半肾
精华
|
战斗力 鹅
|
回帖 0
注册时间 2004-10-27
|
本帖最后由 b0207191 于 2023-6-17 09:55 编辑
我参考下面这个页面方法登录网站抓取页面
Scrapy 进行简单的自动登录_51CTO博客_scrapy crawl
但是报错
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 380: invalid continuation byte
- 2023-06-17 09:31:27 [scrapy.core.scraper] ERROR: Spider error processing <POST https://网站> (referer: https://网站)
- Traceback (most recent call last):
- File "/root/miniconda3/lib/python3.8/site-packages/scrapy/utils/defer.py", line 74, in mustbe_deferred
- result = f(*args, **kw)
- File "/root/miniconda3/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 94, in _process_spider_input
- return scrape_func(response, request, spider)
- File "/root/miniconda3/lib/python3.8/site-packages/scrapy/core/scraper.py", line 209, in call_spider
- warn_on_generator_with_return_value(spider, callback)
- File "/root/miniconda3/lib/python3.8/site-packages/scrapy/utils/misc.py", line 263, in warn_on_generator_with_return_value
- if is_generator_with_return_value(callable):
- File "/root/miniconda3/lib/python3.8/site-packages/scrapy/utils/misc.py", line 239, in is_generator_with_return_value
- src = inspect.getsource(func)
- File "/root/miniconda3/lib/python3.8/inspect.py", line 985, in getsource
- lines, lnum = getsourcelines(object)
- File "/root/miniconda3/lib/python3.8/inspect.py", line 967, in getsourcelines
- lines, lnum = findsource(object)
- File "/root/miniconda3/lib/python3.8/inspect.py", line 794, in findsource
- lines = linecache.getlines(file, module.__dict__)
- File "/root/miniconda3/lib/python3.8/linecache.py", line 47, in getlines
- return updatecache(filename, module_globals)
- File "/root/miniconda3/lib/python3.8/linecache.py", line 137, in updatecache
- lines = fp.readlines()
- File "/root/miniconda3/lib/python3.8/codecs.py", line 322, in decode
- (result, consumed) = self._buffer_decode(data, self.errors, final)
- UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 380: invalid continuation byte
复制代码
看了下,可能是因为网页是gbk编码,于是我看看在哪里设置编码,先在每个函数入口出口都添加了打印
然后发现程序是在parse函数跑完,还未进入next函数的时候就抛出了异常,这个怎么解决
试了下,在scrapy.FormRequest中添加encode="gbk"也没用
callback = self.next,
formdata = post_data,
encoding = "GBK"
|
|