找回密码
 立即注册
搜索
查看: 1187|回复: 0

[软件] 请教个scrapy的问题

[复制链接]
     
发表于 2023-6-17 09:40 | 显示全部楼层 |阅读模式
本帖最后由 b0207191 于 2023-6-17 09:55 编辑

我参考下面这个页面方法登录网站抓取页面

Scrapy 进行简单的自动登录_51CTO博客_scrapy crawl

但是报错

    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 380: invalid continuation byte

  1. 2023-06-17 09:31:27 [scrapy.core.scraper] ERROR: Spider error processing <POST https://网站> (referer: https://网站)
  2. Traceback (most recent call last):
  3.   File "/root/miniconda3/lib/python3.8/site-packages/scrapy/utils/defer.py", line 74, in mustbe_deferred
  4.     result = f(*args, **kw)
  5.   File "/root/miniconda3/lib/python3.8/site-packages/scrapy/core/spidermw.py", line 94, in _process_spider_input
  6.     return scrape_func(response, request, spider)
  7.   File "/root/miniconda3/lib/python3.8/site-packages/scrapy/core/scraper.py", line 209, in call_spider
  8.     warn_on_generator_with_return_value(spider, callback)
  9.   File "/root/miniconda3/lib/python3.8/site-packages/scrapy/utils/misc.py", line 263, in warn_on_generator_with_return_value
  10.     if is_generator_with_return_value(callable):
  11.   File "/root/miniconda3/lib/python3.8/site-packages/scrapy/utils/misc.py", line 239, in is_generator_with_return_value
  12.     src = inspect.getsource(func)
  13.   File "/root/miniconda3/lib/python3.8/inspect.py", line 985, in getsource
  14.     lines, lnum = getsourcelines(object)
  15.   File "/root/miniconda3/lib/python3.8/inspect.py", line 967, in getsourcelines
  16.     lines, lnum = findsource(object)
  17.   File "/root/miniconda3/lib/python3.8/inspect.py", line 794, in findsource
  18.     lines = linecache.getlines(file, module.__dict__)
  19.   File "/root/miniconda3/lib/python3.8/linecache.py", line 47, in getlines
  20.     return updatecache(filename, module_globals)
  21.   File "/root/miniconda3/lib/python3.8/linecache.py", line 137, in updatecache
  22.     lines = fp.readlines()
  23.   File "/root/miniconda3/lib/python3.8/codecs.py", line 322, in decode
  24.     (result, consumed) = self._buffer_decode(data, self.errors, final)
  25. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 380: invalid continuation byte
复制代码

看了下,可能是因为网页是gbk编码,于是我看看在哪里设置编码,先在每个函数入口出口都添加了打印

然后发现程序是在parse函数跑完,还未进入next函数的时候就抛出了异常,这个怎么解决

试了下,在scrapy.FormRequest中添加encode="gbk"也没用

            callback = self.next,
            formdata = post_data,
            encoding = "GBK"

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Archiver|手机版|小黑屋|上海互联网违法和不良信息举报中心|网上有害信息举报专区|962110 反电信诈骗|举报电话 021-62035905|Stage1st ( 沪ICP备13020230号-1|沪公网安备 31010702007642号 )

GMT+8, 2024-9-20 20:41 , Processed in 0.025084 second(s), 5 queries , Gzip On, Redis On.

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表