V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
Ewig
V2EX  ›  Python

scrapy 下载管道报错 301

  •  
  •   Ewig · 2019-01-01 19:27:49 +08:00 · 2046 次点击
    这是一个创建于 2157 天前的主题,其中的信息可能已经有所发展或是发生改变。

    7882 2019-01-01 19:21:26 [searchwww][scrapy.core.engine] INFO: Spider opened 7883 2019-01-01 19:21:26 [searchwww][scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 7884 2019-01-01 19:21:26 [searchwww][scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6031 7885 2019-01-01 19:21:38 [searchwww][scrapy.core.engine] DEBUG: Crawled (200) <GET https://searchwww.sec.gov/EDGARFSClient/jsp/EDGAR_MainAccess.jsp?search_text=F-1+ for&sort=Date&startDoc=101&numResults=100&isAdv=true&formType=FormF1&fromDate=mm/dd/yyyy&toDate=mm/dd/yyyy&stemming=true> (referer: None) 7886 2019-01-01 19:21:38 [searchwww][scrapy.core.engine] DEBUG: Crawled (301) <GET http://www.sec.gov/Archives/edgar/data/1747624/000121390018017885/ff12018_fitboxxholdings.htm> (referer: None) 7887 2019-01-01 19:21:38 [searchwww][scrapy.pipelines.files] WARNING: File (code: 301): Error downloading file from <GET http://www.sec.gov/Archives/edgar/data/1747624/000121390018017885/ ff12018_fitboxxholdings.htm> referred in <none></none>

    from scrapy.pipelines.files import FilesPipeline from scrapy import Request

    class download_pipeline(FilesPipeline):

    def file_path(self, request, response=None, info=None):
        return request.meta.get('filename', '')
    
    def get_media_requests(self, item, info):
        file_url = item['file_url']
        meta = {'filename': item['name']}
        yield Request(url=file_url, meta=meta)
    

    这个在下载的管道里面总是报错 301 求指教

    4 条回复    2019-01-02 19:19:06 +08:00
    wellCh4n
        1
    wellCh4n  
       2019-01-02 10:29:54 +08:00
    被重定向了吗?
    Ewig
        2
    Ewig  
    OP
       2019-01-02 17:39:23 +08:00
    @wellCh4n 如何解决
    wellCh4n
        3
    wellCh4n  
       2019-01-02 18:52:12 +08:00
    @Ewig #2 这个是服务端行为啊,你可以看下为什么被重定向了,在 response 里面看下被重定向到了哪个地址
    Ewig
        4
    Ewig  
    OP
       2019-01-02 19:19:06 +08:00
    这个主要是框架做的,yield 回去的,我用正常的 request 就没有问题,搞不懂
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   5559 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 24ms · UTC 06:51 · PVG 14:51 · LAX 22:51 · JFK 01:51
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.