V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
pwcong
V2EX  ›  Python

Pixiv 的插图小爬虫( ̄ y▽ ̄)╭ Ohohoho.....

  •  1
     
  •   pwcong ·
    pwcong · 2016-10-24 17:03:57 +08:00 · 3172 次点击
    这是一个创建于 2952 天前的主题,其中的信息可能已经有所发展或是发生改变。
    1. python main.py
    2. 输入用户名,密码,和保存的文件夹
    3. 选择要下载哪个排行榜的插图

    今天的排行榜还真是多福利(●ˇ∀ˇ●)

    github: https://github.com/pwcong/PixivCrawler

    如果你喜欢这个小爬虫,请尽情给我个 Start 哈

    8 条回复    2016-11-02 09:20:22 +08:00
    seewhy
        1
    seewhy  
       2016-10-24 19:45:12 +08:00
    Start~~
    pwcong
        2
    pwcong  
    OP
       2016-10-24 22:37:47 +08:00
    @seewhy つ﹏⊂
    402645707
        3
    402645707  
       2016-10-30 10:45:38 +08:00
    po 主好,我想用 docker 把这个打个包方便我在群晖上面用
    运行时报这个

    Traceback (most recent call last):
    File "/usr/local/lib/python3.5/urllib/request.py", line 1254, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
    File "/usr/local/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
    File "/usr/local/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
    File "/usr/local/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
    File "/usr/local/lib/python3.5/http/client.py", line 934, in _send_output
    self.send(msg)
    File "/usr/local/lib/python3.5/http/client.py", line 877, in send
    self.connect()
    File "/usr/local/lib/python3.5/http/client.py", line 849, in connect
    (self.host,self.port), self.timeout, self.source_address)
    File "/usr/local/lib/python3.5/socket.py", line 693, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
    File "/usr/local/lib/python3.5/socket.py", line 732, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno -2] Name or service not known

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File "man.py", line 25, in <module>
    query_tt = RankingCrawler.download_first(opener, RankingCrawler.query_mode[int(qmNo)], saveDir)
    File "/usr/src/app/crawler/RankingCrawler.py", line 119, in download_first
    with op.open(visit) as f:
    File "/usr/local/lib/python3.5/urllib/request.py", line 466, in open
    response = self._open(req, data)
    File "/usr/local/lib/python3.5/urllib/request.py", line 484, in _open
    '_open', req)
    File "/usr/local/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
    File "/usr/local/lib/python3.5/urllib/request.py", line 1282, in http_open
    return self.do_open( http.client.HTTPConnection, req)
    File "/usr/local/lib/python3.5/urllib/request.py", line 1256, in do_open
    raise URLError(err)
    urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>

    尝试改了 resolv.conf 为 114 ,但还是这个错误
    ping pixiv 也能通,感觉应该不是网络的问题
    可以指点下是我哪个运行库没装或者配错了
    402645707
        4
    402645707  
       2016-10-30 11:24:00 +08:00
    排查了半天发现是 lxml 没弄,已自行解决
    pwcong
        5
    pwcong  
    OP
       2016-10-31 11:40:14 +08:00
    @402645707 (>人<;) 骚瑞啊,这两天学车跑长途没看到信息,外部的库只有 bs4(解析用 lxml)
    402645707
        6
    402645707  
       2016-11-01 00:37:56 +08:00
    本地调试的时候还好好的上到 daodocker 就没辙了
    目测又是依赖,然而这次人品用光了

    2016-11-01 00:36:26:start
    2016-11-01 00:36:27:Traceback (most recent call last):
    2016-11-01 00:36:27: File "all.py", line 23, in <module>
    2016-11-01 00:36:27: opener = PixivLoginer.login(userid, password)
    2016-11-01 00:36:27: File "/usr/src/app/api/PixivLoginer.py", line 56, in login
    2016-11-01 00:36:27: data = utils.ungzip(data).decode()
    2016-11-01 00:36:27:AttributeError: module 'utils' has no attribute 'ungzip'
    2016-11-01 00:36:28:Traceback (most recent call last):
    2016-11-01 00:36:28: File "man.py", line 23, in <module>
    2016-11-01 00:36:28: opener = PixivLoginer.login(userid, password)
    2016-11-01 00:36:28: File "/usr/src/app/api/PixivLoginer.py", line 56, in login
    2016-11-01 00:36:28: data = utils.ungzip(data).decode()
    2016-11-01 00:36:28:AttributeError: module 'utils' has no attribute 'ungzip'
    402645707
        7
    402645707  
       2016-11-01 00:38:38 +08:00
    表示主攻嵌入式不太懂 python ,可以的话请指点一下
    mistak1992
        8
    mistak1992  
       2016-11-02 09:20:22 +08:00
    标签“小爬虫”是什么鬼~
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   2674 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 23ms · UTC 05:40 · PVG 13:40 · LAX 21:40 · JFK 00:40
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.