V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
idotfish
V2EX  ›  Python

Python scrapy 爬虫问题

  •  
  •   idotfish · 2019-04-09 23:35:05 +08:00 · 1546 次点击
    这是一个创建于 2056 天前的主题,其中的信息可能已经有所发展或是发生改变。
    用 scrapy 框架爬智联的招聘信息的时候报的错看不懂啊
    2019-04-09 23:29:10 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
    2019-04-09 23:29:10 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/url {"url": "https://zhaopin.com", "sessionId": "b97f6963939467e28aa83493fcf91f9d"}
    [7964:9720:0409/232912.471:ERROR:ssl_client_socket_impl.cc(964)] handshake failed; returned -1, SSL error code 1, net_error -100
    [7964:9720:0409/232912.505:ERROR:ssl_client_socket_impl.cc(964)] handshake failed; returned -1, SSL error code 1, net_error -100
    [7964:10376:0409/232913.146:ERROR:platform_sensor_reader_win.cc(242)] NOT IMPLEMENTED
    2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "POST /session/b97f6963939467e28aa83493fcf91f9d/url HTTP/1.1" 200 72
    2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
    2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: GET http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/window_handle {"sessionId": "b97f6963939467e28aa83493fcf91f9d"}
    2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "GET /session/b97f6963939467e28aa83493fcf91f9d/window_handle HTTP/1.1" 200 111
    2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
    2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:50132/session/b97f6963939467e28aa83493fcf91f9d/element {"using": "class name", "value": "zp-search__input", "sessionId": "b97f6963939467e28aa83493fcf9
    1f9d"}
    2019-04-09 23:29:14 [urllib3.connectionpool] DEBUG: http://127.0.0.1:50132 "POST /session/b97f6963939467e28aa83493fcf91f9d/element HTTP/1.1" 200 102
    2019-04-09 23:29:14 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request



    这是代码
    class JobsSpider(scrapy.Spider):
    name = 'jobs'
    allowed_domains = ['zhaopin.com']
    start_urls = ['https://www.zhaopin.com/']

    def start_requests(self):
    browser = webdriver.Chrome()
    browser.get("https://zhaopin.com")
    windows = browser.current_window_handle
    input = browser.find_element_by_class_name('zp-search__input')
    input.send_keys('Python')
    time.sleep(1)
    button = browser.find_element_by_class_name('zp-search__btn')
    button.click()
    all_handles = browser.window_handles
    for handle in all_handles:
    if handle != windows:
    browser.switch_to.window(handle)
    url = browser.current_url
    yield Request(url,callback = self.parse)

    def parse(self, response):
    le = LinkExtractor(restrict_css='div.contentpile__content__wrapper__item.clearfix')
    for link in le.extract_links(response):
    yield scrapy.Request(link.url,callback=self.parse_job)

    def parse_job(self,response):
    jobs = JobItem()
    sel = response.css('div.main')
    jobs['jobname'] = sel.css('hi.l.info-h3::text').extract_first()
    jobs['Cname'] = sel.css('div.company 1::text').extract_first()
    jobs['salary'] = sel.css('div.l.info-money strong::text').extract_first()
    jobs['joblocation'] = sel.css('span.icon-address::text').extract_first()
    jobs['experience'] = sel.css('div.info-three.1').xpath('(.//span)[1].text()').extract_first()
    jobs['education'] =sel.css('div.info-three.1').xpath('(.//span)[2].text()').extract_first()
    jobs['count'] =sel.css('div.info-three.1').xpath('(.//span)[3].text()').extract_first()
    jobs['jobintro'] = sel.css('div.pos-ul').extract
    yield jobs

    这是不是和 cookie 有什么关系啊 求各位大佬解答
    第 1 条附言  ·  2019-04-10 16:01:45 +08:00

    class JobsSpider(scrapy.Spider): name = 'jobs' allowed_domains = ['zhaopin.com'] start_urls = ['https://www.zhaopin.com/']

    def start_requests(self):
        browser = webdriver.Chrome()
        browser.get("https://zhaopin.com")
        windows = browser.current_window_handle
        input = browser.find_element_by_class_name('zp-search__input')
        input.send_keys('Python')
        time.sleep(1)
        button = browser.find_element_by_class_name('zp-search__btn')
        button.click()
        all_handles = browser.window_handles
        for handle in all_handles:
            if handle != windows:
                browser.switch_to.window(handle)
        url = browser.current_url
        yield Request(url,callback = self.parse)
    
    def parse(self, response):
        le = LinkExtractor(restrict_css='div.contentpile__content__wrapper__item.clearfix')
        for link in le.extract_links(response):
            yield scrapy.Request(link.url,callback=self.parse_job)
    
    def parse_job(self,response):
        jobs = JobItem()
        sel = response.css('div.main')
        jobs['jobname'] = sel.css('hi.l.info-h3::text').extract_first()
        jobs['Cname'] = sel.css('div.company 1::text').extract_first()
        jobs['salary'] = sel.css('div.l.info-money strong::text').extract_first()
        jobs['joblocation'] = sel.css('span.icon-address::text').extract_first()
        jobs['experience'] = sel.css('div.info-three.1').xpath('(.//span)[1].text()').extract_first()
        jobs['education'] =sel.css('div.info-three.1').xpath('(.//span)[2].text()').extract_first()
        jobs['count'] =sel.css('div.info-three.1').xpath('(.//span)[3].text()').extract_first()
        jobs['jobintro'] = sel.css('div.pos-ul').extract
        yield jobs
    
    huisezhiyin
        1
    huisezhiyin  
       2019-04-10 15:13:00 +08:00
    你这个代码格式贴的 让人很难看得懂啊
    idotfish
        2
    idotfish  
    OP
       2019-04-10 15:46:10 +08:00
    @huisezhiyin 不好意思,刚刚入门 python,不太懂这些东西,把代码直接截图出来可以吗
    huisezhiyin
        3
    huisezhiyin  
       2019-04-10 16:17:04 +08:00   ❤️ 1
    @idotfish 你这随便搜一下 ERROR 就有答案啊
    随便搜一下 error:ssl_client_socket_impl.cc(964)] handshake failed
    stack overflow 上的一个答案
    https://stackoverflow.com/questions/37883759/errorssl-client-socket-openssl-cc1158-handshake-failed-with-chromedriver-chr
    不行的话就试试其他的答案
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   3361 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 24ms · UTC 12:10 · PVG 20:10 · LAX 04:10 · JFK 07:10
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.