Python3 requests 爬取百度，遇到被重定向问题

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 3170 天前的主题，其中的信息可能已经有所发展或是发生改变。

使用 requests.get 在百度上搜索一个关键字，百度会返回一个页面，页面的最底下会有第 1 页，第 2 页这样的按钮。获取这个按钮上的 href 后，再使用 requests.get 去获取这个 url,结果返回如下： <html> <head> <script> location.replace(location.href.replace("https://","http://")); </script> </head> <body> <noscript><meta http-equiv="refresh" content="0;url=<a href=" http:="" <a="" href="http://www.baidu.com" rel="nofollow">www.baidu.com="" "="" rel="nofollow">http://www.baidu.com/"></noscript> </body> </html>

请问下 V 友们，这个该怎么解决

6 条回复 • 2017-05-27 22:57:11 +08:00

dd99iii

2017 年 5 月 27 日

code 呢

kindjeff

2017 年 5 月 27 日

你爬到的链接都换成 https 的再访问呗。

cwlmxwb

2017 年 5 月 27 日

@kindjeff 得到了一样的结果呢

cwlmxwb

2017 年 5 月 27 日

@dd99iii user_agent = "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
headers = {'User_Agent':user_agent}
response = requests.get(url, headers = headers,timeout=10)
return str(response.text)

这样的

GoBeyond

2017 年 5 月 27 日 via Android

你可以读一下这段 HTML，你会发现它在试图从 https 降到 http，虽然我也不清楚为什么会这样，但是你可以试一下

cwlmxwb

2017 年 5 月 27 日

@GoBeyond 老哥厉害了我把 https 改成 http 果然访问成功了帮大忙