有没有过周末为看什么电影焦虑,今日手把手教学爬取豆瓣电影评分,网站地址,上一篇文章提到会话坚持最终没有做解说,在这儿说一下。
0,当咱们翻开一个网站的时分这个时分网站给到咱们一个cookies,这个cokkies可能是多个参数或一个参数,然后咱们再阅读其他的页面的时分网站会校验cookies等信息确认是不是仍是一个同一个用户再访问。那么再说模仿登录的时分咱们获取到验证码怎样让他再登录的时分也认为是咱们便是拿验证码的人呢,就用到会话坚持,可是又一些公司为了方式爬虫会制止sess传递cookies。那么怎样搞了。这儿留个彩蛋我们自己研讨研讨之后有了好案例我再更。
1,话不多说开端今日爬虫之旅




3.0,直接对接口建议恳求看一下json数据取值,
headers = {
"Accept": "application/json, text/plain, */*",
"Accept-Language": "zh-CN,zh;q=0.9",
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"Origin": "https://movie.douban.com",
"Pragma": "no-cache",
"Referer": "https://movie.douban.com/explore",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-site",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
"sec-ch-ua": "" Not A;Brand";v="99", "Chromium";v="100", "Google Chrome";v="100"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": ""Windows""
}
cookies = {
"ll": ""108288"",
"bid": "5JjiG1qD2ik",
"ap_v": "0,6.0"
}
url = "https://m.douban.com/rexxar/api/v2/movie/recommend"
params = {
"refresh": "0",
"start": "40",
"count": "20",
"selected_categories": "{}",
"uncollect": "false",
"sort": "T",
"tags": ""
}
response = requests.get(url, headers=headers, cookies=cookies, params=params).json()['items']
for res in response:
title = res['title']
print(title)

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。