Python爬虫学习==>第八章:Requests库详…
2018-06-18 02:14:39来源:未知 阅读 ()
学习目的:
request库比urllib库使用更加简洁,且更方便。
正式步骤
Step1:什么是requests
requests是用Python语言编写,基于urllib,采用Apache2 Licensed开源协议的HTTP库。它比urllib更加方便,可以节约大量工作时间,还完全满足HTTP测试需求,是一个简单易用的HTTP库。
Step2:实例 引入
# -*- coding:utf-8 -*- import requests response = requests.get('http://www.baidu.com') print(type(response)) print(response.content) print(response.status_code) print(response.text) print(type(response.text)) print(response.cookies)
Step3:各种请求方式
# -*- coding:utf-8 -*- import requests requests.post('http://httpbin.org/post') requests.put('http://httpbin.org/put') requests.delete('http://httpbin.org/delete') requests.head('http://httpbin.org/get') requests.options('http://httpbin.org/get')
- get请求
① 基本用法
# -*- coding:utf-8 -*- import requests response = requests.get('http://httpbin.org/get') print(response.text)
运行结果:
{ "args": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "origin": "222.94.50.178", "url": "http://httpbin.org/get" }
②带参数的get请求import requests data = { 'name':'python','age':17 } response = requests.get('http://httpbin.org/get',params=data) print(response.text)
运行结果:
{ "args": { "age": "17", "name": "python" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Connection": "close", "Host": "httpbin.org", "User-Agent": "python-requests/2.18.4" }, "origin": "222.94.50.178", "url": "http://httpbin.org/get?name=python&age=17" }
③解析Jsonimport requests import json response = requests.get('http://httpbin.org/get') print(response.json()) print(type(response.json()))
④获取二进制数据# -*- coding:utf-8 -*- ''' 保存百度图标 ''' import requests response = requests.get('https://www.baidu.com/img/bd_logo1.png') with open('baidu.png','wb') as f: f.write(response.content) f.close()
⑤添加headers
如果直接爬取知乎的网站,是会报错的,如:import requests response = requests.get('https://www.zhihu.com/explore') print(response.text)
运行结果:
<html><body><h1>500 Server Error</h1> An internal server error occured. </body></html>
解决办法:
import requests headers = { 'user-agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36' } response = requests.get('https://www.zhihu.com/explore',headers = headers) print(response.text)
就是添加一个headers,就可以正常抓取,而headers中的数据,我是通过chrome浏览器自带的开发者工具去找了然后copy过来的
- 基本POST请求
import requests data = { 'name':'python','age' : 18 } headers = { 'user-agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36' } response = requests.post('http://httpbin.org/post',data=data,headers=headers) print(response.json())
- 响应
import requests ''' response属性 ''' response = requests.get('http://www.baidu.com') print(response.status_code,type(response.status_code)) print(response.history,type(response.history)) print(response.cookies,type(response.cookies)) print(response.url,type(response.url)) print(response.headers,type(response.headers))
运行结果:
200 <class 'int'> [] <class 'list'> <RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]> <class 'requests.cookies.RequestsCookieJar'> http://www.baidu.com/ <class 'str'> {'Server': 'bfe/1.0.8.18', 'Date': 'Thu, 05 Apr 2018 06:27:33 GMT', 'Content-Type': 'text/html', 'Last-Modified': 'Mon, 23 Jan 2017 13:28:24 GMT', 'Transfer-Encoding': 'chunked', 'Connection': 'Keep-Alive', 'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Pragma': 'no-cache', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Content-Encoding': 'gzip'} <class 'requests.structures.CaseInsensitiveDict'>
- 状态码判断
状态码参考表 http://www.cnblogs.com/wuzhiming/p/8722422.html
# -*- coding:utf-8 -*- import requests response = requests.get('http://www.cnblogs.com/hello.html') exit() if not response.status_code == requests.codes.not_found else print('404 not found') response1 = requests.get('http://www.baidu.com') exit() if not response1.status_code == requests.codes.ok else print('Request Successly')
- 高级操作
①文件上传
import requests file = {'file':open('baidu.png','rb')} response = requests.post('http://httpbin.org/post',files = file) print(response.text)
运行结果不演示
②获取cookie
import requests response = requests.get('http://www.baidu.com') cookies = response.cookies print(cookies) for key,value in cookies.items(): print(key + '=' + value)
import requests s = requests.Session() s.get('http://httpbin.org/cookies/get/number/123456789') response = s.get('http://httpbin.org/cookies') print(response.text)
import requests #verify=False表示不进行证书验证 response = requests.get('https://www.12306.cn',verify=False) print(response.status_code)
手动指定证书
response1 = requests.get('https://www.12306.cn',cert=('/path/server.crt','/path/key'))
⑤代理设置
import requests #用法示例,代理可以自己百度免费的代理 proxies = { 'http':'http://127.0.0.1:端口号', 'https':'https://ip:端口号', 'http':'http://username:password@ip:端口号' } response = requests.get('http://www.baidu.com',proxies=proxies) print(response.status_code)
import requests response = requests.get('http://httpbin.org/get',timeout = 1) print(response.status_code)
import requests from requests.auth import HTTPBasicAuth response = requests.get('http://127.0.0.1:8888',auth=('user','password')) response1 = requests.get('http://127.0.0.1:8888',auth=HTTPBasicAuth('user','passwrd')) print(response.status_code)
PS:127.0.0.1:8888只是举例
⑧异常处理
import requests from requests.exceptions import ReadTimeout,HTTPError,RequestException try: response = requests.get('http://httpbin.org/get',timeout = 0.01) print(response.status_code) except ReadTimeout: print("TIME OUT") except HTTPError: print('HTTP ERROR') except RequestException: print("ERROR")
学习总结:
通过爬虫的学习可以进一步的掌握python的基础应用,我的目的就是这个,后面继续学习
标签:
版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com
特别注意:本站所有转载文章言论不代表本站观点,本站所提供的摄影照片,插画,设计作品,如需使用,请与原作者联系,版权归原作者所有
- python3基础之“术语表(2)” 2019-08-13
- python3 之 字符串编码小结(Unicode、utf-8、gbk、gb2312等 2019-08-13
- Python3安装impala 2019-08-13
- 小白如何入门 Python 爬虫? 2019-08-13
- python_字符串方法 2019-08-13
IDC资讯: 主机资讯 注册资讯 托管资讯 vps资讯 网站建设
网站运营: 建站经验 策划盈利 搜索优化 网站推广 免费资源
网络编程: Asp.Net编程 Asp编程 Php编程 Xml编程 Access Mssql Mysql 其它
服务器技术: Web服务器 Ftp服务器 Mail服务器 Dns服务器 安全防护
软件技巧: 其它软件 Word Excel Powerpoint Ghost Vista QQ空间 QQ FlashGet 迅雷
网页制作: FrontPages Dreamweaver Javascript css photoshop fireworks Flash