Python3爬虫(十五) 代理
2018-06-18 02:45:04来源:未知 阅读 ()
Infi-chu:
http://www.cnblogs.com/Infi-chu/
一、设置代理
1.urllib
#HTTP代理类型 from urllib.error import URLError from urllib.requests import ProxyHandler,build_opener proxy='127.0.0.1:9743' # proxy='username:password@127.0.0.1:9743' 用户名密码放在开头 proxy_handler=ProxyHandler({ 'http':'http://'+proxy, 'https':'https://'+proxy }) opener=build_opener(proxy_handler) try: res = opener.open('http://httpbin.org/get') print(res.read().decode('uft-8')) except URLError as e: print(e.reason) #SOCK5代理类型 import socks # pip3 install PySocks import socket from urllib import request from urllib.error import URLError socks.set_default_proxy(socks.SOCKS5,'127.0.0.1',9742) socket.socket=socks.socksocket try: res = request.urlopen('http://httpbin.org/get') print(res.read().decode('utf-8')) except URLError as e: print(e.reason)
2.requests
比urllib简单
# HTTP代理类型 improt requests proxy='127.0.0.1:9743' proxies = { 'http':'http://'+proxy, 'https':'https://'+proxy, } try: res = requests.get('http://httpbin.org/get',proxies=proxies) print(res.text) except requests.exceptions.ConnectionError as e: print('Error',e.args) # SOCK5代理类型(1) import requests # pip3 install 'requests[socks]' proxy='127.0.0.1:9742' proxies={ 'http':'socks5://'+proxy, 'https':'socks5://'+proxy, } try: res = requests.get('http://httpbin.org/get',proxies=proxies) print(res.text) except requests.exceptions.ConnectionError as e: print('Error',e.args) # SOCK5代理类型(2) import requests,socks,socket socks.set_default_proxy(socks.SOCKS5,'127.0.0.1',9742) socket.socket=socks.socksocket try: res = requests.get('http://httpbin.org/get',proxies=proxies) print(res.text) except requests.exceptions.ConnectionError as e: print('Error',e.args)
3.Selenium
设置浏览器代理
from selenium import webdriver proxy='127.0.0.1:9743' chrome_options=webdriver.ChromeOptions() # 使用此方法传参数 chrome_options.add_argument('--proxy-server=http://'+proxy) browser=webdriver.Chrome(chrome_options=chrome_options) browser.get('http://httpbin.org/get')
设置认证代理
from selenium import webdriver from selenium.webdriver.chrome.options import Options import zipfile ip='127.0.0.1' port=9743 username='test' password='test' manifest_json=""" { "version":"1.0.0", "manifest_version":2, "name":"Chrome Proxy", "permissions":[ "proxy", "tabs", "unlimitedStorage", "storage", "<all_urls>", "webRequest", "webRequestBlocking" ], "background":{"scripts":["background.js"]} } """ background_js=""" var config={ mode:"fixed_servers", rules:{ singleProxy:{ scheme:"http", host:"%(ip)s", port:"%(port)s" } } } chrome.proxy.settings.set({value:config,scope:"regular"},function(){}); function callbackFn(details){ return{ authCredentials:{ username:"%(username)s", password:"%(password)s" } } } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls:["<all_urls>"]}, ['blocking'] ) """%{'ip':ip,'port':port,'username':username,'port':port} plugin_file='proxy_auth_plugin.zip' with zipfile.ZipFile(plugin_file,'w') as zp: zp.writestr("manifest_json",manifest_json) zp.writestr("background.js",background_js) chrome_options=Options() chrome_options.add_argument('--start-maximized') chrome_options.add_extension(plugin_file) browser=webdriver.Chrome(chrome_options=chrome_options) browser.get('http://httpbin.org/get')
二、代理池维护
单一代理并不能完成我们的代理任务,所以需要更多数量的代理为我们服务。
我们将对代理进行筛选,并高效的为我们提供服务。
1.准备
需要使用redis数据库,aiohttp、requests、redis-py、pyquery、flask库
2.代理池的目标:存储模块、获取模块、检测模块、接口模块
3.各模块的实现:
https://github.com/Infi-chu/proxypool
三、利用代理爬取微信文章
https://github.com/Infi-chu/weixinspider
标签:
版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com
特别注意:本站所有转载文章言论不代表本站观点,本站所提供的摄影照片,插画,设计作品,如需使用,请与原作者联系,版权归原作者所有
- python3基础之“术语表(2)” 2019-08-13
- python3 之 字符串编码小结(Unicode、utf-8、gbk、gb2312等 2019-08-13
- Python3安装impala 2019-08-13
- 小白如何入门 Python 爬虫? 2019-08-13
- python day2-爬虫实现github登录 2019-08-13
IDC资讯: 主机资讯 注册资讯 托管资讯 vps资讯 网站建设
网站运营: 建站经验 策划盈利 搜索优化 网站推广 免费资源
网络编程: Asp.Net编程 Asp编程 Php编程 Xml编程 Access Mssql Mysql 其它
服务器技术: Web服务器 Ftp服务器 Mail服务器 Dns服务器 安全防护
软件技巧: 其它软件 Word Excel Powerpoint Ghost Vista QQ空间 QQ FlashGet 迅雷
网页制作: FrontPages Dreamweaver Javascript css photoshop fireworks Flash