python根据正则表达式的简单爬虫

2018-08-10 11:26:41来源:博客园 阅读 ()

新老客户大回馈,云服务器低至5折

今天根据正则表达式简单的爬了一下大众点评,把北京的美食爬了爬,(店铺名,人均消费,地址)

import re
import urllib.request
from urllib.request import urlopen

def getPage(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '
                             'Chrome/51.0.2704.63 Safari/537.36'}
    req = urllib.request.Request(url=url, headers=headers)
    res = urllib.request.urlopen(req)
    return res.read().decode('utf-8')

def parsePage(s):
    ret = com.finditer(s)
    for i in ret:
        ret = {
            "店铺名": i.group("shop_name"),
            "人均价格": i.group("per_capita"),
            "地址": i.group("address"),
        }

        yield ret

def main(num):
    url = "http://www.dianping.com/beijing/ch10/p%s?aid=92020785%%2C102284990&cpt=92020785%%2C102284990" % num
    response_html = getPage(url)
    ret = parsePage(response_html)
    print(ret)
    f = open("eat_info", "a", encoding="utf-8")

    for obj in ret:
        print(obj)
        data = str(obj)
        f.write(data + "\n")
com = re.compile(
        '<div class="txt">.*?<h4>(?P<shop_name>.*?)</h4>'
        '.*?<b>¥(?P<per_capita>\d+)</b>.*?<span class="addr">(?P<address>.*?)</span>', re.S)

count = 1
for i in range(50):
    main(count)
    count += 1
简单爬虫

 

标签:

版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com
特别注意:本站所有转载文章言论不代表本站观点,本站所提供的摄影照片,插画,设计作品,如需使用,请与原作者联系,版权归原作者所有

上一篇:python 课程博客链接

下一篇:都说Python是一门躺着就能赚钱的编程语言!编写自动获取金币脚本