第一次python词云尝试
2018-08-10 11:27:16来源:博客园 阅读 ()
1 分析英文文本
1 from wordcloud import WordCloud
2 import os
3
4 cur_path = os.path.dirname(__file__)
5
6 with open(os.path.join(cur_path, 'love_en.txt')) as fp:
7 txt = fp.read()
8 # print(txt)
9 wordcloud = WordCloud().generate(txt)
10 image = wordcloud.to_image()
11 image.show()
发生错误,错误类型:OSError: cannot open resource
解决方案:添加字体
改正后代码:
1 from wordcloud import WordCloud
2 import os
3
4 cur_path = os.path.dirname(__file__)
5
6 with open(os.path.join(cur_path, 'love_en.txt')) as fp:
7 txt = fp.read()
8 # print(txt)
9 wordcloud = WordCloud(font_path = 'FZLTXIHK.TTF').generate(txt)
10 image = wordcloud.to_image()
11 image.show()
进一步优化代码:
1 from wordcloud import WordCloud
2 import os
3
4 cur_path = os.path.dirname(__file__)
5
6 with open(os.path.join(cur_path, 'love_en.txt')) as fp:
7 txt = fp.read()
8 # print(txt)
9 wordcloud = WordCloud(font_path = 'FZLTXIHK.TTF', # 字体
10 background_color = 'black', # 背景色
11 max_words = 30, # 最大显示单词数
12 max_font_size = 60 # 频率最大单词字体大小
13 ).generate(txt)
14 image = wordcloud.to_image()
15 image.show()
效果图:
2 分析中文文本
1 import jieba
2 from wordcloud import WordCloud
3 import os
4
5 cur_path = os.path.dirname(__file__)
6
7 def chinese_jieba(txt):
8 wordlist_jieba = jieba.cut(txt) # 将文本分割,返回列表
9 txt_jieba = " ".join(wordlist_jieba) # 将列表拼接为以空格为间断的字符串
10 return txt_jieba
11
12 stopwords = {'这些':0, '那些':0, '因为':0, '所以':0} # 噪声词
13
14 with open(os.path.join(cur_path, '择天记.txt')) as fp:
15 txt = fp.read()
16 txt = chinese_jieba(txt)
17 # print(txt)
18 wordcloud = WordCloud(font_path = 'FZLTXIHK.TTF', # 字体
19 background_color = 'black', # 背景色
20 max_words = 30, # 最大显示单词数
21 max_font_size = 60, # 频率最大单词字体大小
22 stopwords = stopwords # 过滤噪声词
23 ).generate(txt)
24 image = wordcloud.to_image()
25 image.show()
效果图:
3 进一步优化显示效果
1 import jieba
2 from wordcloud import WordCloud
3 import os
4 import numpy
5 import PIL.Image as Image
6
7 cur_path = os.path.dirname(__file__)
8
9 def chinese_jieba(txt):
10 wordlist_jieba = jieba.cut(txt) # 将文本分割,返回列表
11 txt_jieba = " ".join(wordlist_jieba) # 将列表拼接为以空格为间断的字符串
12 return txt_jieba
13
14 stopwords = {'这些':0, '那些':0, '因为':0, '所以':0} # 噪声词
15 mask_pic = numpy.array(Image.open(os.path.join(cur_path, 'love.jpg')))
16
17 with open(os.path.join(cur_path, '择天记.txt')) as fp:
18 txt = fp.read()
19 txt = chinese_jieba(txt)
20 # print(txt)
21 wordcloud = WordCloud(font_path = 'FZLTXIHK.TTF', # 字体
22 background_color = 'white', # 背景色
23 max_words = 100, # 最大显示单词数
24 max_font_size = 60, # 频率最大单词字体大小
25 stopwords = stopwords, # 过滤噪声词
26 mask = mask_pic # 自定义显示的效果图
27 ).generate(txt)
28 image = wordcloud.to_image()
29 image.show()
效果图:
标签:
版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com
特别注意:本站所有转载文章言论不代表本站观点,本站所提供的摄影照片,插画,设计作品,如需使用,请与原作者联系,版权归原作者所有
- python3基础之“术语表(2)” 2019-08-13
- python3 之 字符串编码小结(Unicode、utf-8、gbk、gb2312等 2019-08-13
- Python3安装impala 2019-08-13
- 小白如何入门 Python 爬虫? 2019-08-13
- python_字符串方法 2019-08-13
IDC资讯: 主机资讯 注册资讯 托管资讯 vps资讯 网站建设
网站运营: 建站经验 策划盈利 搜索优化 网站推广 免费资源
网络编程: Asp.Net编程 Asp编程 Php编程 Xml编程 Access Mssql Mysql 其它
服务器技术: Web服务器 Ftp服务器 Mail服务器 Dns服务器 安全防护
软件技巧: 其它软件 Word Excel Powerpoint Ghost Vista QQ空间 QQ FlashGet 迅雷
网页制作: FrontPages Dreamweaver Javascript css photoshop fireworks Flash