解析pascal文件格式并提取其关键信息
2019-07-24 09:06:38来源:博客园 阅读 ()
问题描述
有一文件夹存放了若干个pascal格式的文件,现要将其依次读出并提取关键信息.
pascal文件内容如下:
# PASCAL Annotation Version 1.00 Image filename : "Train/pos/crop001001.png" Image size (X x Y x C) : 818 x 976 x 3 Database : "The INRIA Rhône-Alpes Annotated Person Database" Objects with ground truth : 3 { "PASperson" "PASperson" "PASperson" } # Note that there might be other objects in the image # for which ground truth data has not been provided. # Top left pixel co-ordinates : (0, 0) # Details for object 1 ("PASperson") # Center point -- not available in other PASCAL databases -- refers # to person head center Original label for object 1 "PASperson" : "UprightPerson" Center point on object 1 "PASperson" (X, Y) : (396, 185) Bounding box for object 1 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (261, 109) - (511, 705) # Details for object 2 ("PASperson") # Center point -- not available in other PASCAL databases -- refers # to person head center Original label for object 2 "PASperson" : "UprightPerson" Center point on object 2 "PASperson" (X, Y) : (119, 385) Bounding box for object 2 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (31, 326) - (209, 712) # Details for object 3 ("PASperson") # Center point -- not available in other PASCAL databases -- refers # to person head center Original label for object 3 "PASperson" : "UprightPerson" Center point on object 3 "PASperson" (X, Y) : (219, 235) Bounding box for object 3 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (148, 179) - (290, 641)
要提取关键信息列表如下:
1. Image filename : "Train/pos/crop001001.png"
提取图片文件名:crop001001.png
2. Objects with ground truth : 3 { "PASperson" "PASperson" "PASperson" }
提取方框个数:3
3. Bounding box for object 1 "PASperson" (Xmin, Ymin) - (Xmax, Ymax) : (261, 109) - (511, 705)
提取方框左上角和右下角的坐标:261 109 511 705
将关键信息输出,这个样例的输出格式为
crop001001.png 3 261 109 511 705 31 326 209 712 148 179 290 641
测试代码
import os import re def main(): pascal_path = '/home/maxin/Desktop/pascal_list/' pascal_list = os.listdir(pascal_path) print(len(pascal_list)) for pascal_file in pascal_list: f = open(pascal_path + pascal_file, encoding='gbk') line_list = f.readlines() str_line = '' for line in line_list: if str(line).__contains__('Image filename'): str_line = line.strip().split('/')[2][0:-1] # remove the end of " break for line in line_list: if str(line).__contains__('Objects with ground truth'): nums = re.findall(r'\d+', str(line)) str_line = str_line + ' ' + str(nums[0]) # print(str_line) break for index in range(1, int(nums[0]) + 1): for line in line_list: if str(line).__contains__("Bounding box for object " + str(index)): coordinate = re.findall(r'\d+', str(line)) str_line = str_line + ' ' + coordinate[1] + ' ' + coordinate[2] + ' ' + coordinate[3] + ' ' + coordinate[4] f.close() print(str_line) if __name__ == "__main__": main()
原文链接:https://www.cnblogs.com/maxin/p/11089596.html
如有疑问请与原作者联系
标签:
版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com
特别注意:本站所有转载文章言论不代表本站观点,本站所提供的摄影照片,插画,设计作品,如需使用,请与原作者联系,版权归原作者所有
- PythonDay08 2019-08-13
- python 之 前端开发(form标签、单选框、多选框、file上传文 2019-08-13
- 把Python项目打包成exe文件 2019-08-13
- pycharm 新建py文件写时有作者和时间 2019-08-13
- 手把手教你破解文件密码、wifi密码、网页密码 2019-07-24
IDC资讯: 主机资讯 注册资讯 托管资讯 vps资讯 网站建设
网站运营: 建站经验 策划盈利 搜索优化 网站推广 免费资源
网络编程: Asp.Net编程 Asp编程 Php编程 Xml编程 Access Mssql Mysql 其它
服务器技术: Web服务器 Ftp服务器 Mail服务器 Dns服务器 安全防护
软件技巧: 其它软件 Word Excel Powerpoint Ghost Vista QQ空间 QQ FlashGet 迅雷
网页制作: FrontPages Dreamweaver Javascript css photoshop fireworks Flash