使用Rapidxml 库遇到的问题和分析过程
2018-06-17 22:34:26来源:未知 阅读 ()
C++解析xml的开源库有很多,在此我就不一一列举了,今天主要说下Rapidxml,我使用这个库也并不是很多,如有错误之处还望大家能够之处,谢谢。
附:
官方链接:http://rapidxml.sourceforge.net/
官方手册:http://rapidxml.sourceforge.net/manual.html
之前有一次用到,碰到了个"坑",当时时间紧迫并未及时查找,今天再次用到这个库,对这样的"坑"不能踩第二次,因此我决定探个究竟。
先写两段示例:
创建xml:
1 void CreateXml() 2 { 3 rapidxml::xml_document<> doc; 4 5 auto nodeDecl = doc.allocate_node(rapidxml::node_declaration); 6 nodeDecl->append_attribute(doc.allocate_attribute("version", "1.0")); 7 nodeDecl->append_attribute(doc.allocate_attribute("encoding", "UTF-8")); 8 doc.append_node(nodeDecl);//添加xml声明 9 10 auto nodeRoot = doc.allocate_node(rapidxml::node_element, "Root");//创建一个Root节点 11 nodeRoot->append_node(doc.allocate_node(rapidxml::node_comment, NULL, "编程语言"));//添加一个注释内容到Root,注释没有name 所以第二个参数为NULL 12 auto nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C language");//创建一个language节点 13 nodeLangrage->append_attribute(doc.allocate_attribute("name", "C"));//添加一个name属性到language 14 nodeRoot->append_node(nodeLangrage); //添加一个language到Root节点 15 nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C++ language");//创建一个language节点 16 nodeLangrage->append_attribute(doc.allocate_attribute("name", "C++"));//添加一个name属性到language 17 nodeRoot->append_node(nodeLangrage); //添加一个language到Root节点 18 19 doc.append_node(nodeRoot);//添加Root节点到Document 20 std::string buffer; 21 rapidxml::print(std::back_inserter(buffer), doc, 0); 22 std::ofstream outFile("language.xml"); 23 outFile << buffer; 24 outFile.close(); 25 }
结果:
1 <?xml version="1.0" encoding="UTF-8"?> 2 <Root> 3 <!--编程语言--> 4 <language name="C">This is C language</language> 5 <language name="C++">This is C++ language</language> 6 </Root>
修改xml:
1 void MotifyXml() 2 { 3 rapidxml::file<> requestFile("language.xml");//从文件加载xml 4 rapidxml::xml_document<> doc; 5 doc.parse<0>(requestFile.data());//解析xml 6 7 auto nodeRoot = doc.first_node();//获取第一个节点,也就是Root节点 8 auto nodeLanguage = nodeRoot->first_node("language");//获取Root下第一个language节点 9 nodeLanguage->first_attribute("name")->value("Motify C");//修改language节点的name属性为 Motify C 10 std::string buffer; 11 rapidxml::print(std::back_inserter(buffer), doc, 0); 12 std::ofstream outFile("MotifyLanguage.xml"); 13 outFile << buffer; 14 outFile.close(); 15 }
结果:
1 <Root> 2 <language name="Motify C">This is C language</language> 3 <language name="C++">This is C++ language</language> 4 </Root>
由第二个结果得出:
第一个language的name属性确实改成我们所期望的值了,不过不难发现xml的声明和注释都消失了。是怎么回事呢?这个问题也困扰了我一段时间,既然是开源库,那我们跟一下看看他都干了什么,从代码可以看出可疑的地方主要有两处:print和parse,这两个函数均需要提供一个flag,这个flag到底都干了什么呢,从官方给的教程来看 均使用的0,既然最终执行的是print我们就从print开始调试跟踪吧
找到了找到print调用的地方:
1 template<class OutIt, class Ch> 2 inline OutIt print(OutIt out, const xml_node<Ch> &node, int flags = 0) 3 { 4 return internal::print_node(out, &node, flags, 0); 5 }
继续跟踪:
1 // Print node 2 template<class OutIt, class Ch> 3 inline OutIt print_node(OutIt out, const xml_node<Ch> *node, int flags, int indent) 4 { 5 // Print proper node type 6 switch (node->type()) 7 { 8 9 // Document 10 case node_document: 11 out = print_children(out, node, flags, indent); 12 break; 13 14 // Element 15 case node_element: 16 out = print_element_node(out, node, flags, indent); 17 break; 18 19 // Data 20 case node_data: 21 out = print_data_node(out, node, flags, indent); 22 break; 23 24 // CDATA 25 case node_cdata: 26 out = print_cdata_node(out, node, flags, indent); 27 break; 28 29 // Declaration 30 case node_declaration: 31 out = print_declaration_node(out, node, flags, indent); 32 break; 33 34 // Comment 35 case node_comment: 36 out = print_comment_node(out, node, flags, indent); 37 break; 38 39 // Doctype 40 case node_doctype: 41 out = print_doctype_node(out, node, flags, indent); 42 break; 43 44 // Pi 45 case node_pi: 46 out = print_pi_node(out, node, flags, indent); 47 break; 48 49 // Unknown 50 default: 51 assert(0); 52 break; 53 } 54 55 // If indenting not disabled, add line break after node 56 if (!(flags & print_no_indenting)) 57 *out = Ch('\n'), ++out; 58 59 // Return modified iterator 60 return out; 61 }
跟进print_children 发现这实际是个递归,我们继续跟踪
1 // Print element node 2 template<class OutIt, class Ch> 3 inline OutIt print_element_node(OutIt out, const xml_node<Ch> *node, int flags, int indent) 4 { 5 assert(node->type() == node_element); 6 7 // Print element name and attributes, if any 8 if (!(flags & print_no_indenting)) 9 ...//省略部分代码 10 11 return out; 12 }
我们发现第8行有一个&判断 查看print_no_indenting的定义:
1 // Printing flags 2 3 const int print_no_indenting = 0x1; //!< Printer flag instructing the printer to suppress indenting of XML. See print() function.
据此我们就可以分析了,按照开发风格统一的思想,parse也应该有相同的标志定义
省略分析parse流程..
我也顺便去查看了官方文档,确实和我预想的一样,贴一下头文件中对这些标志的描述,详细信息可参考官方文档
1 // Parsing flags 2 3 //! Parse flag instructing the parser to not create data nodes. 4 //! Text of first data node will still be placed in value of parent element, unless rapidxml::parse_no_element_values flag is also specified. 5 //! Can be combined with other flags by use of | operator. 6 //! <br><br> 7 //! See xml_document::parse() function. 8 const int parse_no_data_nodes = 0x1; 9 10 //! Parse flag instructing the parser to not use text of first data node as a value of parent element. 11 //! Can be combined with other flags by use of | operator. 12 //! Note that child data nodes of element node take precendence over its value when printing. 13 //! That is, if element has one or more child data nodes <em>and</em> a value, the value will be ignored. 14 //! Use rapidxml::parse_no_data_nodes flag to prevent creation of data nodes if you want to manipulate data using values of elements. 15 //! <br><br> 16 //! See xml_document::parse() function. 17 const int parse_no_element_values = 0x2; 18 19 //! Parse flag instructing the parser to not place zero terminators after strings in the source text. 20 //! By default zero terminators are placed, modifying source text. 21 //! Can be combined with other flags by use of | operator. 22 //! <br><br> 23 //! See xml_document::parse() function. 24 const int parse_no_string_terminators = 0x4; 25 26 //! Parse flag instructing the parser to not translate entities in the source text. 27 //! By default entities are translated, modifying source text. 28 //! Can be combined with other flags by use of | operator. 29 //! <br><br> 30 //! See xml_document::parse() function. 31 const int parse_no_entity_translation = 0x8; 32 33 //! Parse flag instructing the parser to disable UTF-8 handling and assume plain 8 bit characters. 34 //! By default, UTF-8 handling is enabled. 35 //! Can be combined with other flags by use of | operator. 36 //! <br><br> 37 //! See xml_document::parse() function. 38 const int parse_no_utf8 = 0x10; 39 40 //! Parse flag instructing the parser to create XML declaration node. 41 //! By default, declaration node is not created. 42 //! Can be combined with other flags by use of | operator. 43 //! <br><br> 44 //! See xml_document::parse() function. 45 const int parse_declaration_node = 0x20; 46 47 //! Parse flag instructing the parser to create comments nodes. 48 //! By default, comment nodes are not created. 49 //! Can be combined with other flags by use of | operator. 50 //! <br><br> 51 //! See xml_document::parse() function. 52 const int parse_comment_nodes = 0x40; 53 54 //! Parse flag instructing the parser to create DOCTYPE node. 55 //! By default, doctype node is not created. 56 //! Although W3C specification allows at most one DOCTYPE node, RapidXml will silently accept documents with more than one. 57 //! Can be combined with other flags by use of | operator. 58 //! <br><br> 59 //! See xml_document::parse() function. 60 const int parse_doctype_node = 0x80; 61 62 //! Parse flag instructing the parser to create PI nodes. 63 //! By default, PI nodes are not created. 64 //! Can be combined with other flags by use of | operator. 65 //! <br><br> 66 //! See xml_document::parse() function. 67 const int parse_pi_nodes = 0x100; 68 69 //! Parse flag instructing the parser to validate closing tag names. 70 //! If not set, name inside closing tag is irrelevant to the parser. 71 //! By default, closing tags are not validated. 72 //! Can be combined with other flags by use of | operator. 73 //! <br><br> 74 //! See xml_document::parse() function. 75 const int parse_validate_closing_tags = 0x200; 76 77 //! Parse flag instructing the parser to trim all leading and trailing whitespace of data nodes. 78 //! By default, whitespace is not trimmed. 79 //! This flag does not cause the parser to modify source text. 80 //! Can be combined with other flags by use of | operator. 81 //! <br><br> 82 //! See xml_document::parse() function. 83 const int parse_trim_whitespace = 0x400; 84 85 //! Parse flag instructing the parser to condense all whitespace runs of data nodes to a single space character. 86 //! Trimming of leading and trailing whitespace of data is controlled by rapidxml::parse_trim_whitespace flag. 87 //! By default, whitespace is not normalized. 88 //! If this flag is specified, source text will be modified. 89 //! Can be combined with other flags by use of | operator. 90 //! <br><br> 91 //! See xml_document::parse() function. 92 const int parse_normalize_whitespace = 0x800; 93 94 // Compound flags 95 96 //! Parse flags which represent default behaviour of the parser. 97 //! This is always equal to 0, so that all other flags can be simply ored together. 98 //! Normally there is no need to inconveniently disable flags by anding with their negated (~) values. 99 //! This also means that meaning of each flag is a <i>negation</i> of the default setting. 100 //! For example, if flag name is rapidxml::parse_no_utf8, it means that utf-8 is <i>enabled</i> by default, 101 //! and using the flag will disable it. 102 //! <br><br> 103 //! See xml_document::parse() function. 104 const int parse_default = 0; 105 106 //! A combination of parse flags that forbids any modifications of the source text. 107 //! This also results in faster parsing. However, note that the following will occur: 108 //! <ul> 109 //! <li>names and values of nodes will not be zero terminated, you have to use xml_base::name_size() and xml_base::value_size() functions to determine where name and value ends</li> 110 //! <li>entities will not be translated</li> 111 //! <li>whitespace will not be normalized</li> 112 //! </ul> 113 //! See xml_document::parse() function. 114 const int parse_non_destructive = parse_no_string_terminators | parse_no_entity_translation; 115 116 //! A combination of parse flags resulting in fastest possible parsing, without sacrificing important data. 117 //! <br><br> 118 //! See xml_document::parse() function. 119 const int parse_fastest = parse_non_destructive | parse_no_data_nodes; 120 121 //! A combination of parse flags resulting in largest amount of data being extracted. 122 //! This usually results in slowest parsing. 123 //! <br><br> 124 //! See xml_document::parse() function. 125 const int parse_full = parse_declaration_node | parse_comment_nodes | parse_doctype_node | parse_pi_nodes | parse_validate_closing_tags;
根据以上提供的信息我们改下之前的源代码:
将
1 doc.parse<0>(requestFile.data());//解析xml 2 auto nodeRoot = doc.first_node("");//获取第一个节点,也就是Root节点
改为
1 doc.parse<rapidxml::parse_declaration_node | rapidxml::parse_comment_nodes | rapidxml::parse_non_destructive>(requestFile.data());//解析xml 2 auto nodeRoot = doc.first_node("Root");//获取第一个节点,也就是Root节点
这里解释一下,parse加入了三个标志,分别是告诉解析器创建声明节点、告诉解析器创建注释节点、和不希望解析器修改传进去的数据,第二句是当有xml的声明时,默认的first_node并不是我们期望的Root节点,因此通过传节点名来找到我们需要的节点。
注:
1.这个库在append的时候并不去判断添加项(节点、属性等)是否存在
2.循环遍历时对项(节点、属性等)进行修改会导致迭代失效
总结:用别人写的库,总会有些意想不到的问题,至今我只遇到了这些问题,如果还有其它问题欢迎补充,顺便解释下"坑"并不一定是用的开源库有问题,更多的时候可能是还没有熟练的去使用这个工具。
感谢rapidxml的作者,为我们提供一个如此高效便利的工具。
标签:
版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com
特别注意:本站所有转载文章言论不代表本站观点,本站所提供的摄影照片,插画,设计作品,如需使用,请与原作者联系,版权归原作者所有
- C++ 在名称空间中使用using声明和using编译指令 2020-05-29
- 1.ffmpeg、ffplay、ffprobe命令使用 2020-05-11
- 关于使用ffmpeg的一些牢骚 2020-05-08
- G++编译链接的那些事!G++的特殊使用方法[常用] 2020-04-19
- c/c++ 使用valgrind检查内存泄漏 2020-04-14
IDC资讯: 主机资讯 注册资讯 托管资讯 vps资讯 网站建设
网站运营: 建站经验 策划盈利 搜索优化 网站推广 免费资源
网络编程: Asp.Net编程 Asp编程 Php编程 Xml编程 Access Mssql Mysql 其它
服务器技术: Web服务器 Ftp服务器 Mail服务器 Dns服务器 安全防护
软件技巧: 其它软件 Word Excel Powerpoint Ghost Vista QQ空间 QQ FlashGet 迅雷
网页制作: FrontPages Dreamweaver Javascript css photoshop fireworks Flash