需求为:到某一网站抓取查询结果.环境为vb.net
从0开始,一开始具体需要用到.net里的具体什么东东都不清楚,于是就一顿瞎搜索.又是google,又是baidu,yisou……胡乱搜的内容有.net ie,拆分网页 .net,内嵌ie等等.没过多久能得知webbrowser这个控件.
其中对我有帮助比较大的文章是http://www.microsoft.com/china/msdn/archives/workshop/scrape.asp
只是这里介绍的vb环境.到.net也没什么太大差别,别笑!我最开始找shdocvw.dll 和 mshtml.dll添加引用时候都费了半天劲.因为大家都说webbrowser.而.net里写的是microsoft web 浏览器..
先按照上面的文章练一练!
不说废话了.
先做一个输入框,和一个按钮,供输入信息,和提交信息.
在按纽的click事件中写:
dim postdata as string() = {"searchtext=" + me.searchtext.text}
dim strurl as string = "http://"
dim sessionhtml as string = postdate(strurl, postdata)
产生临时文件
dim sw as streamwriter = new streamwriter("d:\1.htm", false, encoding.getencoding("gb2312"))
sw.writeline(sessionhtml)
sw.close()
me.axwebbrowserfill.navigate("d:\1.htm")
postdate函数如下:
public function postdate(byval url as string, byval postdata() as string) as string
dim post as string = ""
拼接成传递变量
for each s as string in postdata
post += s + "&"
next
post = post.substring(0, post.length – 1)
dim html as string = ""
dim encoding as encoding = encoding.getencoding("gb2312")
dim data as byte() = encoding.getbytes(post)
dim myrequest as httpwebrequest = ctype(webrequest.create(url), httpwebrequest)
myrequest.method = "post"
myrequest.contenttype = "application/x-www-form-urlencoded"
myrequest.contenttype = "text/asp"
myrequest.contentlength = data.length
dim newstream as stream = myrequest.getrequeststream()
newstream.write(data, 0, data.length)
newstream.close()
dim resp as httpwebresponse = ctype(myrequest.getresponse(), httpwebresponse)
dim sr as streamreader = new streamreader(resp.getresponsestream(), system.text.encoding.getencoding("gb2312"))
返回html代码的字符串
html = sr.readtoend()
sr.close()
return html
end function
这样就可以了.
至于直接把html显示在webbrowser控件中,而不通过临时文件,在网上搜到的都是delphi办法.而.net似乎没有完美的解决办法.
曾经试过:
axwebbrowserfill.navigate(sessionhtml)
me.axwebbrowserfill.document.write(sessionhtml + "haga")
me.axscriptlet.url = "about:blank" + sessionhtml
me.axwebbrowserfill.document.write(sessionhtml)
doc = me.axwebbrowserfill.document
doc.body.innerhtml = sessionhtml
doc.write(sessionhtml)
往往只是第一次成功,而且中间会涉及到html内双引号的问题.
也有网上说按如下方法:
在webbrowser中显示报告内容字段
dim doc as ihtmldocument2 = ctype(axwebbrowserfill.document, ihtmldocument2)
dim bodyelement as ihtmlelement = ctype(doc.body, ihtmlelement)
bodyelement.innerhtml = sessionhtml + "haga"
而这个方法我就没有奏效过!