欢迎光临
我们一直在努力

抓取网页中的链接-.NET教程,Asp.Net开发

建站超值云服务器,限时71元/月

输入一个地址,就可以把那个网页中的链接提取出来,下面这段代码可以轻松实现,主要的是用到了正则表达式。

geturl.aspx代码如下:

<%@ page language="vb" codebehind="geturl.<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>x.vb&#8221; autoeventwireup=&#8221;false&#8221; inherits=&#8221;<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>xweb.geturl&#8221; %><br /> <html><br /> <head><br /> <meta http-equiv="content-type" content="text/html; charset=gb2312"><br /> </head><br /> <body></p> <form id="form1" method="post" runat="server"> <p> <<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:label id=&#8221;label1&#8243; runat=&#8221;server&#8221;></<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:label><br /> <<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:textbox id=&#8221;urltextbox&#8221; runat=&#8221;server&#8221; width=&#8221;336px&#8221;><br /> http://lucky_elove.www1.dotnetplayground.com/<br /> </<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:textbox><br /> <<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:button onclick=&#8221;scrapebutton_click&#8221; id=&#8221;scrapebutton&#8221; runat=&#8221;server&#8221;></<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:button> </p> <hr width="100%" size="1"> <p> <<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:label id=&#8221;tipresult&#8221; runat=&#8221;server&#8221;></<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:label><br /> <<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:textbox id=&#8221;resultlabel&#8221; runat=&#8221;server&#8221; textmode=&#8221;multiline&#8221;<br /> width=&#8221;100%&#8221; height=&#8221;400&#8243;></<a href="http://www.west999.com/www/go/?url=http://www.chinaitpower.com/dev/web/asp/index.html" target="_blank" rel="nofollow" >asp</a>:textbox> </p> </form> <p></body><br /> </html>

后代码geturl.aspx.vb如下:

imports system.io<br /> imports system.net<br /> imports system.text<br /> imports system.text.regularexpressions<br /> imports system</p> <p>public class geturl<br /> inherits system.web.ui.page<br /> protected withevents label1 as system.web.ui.webcontrols.label<br /> protected withevents urltextbox as system.web.ui.webcontrols.textbox<br /> protected withevents scrapebutton as system.web.ui.webcontrols.button<br /> protected withevents tipresult as system.web.ui.webcontrols.label<br /> protected withevents resultlabel as system.web.ui.webcontrols.textbox</p> <p>#region &#8221; web 窗体设计器生成的代码 &#8221;</p> <p> 该调用是 web 窗体设计器所必需的。<br /> <system.diagnostics.debuggerstepthrough()> private sub initializecomponent()</p> <p> end sub</p> <p> private sub page_init(byval sender as system.object, byval e as system.eventargs) handles mybase.init<br /> codegen: 此方法调用是 web 窗体设计器所必需的<br /> 不要使用代码编辑器修改它。<br /> initializecomponent()<br /> end sub</p> <p>#end region</p> <p> private sub page_load(byval sender as system.object, byval e as system.eventargs) handles mybase.load<br /> 在此处放置初始化页的用户代码<br /> label1.text = &#8220;请输入一个url地址:&#8221;<br /> scrapebutton.text = &#8220;分离href链接&#8221;<br /> end sub<br /> private report as new stringbuilder()<br /> private webpage as string<br /> private countofmatches as int32</p> <p> public sub scrapebutton_click(byval sender as system.object, byval e as system.eventargs)<br /> webpage = graburl()<br /> dim mydelegate as new matchevaluator(addressof matchhandler)</p> <p> dim linksexpression as new regex( _<br /> &#8220;\<a.+?href=[""](?!http\:\/\/)(?!mailto\:)(?>foundanchor>[^&#8221;&#8221;>]+?)[^>]*?\>&#8221;, _<br /> regexoptions.multiline or regexoptions.ignorecase or regexoptions.ignorepatternwhitespace)</p> <p> dim newwebpage as string = linksexpression.replace(webpage, mydelegate)</p> <p> tipresult.text = &#8220;</p> <h2>从 &#8221; &#038; urltextbox.text &#038; &#8220;分离出的href链接</h2> <p>&#8221; &#038; _<br /> &#8220;<b>找到并整理&#8221; &#038; countofmatches.tostring() &#038; &#8221; 个链接</b></p> <p>&#8221; &#038; _<br /> report.tostring().replace(environment.newline, &#8220;<br />&#8220;)<br /> tipresult.text &#038;= &#8220;</p> <h2>整理过的页面</h2> <p><script>window.document.title=抓取网页中的链接</script>&#8221;<br /> resultlabel.text = newwebpage<br /> end sub</p> <p> public function matchhandler(byval m as match) as string<br /> dim link as string = m.groups(&#8220;foundanchor&#8221;).value<br /> dim rtol as new regex(&#8220;^&#8221;, regexoptions.multiline or regexoptions.righttoleft)<br /> dim col, row as int32<br /> dim linebegin as int32 = rtol.match(webpage, m.index).index</p> <p> row = rtol.matches(webpage, m.index).count<br /> col = m.index &#8211; linebegin</p> <p> report.appendformat( _<br /> &#8220;link <b>{0}</b>, fixed at row: {1}, col: {2}{3}&#8221;, _<br /> server.htmlencode(m.groups(0).value), _<br /> row, _<br /> col, _<br /> environment.newline _<br /> )<br /> dim newlink as string<br /> if link.startswith(&#8220;/&#8221;) then<br /> newlink = link.substring(1)<br /> else<br /> newlink = link<br /> end if</p> <p> countofmatches += 1<br /> return m.groups(0).value.replace(link, newlink)<br /> end function</p> <p> private function graburl() as string<br /> dim wc as new webclient()<br /> dim s as stream = wc.openread(urltextbox.text)<br /> dim sr as streamreader = new streamreader(s, system.text.encoding.default)<br /> graburl = sr.readtoend<br /> s.close()<br /> wc.dispose()<br /> end function</p> <p>end class<br />

赞(0)
版权申明:本站文章部分自网络,如有侵权,请联系:west999com@outlook.com 特别注意:本站所有转载文章言论不代表本站观点! 本站所提供的图片等素材,版权归原作者所有,如需使用,请与原作者联系。未经允许不得转载:IDC资讯中心 » 抓取网页中的链接-.NET教程,Asp.Net开发
分享到: 更多 (0)