[测速]C#.Net基于正则表达式抓取百度百家文章列表的方法示例

更新时间:2021-08-12    来源:正则表达式    手机版     字体:

【www.bbyears.com--正则表达式】

工作之余,学习了一下正则表达式,鉴于实践是检验真理的唯一标准,于是便写了一个利用正则表达式抓取百度百家文章的例子,具体过程请看下面源码:

一、获取百度百家网页内容

publicList GetUrl()
{
  try
  {
    stringurl ="http://baijia.baidu.com/";
    WebRequest webRequest = WebRequest.Create(url);
    WebResponse webResponse = webRequest.GetResponse();
    StreamReader reader =newStreamReader(webResponse.GetResponseStream());
    stringresult = reader.ReadToEnd();
    reader.Close();
    webResponse.Close();
    returnAnalysisHtml(result);
  }
  catch(Exception ex)
  {
    throwex;
  }
}

二、通过正则表达式筛选

publicList AnalysisHtml(stringhtmlContent)
{
  List list =newList();
  stringstrPattern ="

(?[^<]+)</h3>.*\\s*<p\\s*class=\"feeds-item-text\">(?[^<]+).*)\"\\s*target=\"_blank\"\\s*class=\"feeds-item-more\"\\s*mon=\".*\\s*\">.*\\s*</p>"; Regex regex =newRegex(strPattern, RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.CultureInvariant); if(regex.IsMatch(htmlContent)) { MatchCollection matchCollection = regex.Matches(htmlContent); foreach(Match matchinmatchCollection) { string[] str =newstring[3]; str[0] = match.Groups[1].Value;//获取到的是列表数据的标题 str[1] = match.Groups[2].Value;//获取到的是内容 str[2] = match.Groups[3].Value;//获取到的是链接到的地址 list.Add(str); } } returnlist; }</pre><p>本文来源:<a href="http://www.bbyears.com/aspjiaocheng/135501.html">http://www.bbyears.com/aspjiaocheng/135501.html</a></p> <script type="text/javascript"> s3(); </script> </div> <div class="art-page"> <a href='http://www.bbyears.com/aspjiaocheng/135475.html' title="[相对论通俗解释]通俗解释JavaScr">上一篇</a> <a href='http://www.bbyears.com/aspjiaocheng/135522.html' title='正则表达式匹配字符串|正则表达式匹配代码'>下一篇</a> </div> <ul class="art-pernext"> <li class="copy"> 链接:<a href="http://www.bbyears.com/aspjiaocheng/135501.html">http://www.bbyears.com/aspjiaocheng/135501.html</a><br/> <a href="http://www.bbyears.com/aspjiaocheng/135501.html" title="">[测速]C#.Net基于正则表达式抓取百度百家文章列表的方法示例</a>(转载时请注明本文出处及链接) </li> </ul> <div class="art-interest"> <div class="w640 l"> <h3>猜你感兴趣</h3> <div class="bdsharebuttonbox" data-tag="bdshare"> <a href="#" class="bds_weixin" data-cmd="weixin" title="分享到微信"></a> <a href="#" class="bds_qzone" data-cmd="qzone" title="分享到QQ空间"></a> <a href="#" class="bds_tqq" data-cmd="tqq" title="分享到腾讯微博"></a> <a href="#" class="bds_tsina" data-cmd="tsina" title="分享到新浪微博"></a> <a href="#" class="bds_bdysc" data-cmd="bdysc" title="分享到百度云收藏"></a> <a href="#" class="bds_renren" data-cmd="renren" title="分享到人人网"></a> <a href="#" class="bds_more" data-cmd="more"></a> </div> <script> window._bd_share_config = { share: [{ "tag": "bdshare", "bdStyle": "1", "bdSize": 24 }] }; with(document) 0[(getElementsByTagName('head')[0] || body).appendChild(createElement('script')).src = 'http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion=' + ~ ( - new Date() / 36e5)]; </script> </div> <ul> <li> <a href="http://www.bbyears.com/wangyetexiao/135500.html" title="齿轮螺旋角|齿轮逻辑难题150-200关攻略">齿轮螺旋角|齿轮逻辑难题150-200关攻略</a> <em>2021-08-12</em> </li> <li> <a href="http://www.bbyears.com/asp/135499.html" title="【齿轮螺旋角】齿轮逻辑难题第145关通关攻略">【齿轮螺旋角】齿轮逻辑难题第145关通关攻略</a> <em>2021-08-12</em> </li> <li> <a href="http://www.bbyears.com/asp/135498.html" title="齿轮螺旋角_齿轮逻辑难题第144关通关攻略">齿轮螺旋角_齿轮逻辑难题第144关通关攻略</a> <em>2021-08-12</em> </li> <li> <a href="http://www.bbyears.com/jiaocheng/135497.html" title="[程序员客栈]程序员 代码是从头编还是使用框架好呢?">[程序员客栈]程序员 代码是从头编还是使用框架好呢?</a> <em>2021-08-12</em> </li> <li> <a href="http://www.bbyears.com/asp/135496.html" title="齿轮螺旋角|齿轮逻辑难题第143关通关攻略">齿轮螺旋角|齿轮逻辑难题第143关通关攻略</a> <em>2021-08-12</em> </li> <li> <a href="http://www.bbyears.com/asp/135495.html" title="[android studio]Android 媒体开发MediaPlayer状态机接口">[android studio]Android 媒体开发MediaPlayer状态机接口</a> <em>2021-08-12</em> </li> <li> <a href="http://www.bbyears.com/wangyezhizuo/135494.html" title="js实现浏览上传文件的代码|JS实现浏览上传文件的代码">js实现浏览上传文件的代码|JS实现浏览上传文件的代码</a> <em>2021-08-12</em> </li> <li> <a href="http://www.bbyears.com/wangyezhizuo/135493.html" title="vue-cli|Vue0.1的过滤代码如何添加到Vue2.0直接使用">vue-cli|Vue0.1的过滤代码如何添加到Vue2.0直接使用</a> <em>2021-08-12</em> </li> <li> <a href="http://www.bbyears.com/asp/135492.html" title="[游侠对战平台手机下载]游侠对战平台添加好友">[游侠对战平台手机下载]游侠对战平台添加好友</a> <em>2021-08-12</em> </li> <li> <a href="http://www.bbyears.com/shoujikaifa/135491.html" title="苹果恢复大师收费吗_苹果恢复大师教你怎样恢复苹果手机删除的短信">苹果恢复大师收费吗_苹果恢复大师教你怎样恢复苹果手机删除的短信</a> <em>2021-08-12</em> </li> </ul> </div> <script type="text/javascript"> s4(); </script> </div> <div class="w300 r" id="main-right"> <div class="main-warp"> <div class="main-tit"> <h3>热门标签</h3> <a class="main-more" href="/tags">更多>></a> </div> <div class="right-tag"> <a target="_blank" href="/k/cangbaoge/" title="藏宝阁">藏宝阁</a> <a target="_blank" href="/k/cainiaojiaocheng/" title="菜鸟教程">菜鸟教程</a> <a target="_blank" href="/k/chuangketie/" title="创客贴">创客贴</a> <a target="_blank" href="/k/caoliaoerweima/" title="草料二维码">草料二维码</a> <a target="_blank" href="/k/cctv5/" title="cctv5">cctv5</a> <a target="_blank" href="/k/csdn/" title="csdn">csdn</a> <a target="_blank" href="/k/cnki/" title="cnki">cnki</a> <a target="_blank" href="/k/colg/" title="colg">colg</a> </div> <script type="text/javascript"> s7(); </script> <div class="main-tit"> <h3>本类排行</h3> </div> <ul class="right-hot mt5"> <li> <code class="c1">1</code> <a href="http://www.bbyears.com/aspjiaocheng/37486.html" title="[js手机号码正则表达式]手机号码正则表达式详解">[js手机号码正则表达式]手机号码正则表达式详解</a> </li> <li> <code class="c2">2</code> <a href="http://www.bbyears.com/aspjiaocheng/69572.html" title="nginx 配置|Nginx url rewrite 规则参数与实例介绍">nginx 配置|Nginx url rewrite 规则参数与实例介绍</a> </li> <li> <code class="c3">3</code> <a href="http://www.bbyears.com/aspjiaocheng/53254.html" title="[mysql中replaceregexp正则表达式]mysql中replace、regexp正则表达式替换用法">[mysql中replaceregexp正则表达式]mysql中replace、regexp正则表达式替换用法</a> </li> <li> <code class="c4">4</code> <a href="http://www.bbyears.com/aspjiaocheng/36372.html" title="【日期格式正则表达式】日期格式正则表达式javascript代码">【日期格式正则表达式】日期格式正则表达式javascript代码</a> </li> <li> <code class="c5">5</code> <a href="http://www.bbyears.com/aspjiaocheng/27683.html" title="正则表达式匹配中文|正则表达式验证中文代码">正则表达式匹配中文|正则表达式验证中文代码</a> </li> <li> <code class="c6">6</code> <a href="http://www.bbyears.com/aspjiaocheng/29251.html" title="csrf-token_C#正则表达式基础">csrf-token_C#正则表达式基础</a> </li> <li> <code class="c7">7</code> <a href="http://www.bbyears.com/aspjiaocheng/27680.html" title="【js正则表达式验证】js验证数字单双精度函数">【js正则表达式验证】js验证数字单双精度函数</a> </li> <li> <code class="c8">8</code> <a href="http://www.bbyears.com/aspjiaocheng/3530.html" title="[正则表达式语法]正则表达式简介(微软)--8.特殊字符">[正则表达式语法]正则表达式简介(微软)--8.特殊字符</a> </li> <li> <code class="c9">9</code> <a href="http://www.bbyears.com/aspjiaocheng/40988.html" title="phpstudy|php ipv6正则表达式程序代码">phpstudy|php ipv6正则表达式程序代码</a> </li> <li> <code class="c10">10</code> <a href="http://www.bbyears.com/aspjiaocheng/23552.html" title="javascript学习指南_javascript 密码验证程序">javascript学习指南_javascript 密码验证程序</a> </li> </ul> <div class="w300" id="fixed-right"> <script type="text/javascript"> s8(); </script> <div class="main-tit"> <h3>本类最新</h3> <a class="main-more" href="http://www.bbyears.com/clist-151-1.html">更多>></a> </div> <ul class="right-new mt5"> <li><a href="http://www.bbyears.com/aspjiaocheng/149258.html" title="测速_C#中正则表达式与回车换行符问题">测速_C#中正则表达式与回车换行符问题</a></li> <li><a href="http://www.bbyears.com/aspjiaocheng/147421.html" title="js正则表达式替换url参数的方法_JS正则表达式替换url参数的方法">js正则表达式替换url参数的方法_JS正则表达式替换url参数的方法</a></li> <li><a href="http://www.bbyears.com/aspjiaocheng/143454.html" title="[swift中自定义正则表达式运算符]swift中自定义正则表达式运算符=~详解">[swift中自定义正则表达式运算符]swift中自定义正则表达式运算符=~详解</a></li> <li><a href="http://www.bbyears.com/aspjiaocheng/140807.html" title="【myeclipse去除网上复制下来的代码】MyEclipse去除网上复制下来的代码带有的行号(正则去除行号)">【myeclipse去除网上复制下来的代码】MyEclipse去除网上复制下来的代码带有的行号(正则去除行号)</a></li> <li><a href="http://www.bbyears.com/aspjiaocheng/140715.html" title="【js手机号码正则表达式】PHP利用正则表达式实现手机号码中间4位用星号(*)替换显示功能">【js手机号码正则表达式】PHP利用正则表达式实现手机号码中间4位用星号(*)替换显示功能</a></li> <li><a href="http://www.bbyears.com/aspjiaocheng/140267.html" title="[javascript学习指南]Javascript中正则表达式的使用及基本语法">[javascript学习指南]Javascript中正则表达式的使用及基本语法</a></li> <li><a href="http://www.bbyears.com/aspjiaocheng/140260.html" title="[javascript学习指南]JavaScript中正则表达式的概念与应用">[javascript学习指南]JavaScript中正则表达式的概念与应用</a></li> <li><a href="http://www.bbyears.com/aspjiaocheng/140001.html" title="[js正则表达式语法大全]JS使用正则表达式找出最长连续子串长度">[js正则表达式语法大全]JS使用正则表达式找出最长连续子串长度</a></li> <li><a href="http://www.bbyears.com/aspjiaocheng/139839.html" title="android常用正则表达式验证工具类_Android常用正则表达式验证工具类(实例代码)">android常用正则表达式验证工具类_Android常用正则表达式验证工具类(实例代码)</a></li> <li><a href="http://www.bbyears.com/aspjiaocheng/136934.html" title="js实现继承_js实现密码强度验证方法">js实现继承_js实现密码强度验证方法</a></li> </ul> </div> </div> </div> <div class="clearfix"></div> </div> <div class="footer"> <div class="w1000"> <p>本网站版权归作者所有,如果无意之中侵犯了您的版权,请邮件告知或通知网站客服,本站将在3个工作日内删除。</p> <p> Copyright © 2007-2019 程序开发教程网 All rights reserved. <a href="/" title="程序开发教程网">程序开发教程网</a> <a href="http://www.miitbeian.gov.cn" target="_blank" rel="nofollow">京ICP备17136666号</a> <script type="text/javascript">tj();</script> </p> </div> </div> <div class="sidetop"> <ul> <li class="backtop"><a id="backtop" href="javascript:;"></a></li> </ul> </div> <script src="/statics/js/js_yxfw/jquery.min.js"></script> <script src="/statics/js/js_yxfw/common.js"></script> <script language="JavaScript" src="http://www.bbyears.com/api.php?op=count&id=135501&modelid=1"></script> </body> </html>