how to crawl or grab the title from html url -----------------------------------------------------------------------------------------------------------
url http://dictionary.reference.com/
title Dictionary.com | Find the Meanings and Definitions of Words at Dictionary.com ----------------------------------------------------------------------------------------------------------
what i use is asp.net that has default.aspx
default.aspx.cs
what i have in .aspx page
one text box (TextBox1)-
enter url button field (Button1) -
search url gridview (GridView1)-
to display title value head of such url in textfield
[i]when i enters url [http://dictionary.reference.com/] and
[ii]give searchgridview1 should show/displays the respective title[Dictionary.com | Find the Meanings and Definitions of Words at Dictionary.com] of the
pg.
if anyone knows the full code answer/mail me at ponmanivannangj@mobiusservices.in or
pmv.vel@gmail.com
Based on my understanding, you need to fetch the title of a page via its url address. For example, if a customer inputs the url "http://www.asp.net", we need to return the title as "Home: The Official Microsoft ASP.NET Site".
To handle this task, we can use WebClient class to get all the HTML code of a page and then find its title by a regular expression. Here I made a simple demo for you. Please have a try on it.
<%@ Page Language="C#" %>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<script runat="server">
protected void Button1_Click(object sender, EventArgs e)
{
System.Net.WebClient wc = new System.Net.WebClient();
string html = wc.DownloadString(TextBox1.Text);
System.Text.RegularExpressions.Regex rg = new System.Text.RegularExpressions.Regex(@"<title>(.*?)</title>");
System.Text.RegularExpressions.Match m = rg.Match(html);
Response.Write(m.Groups[1]);
}
</script>
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title></title>
</head>
<body>
<form id="form1" runat="server">
<div>
<asp:TextBox ID="TextBox1" runat="server" Text="http://www.asp.net/"></asp:TextBox>
<asp:Button ID="Button1" runat="server" Text="Click Me" OnClick="Button1_Click" />
</div>
</form>
</body>
</html>
manivannan_v...
Member
38 Points
60 Posts
how to get the content or grab/crawl title value of respective page source url in gridview
Aug 16, 2010 10:00 AM|LINK
how to crawl or grab the title from html url -----------------------------------------------------------------------------------------------------------
url http://dictionary.reference.com/
title Dictionary.com | Find the Meanings and Definitions of Words at Dictionary.com ----------------------------------------------------------------------------------------------------------
what i use is asp.net that has default.aspx
default.aspx.cs
what i have in .aspx page
one text box (TextBox1)-
enter url button field (Button1) -
search url gridview (GridView1)-
to display title value head of such url in textfield
[i]when i enters url [http://dictionary.reference.com/] and
[ii]give search gridview1 should show/displays the respective title[Dictionary.com | Find the Meanings and Definitions of Words at Dictionary.com] of the pg.
if anyone knows the full code answer/mail me at ponmanivannangj@mobiusservices.in or pmv.vel@gmail.com
Shengqing Ya...
All-Star
45968 Points
2997 Posts
Re: how to get the content or grab/crawl title value of respective page source url in gridview
Aug 19, 2010 12:49 PM|LINK
Hi,
Based on my understanding, you need to fetch the title of a page via its url address. For example, if a customer inputs the url "http://www.asp.net", we need to return the title as "Home: The Official Microsoft ASP.NET Site".
To handle this task, we can use WebClient class to get all the HTML code of a page and then find its title by a regular expression. Here I made a simple demo for you. Please have a try on it.
<%@ Page Language="C#" %> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <script runat="server"> protected void Button1_Click(object sender, EventArgs e) { System.Net.WebClient wc = new System.Net.WebClient(); string html = wc.DownloadString(TextBox1.Text); System.Text.RegularExpressions.Regex rg = new System.Text.RegularExpressions.Regex(@"<title>(.*?)</title>"); System.Text.RegularExpressions.Match m = rg.Match(html); Response.Write(m.Groups[1]); } </script> <html xmlns="http://www.w3.org/1999/xhtml"> <head runat="server"> <title></title> </head> <body> <form id="form1" runat="server"> <div> <asp:TextBox ID="TextBox1" runat="server" Text="http://www.asp.net/"></asp:TextBox> <asp:Button ID="Button1" runat="server" Text="Click Me" OnClick="Button1_Click" /> </div> </form> </body> </html>For more information about WebClient class, you can refer to MSDN at http://msdn.microsoft.com/en-us/library/system.net.webclient.aspx.
If I have misunderstood you, please feel free to let me know.
Best Regards,
Shengqing Yang
If you have any feedback about my replies, please contact msdnmg@microsoft.com.
Microsoft One Code Framework
manivannan_v...
Member
38 Points
60 Posts
Re: how to get the content or grab/crawl title value of respective page source url in gridview
Aug 20, 2010 09:53 AM|LINK
thats great
u gave me solution for simple type. thanks
thats working but want something like this more interior; with respect to the tag wise like <table><tr><td> ...... etc
i/p is same :
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
one text box (TextBox1)-
enter url button field (Button1) -
search url gridview (GridView1)
hence to be clear i ll be giving you an url:
http://cgi.ebay.com/GRIFFIN-HANDSFREE-AUX-Cable-MobileSmartPhone-Compatible-/130422824174?pt=AU_Electronics_Portable_Audio_Accessories
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
in that page take the View selection source Page[right click on the selected page]
i want to crawl the page values like
Item specifics
o/p
when i give that url in the textbox1(http://cgi.ebay.com/GRIFFIN-HANDSFREE-AUX-Cable-MobileSmartPhone-Compatible-/130422824174?pt=AU_Electronics_Portable_Audio_Accessories) ,answer should be like this
------------------------------------------------------------------------------------------------------------------------------------------------------------
title:GRIFFIN HANDSFREE AUX Cable-MobileSmartPhone Compatible - eBay (item 130422824174 end time Aug-24-10 17:21:37 PDT)
Item specifics
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[2]sample i/p in textbox1:
http://cgi.ebay.com/CHARGER-CAR-USE-THREE-OUTLETS-PLUS-USB-GREAT-VALUE-/230367337291?pt=AU_Electronics_Portable_Audio_Accessories
corresponding output
title:
Item specifics: CHARGER CAR USE THREE OUTLETS PLUS USB GREAT VALUE - eBay (item 230367337291 end time Sep-07-10 22:42:12 PDT)
SANSAI
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[3]sample i/p in textbox1:
http://cgi.ebay.com/CHARGER-CAR-USE-THREE-OUTLETS-PLUS-USB-GREAT-VALUE-/230367337291?pt=AU_Electronics_Portable_Audio_Accessories
corresponding output
title:USB AC CHARGER THREE YEAR WARRANTY FREE DELIVERY AUST - eBay (item 230430393196 end time Aug-26-10 15:02:49 PDT)
Item specifics
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Shengqing Ya...
All-Star
45968 Points
2997 Posts
Re: how to get the content or grab/crawl title value of respective page source url in gridview
Aug 24, 2010 03:52 AM|LINK
Hi,
Please have a try on this Button1_Click code. It should work for you.
protected void Button1_Click(object sender, EventArgs e) { System.Net.WebClient wc = new System.Net.WebClient(); string html = wc.DownloadString(TextBox1.Text); System.Text.RegularExpressions.Regex rg_title = new System.Text.RegularExpressions.Regex(@"<title>(.*?)</title>"); System.Text.RegularExpressions.Match m_title = rg_title.Match(html); Response.Write("<b>Title:</b> " + m_title.Groups[1] + "<br /><br />"); Response.Write("<b>Item Specifics:</b><br /><br />"); string p_condition = @"<th.*?>Condition: </th><td.*?>(.*?)</td>"; System.Text.RegularExpressions.Regex rg_condition = new System.Text.RegularExpressions.Regex(p_condition); System.Text.RegularExpressions.Match m_condition = rg_condition.Match(html); if (rg_condition.IsMatch(html)) { Response.Write("<b>Condition:</b> " + m_condition.Groups[1] + "<br />"); } string p_suits = @"<th.*?>Suits: </th><td.*?>(.*?)</td>"; System.Text.RegularExpressions.Regex rg_suits = new System.Text.RegularExpressions.Regex(p_suits); System.Text.RegularExpressions.Match m_suits = rg_suits.Match(html); if (rg_suits.IsMatch(html)) { Response.Write("<b>Suits:</b> " + m_suits.Groups[1] + "<br />"); } string p_type = @"<th.*?>Product Type: </th><td.*?>(.*?)</td>"; System.Text.RegularExpressions.Regex rg_type = new System.Text.RegularExpressions.Regex(p_type); System.Text.RegularExpressions.Match m_type = rg_type.Match(html); if (rg_type.IsMatch(html)) { Response.Write("<b>Product Type:</b> " + m_type.Groups[1] + "<br />"); } string p_brand = @"<th.*?>Brand: </th><td.*?>(.*?)</td>"; System.Text.RegularExpressions.Regex rg_brand = new System.Text.RegularExpressions.Regex(p_brand); System.Text.RegularExpressions.Match m_bramd = rg_brand.Match(html); if (rg_brand.IsMatch(html)) { Response.Write("<b>Brand:</b> " + m_bramd.Groups[1] + "<br />"); } }Best Regards,
Shengqing Yang
If you have any feedback about my replies, please contact msdnmg@microsoft.com.
Microsoft One Code Framework