how to parse a web page?

Last post 09-20-2008 2:09 PM by samrid. 8 replies.

Sort Posts:

  • how to parse a web page?

    05-21-2007, 3:31 PM
    • Member
      point Member
    • test84
    • Member since 04-24-2007, 9:54 PM
    • Posts 7

    Hi! I wanted to parse a page each day and get some numbers from it, how can i do it in asp.net? is there any controls that speeds up my coding? i heard some ppl use regular expressions, should i use them too? thnx in advance.

    Filed under:
  • Re: how to parse a web page?

    05-21-2007, 3:53 PM
    • All-Star
      77,566 point All-Star
    • jeff@zina.com
    • Member since 09-26-2003, 10:43 AM
    • Naples, FL, USA
    • Posts 10,552
    • Moderator
      TrustedFriends-MVPs

    Google for "Screen Scraping" or "Page Scraping" and you'll find plenty of information.

    Jeff

    Blatant Self Promotion: ASP.NET 3.5 CMS Development
  • Re: how to parse a web page?

    05-21-2007, 3:56 PM

    You would use HttpWebRequest to get the page text, then, usually regular expressions to match the bit you want and extract it.  HttpWebRequest will return the html source of the page, so you would be advised to look at that first, then see if you can identify the area where the numbers appear.  Hopefully, they will be in a div or span with a unique ID.  Most often, they are not, but that doesn't mean to say a regular expression to do the job will be too difficult to construct.

    If you want to post the url of the page, and show which bit you want to extract, someone might help with the code for you. 

    Regards Mike
    [MVP - ASP/ASP.NET]
    My site    Please help - URGENT!!!    What ASP.NET can and can't do
  • Re: how to parse a web page?

    05-21-2007, 4:02 PM
    • Member
      point Member
    • test84
    • Member since 04-24-2007, 9:54 PM
    • Posts 7

    thnx!

    the problem is, most of them try to parse a asp.net page, i do want to parse a normal web page, i donno how its made since the site, say yahoo, hides extension of their dynamic web page.

    thnx! 

  • Re: how to parse a web page?

    05-21-2007, 4:08 PM
    • Member
      point Member
    • test84
    • Member since 04-24-2007, 9:54 PM
    • Posts 7

    thnx!

    the problem is, most of them try to parse a asp.net page, i do want to parse a normal web page, i donno how its made since the site, say yahoo, hides extension of their dynamic web page.

    thnx! 

  • Re: how to parse a web page?

    05-21-2007, 4:12 PM
    Answer

    It doesn't matter what technology is used to generate the web page you want to "read". It will always result in html.  HttpWebRequest returns a string containing the resulting html.

     

    Regards Mike
    [MVP - ASP/ASP.NET]
    My site    Please help - URGENT!!!    What ASP.NET can and can't do
  • Re: how to parse a web page?

    09-19-2008, 10:47 PM
    • Member
      point Member
    • test84
    • Member since 04-24-2007, 9:54 PM
    • Posts 7

    Ok, I managed to get my page from the web not I have to extract my data from it. I discovered that my data is between  

    <span id="some random text that changes"> THE DATA THAT I WANT IS HERE </span>
      and , so would you please help me to build proper regular expression in order to extract my data from between those tags please?
  • Re: how to parse a web page?

    09-20-2008, 12:40 PM

    Not easily.  You need to see if the <span> appears in some other easily identifiable context.  Otherwise all you will get is the content of every span on the page.

     

    Regards Mike
    [MVP - ASP/ASP.NET]
    My site    Please help - URGENT!!!    What ASP.NET can and can't do
  • Re: how to parse a web page?

    09-20-2008, 2:09 PM
    • Member
      2 point Member
    • samrid
    • Member since 09-20-2008, 12:17 PM
    • Posts 1
Page 1 of 1 (9 items)