Last post Sep 18, 2007 10:00 PM by kirilminev
Sep 18, 2007 06:42 PM|kirilminev|LINK
I am in a process of developing a small application which will go to certain web sites that will be set via the code than view the markup of these web sites and manipulate the content. My objective is once I am able to access the markup to be able to retrieve
information from those web site b specifying certain tags. For example if I have a table with some information I would like to go to that site and be able to pull out the content from <table><tr><td>something</td></tr></table> whatever is whiting this tag
I need to be able to pull out and store to my own database for further manipulation.
I know that there is gotta be classes from either c# or vb with which I can manipulate the markup language, but I guess that is my question which once and how to implement them.
If someone has experienced doing something like that before that i willing to share with me will greatly appreciated.
Sep 18, 2007 09:28 PM|dwhite|LINK
This is typically referred to as "Screen Scraping"; have a search on Google...
Anyway, here are some links to get you started:
Once you get the HTML, you can parse it using standard XML objects (e.g. XPathDocument), however this will only work for well-formed XHTML. Failing that you can try an HTML parser, such as:
Hope this helps.
Sep 18, 2007 10:00 PM|kirilminev|LINK
Thank you I will defiently look at the above mentioned links. I got it to the part where I am using the webrequest and webresponse object than I just return the whole markup as string. Than is the part where I need to organize the information.