I am capturing the response from an HTTP post and getting back a whole big bunch of HTML. That HTML contains a listing of lots of products and associated info, like quantity, price, vendor, etc. I need to pull out this data and ultimately get it into some
kind of data structure. Is there anything more efficient for me to do other than write something that parses the HTML as one long string, where I have to start at the beginning of the string and move along looking for certain values and pulling out data
as substrings?
It sounds like you are screenscraping. You could look for repeated datastructures in the html and then parse those sections. I must admit I use screenscraper. There is a free version which could be handy for you if you are not using this to an enterprise
level.
Member
5 Points
77 Posts
Best way to parse through HTTP reponse
Feb 13, 2012 10:48 AM|Brian_Burgit|LINK
I am capturing the response from an HTTP post and getting back a whole big bunch of HTML. That HTML contains a listing of lots of products and associated info, like quantity, price, vendor, etc. I need to pull out this data and ultimately get it into some kind of data structure. Is there anything more efficient for me to do other than write something that parses the HTML as one long string, where I have to start at the beginning of the string and move along looking for certain values and pulling out data as substrings?
Thanks,
Brian
Member
517 Points
425 Posts
Re: Best way to parse through HTTP reponse
Feb 13, 2012 11:02 AM|seamus1982|LINK
It sounds like you are screenscraping. You could look for repeated datastructures in the html and then parse those sections. I must admit I use screenscraper. There is a free version which could be handy for you if you are not using this to an enterprise level.
Regards,
Seamus
Participant
1120 Points
282 Posts
Re: Best way to parse through HTTP reponse
Feb 13, 2012 11:04 AM|simon.hatchard|LINK
You could take a look at the HTML agility pack
http://htmlagilitypack.codeplex.com/
Member
5 Points
77 Posts
Re: Best way to parse through HTTP reponse
Feb 13, 2012 11:13 AM|Brian_Burgit|LINK
Yes Seamus, scraping I am. I do see repeated HTML structures, which will make it perhaps a little easier. Thanks
Member
5 Points
77 Posts
Re: Best way to parse through HTTP reponse
Feb 13, 2012 11:15 AM|Brian_Burgit|LINK
Thanks Simon, I will download and see how it goes, this sounds like what I need, thanks for the tip.