Last post Nov 18, 2014 08:40 AM by NoBullMan
Nov 12, 2014 03:13 PM|NoBullMan|LINK
I am trying to parse a string containing HTML tags and used both XML parsing and Agility Pack but not having much luck.
The page looks something like this:
<html>\n<body>\n<center>\nSystem Status<br><br>\n<table>\n<tr>\n<td>\n</td>\n<td>\n</td>\n</tr>\n<tr>\n<td>\nNetwork Storage (Image Drive) </td>\n<td>\nOK</td>\n</tr>\n<tr>\n<td>\nNetwork</td>\n<td>\nOK</td>\n</tr>\n<tr>\n<td>\nCamera</td>\n<td>\nOK</td>\n</tr>\n</table>\n</center>\n</body>\n</html>\n
The string is a response to an HttpRequest.
It is not well-formed, so XML parsing chokes. I need to be able to get the values "Network Storage (Image Drive)" and "OK" from this row, for example. I need all three rows.
Nov 12, 2014 03:29 PM|gerrylowry|LINK
Since you seem to know the format, and since you seem to have a fixed number of table rows, a not so elegant solution could be accomplished using String methods.
in pseudo, code,
find the third <td>, find the </td> that follows it; Trim() the Substring between the <td> and the </td> tags.
same idea for the following three <td></td> pairs.
Yes, it's ugly, but it works; downside: if someone changes the structure of your table, this ugly solution is fragile and could be easily broken, depending upon the nature of a given change.
N.B.: if you have any control over the web page that you are receiving, then tweak the design of the web page to make it more friendly to parsing.
edit: you could use regex but it's just as fragile for your situation and imho far more complex; imho, String methods are the better choice for your particular example. end edit.
Nov 18, 2014 08:40 AM|NoBullMan|LINK
This is what I used to parse the string:
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
catch (Exception e)
var tableTags = document.DocumentNode.SelectNodes("table");
XElement table = null;
table = XElement.Parse(sHtmlPage);
catch (Exception e)
List<string> lsValues = table.Descendants("td").Select(td => td.Value).ToList();
The resulting list of strings contains all td values.