There are many reasons not to use the agility pack. The code has not been updated in a long long time. The code is slow when trying to traverse html that has little to no identifiers. If the code was structure to the point where I could easily identify the objects it may make sense but in the case where the html no easy way to find the cells I need it will be easier with the RegEx.
The above RegEx works very fast and find 95% accurate results. There are 2 - 3 extra items showing in the list which should not be there. I added some extra code to the original to check for title_popular and title_approx only and found a small issue with what was showing in the Matches collection:
This is what is showing when I look at the myMatch.Value:
<a href="/title/tt1074193/" onclick="(new Image()).src='/rg/find-title-16/title_substring/images/b.gif?link=/title/tt1074193/';">Decoded: The Making of 'The Matrix Reloaded'</a> (2003) (TV) </td>
This is what is showing in the view source from IE8:
<a href="/title/tt0410519/" onclick="(new Image()).src='/rg/find-title-16/title_approx/images/b.gif?link=/title/tt0410519/';">The Matrix Recalibrated</a> (2003) (TV) </td>
There are some weird discrepencies. 1) title_approx became title_substring? And the characters in the html were escaped showing as ' ? Anyone have any ideas what in the RegEx could be causing these issue or why the could it be a problem with the WebClient object?