Last post Apr 02, 2008 12:08 PM by tasmisr
Apr 02, 2008 12:08 PM|tasmisr|LINK
I'm running into a road block here, I'm trying to fetch a webpage, and getting it's html contents. the problem is, to get the content in the right format, I need to use the correct encoding for it. so I have 2 options here, and both will create problems:
1- Read the stream as UTF8: this will be a problem if the website has non-utf8 encoding, and will cause exceptions.
2- Read the stream using the encoding set by the website, which sometimes is set to the wrong value, and will cause the content to have "boxs" for non-Unicode characters. and after this, I have no way to check for non unicode characters since it will be
hidden in "boxes" automatically in vb.net.
To chose the write method, I was trying to see if I can get the stream from the webpage, using the "webresponse.GetResponseStream()", and then clone the stream, and do the above two tests on it, but I cannot even clone it if I do not know
the encoding... so I'm going into circles here! and I do not want to fetch the same website twice (once for each test), this will be silly and wasteful
Let me know If I'm missing anything.
I did not attach source code is very standard:
Dim returnHtml as String = Reader.ReadToEnd()