I'm running into a road block here, I'm trying to fetch a webpage, and getting it's html contents. the problem is, to get the content in the right format, I need to use the correct encoding for it. so I have 2 options here, and both will create problems:
1- Read the stream as UTF8: this will be a problem if the website has non-utf8 encoding, and will cause exceptions.
2- Read the stream using the encoding set by the website, which sometimes is set to the wrong value, and will cause the content to have "boxs" for non-Unicode characters. and after this, I have no way to check for non unicode characters since it will be
hidden in "boxes" automatically in vb.net.
To chose the write method, I was trying to see if I can get the stream from the webpage, using the "webresponse.GetResponseStream()", and then clone the stream, and do the above two tests on it, but I cannot even clone it if I do not know
the encoding... so I'm going into circles here! and I do not want to fetch the same website twice (once for each test), this will be silly and wasteful
Let me know If I'm missing anything.
I did not attach source code is very standard:
'------------code------------
'Create the HttpWebRequest object
Dim req
As HttpWebRequest = WebRequest.Create(url)
webresponse =
DirectCast(req.GetResponse(), HttpWebResponse)
' Get the stream associated with the response.
Dim responseStreamOriginal
As Stream = webresponse.GetResponseStream()
Dim Reader
As StreamReader =
New StreamReader(responseStreamOriginal, "I need to determine the write encoding object right here!!!!")
tasmisr
Member
242 Points
56 Posts
Fetching webpage without knowing the encoding
Apr 02, 2008 04:08 PM|LINK
Hi Everyone,
I'm running into a road block here, I'm trying to fetch a webpage, and getting it's html contents. the problem is, to get the content in the right format, I need to use the correct encoding for it. so I have 2 options here, and both will create problems:
1- Read the stream as UTF8: this will be a problem if the website has non-utf8 encoding, and will cause exceptions.
2- Read the stream using the encoding set by the website, which sometimes is set to the wrong value, and will cause the content to have "boxs" for non-Unicode characters. and after this, I have no way to check for non unicode characters since it will be hidden in "boxes" automatically in vb.net.
To chose the write method, I was trying to see if I can get the stream from the webpage, using the "webresponse.GetResponseStream()", and then clone the stream, and do the above two tests on it, but I cannot even clone it if I do not know the encoding... so I'm going into circles here! and I do not want to fetch the same website twice (once for each test), this will be silly and wasteful
Let me know If I'm missing anything.
I did not attach source code is very standard:
'------------code------------
'Create the HttpWebRequest object Dim req As HttpWebRequest = WebRequest.Create(url)webresponse =
DirectCast(req.GetResponse(), HttpWebResponse) ' Get the stream associated with the response. Dim responseStreamOriginal As Stream = webresponse.GetResponseStream() Dim Reader As StreamReader = New StreamReader(responseStreamOriginal, "I need to determine the write encoding object right here!!!!")Dim returnHtml as String = Reader.ReadToEnd()
'We're all one hand'
-T