Last post Dec 11, 2008 10:31 PM by Nai-Dong Jin - MSFT
Dec 08, 2008 04:54 PM|Waldo_Dobbs|LINK
Hi, I'm currently building an CMS system which uses C#, ASP.NET(version 2) and SQL Server 2005 at the backend. The CMS system must support many different languages and to this end we've designed the system using unicode throughout (NVARCHARs in db etc...).
This all seems to work fine. We are also using URL rewriting. On our development machines (Windows XP with IIS 5) the URLs are delivered to the code correctly, however when we rollout the site to a Win2003 with IIS6 server any URLs that contain Cyrillic characters
(and presumably other non-European chatacter sets) can't be read properly. This seems to be outside the control of the application and is presumably an IIS issue. When we get the raw URL using Request.RawUrl we can see that these characters have been replaced
by question marks. I can find a number of people reporting similar problems via Google, but no one seems to have a firm idea of how to fix the problem.
As mentioned it works on WinXp with IIS5 but not Win2003 with IIS6.
An English page with a valid URL (the page loads correctly)
An English page with an invalid URL (the page is not found but the RawUrl is correct)
A Russian page with a valid URL (the Cyrillic characters are replaced in the RawUrl with question marks)
Because the URL replaces the Cyrillic characters with question marks the underlying code cannot then determine what content to load from the db, hence the page not found error.
Any recommendations would be gratefully received, unsurprisingly this is quite urgent!
Dec 08, 2008 06:16 PM|MetalAsp.Net|LINK
I'm wondering if the UrlEncode method would help in this case. It's worth a try...
Dec 08, 2008 07:21 PM|Waldo_Dobbs|LINK
Thanks for the response, although I've tried URL encoding and get the same result,
All the CMS pages are served by a page called node.aspx which takes a parameter telling it what page to load. The URL rewriting mechanism should take the supplied URL, strip the path out of it and then calls the node.aspx page setting the path parameter
This works perfectly for English, Spanish, French, German languages (all Latin based alphabets) but not Cyrillic and presumably not for Arabic, Chinese etc... although I've not tried any of those. Oddly enough if I call node.aspx and set the path parameter
manually then the pages load correctly, so I know that the underlying .NET and database code works. Either of the following examples will work;
The code I'm using to do the URL interception/rewriting is based off this method;
So I'm trying to understand why IIS would have replaced these characters with question marks before the .NET code can begin to manipulate the URL in the Application_BeginRequest event within global.asax? There must be other .NET based multilingual systems
that use non European characters in their URLs?
Dec 11, 2008 10:31 PM|Nai-Dong Jin - MSFT|LINK
Just a clarify, when you use Urlencode method to encode your querystrings in url, have you use Urldecode to decode the string to pass to your core process module?