Last post Apr 16, 2008 04:48 PM by CodingTheWheel
Apr 09, 2008 09:24 PM|kenpachi|LINK
It seems Google and other bots ignore my url rewriting and go straight to the page that it is rewritten too, and it this is casuing a lot of problems.
For example the following category works for people using browsers
now when Google and other bots try to access that url, they don't go through the above one, they go through this one
This causes an error in the site code and returns a 500 server error.
My question is why do bots and Google not go through the urls like
they are presented on my site? In no way, shape, or form do I use the
second url explicitly on any of my site pages... In my sitemap I use
and I use that everywhere else as well. This is just one example, it happens on all my categories :( I'm on a IIS 7.0 server using an open source url rewriter module found here
Any help is appreciated, thanks!
Apr 10, 2008 02:14 AM|CodingTheWheel|LINK
Oh, there are a few ways for those links to get into the Google index... was your site ever live prior to implementing URL rewriting? Even for a little bit? If so Google may have spidered those links when they were sitting out in the open.
The question is not so much "why is Google spidering my permalinks" its more "is Google spidering my friendly URLs?" Just so long as your permalinked pages aren't showing up in the index. If they are you can remove them via Google's content removal tool,
or the usual way with noindex, etc.
The other thing, and this is important especially if you do have those invalid links floating around somewhere at Google Inc. You want to return a hard 404 for all external requests for your permalinks. If someone requests
http://www.bestmatchcomputers.com/productlisting.aspx?Category=CD-RW you should be returning a hard 404, not a 500. Only after an official rewrite has occurred is that a valid
Rather than fiddle with URLRewriter.NET to get rid of the 500 error (I assume it's not caused by your code), you could add a lightweight HTTP Module to do a sanity check on incoming URLs, and hard 404 any of them that look like permalinks. Another option
is to 301 redirect but really, you want those links extinct.
Make sense? Hope this helps. Let us know if you figure out it was some specific thing.
Apr 10, 2008 02:31 AM|kenpachi|LINK
I've actually posted this on a few sites now, so I'll cut and paste my findings. Below kind of describes the problem, I found a quick fix for this, though I don't understand how it works atm.
In web.config add this and problem disappears, at least when I browse my site using a googlebot user agent
<forms cookieless="UseCookies" />
See below to another post I made on a different forum
"I'd like to share some debugging tools I found.
User agent switcher:
This tool rocks, it is a Firefox extensions that allows you to switch
your user agent. I put in the googlebot user agent string that was in
my logs and got an interesting error
Here is the string: Mozilla/5.0+(compatible;+Googlebot/2.1;++http://
Why bots are having problems is kind of bizarre, but from what I read
their request to a IIS server is treated differently from a normal
browser request, or at least some user agents are treated differently.
In addition to that, some asp.net 2.0 components generate ../ style
relative paths or "parent paths" for components and this setting is
usually disabled by default on a IIS server. This doesn't seem to be
an issue for regular browsers, but it causes an error for bots
crawling the site.
Use the googlebot agent string with the firefox extension and click on
the link, it should produce the error, unless I have it fixed by then
I'm still trying to wrap my head around all this weirdness, but i'll
keep you guys updated as to my progress, and plan on posting a
complete solution, since this is utter bs. "
Apr 10, 2008 03:08 AM|kenpachi|LINK
Here's some more information about the bug that I found here and elsewhere.
Ok... this is what I have found. All this is due to an ASP.net 2.0 bug that is caused in when you use the following items (url rewriter, themes, master pages). The bug causes user agents that start with
"Mozilla/5.0" or similar to a 500 response, yahoo also starts off with
this. it seems the semi colons in the user agent set off something in asp.net and it screws things up. Below I posted some links to articles that explain the issue along with fixes. At the moment the fix seems to work, i'll check my logs files for the next
few days to see how things go, then i'll add my url rewriter block script see how that goes.
This is one nasty bug that will prevent your site from being indexed by the major search engines. If you have an asp.net 2.0 site and are using url rewriting check your log files. If you constantly get 500 errors from user agents that start with "Mozilla"
then your probably
Get GoogleBot to crash your .NET 2.0 site (ASP.net 2.0 and url rewriting bug)
"It's a pretty severe bug that can potentially prevent you from getting indexed on google, and the best workaround thus far appears to be specifying this in your web.config:"
Current Fix - add this to your web.config
<forms cookieless="UseCookies" />
Master pages, themes and url rewriting
- Enable parent paths
- Move style sheet out of themes folder
<forms cookieless="UseCookies" />
Apr 10, 2008 01:25 PM|CodingTheWheel|LINK
Interesting - I had read that this issue only affected IIS 6.0 installations. Are you seeing it on 7.0?'
I wasn't able to produce this error on the above URL, using the UA string you specified. Maybe it's fixed...?
Another good HTTP debugger is Fiddler:
Apr 10, 2008 02:07 PM|kenpachi|LINK
Ya, I fixed it pretty fast after I posted.Thanks for the heads up on Fiddler
Apr 10, 2008 11:31 PM|docluv|LINK
That is funny, because if you implement URL rewriting correctly the search engine would never know the difference. Are you sure you implemented rewritting correctly?
Apr 11, 2008 02:04 AM|kenpachi|LINK
Apr 11, 2008 03:03 AM|CodingTheWheel|LINK
That's the point I was trying to make - Googlebot will never see internal URLs if rewriting is implemented correctly. But URL rewriting is widely misunderstood, and in particular, I've yet to see any clear evidence that Googlebot can somehow
"get at" the internal rewritten URLs due to some flaw in ASP.NET 2.0 URL rewriting. ASP.NET allows 100% clean URL rewriting with zero messiness of any kind. Even assuming RewritePath were buggy (has that been proven?) RewritePath is just a convenience. It's
quite possible to implement URL rewriting without using RewritePath and without using URL mapping in web.config. ASP.NET gives you the the ability to return any arbitrary response for any arbitrary request. End of story, so far as ASP.NET's URL rewriting "correctness"
Apr 11, 2008 04:38 AM|kenpachi|LINK
"That's the point I was trying to make - Googlebot will never see internal URLs if rewriting is implemented correctly. But URL rewriting is widely misunderstood, and in particular, I've yet to see any clear evidence that Googlebot can somehow "get
at" the internal rewritten URLs due to some flaw in ASP.NET 2.0 URL rewriting."
This is true, they can't.
The reason I initially thought this was because my server logs indicated that this was going on.The logs kept showing bots entering the non-rewritten url paths using parameters, so I truly though this was happening. Trust me this was no easy issue to figure
out, I spent the last couple of weeks on Google groups and they were at a lose. My hosting company cut and paste the urls the bots were going through, trying to convince me I had a coding error even though I didn't use the non-rewritten urls anywhere on my
site, so obviously my code wasn't setup to handle those urls. No one could figure out why the heck some user agents worked and others didn't, it wasn't until I came across some forums and posts where people had the exact same problem I was having.
All the forums and posts that complained of this issue had a few things in common
A few years ago there was a bug with asp.net and rewritepath, so some url rewriters may be using this method and triggering the bug. There is also a "bug" or really shitty feature with themes that generates a really strange path for some files in the theme
folder.For example in one of my files the path for my theme looked like this
The above path is no problem for browsers, but when a certain user agent comes along it crashes on this. Your server needs to have parent paths enabled to deal with this, or need to use the web.config fix I posted in a previous post.
To be honest I still don't fully know if my problem was related to a combination of url rewriter interfering with the themes paths, or if it had to do with certain user agents strings containing semi-colons that messed up something, or perhaps some other bizarre
combination of those components. From what I have read from other people with similar issues is that each of those components can trigger some weird bugs by themselves or in combination with each other, especially themes. This was hell to debug and I have
learned a lot of things, so it wasn't all bad.
Here is some advice to others
1. Do not use themes, it seem to be a broken feature. I remember coming across some other annoying things about them awhile back, hopefully they will be improved.
2. Use url rewriting with caution and think twice about using it in conjunction with themes
Apr 11, 2008 06:42 AM|CodingTheWheel|LINK
Thanks for coming back and posting this info along with the previous post. I assume that URL rewriting is now working 100% correctly (so far as you know) on your ASP.NET website? :-) Right? So all's well that ends well but I'm going to file this one away
in my toolbox just in case I do come across it. Good luck with your site.
Apr 16, 2008 10:18 AM|viva-emptiness|LINK
i dont know why microsoft didnt yet address this issue. this is a MAJOR flaw to their URL rewriting. any one who's using asp.net and URL rewriting (3rd party as well, im using urlrewriter.net) is concerned.
i started experiencing 500 errors when i first saw that on web master tools in google. anyway the usecookies trick will do it for now, i dont know how nasty it is but ill definetely check ...
any updates regarding this issue?
Apr 16, 2008 04:48 PM|CodingTheWheel|LINK
Actually the flaw isn't in Microsoft / ASP.NET at all - well, it doesn't appear to be - and I've been hammering this for a few days now. Does that mean RewritePath isn't quirky? No. But all evidence points to what has always been the case: ASP.NET
can perform 100% clean URL rewriting with zero 500 errors of any kind. I can go into details on that if you'd like but for now, these 500 errors can arise for any number of reasons (including a bug in the 3rd party rewriting tool, etc). May I ask
what kind of URL rewriting you're doing in specific and what problems you've having?