Last post Nov 13, 2013 01:51 PM by DavidLee
Oct 17, 2013 06:47 PM|DavidLee|LINK
We have several sites that Bingbot is hammering, even though we adjusted the robots.txt. That's not what I'm asking here about though.
It showed us something interesting though.
We have a page like: www.OurSite.com/something.aspx
Bingbot is trying to go to:
When I try to go there, I successfully arrive at
www.OurSite.com/something.aspx with no error page. I tried it on CNN's page, Microsoft's pages and other sites, and it works as well.
Looking at my weblogs, Bingbot gets a 200 (i.e. successful) response when it tries to go there, because the base page loads properly. I'd like to thrown a 404 or 403 to hopefully make it stop.
I would appreciate any ideas from the experts.
Oct 18, 2013 05:06 AM|Illeris|LINK
I learned something new. Didn't know IIS was that flexible with URLs :-)
If you can solve this, I think it will be related to IIS configuration, and not with your application architecture.
What do you want to achieve in the end? If you want to block some spiders, and normal methods do not work, you could try blocking at networking level. Block the IP's the bots are using and you'll get rid of them very quickly (most give up after a number of
times receiving the message the site does not exist)
Furthermore, check the official Bing FAQ : http://www.bing.com/webmaster/help/how-can-i-remove-a-url-or-page-from-the-bing-index-37c07477
Oct 18, 2013 10:58 AM|DavidLee|LINK
Thanks for the reply!
The main reason I was curious about the application architecture was (since it's a nopCommerce installation with url rewriting turned on) if the app was somehow doing this.
Ideally I would like a bot (or human) who comes looking for a page that doesn't exist to get an error page (which is already defined).
Somehow these visits are able to just type any folders and still get the page.
I am aware of being able to just block the IP's (as I do with many attacks from Russia and China) but of course blocking a search engine isn't idea. Like most sites, we get the lion's share of traffic from Google, but even though Bing gives only a fraction
of traffic, it's still traffic : )
In Google Webmaster Tools (and Bing Toolbox) we can block individual urls of course, but I'm curious of the root of the issue.
Oct 21, 2013 08:25 AM|Illeris|LINK
I think it is default to the handler catching request in IIS. I doubt it was like this a few years ago. It does help when a user mistypes so it has an advantage.
Nov 13, 2013 01:51 PM|DavidLee|LINK
Thanks for that,
I can certainly appreciate that a customer would be assisted when they mis-type a url.
Sadly, it leaves us open to attack from bots.
The thing that made me wonder if it was application-related was that since those directories do not (and have never) existed, I don't think there's any way that bingbot would be able to get the idea to go to them.
hxxp://www.OurSite.com/our-product-name.aspx does exist
hxxp://www.OurSite.com/download. does exist
hxxp://www.OurSite.com/products/ does exist
hxxp://www.OurSite.com/imagers/ does exist
It's just somehow stringing them together. That's why I wondered about url rewriting.
At this point I'd be happy to just return an error code (404) when they tried going to something like: