I base all my new cms on having static uploaded contents (images, videos, pdf and such) inside App_Data folder, so that they must be accessed alway through the web site and not with direct access.
Suddenly I realized that this may prevent search engines to index images and other contents. Am I right?
If this is the case I will have to have a public data folder (that can be browsed and indexed) and private content in the App_Data folder. In this second case is there a way to prevent direct access to static content (images hotlinking most of all) inside
the public data folder without having to run the authorization modules on each request (I already excluded static files from firing httpmodules and also implemented custom authorization modules).
You can build a robots.txt file that will tell crawlers to ignore certain directories. Then they will index the folders and files you want them to. See this site for a nice how-to:
http://www.robotstxt.org/robotstxt.html
Thank you bbcompent1 but maybe my question was not clear. I DO want spiders to crawl images inside my App_Data folder, the question is ... will they? Since content in App_Data folder can be accessed
only by the web application. This is my first doubt.
The second question is how to prevent hotlinking (not just crawling) to public static files (images for example) so to disallow someone to use
Theoretically, because this is a server-only utilized folder, I sincerely doubt anything in that folder would be indexed, no more than anything in App_Code would be. The web server denies access to these folders except to the web applicaiton which runs locally.
Mark as answer posts that helped you.
Marked as answer by manight on Jan 29, 2013 07:20 PM
Thank you Prashanth for your reply. Looking around in more IIS oriented resources, I found a nice way to prevent hotlinking with url rewrite. This should be less resource intensive than running the modules I guess:
I wonder if I can safely change the second pattern with "*YourDomain.com/.*" to allow for any subdomains and https as well or there could be some exploit to workaround this added flexibility and hotlink the images
EDIT: changed (and tested) the pattern to: https?://(\w*\.)*yourdomain\.com/.* wich allows for http(s) and optional (multiple) subdomains
manight
Member
59 Points
60 Posts
App_Data folder prevents search engine indexing?
Jan 29, 2013 05:31 PM|LINK
I base all my new cms on having static uploaded contents (images, videos, pdf and such) inside App_Data folder, so that they must be accessed alway through the web site and not with direct access.
Suddenly I realized that this may prevent search engines to index images and other contents. Am I right?
If this is the case I will have to have a public data folder (that can be browsed and indexed) and private content in the App_Data folder. In this second case is there a way to prevent direct access to static content (images hotlinking most of all) inside the public data folder without having to run the authorization modules on each request (I already excluded static files from firing httpmodules and also implemented custom authorization modules).
Thanks in advance
bbcompent1
All-Star
33873 Points
8776 Posts
Moderator
Re: App_Data folder prevents search engine indexing?
Jan 29, 2013 05:59 PM|LINK
You can build a robots.txt file that will tell crawlers to ignore certain directories. Then they will index the folders and files you want them to. See this site for a nice how-to: http://www.robotstxt.org/robotstxt.html
manight
Member
59 Points
60 Posts
Re: App_Data folder prevents search engine indexing?
Jan 29, 2013 06:46 PM|LINK
Thank you bbcompent1 but maybe my question was not clear. I DO want spiders to crawl images inside my App_Data folder, the question is ... will they? Since content in App_Data folder can be accessed only by the web application. This is my first doubt.
The second question is how to prevent hotlinking (not just crawling) to public static files (images for example) so to disallow someone to use
On their http://www.theirsite.com
bbcompent1
All-Star
33873 Points
8776 Posts
Moderator
Re: App_Data folder prevents search engine indexing?
Jan 29, 2013 06:52 PM|LINK
Theoretically, because this is a server-only utilized folder, I sincerely doubt anything in that folder would be indexed, no more than anything in App_Code would be. The web server denies access to these folders except to the web applicaiton which runs locally.
manight
Member
59 Points
60 Posts
Re: App_Data folder prevents search engine indexing?
Jan 29, 2013 07:21 PM|LINK
I guess it is so. Someone has any hands-on experience on this subject?
Anyone with hints on how to prevent hotlinking also?
PrashanthRed...
Member
559 Points
96 Posts
Re: App_Data folder prevents search engine indexing?
Jan 29, 2013 09:06 PM|LINK
Hi manight,
I think it is not possible without module. You must write a module to verify file extension and attach this module to the asp.net pipe line.
Alternative is you can use base64 encoded data URI images like below
Thanks,
Prashanth Reddy
manight
Member
59 Points
60 Posts
Re: App_Data folder prevents search engine indexing?
Jan 29, 2013 10:11 PM|LINK
Thank you Prashanth for your reply. Looking around in more IIS oriented resources, I found a nice way to prevent hotlinking with url rewrite. This should be less resource intensive than running the modules I guess:
<system.webServer> <rewrite> <rules> <rule name="Prevent image hotlinking" enabled="true" stopProcessing="true"> <match url=".*\.(gif|jpg|png)$" /> <conditions> <add input="{HTTP_REFERER}" negate="true" pattern="^$" /> <add input="{HTTP_REFERER}" negate="true" pattern="https?://(\w*\.)*yourdomain\.com/.*" /> </conditions> <action type="Rewrite" url="/images/hotlinking.jpg" /> </rule> </rules> </rewrite> </system.webServer>Original article is here:
https://help.maximumasp.com/KB/a738/using-url-rewrite-to-prevent-image-hotlinking.aspx
I wonder if I can safely change the second pattern with "*YourDomain.com/.*" to allow for any subdomains and https as well or there could be some exploit to workaround this added flexibility and hotlink the images
EDIT: changed (and tested) the pattern to: https?://(\w*\.)*yourdomain\.com/.* wich allows for http(s) and optional (multiple) subdomains