Over the course of the past two years, I have worked on a number of Community Server sites. The goal of these tips are to share useful information about Telligent's Community Server platform. There are a few ways you can protect yourself from rouge search bots from stealing your content and bringing down your site.
The easiest way is to setup a robots text file, which will tell search engines to ignore certain files or directories. I found this guide which was very helpful on setting up a robots.txt file in the root directory of your website. There are a few disadvantages to this approach.
- It isn’t guaranteed to work. There are companies out there who don’t care what safe guards you have setup. They are just concerned with indexing or stealing your content.
- People do list there private directories in the robots file thus opening themselves for making public any documents they think are hidden.
If this doesn’t work, and trust me sometimes it doesn’t always work as I learned a few months ago, then you need to go to the extra step. If you take a look at your exception log in community server by navigating to (/controlpanel/Tools/Reports/ExceptionsReport.aspx) or by running the following SQL Query:
1: SELECT [Frequency], [IPAddress], [UserAgent], [HttpReferrer], [HttpVerb], [PathAndQuery], [Exception], [ExceptionMessage], [DateLastOccurred] FROM [dbo].[cs_Exceptions] ORDER BY DateLastOccurred DESC
You will see the IP Address and the name of the spider that's bringing down your site. From here you need to open up Internet Information Services (IIS) Manager. Once open you need to navigate to the site you want to block the IP or IP Range from accessing and then click on IP Address and Domain Restrictions
Next right click and click Add Allow Entry or Add Deny Entry and fill in the information.
The spider has now been blocked. I’d suggest keeping an eye on this and blocking the IP Range of the spider. The last step is to contact the owner of the spider and ask them to stop. You can get this information by looking up the IP Address through Network Solutions.