2 September 2010 AQTRONIX
Info Security
News
Advisories
Whitepapers
Search
 
WebKnight
What is it?
Download
Manual
F.A.Q.
SQL Injection
Robots
Hot Linking
Log Analysis
Testimonials
 
User Agents
Agents
Online Scan
Lookup
 
MailKnight
What is it?
Log Analysis
 
Members
Login
 
You are not logged in

AQTRONIX WebKnight - Robots

Robots

WebKnight blocks bad robots or bad user-agents in three possible ways:

  • Robots Database
  • Bad Bot Trap
  • Aggressive Bot Trap

Robots Database

WebKnight uses a robots database to block known bad bots or any additional bots the administrator specifies. This robot database is the file Robots.xml located in the WebKnight folder of your installation.

Download the latest version of Robots.xml.

Download and overwrite the existing file in your WebKnight folder to have the latest database of known robots. WebKnight will automatically detect and load the new file.

If WebKnight blocks

  • a user agent specified in the User Agent section of the WebKnight configuration
  • a robot that is specified in the Robots section of the WebKnight configuration
it will generate a log entry with any or both of the following messages:
BLOCKED: User Agent not allowed
BLOCKED: '[token]' not allowed in User Agent
To know what the robot / user agent is all about and why it was blocked:

Lookup the user agent in our database.

If you want to allow a certain blocked robot, you need to remove the [token] and User Agent from the robots.xml file or uncheck the appropriate item in the Robots section of WebKnight configuration. To know in which section your robot is, you can lookup the agent.

Bad Bot Trap

This feature enables you to block robots that do not obey robots.txt. This consists of three things:

  1. Robots.txt file
  2. Hidden links to lure bad robots
  3. WebKnight configuration

Add the bot trap urls to your robots.txt file. The robots.txt file should look like this (you can also find a default robots.txt file with the installation):

User-Agent: *
Disallow: /forbidden.for.web.robots/

Now to lure a bad bot into those urls, add these urls with hidden anchors (you don't want anyone to actually click on this link) in your web site:

<a href="/forbidden.for.web.robots/"></a>

Make sure this folder is also added in the WebKnight configuration to the "Deny Bots BotTraps" in the section Robots. To catch all bad robots, make sure to not add the ending forward slash in the WebKnight config file, because some bots request the file without a slash, if it gets a redirection, then it knows the folder exists, if not, it knows it is a bad bot trap. When WebKnight detects access to these urls, it will block the robots for several hours (by default 36 hours). Blocking is done by combination of IP address and User Agent. You will see this in your log file:

BLOCKED: Bad robot fell for trap: '/forbidden.for.web.robots'
BLOCKED: Robot not allowed until timeout expires

Aggressive Bot Trap

This filter enables you to block robots that are requesting too much pages in a short period of time. By default, this filter is not enabled. When enabled and using the default settings, robots that are requesting more than 180 requests in 3 minutes after their initial request for robots.txt, will be blocked. Blocking is done by combination of IP address and User Agent. You will see this in your log file:

BLOCKED: Aggressive robot not allowed

If robots do not request the file robots.txt, they will not be seen as robots and will not be blocked by this filter. If you want to block aggressive users as well, you can block them in the Connection settings (Use Connection Requests Limits).


Published: 17/04/2007Document Type: HOWTO
Last modified: 1/02/2010Target: Administrator
Visibility: PublicLanguage: English

[top] [print] [edit]


Comments (use this form to send comments to the author of the page):
Text:
E-mail: (optional)

HELP US
developing WebKnight

    

AQTRONIX
Serious about Security
Copyright © 2010 AQTRONIX. All rights reserved.