WebKnight blocks bad robots or bad user-agents in four possible ways:
WebKnight uses a robots database to block known bad bots or any additional bots the administrator specifies. This robots database is the file Robots.xml located in the WebKnight folder of your installation.
Download the latest version of Robots.xml (right click and choose save as...).
If WebKnight blocks
BLOCKED: User Agent not allowed
BLOCKED: '[token]' not allowed in User AgentTo know what the robot / user agent is all about and why it was blocked:
Lookup the user agent in our database
If you want to allow a certain blocked robot, you need to remove the [token] and User Agent from the robots.xml file or uncheck the appropriate item in the Robots section of WebKnight configuration. To know in which section your robot is, you can lookup the agent.
This feature enables you to block robots that do not obey robots.txt. This consists of three things:
Add the bot trap urls to your robots.txt file. The robots.txt file should look like this (you can also find a default robots.txt file with the installation):
User-Agent: * Disallow: /forbidden.for.web.robots/
Now to lure a bad bot into those urls, add these urls with hidden anchors (you don't want anyone to actually click on this link) in your web site:
Make sure this folder is also added in the WebKnight configuration to the "Deny Bots BotTraps" in the section Robots. To catch all bad robots, make sure to not add the ending forward slash in the WebKnight config file, because some bots request the file without a slash, if it gets a redirection, then it knows the folder exists, if not, it knows it is a bad bot trap. When WebKnight detects access to these urls, it will block the robots for several hours (by default 36 hours). Blocking is done by combination of IP address and User Agent. You will see this in your log file:
BLOCKED: Bad robot fell for trap: '/forbidden.for.web.robots' BLOCKED: Robot not allowed until timeout expires
Aggressive Bot Trap
This filter enables you to block robots that are requesting too many pages in a short period of time. By default, this filter is not enabled. When enabled and using the default settings, robots that are requesting more than 180 requests in 3 minutes after their initial request for robots.txt, will be blocked. Blocking is done by combination of IP address and User Agent. You will see this in your log file:
BLOCKED: Aggressive robot not allowed
If robots do not request the file robots.txt, they will not be seen as robots and will not be blocked by this filter. If you want to block aggressive users as well, you can block them in the Connection settings (Use Connection Requests Limits).
WebKnight 2.5 and later supports url rewriting. Requests for robots.txt can be mapped to a server side script without redirecting the client. IIS will execute this file, but the robot/browser will still see the file robots.txt in the url. This enables you to block certain robots or hackers from seeing the true contents of your robots.txt file.
In addition you can set session variables or block the IP address at the web application level.
A sample ASP script (robots.asp) is provided with WebKnight.