| User Agent | | Date Added |
| ES.NET |  | 5/07/2005 11:02:08 |
| Innerprise develops full-text indexing search engine software technology enabling search for your Web site, Intranet, or the Web. Advanced crawler features ensure that only documents you want indexed are indexed. Key features provide support for common file types, secure servers, multiple servers, and complete automation through built-in schedulers. |
| FAST Enterprise Crawler |  | 18/03/2004 21:41:19 |
|
| Felix IDE |  | 11/02/2004 22:55:39 |
| Felix IDE is a retail personal search spider sold by The Pentone Group, Inc. It supports the proprietary exclusion "Frequency: ??????????" in the robots.txt file. Question marks represent an integer indicating number of milliseconds to delay between document requests. This is called VDRF(tm) or Variable Document Retrieval Frequency. Note that users can re-define the useragent name. |
| FetchRover |  | 11/02/2004 23:00:32 |
| FetchRover fetches Web Pages. It is an automated page-fetching engine. FetchRover can be used stand-alone or as the front-end to a full-featured Spider. Its database can use any ODBC compliant database server, including Microsoft Access, Oracle, Sybase SQL Server, FoxPro, etc. |
| Fish search |  | 11/02/2004 23:13:23 |
| Its purpose is to discover resources on the fly a version exists that is integrated into the Tübingen Mosaic 2.4.2 browser (also written in C) |
| Fluid Dynamics Search Engine robot (FDSE) |  | 7/02/2004 23:10:47 |
| FDSE is an easy-to-install search engine for local and remote sites. It returns fast, accurate results from a template-driven architecture. |
| gammaSpider/FocusedCrawler |  | 11/02/2004 23:23:05 |
| Information gathering. Focused crawling on specific topic. Uses gammaFetcherServer Product for selling. RobotUserAgent may changed by the user. More features are being added. The product is constatnly under development. AKA FocusedCrawler |
| GastroGnome |  | 29/01/2018 1:43:26 |
|
| GetBot |  | 12/02/2004 20:44:16 |
| GetBot's purpose is to index all the sites it can find that contain Shockwave movies. It is the first bot or spider written in Shockwave. The bot was originally written at Macromedia on a hungover Sunday as a proof of concept. - Alex Zavatone 3/29/96 |
| Google Search Appliance |  | 8/02/2004 0:02:23 |
|
| Googlebot-Image |  | 8/04/2004 22:35:33 |
|
| Grapnel/0.01 Experiment |  | 12/02/2004 20:53:07 |
| Resource Discovery Experimentation |
| Grub |  | 8/02/2004 0:00:25 |
| Leveraging the power of distributed computing, Grub allows everyone with an Internet connection to participate in the last frontier of discovery. By downloading the unique screensaver, you can donate your computer's unused bandwidth to probing the hidden depths of the Web. |
| havIndex |  | 14/02/2004 0:15:57 |
| havIndex allows individuals to build searchable word index of (user specified) lists of URLs. havIndex does not crawl - rather it requires one or more user supplied lists of URLs to be indexed. havIndex does (optionally) save urls parsed from indexed pages. |
| Heritrix |  | 8/02/2004 0:05:08 |
| Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. |
| HKU WWW Octopus |  | 25/07/2005 1:08:54 |
| HKU Octopus is an ongoing project for resource discovery in the Hong Kong and China WWW domain . It is a research project conducted by three undergraduate at the University of Hong Kong |
| ht://Dig |  | 14/02/2004 0:24:00 |
The ht://Dig system is a complete world wide web indexing and searching system for a domain or intranet. This system is not meant to replace the need for powerful internet-wide search systems like Lycos, Infoseek, Google and AltaVista. Instead it is meant to cover the search needs for a single company, campus, or even a particular sub section of a web site. As opposed to some WAIS-based or web-server based search engines, ht://Dig can easily span several web servers. The type of these different web servers doesn't matter as long as they understand common protocols like HTTP. |
| Hyper-Decontextualizer |  | 14/02/2004 0:27:03 |
| Perl 5 Takes an input sentence and marks up each word with an appropriate hyper-text link. |
| IBM_Planetwide |  | 7/03/2004 23:48:26 |
| Restricted to IBM owned or related domains. |
| image.kapsi.net |  | 8/03/2004 0:50:22 |
The image.kapsi.net robot is used to build the database for the image.kapsi.net search service. The robot runs currently in a random times. The Robot was build for image.kapsi.net's database in year 2001. |
| Imagelock |  | 7/03/2004 23:54:21 |
| searches for image links |
| Ingrid |  | 7/03/2004 23:52:52 |
| Commercial as part of search engine package |
| InnerpriseBot |  | 25/07/2005 3:14:07 |
| Innerprise develops full-text indexing search engine software technology enabling search for your Web site, Intranet, or the Web. Advanced crawler features ensure that only documents you want indexed are indexed. Key features provide support for common file types, secure servers, multiple servers, and complete automation through built-in schedulers. |
| IXE Crawler |  | 7/02/2004 23:11:02 |
|
| JavaCrawler |  | 7/02/2004 23:11:28 |
| The JavaCrawler, a prototype next generation MetaCrawler written in Java, supports most of the features already present in the MetaCrawler. |
| JoBo Java Web Robot |  | 8/03/2004 0:40:18 |
JoBo is a web site download tool. The core web spider can be used for any purpose. User agent can be changed by user |
| Jobot |  | 8/03/2004 0:43:30 |
| Its purpose is to generate a Resource Discovery database. Intended to seek out sites of potential "career interest". Hence - Job Robot. |
| JoeBot |  | 8/03/2004 0:44:38 |
| JoeBot is a generic web crawler implemented as a collection of Java classes which can be used in a variety of applications, including resource discovery, link validation, mirroring, etc. It currently limits itself to one visit per host per minute. |
| JoobleBot |  | 23/12/2012 14:54:59 |
| Jooble indexes jobs from the web. |
| JumpStation |  | 8/03/2004 0:47:20 |
|
| KDD-Explorer |  | 8/03/2004 0:52:56 |
KDD-Explorer is used for indexing valuable documents which will be retrieved via an experimental cross-language search engine, CLINKS. This robot was designed in Knowledge-bases Information processing Laboratory, KDD R&D Laboratories, 1996-1997 |
| Keyword Density |  | 8/09/2006 0:57:44 |
| ? |
| Kilroy |  | 8/03/2004 0:55:58 |
| Used to collect data for several projects. Runs constantly and visits site no faster than once every 90 seconds. |
| Knowledge.com |  | 7/01/2005 17:15:41 |
|
| KO_Yappo_Robot |  | 8/03/2004 1:01:34 |
| The KO_Yappo_Robot robot is used to build the database for the Yappo search service by k,osawa (part of AOL). The robot runs random day, and visits sites in a random order. |
| KomodiaBot |  | 26/11/2012 14:15:03 |
|
| Krugle |  | 29/04/2006 23:52:36 |
The Krugle spider crawls web pages, documents, and archives looking for technical information that would be of value to programmers.
We use the results of the crawl to provide a vertical search service for programmers. Our product page explains how Krugle helps programmers find code and answers to technical questions.
Our spider is based on Nutch (version 0.8 as of April 2006), and uses various open source components to pull down publicly available information via HTTP(S) and FTP.
To be polite, our spider tries to only access a given domain via one thread at any time. In addition, we impose a 5 second minimum delay between requests. |
| LabelGrabber |  | 8/03/2004 1:05:03 |
| The PICS label grabber application searches the WWW for PICS Labels and submits them to the label bureau of your choice. It can be used to populate a label bureau for testing, or for any other label bureau purpose. |
| Larbin |  | 8/02/2004 0:11:21 |
| Larbin is a web crawler (also called (web) robot, spider, scooter...). It is intended to fetch a large number of web pages to fill the database of a search engine. |
| LexiBot |  | 8/09/2006 11:48:18 |
|
| linkdex.com |  | 1/12/2010 13:53:05 |
Linkdex is an enterprise-class platform that combines team management software with search engine optimization tools. Sometimes we crawl websites to understand backlink profiles or gather information for our users so they can improve their websites and SEO strategies.
|
| LinkScan |  | 7/02/2004 14:12:22 |
LinkScan is an industrial-strength link checking and website management tool. LinkScan checks links, validates HTML and creates site maps |
| LivelapBot |  | 29/08/2014 9:01:20 |
Livelap is a content discovery app that indexes web content. Probably you have seen the Livelapbot/0.1 or LivelapBot/0.2 crawler in your server logs. LivelapBot can visit a page if it is shared on social media, and as part of its RSS/page crawling schedule.
What does LivelapBot collect
Livelap indexes web content and makes meta data and a link to your content available in livelap.com and in the Livelap app. For indexing we only use official HTML and media meta tags in your page. We don't scrape the contents of your articles. The following fields are used for indexing: Title Description Author Publication date Type of content (article, photo, video, etc) Images (og, twitter and other standard tags) Videos (og, twitter and other standard tags) RSS links Detect whether showing page in iframe is allowed |
| Lockon |  | 8/03/2004 1:15:24 |
| This robot gathers only HTML documents. |
| logo.gif Crawler |  | 24/07/2005 22:51:36 |
meta-indexing engine for corporate logo graphics The robot runs at irregular intervals and will only pull a start page and its associated /.*logo\.gif/i (if any). It will be terminated once a statistically significant number of samples has been collected.
logo.gif is part of the design diploma of Markus Weisbeck, and tries to analyze the abundance of the logo metaphor in WWW corporate design. The crawler and image database were written by Sevo Stille and Peter Frank of the Institut für Neue Medien, respectively. |
| lufsbot |  | 13/07/2013 9:44:15 |
| Fake search engine (no results in search). |
| Mac WWWWorm |  | 24/07/2005 22:53:56 |
| a French Keyword-searching robot for the Mac The author has decided not to release this robot to the public |
| Magpie |  | 24/07/2005 22:54:50 |
| Used to obtain information from a specified list of web pages for local indexing. Runs every two hours, and visits only a small number of sites. |
| meanpathbot |  | 28/06/2013 13:56:19 |
meanpath is a new search engine that allows software developers to access detailed snapshots of millions of websites without having to run their own crawlers. Our clients use the information we gather from your site to help solve problems in these areas: - Semantic analysis - Linguistics - Identity theft protection - Malware and virus analysis
|
| MediaFox |  | 24/07/2005 23:01:53 |
| The robot is used to index meta information of a specified set of documents and update a database accordingly. |