| User Agent | | Verified | Date Added |
| Almaden | [edit] | Yes | 5/02/2004 18:49:27 |
| Harnessing WebFountain's power will help enterprises gain insightful, highly synthesized, timely, and customized information that is not readily perceptible or available today. This includes information such as emerging trends and patterns, competitive activities, “buzz” about products, relationships affecting customers’ businesses, and pathways to discovery. |
| Arachnophilia | [edit] | | 10/02/2004 20:42:21 |
| The purpose (undertaken by HaL Software) of this run was to collect approximately 10k html documents for testing automatic abstract generation |
| CACTVS Chemistry Spider | [edit] | | 11/02/2004 0:31:27 |
| Locates chemical structures in Chemical MIME formats on WWW and FTP servers and downloads them into database searchable with structure queries (substructure, fullstructure, formula, properties etc.) |
| Cerfinfo | [edit] | Yes | 5/02/2004 20:32:25 |
| CERFinfo.com is a dynamic directory of tens of thousands of carefully selected, information-rich, safe K-12 websites |
| CJNetworkQuality | [edit] | Yes | 5/02/2004 20:31:11 |
| The network quality utility tool searches each publisher Web site that is registered in the Commission Junction network that generates traffic to monitor compliance to the Publisher Service Agreement, specifically, Sections 1 and/or 2.2. |
| Conceptbot | [edit] | | 11/02/2004 16:26:22 |
| The Conceptbot spider is used to research concept-based search indexing techniques. It uses a breadth first search to spread out the number of hits on a single site over time. The spider runs at irregular intervals and is still under construction. |
| DeWeb(c) Katalog/Index | [edit] | Yes | 11/02/2004 16:53:41 |
| Its purpose is to generate a Resource Discovery database, perform mirroring, and generate statistics. Uses combination of Informix(tm) Database and WN 1.11 serversoftware for indexing/ressource discovery, fulltext search, text excerpts. |
| FunnelWeb | [edit] | Yes | 11/02/2004 23:19:50 |
| Its purpose is to generate a Resource Discovery database, and generate statistics. Localised South Pacific Discovery and Search Engine, plus distributed operation under development. |
| GCreep | [edit] | | 12/02/2004 20:42:55 |
| Indexing robot to learn SQL |
| IRLbot | [edit] | Yes | 2/05/2006 0:42:29 |
| IRL-crawler is a Texas A&M research project sponsored in part by the National Science Foundation that investigates algorithms for mapping the topology of the Internet and discovering the various parts of the web. The crawler downloads random web pages (text only) and follows certain links to find other websites. |
| IssueCrawler | [edit] | | 5/02/2004 20:36:25 |
|
| IUSA Browser | [edit] | | 5/02/2004 20:33:57 |
|
| Kilroy | [edit] | | 8/03/2004 0:55:58 |
| Used to collect data for several projects. Runs constantly and visits site no faster than once every 90 seconds. |
| knowledge | [edit] | | 12/09/2006 0:36:05 |
|
| legs | [edit] | Yes | 8/03/2004 1:07:28 |
| The legs robot is used to build the magazine article database for MagPortal.com. |
| Mediapartners | [edit] | Yes | 5/02/2004 20:40:07 |
Google AdSense is for web publishers who want to make more revenue from advertising on their site while maintaining editorial quality.
Mediapartners-Google/2.1 (via babelfish.yahoo.com): this one will look like it is coming from the Yahoo IP range, but the X-Forwarded-For header will contain a Google IP address. |
| MerzScope | [edit] | | 24/07/2005 23:03:08 |
| Robot is part of a Web-Mapping package called MerzScope, to be used mainly by consultants, and web masters to create and publish maps, on and of the World wide web. |
| miniRank | [edit] | | 3/05/2006 0:21:23 |
| miniRank is an online tool that ranks websites by popularity in their respective country.The rank is calculated from a wide range of qualitative factors. Webmasters can't pay for a higher score. |
| Miva | [edit] | | 3/05/2006 0:04:10 |
| MIVA is the new name for Espotting and the FindWhat.com Group. We are now one company, with one brand and one mission - to help businesses grow. |
| MOMspider | [edit] | | 25/07/2005 0:44:17 |
to validate links, and generate statistics. It's usually run from anywhere
Originated as a research project at the University of California, Irvine, in 1993. Presented at the First International WWW Conference in Geneva, 1994. |
| MSRBOT | [edit] | | 5/02/2004 20:37:06 |
| Microsoft is using the MSRBot web crawler to collect data from the web for further study. |
| NZexplorer | [edit] | Yes | 11/02/2004 22:48:30 |
| Started in 1995 to provide a comprehensive index to WWW pages within New Zealand. Now also used in Malaysia and other countries. |
| panscient.com | [edit] | Yes | 9/11/2006 22:42:55 |
At Panscient Technologies we design, build and operate custom internet search engines that unlock the hidden structure of web data.
Using state of the art AI technology, Panscient Technologies' software analyzes web sites for their information content and compiles the data into a searchable index. Our software can be trained to recognize specific entities and relations, so whatever your application, from searching product reviews to detecting new job ads, Panscient Technologies can supply a custom search engine for the task. |
| Patric | [edit] | | 25/07/2005 1:17:25 |
|