| User Agent | | Date Added |
| RuLeS |  | 27/07/2005 23:31:49 |
|
| SafeDNS Search Bot |  | 18/10/2015 8:24:58 |
| The main reason for us at SafeDNS to collect web pages, is to correctly categorize the Internet resources and to develop new technologies and products for SafeDNS. |
| SafetyNet Robot |  | 27/07/2005 23:32:43 |
| Finds URLs for K-12 content management. |
| scrapy-redis |  | 14/11/2016 13:07:20 |
| Distributed crawling/scraping |
| Seamus the Search Engine |  | 17/04/2026 1:38:33 |
| Annoying crawler that claims to be part of someone's coursework |
| Search.Aus-AU.COM |  | 27/07/2005 23:35:45 |
| Search-AU is a development tool I have built to investigate the power of a search engine and web crawler to give me access to a database of web content ( html / url's ) and address's etc from which I hope to build more accurate stats about the .au zone's web content. the robot started crawling from http://www.geko.net.au/ on march 1st, 1998 and after nine days had 70mb of compressed ascii in a database to work with. i hope to run a refresh of the crawl every month initially, and soon every week bandwidth and cpu allowing. if the project warrants further development, i will turn it into an australian ( .au ) zone search engine and make it commercially available for advertising to cover the costs which are starting to mount up. --dez (980313 - black friday!) |
| Senrigan |  | 31/07/2005 23:35:36 |
| This robot now gets HTMLs from only jp domain. |
| SG-Scout |  | 31/07/2005 23:38:47 |
| Does a "server-oriented" breadth-first search in a round-robin fashion, with multiple processes. |
| Sherlock Holmes Search Engine |  | 8/02/2004 0:08:16 |
| Sherlock Holmes is a universal search engine – a system for gathering and indexing of textual data (text files, web pages, ...), both locally and over the network. |
| Shim-Crawler |  | 6/02/2006 15:38:26 |
| Shim-crawler was written by Shim Wonbo of Chikayama-Taura laboratory.The main goal behind writing the crawler is to collect web pages for researches related to web-search and data mining. Recently, we are planning to use it for crawling weblogs too.The Crawler is used by the members of Chikayama-Taura Laboratory to crawl web-pages only for the research purposes.Our crawling policy distinctly respects the general crawling norm.Though we duely understand the concern of the webmasters, we would like to assure that our crawler is only crawling pages for performing researches and not for any business use.Please have a glance at our crawling policy for better understanding.We sincerely appriciate your co-operation and support. |
| ShopWiki |  | 23/02/2009 0:49:32 |
| ShopWiki finds products using Web crawlers similar to other search engines. This means we look into a Web site's domain for all robots.txt files, which tell our crawlers which files it may search. All Web sites have the ability to define what parts of their domain are off-limits to specific robot user agents. ShopWiki respects and obeys all robots.txt files. |
| Sift |  | 31/07/2005 23:42:25 |
| Subject directed (via key phrase list) indexing. |
| Simmany Robot Ver1.0 |  | 31/07/2005 23:43:33 |
The Simmany Robot is used to build the Map(DB) for the simmany service operated by HNC(Hangul & Computer Co., Ltd.). The robot runs weekly, and visits sites that have a useful korean information in a defined order.
This robot is a part of simmany service and simmini products. The simmini is the Web products that make use of the indexing and retrieving modules of simmany. |
| SiteSpider |  | 7/02/2004 23:38:20 |
| The indexer is capable of indexing up to 1,000 documents per site, and the information is stored to a database searchable by clients. The user can then utilize a simple search server protocol to query the database and generate a search service for their site. |
| Sleek |  | 31/07/2005 23:32:53 |
| Crawls remote sites and performs link popularity checks before inclusion. |
| Snipebot |  | 3/04/2013 11:06:20 |
|
| Spock Crawler |  | 2/08/2007 23:23:01 |
| As part of Spock's mission to index every single human being on the planet, we have developed a crawler to collect pages all over the Internet. |
| Sven |  | 7/02/2004 23:40:46 |
| Emtpy user agent |
| SWISH-E |  | 8/02/2004 0:12:03 |
| SWISH-E is a fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other files |
| t6labs |  | 24/12/2006 22:54:34 |
| T6 Labs is an R&D lab which is into higher order tensor analysis to solve variety of industry related problems. Philosophically all problems of this world where there is an information overload, be it web or computational fluid dynamics, needs higher order tensor analysis for better abstraction. Higher order tensor analysis techniques developed by T6 Labs is being currently used for developing SPAC – a search engine personalization and collaboration platform. |
| TeraText AGLS Harvester |  | 7/02/2004 23:50:11 |
| A text database system and search engine built for handling large text collections |
| The NorthStar Robot |  | 25/07/2005 1:05:10 |
| Recent runs (26 April 94) will concentrate on textual analysis of the Web versus GopherSpace (from the Veronica data) as well as indexing. |
| The Peregrinator |  | 25/07/2005 1:19:41 |
| This robot is being used to generate an index of documents on Web sites connected with mathematics and statistics. It ignores off-site links, so does not stray from a list of servers specified initially. |
| TheRarestParser |  | 23/02/2009 0:30:43 |
| TheRarestParser is my bot which goes out collecting words used in web pages for “The Rarest Words” project. |
| Thunderstone Webinator |  | 7/02/2004 23:48:37 |
| Webinator is a Web walking and indexing package that allows a Website administrator to easily create and provide a high quality retrieval interface to collections of HTML documents. |
| URL Spider Pro |  | 7/02/2004 23:51:44 |
| URL Spider Pro is designed for creating a small Google-like search engine. You simply configure (through keyword and domain filters) the spider for the type of information it should index and it will crawl through the Web collecting only documents matching that configuration. You can also instruct it to crawl through one or more specific domains, making it easy to add site search functionality to your Web site. |
| Verity Ultraseek |  | 8/02/2004 0:10:25 |
| Verity Ultraseek (formerly Inktomi Enterprise Search) |
| Vspider |  | 8/02/2004 0:15:31 |
| ColdFusion MX includes several Verity utilities to diagnose and manage your collections. These tools include the mkvdk, rcvdk, rck2, and vspider utilities... |
| WebImages |  | 29/06/2007 23:38:51 |
|
| WebRACE |  | 7/02/2004 23:54:25 |
| WebRACE is a prototype HTTP Retrieval, Annotation and Caching Engine developed in Java. It is the WWW Agent-Proxy of eRACE. |
| WeSEE:Ads |  | 30/09/2015 21:47:23 |
|
| Wired Digital |  | 14/02/2004 0:21:01 |
| wired-digital-newsbot/1.5 |
| WSB WebCrawler |  | 7/02/2004 23:52:38 |
| WebSearchBench consists of the two software components Web Crawler and Search Engine (Repository, Indexer and search software) |
| wsowner.com |  | 2/08/2013 11:19:21 |
| Search websites with the same IP address or Google Analytics account |
| www.petitsage.fr |  | 3/05/2006 22:31:13 |
|
| Xbot |  | 7/02/2004 23:57:22 |
| The xbot software is a modular bot environment based on the .net framework for autonomous neuronal network, script or map driven mobile omniwheel robots using the SV203 controller. |
| Xtreeme SiteXpert |  | 9/02/2004 20:19:04 |
| With SiteXpert, without any HTML / JavaScript knowledge, you can quickly create a variety of cross-browser navigation systems without having to worry about compatibility between browsers. You can also create a search engine (hosted by your own web server or for CDROM distribution). The program will automatically crawl through your web site or local disk in search for documents. SiteXpert comes with 10 navigation system types and over 90 graphical schemes. |
| XYLEME Robot |  | 11/02/2004 16:39:30 |
| index XML, follow HTML |
| Yahoo-Blogs |  | 29/04/2006 23:32:30 |
| Yahoo-Blogs/v3.9 is Yahoo!'s blog indexing robot. As part of the crawling effort, Yahoo!'s blog crawler will take robots.txt standards into account to ensure we do not crawl and index content from those pages whose content you do not want included in Yahoo! Search Technology. If a page is disallowed to be crawled by robots.txt standards, Yahoo! will not read or use the contents of that page. The URL of a protected page may be included in Yahoo! Search Technology as a "thin" document with no text content. Links and reference text from other public web pages provide identifiable information about a URL and may be indexed as part of web search coverage. |
| YioopBot |  | 3/01/2013 16:23:51 |
|
| zzabmbot |  | 2/04/2015 14:29:15 |
|