Logo
Home
 
User Agents
New Agents
List All
My User Agent
Add New
 
User-Agents Database

User Agents

User Agent Date Added
JavaCrawleredit7/02/2004 23:11:28
The JavaCrawler, a prototype next generation MetaCrawler written in Java, supports most of the features already present in the MetaCrawler.
JoBo Java Web Robotedit8/03/2004 0:40:18
JoBo is a web site download tool. The core web spider can be used for any purpose.
User agent can be changed by user
Jobotedit8/03/2004 0:43:30
Its purpose is to generate a Resource Discovery database. Intended to seek out sites of potential "career interest". Hence - Job Robot.
JoeBotedit8/03/2004 0:44:38
JoeBot is a generic web crawler implemented as a collection of Java classes which can be used in a variety of applications, including resource discovery, link validation, mirroring, etc. It currently limits itself to one visit per host per minute.
JoobleBotedit23/12/2012 14:54:59
Jooble indexes jobs from the web.
JumpStationedit8/03/2004 0:47:20
KDD-Exploreredit8/03/2004 0:52:56
KDD-Explorer is used for indexing valuable documents which will be retrieved via an experimental cross-language search engine, CLINKS.
This robot was designed in Knowledge-bases Information processing Laboratory, KDD R&D Laboratories, 1996-1997
Keyword Densityedit8/09/2006 0:57:44
?
Kilroyedit8/03/2004 0:55:58
Used to collect data for several projects. Runs constantly and visits site no faster than once every 90 seconds.
Knowledge.comedit7/01/2005 17:15:41
KO_Yappo_Robotedit8/03/2004 1:01:34
The KO_Yappo_Robot robot is used to build the database for the Yappo search service by k,osawa (part of AOL). The robot runs random day, and visits sites in a random order.
KomodiaBotedit26/11/2012 14:15:03
Krugleedit29/04/2006 23:52:36
The Krugle spider crawls web pages, documents, and archives looking for technical information that would be of value to programmers.

We use the results of the crawl to provide a vertical search service for programmers. Our product page explains how Krugle helps programmers find code and answers to technical questions.

Our spider is based on Nutch (version 0.8 as of April 2006), and uses various open source components to pull down publicly available information via HTTP(S) and FTP.

To be polite, our spider tries to only access a given domain via one thread at any time. In addition, we impose a 5 second minimum delay between requests.
LabelGrabberedit8/03/2004 1:05:03
The PICS label grabber application searches the WWW for PICS Labels and submits them to the label bureau of your choice. It can be used to populate a label bureau for testing, or for any other label bureau purpose.
Larbinedit8/02/2004 0:11:21
Larbin is a web crawler (also called (web) robot, spider, scooter...). It is intended to fetch a large number of web pages to fill the database of a search engine.
LexiBotedit8/09/2006 11:48:18
linkdex.comedit1/12/2010 13:53:05
Linkdex is an enterprise-class platform that combines team management software with search engine optimization tools. Sometimes we crawl websites to understand backlink profiles or gather information for our users so they can improve their websites and SEO strategies.
LinkScanedit7/02/2004 14:12:22
LinkScan is an industrial-strength link checking and website management tool.
LinkScan checks links, validates HTML and creates site maps
LivelapBotedit29/08/2014 9:01:20
Livelap is a content discovery app that indexes web content. Probably you have seen the Livelapbot/0.1 or LivelapBot/0.2 crawler in your server logs. LivelapBot can visit a page if it is shared on social media, and as part of its RSS/page crawling schedule.

What does LivelapBot collect

Livelap indexes web content and makes meta data and a link to your content available in livelap.com and in the Livelap app. For indexing we only use official HTML and media meta tags in your page. We don't scrape the contents of your articles. The following fields are used for indexing:
Title
Description
Author
Publication date
Type of content (article, photo, video, etc)
Images (og, twitter and other standard tags)
Videos (og, twitter and other standard tags)
RSS links
Detect whether showing page in iframe is allowed
Lockonedit8/03/2004 1:15:24
This robot gathers only HTML documents.
logo.gif Crawleredit24/07/2005 22:51:36
meta-indexing engine for corporate logo graphics The robot runs at irregular intervals and will only pull a start page and its associated /.*logo\.gif/i (if any). It will be terminated once a statistically significant number of samples has been collected.

logo.gif is part of the design diploma of Markus Weisbeck, and tries to analyze the abundance of the logo metaphor in WWW corporate design. The crawler and image database were written by Sevo Stille and Peter Frank of the Institut für Neue Medien, respectively.
lufsbotedit13/07/2013 9:44:15
Fake search engine (no results in search).
Mac WWWWormedit24/07/2005 22:53:56
a French Keyword-searching robot for the Mac The author has decided not to release this robot to the public
Magpieedit24/07/2005 22:54:50
Used to obtain information from a specified list of web pages for local indexing. Runs every two hours, and visits only a small number of sites.
meanpathbotedit28/06/2013 13:56:19
meanpath is a new search engine that allows software developers to access detailed snapshots of millions of websites without having to run their own crawlers. Our clients use the information we gather from your site to help solve problems in these areas:
- Semantic analysis
- Linguistics
- Identity theft protection
- Malware and virus analysis
MediaFoxedit24/07/2005 23:01:53
The robot is used to index meta information of a specified set of documents and update a database accordingly.
MerzScopeedit24/07/2005 23:03:08
Robot is part of a Web-Mapping package called MerzScope, to be used mainly by consultants, and web masters to create and publish maps, on and of the World wide web.
Mnogosearchedit7/02/2004 23:13:37
mnoGoSearch (formerly known as UdmSearch) is a full-featured web search engine software for intranet and internet servers. mnoGoSearch for UNIX is a free software covered by the GNU General Public License and mnoGoSearch for Windows is a commercial search software version.
Motoredit25/07/2005 0:48:09
The Motor robot is used to build the database for the www.webindex.de search service operated by CyberCon. The robot is under development - it runs in random intervals and visits site in a priority driven order (.de/.ch/.at first, root and robots.txt first)
MS Sharepoint Portal Serveredit7/02/2004 23:12:33
MSNBot Mediaedit13/06/2006 0:06:56
Muncheredit25/07/2005 0:50:11
Used to build the index for www.goodlookingcooking.co.uk. Seeks out cooking and recipe pages.
Muscat Ferretedit25/07/2005 0:54:52
Used to build the database for the EuroFerret
Mwd.Searchedit25/07/2005 0:55:50
Robot for indexing finnish (toplevel domain .fi) webpages for search engine called Fifi. Visits sites in random order.
NDSpideredit25/07/2005 0:57:50
It is designed to index the web.
NEC-MeshExploreredit24/07/2005 23:04:16
The NEC-MeshExplorer robot is used to build database for the NETPLAZA search service operated by NEC Corporation. The robot searches URLs around sites in japan(JP domain). The robot runs every day, and visits sites in a random order.

Prototype version of this robot was developed in C&C Research Laboratories, NEC Corporation. Current robot (Version 1.0) is based on the prototype and has more functions.
NetCarta WebMap Engineedit25/07/2005 0:59:30
The NetCarta WebMap Engine is a general purpose, commercial spider. Packaged with a full GUI in the CyberPilo Pro product, it acts as a personal spider to work with a browser to facilitiate context-based navigation. The WebMapper product uses the robot to manage a site (site copy, site diff, and extensive link management facilities). All versions can create publishable NetCarta WebMaps, which capture the crawled information. If the robot sees a published map, it will return the published map rather than continuing its crawl. Since this is a personal spider, it will be launched from multiple domains. This robot tends to focus on a particular site. No instance of the robot should have more than one outstanding request out to any given site at a time. The User-agent field contains a coded ID identifying the instance of the spider; specific users can be blocked via robots.txt using this ID.
NetResearchServeredit8/02/2004 16:31:37
NRS crawls pages all over the world in order to build full-text
search indexes and/or to compile lists of search engine forms.
NetScoopedit25/07/2005 1:01:18
The NetScoop robot is used to build the database for the NetScoop search engine.

The robot has been used in the research project at the Faculty of Engineering, Tokushima University, Japan., since Dec. 1996.
newscan-onlineedit25/07/2005 1:02:31
The newscan-online robot is used to build a database for the newscan-online news search service operated by smart information services. The robot runs daily and visits predefined sites in a random order.

This robot finds its roots in a prereleased software for news filtering for Lotus Notes in 1995.
NextopiaBOTedit7/02/2004 23:16:00
NHSE Web Forageredit25/07/2005 1:03:15
to generate a Resource Discovery database
Nomadedit25/07/2005 1:04:17
Developed in 1995 at Colorado State University.
NutchCVSedit7/02/2004 23:18:12
When we crawl to populate our index, we advertise the "User-agent" string "NutchOrg". If you see the agent "Nutch" or "NutchCVS", that's probably a developer testing a new version of our robot, or someone running their own instance.
Occamedit25/07/2005 1:08:07
The robot takes high-level queries, breaks them down into multiple web requests, and answers them by combining disparate data gathered in one minute from numerous web sites, or from the robots cache.

The robot is a descendant of Rodney, an earlier project at the University of Washington.
omgilibotedit31/03/2008 17:10:34
crawls forums
OpenIntelligenceDataedit3/09/2005 17:15:02
Open Intelligence Data ™ is a project by Tortuga Group LLC to provide free tools for collecting information for millions of Internet domains.
Oracle Ultra Searchedit7/02/2004 23:33:14
Ultra Search can be used to search across Collaboration Suite Components, corporate Web servers, databases, mail servers, fileservers and Oracle10g Portal instances.
Orb Searchedit25/07/2005 1:12:12
Orbsearch builds the database for Orb Search Engine. It runs when requested.
Originedit7/02/2004 23:40:28
Empty user agent

Add new user agent

User Agents - Search

Enter keyword or user agent: