Skip to main content

Bot Patterns Reference

This page provides a comprehensive list of AI bot user agent patterns detected by DarkForest Protocol.

Overview

DarkForest Protocol detects AI bots by matching their user agent strings against known patterns. These patterns are categorized into three groups:

  1. AI Search Bots: Bots used by AI-powered search engines
  2. AI Crawl Bots: Bots used for AI training data collection
  3. Open Data Crawlers: Bots used for open data collection

The patterns are regularly updated to include new AI systems as they emerge.

AI Search Bots

These bots are used by AI-powered search engines to index content for search results and AI-assisted search features.

Bot NameUser Agent PatternCompanyPurpose
ApplebotApplebotAppleWeb crawler for Apple services like Siri and Spotlight Suggestions
Applebot-ExtendedApplebot-ExtendedAppleExtended version of Applebot for additional Apple services
DuckAssistBotDuckAssistBotDuckDuckGoCrawler for DuckDuckGo's AI-assisted search features
Google-ExtendedGoogle-ExtendedGoogleExtended Google crawler for AI features
GoogleOtherGoogleOtherGoogleSpecialized Google crawler for various services
GoogleOther-ImageGoogleOther-ImageGoogleGoogle crawler for image content
GoogleOther-VideoGoogleOther-VideoGoogleGoogle crawler for video content
OAI-SearchBotOAI-SearchBotOpenAICrawler for OpenAI's search features
PerplexityBotPerplexityBotPerplexityCrawler for Perplexity AI search engine
PetalBotPetalBotPetal (Huawei)Web crawler for Huawei's search services
YouBotYouBotYou.comCrawler for You.com AI search engine
ChatGPT-UserChatGPT-UserOpenAIBrowser extension for ChatGPT web browsing
Cohere AIcohere-aiCohereCrawler for Cohere's AI services

AI Crawl Bots

These bots are used by AI companies to collect training data for large language models and other AI systems.

Bot NameUser Agent PatternCompanyPurpose
AI2 BotAI2BotAllen Institute for AIResearch data collection
AI2 Bot DolmaAi2Bot-DolmaAllen Institute for AIDolma dataset collection
AmazonbotAmazonbotAmazonData collection for Amazon AI services
Anthropic AIanthropic-aiAnthropicData collection for Claude and other models
Claude WebClaude-WebAnthropicWeb browsing feature for Claude
ClaudeBotClaudeBotAnthropicGeneral crawler for Anthropic services
Cohere AIcohere-aiCohereData collection for Cohere models
Cohere Training Crawlercohere-training-data-crawlerCohereSpecific crawler for model training data
CrawlspaceCrawlspaceVariousGeneric AI crawler
DiffbotDiffbotDiffbotStructured data extraction for AI
FacebookBotFacebookBotMetaData collection for Meta AI services
FriendlyCrawlerFriendlyCrawlerVariousGeneric AI crawler
GPTBotGPTBotOpenAIData collection for GPT models
ICCCrawlerICCCrawlerVariousResearch data collection
ImagesiftBotImagesiftBotVariousImage data collection
img2datasetimg2datasetVariousImage dataset collection tool
Kangaroo BotKangaroo BotVariousResearch data collection
Meta External AgentMeta-ExternalAgentMetaExternal data collection for Meta AI
Meta External FetcherMeta-ExternalFetcherMetaExternal data fetching for Meta AI
OmgiliomgiliOmgiliWeb data collection service
OmgilibotomgilibotOmgiliBot for Omgili data collection
PanguBotPanguBotHuaweiData collection for Pangu models
ScrapyScrapyVariousPopular web scraping framework
Sidetrade IndexerSidetrade indexer botSidetradeBusiness data collection
TimpibotTimpibotVariousResearch data collection
Velen Public CrawlerVelenPublicWebCrawlerVelenPublic web data collection
Webzio ExtendedWebzio-ExtendedWebzioExtended data collection service
BytespiderBytespiderByteDanceData collection for ByteDance AI
iaskspideriaskspider/2.0iaskData collection for iask services
ISS Cyber Risk CrawlerISSCyberRiskCrawlerISSSecurity and risk assessment crawler

Open Data Crawlers

These bots are used for collecting data for open datasets and research purposes.

Bot NameUser Agent PatternCompanyPurpose
CCBotCCBotCommon CrawlWeb crawler for the Common Crawl dataset

How Bot Detection Works

DarkForest Protocol uses substring matching to detect AI bots. If a user agent string contains any of the patterns listed above, it will be identified as an AI bot.

For example, a user agent string like Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot; +https://openai.com/gptbot) would be detected as an AI bot because it contains the GPTBot pattern.

Customizing Bot Detection

You can customize which categories of bots are blocked by configuring the presetCategories option in your DarkForest Protocol implementation:

// Block only AI search bots
const darkforestMiddleware = createExpressBlocker({
apiKey: 'your-api-key',
presetCategories: ['ai-search-bots']
});

// Block both AI search bots and AI crawl bots
const darkforestMiddleware = createExpressBlocker({
apiKey: 'your-api-key',
presetCategories: ['ai-search-bots', 'ai-crawl-bots']
});

// Block all categories
const darkforestMiddleware = createExpressBlocker({
apiKey: 'your-api-key',
presetCategories: ['ai-search-bots', 'ai-crawl-bots', 'open-data-crawlers']
});

You can also add custom patterns to block additional user agents:

const darkforestMiddleware = createExpressBlocker({
apiKey: 'your-api-key',
presetCategories: ['ai-search-bots', 'ai-crawl-bots'],
customBlockedUserAgents: ['custom-bot-pattern', 'another-pattern']
});

Exempting Specific Paths

You may want to allow AI bots to access certain parts of your website, such as your robots.txt file or public API. You can configure exempt paths in your DarkForest Protocol implementation:

const darkforestMiddleware = createExpressBlocker({
apiKey: 'your-api-key',
presetCategories: ['ai-search-bots', 'ai-crawl-bots'],
exemptPaths: ['/robots.txt', '/sitemap.xml', '/api/public']
});

Staying Updated

The list of AI bot patterns is regularly updated as new AI systems emerge. When you update the DarkForest Protocol package, you'll automatically get the latest patterns.

For npm-based installations:

npm update darkforest-blocker

For other integration methods, refer to the specific integration guide.

References

The bot patterns used by DarkForest Protocol are based on research from various sources, including:

If you discover a new AI bot pattern that should be added to our list, please contact us or open an issue on our GitHub repository.