OpenAI recently published a documentation on how to adjust your website to prevent the so called GPTBot from parsing and reusing your content. Neil Clarke also posted a list of bots to block on his blog.
Let's assume this is true and actually has an effect, here is how you can adjust the output of the robots.txt file in Neos CMS with minimal effort:
prototype(Neos.Seo:RobotsTxt) {
data {
disallowBots = Neos.Fusion:Join {
GPTBot = 'User-agent: GPTBot'
GPTBotPath = 'Disallow: /'
OAISearchBot = 'User-agent: OAI-SearchBot'
OAISearchBotPath = 'Disallow: /'
ChatGPTUser = 'User-agent: ChatGPT-User'
ChatGPTUserPath = 'Disallow: /'
ClaudeBot = 'User-agent: ClaudeBot'
ClaudeBotPath = 'Disallow: /'
AnthropicAI = 'User-agent: anthropic-ai'
AnthropicAIPath = 'Disallow: /'
ClaudeWeb = 'User-agent: Claude-Web'
ClaudeWebPath = 'Disallow: /'
GoogleExtended = 'User-agent: Google-Extended'
GoogleExtendedPath = 'Disallow: /'
CCBot = 'User-agent: CCBot'
CCBotPath = 'Disallow: /'
PerplexityBot = 'User-agent: PerplexityBot'
PerplexityBotPath = 'Disallow: /'
FacebookBot = 'User-agent: FacebookBot'
FacebookBotPath = 'Disallow: /'
MetaExternalAgent = 'User-agent: Meta-ExternalAgent'
MetaExternalAgentPath = 'Disallow: /'
MetaExternalFetcher = 'User-agent: Meta-ExternalFetcher'
MetaExternalFetcherPath = 'Disallow: /'
OmgiliBot = 'User-agent: OmgiliBot'
OmgiliBotPath = 'Disallow: /'
CohereAI = 'User-agent: cohere-ai'
CohereAIPath = 'Disallow: /'
@glue = "\n"
@position = 'after disallowNeos'
}
}
}
Add this Fusion code f.e. to a new file `Override.RobotsTxt.fusion` somewhere in your site packages Fusion folder.
Test the change by opening the robots.txt on "your.domain/robots.txt" and see if it works as expected.
The Neos.Seo package needs to be installed of course to make this work and you should not have an actual robots.txt file in your Web folder of your Neos installation.
This way you can of course also make other adjustments and provide more information to other robots that frequent your website.