Deutsch
A bad looking robot

How to disallow the GPTBot from crawling your NeosCMS site

OpenAI recently published a documentation on how to adjust your website to prevent the so called GPTBot from parsing and reusing your content.

Let's assume this is true and actually has an effect, here is how you can adjust the output of the robots.txt file in Neos CMS with minimal effort:

prototype(Neos.Seo:RobotsTxt) {
    data {
        # Disallow GPTBot (https://platform.openai.com/docs/gptbot)
        disallowGPTBot = 'User-agent: GPTBot'
        disallowGPTBot.@position = 'after disallowNeos'
        disallowGPTBotPath = 'Disallow: /'
        disallowGPTBotPath.@position = 'after disallowGPTBot'
    }
}

Add this Fusion code f.e. to a new file `Override.RobotsTxt.fusion` somewhere in your site packages Fusion folder.

Test the change by opening the robots.txt on "your.domain/robots.txt" and see if it works as expected.

The Neos.Seo package needs to be installed of course to make this work and you should not have an actual robots.txt file in your Web folder of your Neos installation.

This way you can of course also make other adjustments and provide more information to other robots that frequent your website.

Disclaimer: I allow the GPTBot on my website, as I currently prefer that my examples and tutorials help other people in whatever way. Depending on what those ML companies do, I might change my mind at some point, but on the other hand I also don't trust them to actually respect whatever I configure in the robots.txt.