Reddit third-party client ban closed user messages behind paywall. I think we the Lemmitors should stop AI training on us or at least monetise it (for our instances)
You can’t stop them. Publicly available data can and will be a training source for LLMs.
Instances could add this snippet to theirs robots.txt (source: Eff.org, businessinsider.com and nytimes.com/robots.txt ):
User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: Meta-ExternalAgent User-agent: meta-externalagent Disallow: /
Note: this only tell to the crawlers of openai, google and meta to not crawl the site to traiN a LLM, the nytimes have a large list of other crawlers.
Removed by mod
With the way federation works, not much. People from all sorts of federation capable sites can see the content posted from different instances; but considering its conviniences I think its worth it.
Broadly this is preventing plagiarism. We don’t want someone to scrape all our knowledge, remove the human connection and reference back to experts and people, and serve the information itself, uncredited.
But if a human can read something, so can a bot. I think ultimately we need legislation.