this post was submitted on 20 Oct 2023
1231 points (98.1% liked)

Technology

59374 readers
3767 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 6 points 1 year ago* (last edited 1 year ago) (1 children)

Speaking of this, what parts of the fediverse have added the option to block training generative AI to their respective robots.txt?

https://blog.google/technology/ai/an-update-on-web-publisher-controls/ https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers https://techcrunch.com/2023/09/28/medium-hints-at-a-nascent-media-coalition-to-block-ai-crawlers/

It looks like there's a handful of these lines you'd have to add to robots.txt

Is there anywhere that keeps a comprehensive list of these?

[–] [email protected] 1 points 1 year ago (1 children)

I've been trying to find a list as well to no avail. The ones I do know are on my own robots.txt, at volcanolair.co/robots.txt

[–] [email protected] 0 points 1 year ago

Someone should make a github just to make it easier for people to find them all in one place with sources and update the list as we get new ones.