this post was submitted on 29 Jun 2024
142 points (98.0% liked)

Technology

59148 readers
2689 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 14 points 4 months ago (1 children)

Quora should be respecting robots.txt, but also why are the NYT etc. serving the full article to the Quora bot anyway?

[–] [email protected] 22 points 4 months ago

Usually NYT sets a cookie to track how many free articles you read and once you exceed that, you get the paywall. The bots probably don't set/send the cookies, so NYT doesn't block them. Also, I'd imagine the bots are coming from various different IPs so even server side blocking based on IP wouldn't block everything and eventually the bot would get to the article. User Agents can also be spoofed.