1077
AI companies are violating a basic social contract of the web and and ignoring robots.txt
(www.theverge.com)
This is a most excellent place for technology news and articles.
Put something in robots.txt that isn't supposed to be hit and is hard to hit by non-robots. Log and ban all IPs that hit it.
Imperfect, but can't think of a better solution.
Yeah, this is a pretty classic honeypot method. Basically make something available but inaccessible to the normal user. Then you know anyone who accesses it is not a normal user.
I’ve even seen this done with Steam achievements before; There was a hidden game achievement which was only available via hacking. So anyone who used hacks immediately outed themselves with a rare achievement that was visible on their profile.
There are tools that just flag you as having gotten an achievement on Steam, you don't even have to have the game open to do it. I'd hardly call that 'hacking'.