Technology

68245 readers

4237 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

703

AI crawlers cause Wikimedia(The umbrella organization of Wikipedia and a dozen or so other crowdsourced knowledge projects) Commons bandwidth demands to surge 50%. (diff.wikimedia.org)

submitted 1 day ago* (last edited 1 day ago) by [email protected] to c/[email protected]

77 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 9 points 1 day ago (2 children)

So, uh. What about Lemmy?

They can also crawl this publically-accessible social media source for their data sets.

I'm on board with abandoning mainstream social media, but my point is that your suggestion would not solve the problem just relocate it. A better solution to the AI conglomerates stealing everyone's data from the open Internet is legislation and regulations - ie tackling the whole 'stealing data' component, along with stronger privacy regulations for everyone to make it harder for them to do the same in the future. It's nice seeing the EU taking some positive steps, but we will not see the US take any steps in that direction anytime soon, due to corporate capture of their politicians and the AI companies all being in the top 10 most wealthy companies in the US.

[–] [email protected] 9 points 1 day ago* (last edited 1 day ago)

It's nice seeing the EU taking some positive steps

Yet they helped introducing the super cookies and are trying to end encryption on communications.

[–] [email protected] 1 points 1 day ago (1 children)

They can also crawl this publically-accessible social media source for their data sets.

Crawling would be silly. They can simply setup a lemmy node and subscribe to every other server. Activitypub crawler would be much more efficient as they wouldn't accidentally crawl things that haven't changed, but instead can read the activitypub updates.

[–] [email protected] 1 points 19 hours ago

Sure but we're in the comments section of an article about wikipedia being crawled, which is silly because they could just download a snapshot of wikipedia