Technology

58799 readers

4535 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

292

Meta’s new AI image generator was trained on 1.1 billion Instagram and Facebook photos (arstechnica.com)

submitted 10 months ago by [email protected] to c/[email protected]

50 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 50 points 10 months ago* (last edited 10 months ago) (13 children)

So I assume they added any necessary stuff to the TOS to allow this.

My question is if there's any legal mechanism to prevent this on other platforms? Pixelfed for example.

Companies will likely federate and pull images regardless, but can we go after them when they're caught? Nothing prevents them from taking the images for internal R&D, but at least we can stop them from selling products with that training data

[–] [email protected] 10 points 10 months ago (4 children)

My question is if there’s any legal mechanism to prevent this on other platforms? Pixelfed for example.

Good question!

I’ve been saying for a while that the fediverse is blind to this issue as everything here is completely scrapable through either the public web or by running federated servers. On top of that, being culturally inclined toward more “serious” conversation and providing content warnings and alt-text for images, we’re probably generating relatively valuable training data.

And yet everything is public as though it’s still 2012.

There are alternatives. BlueSky for instance is basically private to members only. They recently announced that content would be made public to the web and a number of users were upset.

Group chats and Discord servers are probably similar, and from what I can tell “new” popular places for social activity online.

A major issue the fediverse has, IMO, is that it’s kinda stuck trying to fight Twitter and Facebook circa 2012, when that battle was lost and we’re on to new battle fronts now.

[–] [email protected] 4 points 10 months ago

Yea that's something that's been on my mind as well

There are benefits from that openness and verifiability in public spaces (ex. Lemmy communities), since now it's easier to determine if there's vote manipulation or astroturfing. But I think the fediverse needs a lot of work around privacy, and also education about what is/isn't private on these platforms.

There should also be more of a focus on setting up a legal requirement on what can be done with the information, but I'm not sure if that's a thing just yet. We developed GPLv3 to make sure FOSS products can't be incorporated for profit, but I'm not sure how it would work for data.

ex. It should be easy to save, record, and share posts on the fediverse, such as with embeds/screenshots/news stories

But also we want to prevent abuse, misuse, and AI training

load more comments (3 replies)

load more comments (11 replies)