this post was submitted on 03 Sep 2024
1580 points (97.8% liked)
Technology
59207 readers
2513 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I can already tell this is going to be a unpopular opinion judging by the comments but this is my ideology on it
it's totally true. I'm indifferent on it, if it was acquired by a public facing source I don't really care, but like im definitly against using data dumps or data that wasn't available to the public in the first place. The whole thing with AI is rediculous, it's the same as someone going to a website and making a mirror, or a reporter making an article that talks about what's in it, last three web search based AI's even gave sources for where it got the info. I don't get the argument.
if it's image based AI, well it's the equivalent to an artist going to an art museum and deciding they want to replicate the art style seen in a painting. Maybe they shouldn't be in a publishing field if they don't want their work seen/used. That's my ideology on it it's not like the AI is taking a one-to-one copy and selling the artwork as , which in my opinion is a much more harmful instance and already happens commonly in today's art world, it's analyzing existing artwork which was available through the same means that everyone else had of going online loading up images and scraping the data. By this logic, artist should not be allowed to enter any art based websites museums or galleries, since by looking at others are they are able to adjust their own art which is stealing the author's work. I'm not for or against it but, the ideology is insane to me.
@Pika @flop_leash_973 This is largely my thoughts on the whole thing, the process of actually training the AI is no different from a human learning
The thing about that, is that there's likely enough precedent in copyright law to actually handle that, with most copyright law it's all about intent and scale and I think that's likely where this will all go
Here the intent is to replace and the scale is astronomical, whereas an individual's intent is to add and the scale is minimal
The process of training the model is arguably similar to a human learning, and if the model just sat on a server doing nothing but knowing, there'd be no problem. Taking that knowledge and selling it to the public en mass is the issue.
This is precisely what copyrights and patents are here to safeguard. Is there already a book like A Song of Ice and Fire? Write something else, maybe better! There's already a patent for an idea you have? Change and improve upon it and get your own patent!
You see, copyrights and patents are supposed to spur creativity, not hinder it. OpenAI should improve upon its system so that it actually thinks and is creative itself rather than regurgitating copyrighted materials, themes and ideas. Then they wouldn't have this problem.
OpenAI wants literally all of human knowledge and creativity for free so that they can sell it back to you. And you're okay-ish with it?
@Subverb that is, quite impressively, the opposite of what I said
Is a person infringing on copyright by producing content? No. It’s about intent and scale. Humans don’t just sit on this knowledge, they do something with it
There is nothing illegal about WHAT it’s doing, there is everything illegal about HOW and WHY
I very clearly stated that OpenAI’s intent and their scale at which they operate are blatant copyright infringement and that it has been backed up with decades of precedents
Hello fellow human. I also learn by having information shoveled to me without regard to my agency.
@zbyte64 with everything you see you are scraping data from your environment whether you want to or not
How does a child learn what pain is? How does a teenager learn what heartbreak is? It’s certainly not because they made the decision to find that out themselves
I bring up agency and I get an exemplary response what I mean.
Raising a child well requires someone who is able to engage in the child's own theory of mind. If you just treat a child as an information sponge they will need more therapy than usual. A good parent takes interest in their child's ability to exercise agency.
@zbyte64 you’re getting away from the original conversation
Then I guess my original point of agency being an essential element in human learning had nothing to do with your conversation about how AI learns like humans. Carry on.
@zbyte64 we’re saying the same thing
It’s a matter scale, not process
I'm literally saying (an aspect of) process matters, how are we saying the same thing?
@zbyte64 from what I understand, you’re referring to the process at scale—the amount of information the AI can take in is inhuman—which I’m not disagreeing with
None of which is relevant to my original point: the scale of their operations, which has already been used countless times in copyright law
The scale at which they operate and their intention to profit is the basis for their infringement, how they’re doing it would be largely irrelevant in a copyright case, is my point
I don't understand how when I say "agency" or "an aspect of the process" one would think I'm talking about the volume of information and not the quality.
@zbyte64 1) In no way is quality a part of that equation and 2) In what other contexts is quality ever a part of the equation? I mean I can go look at some Monets and paint some shitty water lillies, is that somehow problematic?
If we're using your paintings as training data for a Monet copy, then it could be.
Are we even talking about AI if we're saying data quality doesn't matter?
@zbyte64 data quality, again, was out of the scope of what I was talking about originally
Which, again, was that legal precedent would suggest that the *how* is largely irrelevant in copyright cases, they’re mostly focused on *why* and the *scale of the operation*
I’m not getting sued for copyright infringement by the NYT because I used inspect element to delete content to read behind their paywall, OpenAI is
I was narrowly taking issue with the comparison to how humans learn, I really don't care about copyrights.
@zbyte64 where am I wrong? The process is effectively the same: you get a set of training data (a textbook) and a set of validation data (a test) and voila, I’m trained
To learn how to draw an image of a thing, you look at the thing a lot (training data) and try sketching it out (validation data) until it’s right
How the data is acquired is irrelevant, I can pirate the textbook or trespass to find a particular flower, that doesn’t mean I’m learning differently than someone who paid for it
Do we assume everything read in a textbook is correct? When we get feedback on drawing, do we accept the feedback as always correct and applicable? We filter and groom data for the AI so it doesn't need to learn these things.
Agreed. I don't understand how training LLM on publicly available data is an issue. As you says, it doesn't copy the work. Rather the data is used as "inspiration" to stay in the art analogy.
Maybe I'm ignorant. Would love to be proven wrong. Right now it seems to me that failing media publishers are trying to do a money grab and use copyright as an argument, even though their data/material isn't getting illegally reproduced.