this post was submitted on 24 Jun 2024
226 points (96.3% liked)
Technology
59148 readers
2689 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I can't tell which one is a shittier actor here...
Eitherway, this is not good for end consumer lol
We always get fucked
I hate to say it, but I kinda hope the music copyright cartel wins this one, only for the precedent it would set about things like proprietary use of MS Copilot output being an infringement of GPL-licensed code.
Yeah, as someone who's fought against the RIAA/MPAA copyright lobbying in my country, I think I'm on their side on this one.
GPL code is the least concern, you can always just say the AI-generated code is GPL. What about training on leaked proprietary code? The training data already known to include medical records, CSAM, etc., wouldn't be surprised if it also contained proprietary code.
I don't know which one is better tbh
the devil you know or the one you don't!
Having all AI-generated code be either "viral" copyleft or illegal to use at all would certainly be better than allowing massive laundering of GPL-licensed code for exploitation in proprietary software.
If they are using GPL code, shouldn't they also release their source code?
That's the argument I would be making, but it certainly isn't Microsoft's (Copilot), OpenAI's (Codex), etc's position: they think the output is sufficiently laundered from the GPL training data so as not to constitute a derivative work (which means none of the original licenses -- "open source" or otherwise -- would apply, and the recipient could do whatever they want).
Edit: actually, to be more clear, I would take either of two positions:
That the presence of GPL (or in general, copyleft) code in the training dataset requires all output to be GPL (or in general, copyleft).
That the presence of both GPL code and code under incompatible licenses in the training dataset means that the AI output cannot legally be used at all.
(Position #2 seems more likely, as the license for proprietary code would be violated, too. It's just that I don't care about that; I only care about protecting the copyleft parts.)
Damn i see your point here tbh...
i am vaguely familiar with software licensing is GPL type of open source?
You could say that, LOL. It's the OG of "copyleft" licenses (the guy that made it invented the concept), although "permissive" licenses (BSD, MIT) existed before.
"Copyleft" and "permissive" are the two major categories of Free Software (a.k.a. "open source", although that term has different connotations) license. The difference between them is that "copyleft" requires future modifications by people other than the copyright holders to be released under the same terms, while "permissive" does not. In other words, "copyleft" protects the freedom of future users to control their computer by being able to modify the software on it, while "permissive" maximizes the freedom of other developers to do whatever they want with the code (including using it in proprietary apps, to exploit people).
See also: https://www.gnu.org/philosophy/free-sw.en.html