1862
this post was submitted on 23 May 2024
1862 points (98.6% liked)
Technology
59374 readers
7033 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Is this real though? Does ChatGPT just literally take whole snippets of texts like that? I thought it used some aggregate or probability based on the whole corpus of text it was trained on.
It does, but the thing with the probability is that it doesn't always pick the most likely next bit of text, it basically rolls dice and picks maybe the second or third or in rare cases hundredth most likely continuation. This chaotic behaviour is part of what makes it feel "intelligent" and why it's possible to reroll responses to the same prompt.
I remember doing ghetto text generation in my NLP (Natural Language Processing) class, and the logic was basically this:
This is a rough explanation of Baysian nets, which I think are what's used in LLMs. We used a very simple n-gram model (e.g. n words are considered for the statistics, e.g. "to my math" is much more likely to generate "class" than "homework"), but they're probably doing fancy things with text categorization and whatnot to generate more relevant text.
The LLM isn't really "thinking" here, it's just associating input text and the training data to generate output text.
Sounds quite similar to Markov chains which made me think of this story:
https://thedailywtf.com/articles/the-automated-curse-generator
Still gets a snort out of me every time Markov chains are mentioned.
Yup, and I'm guessing LLMs use Markov chains, which are also a really old concept (the idea is >100 years old, and it's used in compression algorithms like LZMA).