this post was submitted on 10 Apr 2024
1296 points (99.0% liked)
Programmer Humor
19557 readers
737 users here now
Welcome to Programmer Humor!
This is a place where you can post jokes, memes, humor, etc. related to programming!
For sharing awful code theres also Programming Horror.
Rules
- Keep content in english
- No advertisements
- Posts must be related to programming or programmer topics
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Hmm. I'm not really sure why anyone would write such a text. There is no "weighted proportionality" (or pathways). Is this a common conception?
I guess you picked up on the fact that transformers output a probability distribution. I don't think anyone calls those an average, though you could have an average distribution. Come to think of it, before you use that to pick the next token, you usually mess with it a little to make it more or less "creative". That's certainly no longer an average.
You can see a neural net as a kind of regression analysis. I don't think I have ever heard someone calling that a kind of average, though. I'm also skeptical if you can see a transformer as a regression but I don't know this stuff well enough. When you train on some data more often than on other data, that is not how you would do a regression. Certainly, once you start RLHF training, you have left regression territory for good.
The GPTisms might be because they are overrepresented in the finetuning data. It might also be from the RLHF and/or brought out by the system prompt.