overview for leftzero

OpenAI and Anthropic are ignoring an established rule that prevents bots scraping online content in c/[email protected]

[–] [email protected] 2 points 4 months ago

The problem is not them being random.

They are not random, that's the point. They're entirely deterministic and very precise, and they aren't hiding anything; they will give you the most likely (not blacklisted) sequence of characters to follow your input according to their model. What they won't give you is information, except by accident.

If they were random (hidden or not) they'd be harmless, no one would trust them any more than one of those eight ball toys, or your average horoscope.

The issue is that they're very not random, so much that there's no way to know if what they are saying bears any accidental semblance to the truth without fact checking... and that very soon they'll have replaced any feasible way to fact check them, since all the supposed "facts" we'll have access to will have been generated by LLMs train on LLM generated garbage.

OpenAI and Anthropic are ignoring an established rule that prevents bots scraping online content in c/[email protected]

[–] [email protected] 2 points 4 months ago (2 children)

If the models are random then we shouldn't be trusting them to do anything, let alone serious applications.

That's not the reason we shouldn't be using them for anything other than generating lorem ipsum style text or dialogue for non quest critical NPCs in games.

The reason is that, paraphrasing Neil Gaiman, LLMs don't generate information, they generate information shaped sentences.

Specifically, an LLM takes a sequence of characters (not a word or text; LLMs have no concept of words, or text, or anything else for that matter; they're just an application of statistics on large volumes of sequences of characters; no meaning or intelligence involved, artificial or not)... as I was saying, an LLM takes a sequence of characters, pushes it through its model, and outputs the sequence of characters most likely to follow it in the texts its model has been trained on (or rather, the most likely after discarding the ones its creators have labelled as politically incorrect).

That's all they do, and they'll excellent at it (or would be if it weren't for the aforementioned filters), but that'll never give you a cure for cancer unless there already was one in their training data.

They take texts written by humans, shred them, and give you their badly put back together dessicated corpses, drained of any and all meaning or information, but looking very convincingly (until you fact check them) like actually meaningful or informative texts.

That is what makes them dangerous. That and the fact that the bastards selling them are marketing them for the jobs they're least capable of doing, that is, providing reliable information.

(And that's while they can still be trained on meaningful and informative texts written by humans — inasmuch as anything found on reddit, facebook, or xitter can be considered to be meaningful or informative —, but given that a higher and higher percentage of the text on the internet is being generated by LLMs soon enough it'll be impossible to train new models on anything but 99% LLM generated garbage, at which point the whole bubble will implode, as anyone who's wasted time, paper, and toner playing with a photocopier or anyone familiar with the phrase “garbage in, garbage out” will already have realised... which is probably why the LLM peddlers are ignoring robots.txt and copyright laws in a desperate effort to scrape whatever's left of the bottom of the barrel.)

Do you pay for some pirated contents in c/[email protected]

[–] [email protected] 1 points 5 months ago

Hades didn't really seem like my kind of game, so I torrented it to try it out. Then I bought it, and later Hades 2, too.

I've also bought some comics I'd previously read on the computer, too, if they were good enough and I've come across a nice edition.

People who started learning a second language, how has it made you aware how broken English is ? in c/[email protected]

[–] [email protected] 1 points 5 months ago

Except many languages' vocabularies share common roots (e.g. Latin and Greek) even if the languages themselves don't, so quite often someone learning Spanish will be able to make an educated attempt at figuring out the equivalent Spanish word (for instance, an English speaker might figure out that machine ≈ máquin_)... but will have no clue about the gender, having a 50% chance of ending up with, say, máquino.

And, as I said, misgendering words seems to be a relatively common mistake for people learning Spanish without having a Romance language base.

People who started learning a second language, how has it made you aware how broken English is ? in c/[email protected]

[–] [email protected] 3 points 5 months ago* (last edited 5 months ago)

That's a good thing.

Nah, man. That's the abused justifying the abuser. That's pure Stockholm syndrome.

There's no world in which the oos in moon, book, door, blood, brooch, and cooperation (I had forgotten about this one. There are six. SIX! 😩) representing SIX different sounds is a good thing. There simply isn't.

A sane language would replace some of those with u, ø, ō, ô, ö, õ, whatever, make some rule so that the poor sod attempting to decipher the written word could begin to know how to pronounce it... but not English. Not English. 😞

People who started learning a second language, how has it made you aware how broken English is ? in c/[email protected]

[–] [email protected] 2 points 5 months ago

Someone learning Spanish as a second language will have to remember that it's máquina and not máquino when speaking or writing it, though (and will then probably be quite confused if they ever meet some guy nicknamed El Máquina, which would somehow be a perfectly cromulent nickname in Spanish).

Confusing genders when speaking or writing is one of the most common mistakes amongst people new to the language, because while everything else has some form of rule, this doesn't (sure, when reading or listening you can most of the time use the word ending, and you'll probably have an article, too, but when you are the one speaking or writing you have no option but to just know a word's gender, or how it ends, which is the same thing).

People who started learning a second language, how has it made you aware how broken English is ? in c/[email protected]

[–] [email protected] 2 points 5 months ago (2 children)

you don't memorize it. You memorize the words and how they sound

Potahto potayto. 🤷‍♂️

People who started learning a second language, how has it made you aware how broken English is ? in c/[email protected]

[–] [email protected] 1 points 5 months ago

That could just be a comment on how much I still have to learn about Spanish :P

Gotten into verbal tenses yet..? 😉

But, hey, at least it doesn't have [weak pronouns](https://en.wikipedia.org/wiki/Catalan_personal_pronouns#:~:text=The%20weak%20pronouns%20(Catalan%3A%20pronoms,different%20element%20of%20the%20sentence.) as we do in Catalan... Those can be confusing even for native speakers! 😅

People who started learning a second language, how has it made you aware how broken English is ? in c/[email protected]

[–] [email protected] 5 points 5 months ago (5 children)

I mean, you do memorise them, you just don't realise you're doing it because you're a baby or toddler and babies and toddlers are language sponges, and not very aware of how their own minds work.

When learning a gendered language as an adult you definitely have no option but to memorise what gender each word uses, since there's generally no specific rule, just how the language happened to evolve. (And this can be particularly hard if your native language is gendered, but you're trying to learn one that genders words differently, for instance when learning German coming from a Romance language, or vice versa.)

People who started learning a second language, how has it made you aware how broken English is ? in c/[email protected]

[–] [email protected] 2 points 5 months ago (2 children)

I honestly wasn't aware naïve had a dieresis in English.

I mean, it makes complete sense for it to have one in languages that use them, but I wasn't aware it was a loanword (from French or Normand, I assume).

People who started learning a second language, how has it made you aware how broken English is ? in c/[email protected]

[–] [email protected] 6 points 5 months ago (13 children)

I don't feel it's particularly broken honestly.

There are five (5) ways of pronouncing oo, if you people haven't added a sixth one since the last time I looked.

Radii, fiancé, and façade are apparently perfectly cromulent English words that native English speakers who've never seen an ii, an é, or a ç are supposed to be able to pronounce correctly...

Your words for food animals come from completely different and unrelated languages depending on whether the animal is alive or dead (since the people who tended to the farms and the people who actually ate their meat spoke different languages)...

There are probably more irregular verbs than regular ones... (again, probably because of English really being three different languages in a trenchcoat)...

At some point in the sixteenth century you apparently just up and decided to randomly switch the pronunciation of all your vowels... without changing how you wrote them...

While most languages have developed some form of standard and regulative body, English seems like it'd rather leave the whole grammar, orthography, pronunciation, and whatnot situation as an exercise for the ~~victim~~ speaker, writer, or reader...

Yeah, no, not particularly broken at all... 😒

People who started learning a second language, how has it made you aware how broken English is ? in c/[email protected]

[–] [email protected] 3 points 5 months ago* (last edited 4 months ago) (2 children)

Seriously, other languages at least adapt loanwords to their own grammar, orthography, and whatnot... English just grabs them as they are and runs away without looking back.

That's why you end up with the plural of radius being radii, or stuff like fiancé or façade (seriously, how are people who only speak English and have never seen a ç before in their lives supposed to know how to pronounce that‽)...

Of course it all comes from English being really three or four languages — (Anglo-)Saxon, Normand(/old French), and Norse — badly put together, so sprinkling bits of other languages on top didn't make much of a difference, when there were already about five different ways to pronounce, for instance, oo, and the whole vowel shift debacle didn't exactly help with this mess... but while other languages which may have had similar (if maybe less spectacular) growing pains eventually developed normative bodies, mostly from the eighteenth century onwards, that define and maintain a standard form of the language, English seems to have ignored all that and left grammar and orthography as a stylistic choice on the writers' part, and pronunciation as an exercise for the readers...