this post was submitted on 09 Mar 2024
62 points (94.3% liked)

Technology

59374 readers
7033 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Wikifunctions is a new site that has been added to the list of sites operated by WMF. I definitely see uses for it in automating updates on Wikipedia and bots (and also for programmers to reference), but their goal is to translate Wikipedia articles to more languages by writing them in code that has a lot of linguistic information. I have mixed feelings about this, as I don't like existing programs that automatically generate articles (see the Cebuano and Dutch Wikipedias), and I worry that the system will be too complicated for average people.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 6 months ago

The writer will need to tag things down, to minimal details, for the sake of languages that they don’t care about.

Sure and that’s likely a good bit of work.

It isn't just "a good bit of work", it's an unreasonably large amount of work. It's like draining the ocean with a bucket. I'm talking about tagging hundreds of subtle distinctions for each sentence, and that not tagging those distinctions will output nonsense for at least some language.

However, you must consider [implied: "you didn't consider"] the alternative which is translating the entire text to dozens of languages

I did consider it. And it's blatantly clearly overall less work, and easier to distribute among multiple translators.

For example. If I'm translating some genitive construction from Portuguese to Latin, I don't need to care on which side of English's esoteric "of vs. 's" distinction it lies in. Or if I'm expected to use の/no in Japanese in that situation. Or to tag "hey, this is not alienable!" for the sake of Nahuatl. I need to deal with oddities of exactly two languages - source and target.

Under the proposed system though? Enjoy tagging a single word [jap-no][eng-of][lat-gen][nah-inal]. And that's only for four languages.

(inb4: this shit depends on meaning, so no, code can't handle it. At most code can convert sea[lat-gen] to "maris", but it won't "magically" know if it needs to use the genitive or ablative, or if English would use "of" or "'s".)

and doing the same for any update done to said text

False dichotomy.

I’d assume

If you're eager to assume (i.e. to make shit up and take it as true), please do not waste my time.

that to be even more work by at least one order of magnitude.

Source: you made it up.

Many languages are quite similar to another. An article written in the hypothetical abstract language and tuned on an abstract level to produce good results in German would likely produce good results in Dutch too and likely wouldn’t need much tweaking for good results in e.g. English. This has the potential to save ton of work.

Okay... I've stopped reading here. If your low-hanging fruit example is three closely related languages, then it's blatantly clear that you're ignorant on the sheer scale of the problem.