this post was submitted on 09 Mar 2024
62 points (94.3% liked)

Technology

59374 readers
3794 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Wikifunctions is a new site that has been added to the list of sites operated by WMF. I definitely see uses for it in automating updates on Wikipedia and bots (and also for programmers to reference), but their goal is to translate Wikipedia articles to more languages by writing them in code that has a lot of linguistic information. I have mixed feelings about this, as I don't like existing programs that automatically generate articles (see the Cebuano and Dutch Wikipedias), and I worry that the system will be too complicated for average people.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 37 points 8 months ago (8 children)

Sounds like a great idea. Plain English (or any human language) is not the best way to store information. I've certainly noticed mismatches between the data in different languages, or across related articles, because they don't share the same data source.

Take a look at the article for NYC in English and French and you'll see a bunch of data points, like total area, that are different. Not huge differences, but any difference at all is enough to demonstrate the problem. There should be one canonical source of data shared by all representations.

Wikipedia is available in hundreds of languages. Why should hundreds of editors need to update the NYC page every time a new census comes out with new population numbers? Ideally, that would require only one change to update every version of the article.

In programming, the convention is to separate the data from the presentation. In this context, plain-English is the presentation, and weaving actual data into it is sub-optimal. Something like population or area size of a city is not language-dependent, and should not be stored in a language-dependent way.

Ultimately, this is about reducing duplicate effort and maintaining data integrity.

[–] [email protected] 14 points 8 months ago (2 children)

This problem was solved in like 2012 or 2013 with the introduction of Wikidata, but not all language editions have decided to use that.

[–] [email protected] 3 points 8 months ago (1 children)

How common is it in English? I haven't checked a lot of articles, but I did check the source of the English and French NYC articles I linked and it seems like all the information is hardcoded, not referenced from Wikidata.

[–] [email protected] 2 points 8 months ago

I think enwiki tends to use Wikidata relatively sparingly.

[–] [email protected] 1 points 8 months ago

but not all language editions have decided to use that.

Some people like their little power they call "meritocracy" to decide what belongs in the article and what doesn't.

load more comments (5 replies)