antonim

joined 1 year ago
[–] [email protected] 16 points 1 week ago

Yeah, totally makes sense, "they" attacked IA one month in advance before the elections, knowing that IA would spend around a month rewriting and improving their site code until the Save Page option would be enabled again (unless IA themselves are a part of the plot???), so that news articles could be "edited on the fly" (with what result?) until the election day, while other similar web archiving services such as archive.is would keep working just fine.

[–] [email protected] 30 points 1 week ago

clbottomt when the chtopt shows up [imagine this as that popular GIF meme]

[–] [email protected] 1 points 2 weeks ago (1 children)

And that's more or less what I was aiming for, so we're back at square one. What you wrote is in line with my first comment:

it is a weak compliment for AI, and more of a criticism of the current web search engines

The point is that there isn't something that makes AI inherently superior to ordinary search engines. (Personally I haven't found AI to be superior at all, but that's a different topic.) The difference in quality is mainly a consequence of some corporate fuckery to wring out more money from the investors and/or advertisers and/or users at the given moment. AI is good (according to you) just because search engines suck.

[–] [email protected] 1 points 2 weeks ago (3 children)

AI LLMs simply are better at surfacing it

Ok, but how exactly? Is there some magical emergent property of LLMs that guides them to filter out the garbage from the quality content?

[–] [email protected] 1 points 2 weeks ago (5 children)

If you don't feel like discussing this and won't do anything more than deliberately miss the point, you don't have to reply to me at all.

[–] [email protected] 2 points 2 weeks ago (7 children)

they’re a great use in surfacing information that is discussed and available, but might be buried with no SEO behind it to surface it

This is what I've seen many people claim. But it is a weak compliment for AI, and more of a criticism of the current web search engines. Why is that information unavailable to search engines, but is available to LLMs? If someone has put in the work to find and feed the quality content to LLMs, why couldn't that same effort have been invested in Google Search?

[–] [email protected] 2 points 1 month ago

Here in my southeast European shithole I'm not worrying about my tax money, the upgrade is going to be pretty cheap, they're just going to switch from unlicensed XP to unlicensed Win7.

[–] [email protected] 2 points 1 month ago

Yep, but I didn't mention that because it's not a part of the "Wayback Machine", it's just the general "Internet Archive" business of archiving media, which is for now still completely unavailable. (I've uploaded dozens of public-domain books there myself, and I'm really missing it...)

[–] [email protected] 15 points 1 month ago (2 children)

You can (well, could) put in any live URL there and IA would take a snapshot of the current page on your request. They also actively crawl the web and take new snapshots on their own. All of that counts as 'writing' to the database.

[–] [email protected] 5 points 1 month ago

it is quite literally named the “land of the blacks” after all that is what Egypt means

Egypt is from Greek and definitely doesn't mean that. The Egyptian endonym was kmt (traditionally pronounced as kemet), which is interpreted as "black land" (km means "black", -t is a nominal suffix, so it might be translated as black-ness, not at all "quite literally land of the blacks"), most likely referring to the fertile black soil around the Nile river. Trying to interpret that as "land of the blacks" should be suspicious already due to the fact people would hardly name themselves after their most ordinary physical characteristic; the Egyptians might call themselves black only if they were surrounded by non-black people and could view that as their own special characteristic, but they certainly neighboured and had contact with black peoples. And either way one has to wonder if the ancient views of white and black skin were meaningfully comparable to modern western ones. On the other hand, the fertile black soil most certainly is a differentia specifica of the settled Egyptian land that is surrounded by a desert.

[–] [email protected] 19 points 1 month ago

More screenshots are here: https://xcancel.com/p9cker_girl/status/1844203626681794716

What I find odd is that the message that they actually left on the site has nothing to do with Palestine, just childish "lol btfo" sort of message. So I wouldn't be surprised if these guys aren't the ones who actually did it, and it's merely a false flag to make pro-Palestinian protesters look like idiotic assholes.

 

Quite frequently I come across scanned books that are viewable for free online. For example, the publisher put them there (such as preview chapters), a library (old books from their collection that are in public domain), etc. Since I like hoarding data, and the online viewers that are used to present the book to me might not be very practical, I frequently try to download the books one way or another. This requires toying with the "inspect element" tool and various other methods of getting the images/PDF. Now, all that I access is what is, well, accessible; I don't hack into the servers or something. But - the stuff is meant to be hidden from the normal user. Does that act of hiding the material, no matter how primitive and easily circumvented, mean that I'm not allowed to access it at all?

I suppose ripping a public domain book is no big deal, but would books under copyright fare differently?

Mainly I'm asking out of curiosity, I don't expect the police to come visit me for ripping a 16th century dictionary.

Note: I live in EU, but I'd be curious to hear how this is treated elsewhere too.

Edit: I also remembered a funny trick I noticed on one site - it allows viewing PDFs on their website, but not downloading, unless you pay for the PDF. But when you load the page, even without paying, the PDF is already downloaded onto your computer and can be found in the browser cache. Is it legal to simply save the file that is already on your computer?

 
96
submitted 10 months ago* (last edited 10 months ago) by [email protected] to c/[email protected]
 

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2024-01-10/Traffic_report

Here's the top 50 list, with the number of views in brackets. The actual article also includes commentary and dates with peak amount of views.

  1. ChatGPT [52,565,681]
  2. Deaths in 2023 [48,603,284]
  3. 2023 Cricket World Cup [38,723,498]
  4. Oppenheimer (film) [31,265,503]
  5. J. Robert Oppenheimer [28,681,943]
  6. Cricket World Cup [26,390,217]
  7. Jawan (film) [23,112,884]
  8. Taylor Swift [22,179,656]
  9. The Last of Us (TV series) [21,000,722]
  10. Pathaan (film) [20,614,066]
  11. Premier League [19,968,486]
  12. Barbie (film) [19,930,916]
  13. Cristiano Ronaldo [19,287,757]
  14. The Idol (TV series) [19,186,512]
  15. United States [18,135,421]
  16. Matthew Perry [17,882,508]
  17. Lionel Messi [17,768,818]
  18. Animal (2023 film) [16,988,676]
  19. Elon Musk [16,026,256]
  20. India [15,200,006]
  21. Avatar: The Way of Water [15,062,733]
  22. Lisa Marie Presley [14,812,928]
  23. Guardians of the Galaxy Vol. 3 [14,155,874]
  24. Russian invasion of Ukraine [13,998,378]
  25. Leo (2023 Indian film) [13,994,461]
  26. List of highest-grossing Indian films [13,904,959]
  27. 2023 Israel–Hamas war [13,647,220]
  28. Israel [13,344,140]
  29. Andrew Tate [13,604,475]
  30. Elizabeth II [13,021,033]
  31. David Beckham [12,850,994]
  32. Fast X [12,763,269]
  33. Sinéad O'Connor [12,712,846]
  34. Spider-Man: Across the Spider-Verse [12,705,868]
  35. Elvis Presley [12,584,150]
  36. Killers of the Flower Moon (film) [12,525,826]
  37. Twitter [12,220,814]
  38. List of American films of 2023 [12,197,227]
  39. Travis Kelce [12,155,733]
  40. The Super Mario Bros. Movie [12,065,680]
  41. Pedro Pascal [12,022,551]
  42. Charles III [11,978,873]
  43. Donald Trump [11,925,480]
  44. Tina Turner [11,634,915]
  45. Indiana Jones and the Dial of Destiny [11,563,900]
  46. Joe Biden [11,152,150]
  47. John Wick: Chapter 4 [11,133,720]
  48. Gadar 2 [11,129,684]
  49. Everything Everywhere All at Once [11,115,623]
  50. Margot Robbie [11,041,143]
 

From https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-10-03/Recent_research

^By^ ^Tilman^ ^Bayer^

A preprint titled "Do You Trust ChatGPT? -- Perceived Credibility of Human and AI-Generated Content" presents what the authors (four researchers from Mainz, Germany) call surprising and troubling findings:

"We conduct an extensive online survey with overall 606 English speaking participants and ask for their perceived credibility of text excerpts in different UI [user interface] settings (ChatGPT UI, Raw Text UI, Wikipedia UI) while also manipulating the origin of the text: either human-generated or generated by [a large language model] ("LLM-generated"). Surprisingly, our results demonstrate that regardless of the UI presentation, participants tend to attribute similar levels of credibility to the content. Furthermore, our study reveals an unsettling finding: participants perceive LLM-generated content as clearer and more engaging while on the other hand they are not identifying any differences with regards to message’s competence and trustworthiness."

The human-generated texts were taken from the lead section of four English Wikipedia articles (Academy Awards, Canada, malware and US Senate). The LLM-generated versions were obtained from ChatGPT using the prompt Write a dictionary article on the topic "[TITLE]". The article should have about [WORDS] words.

The researchers report that

"[...] even if the participants know that the texts are from ChatGPT, they consider them to be as credible as human-generated and curated texts [from Wikipedia]. Furthermore, we found that the texts generated by ChatGPT are perceived as more clear and captivating by the participants than the human-generated texts. This perception was further supported by the finding that participants spent less time reading LLM-generated content while achieving comparable comprehension levels."

One caveat about these results (which is only indirectly acknowledged in the paper's "Limitations" section) is that the study focused on four quite popular (i.e. non-obscure) topics – Academy Awards, Canada, malware and US Senate. Also, it sought to present only the most important information about each of these, in the form of a dictionary entry (as per the ChatGPT prompt) or the lead section of a Wikipedia article. It is well known that the output of LLMs tends to be have fewer errors when it draws from information that is amply present in their training data (see e.g. our previous coverage of a paper that, for this reason, called for assessing the factual accuracy of LLM output on a benchmark that specifically includes lesser-known "tail topics"). Indeed, the authors of the present paper "manually checked the LLM-generated texts for factual errors and did not find any major mistakes," something that is well reported to not be the case for ChatGPT output in general. That said, it has similarly been claimed that Wikipedia, too, is less reliable on obscure topics. Also, the paper used the freely available version of ChatGPT (in its 23 March 2023 revision) which is based on the GPT 3.5 model, rather than the premium "ChatGPT Plus" version which, since March 2023, has been using the more powerful GPT-4 model (as does Microsoft's free Bing chatbot). GPT-4 has been found to have a significantly lower hallucination rate than GPT 3.5.

view more: next ›