Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54577 readers

213 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others

Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):

💰 Please help cover server costs.


Ko-fi	Liberapay

founded 1 year ago

MODERATORS

[email protected]

1172

The New York Times tried to block the Internet Archive: another reason to value the latter (walledculture.org)

submitted 1 year ago by [email protected] to c/[email protected]

67 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[+] [email protected] -83 points 1 year ago (37 children)

This is useful for pointing out if a news site is manipulating a narrative, but for other things, I think news site should get the privacy they need to make stealth edits.

Like:

More recently, the Times stealth-edited an article that originally listed “death” as one of six ways “you can still cancel your federal student loan debt.” Following the edit, the “death” section title was changed to a more opaque heading of “debt won’t carry on.”

This was just poor wording. No reason sites shouldn't have the peace of mind to change poor wording without being called out.

[–] [email protected] 19 points 1 year ago (1 children)

While I agree in theory, it's hard practically to give the ability to make private wording and typo edits without giving the ability to make more insidious changes - like pushing a certain narrative and then quietly changing words here and there to erase evidence of that after most people have read it, etc.

If news websites kept their own visible audit trail, much like Wikipedia, I could see the argument that Internet Archive doesn't need to capture these articles immediately, maybe it should be time bound to a year after publication or somesuch, and therefore recent news could retain its paywall by the NYT without being sidestepped by Internet Archive. (While it's annoying that articles are paywalled, news sites do need to make money and pay for actual news reporters.)

[+] [email protected] -7 points 1 year ago (1 children)

Yeah I'm surprised the archive hasn't worked out a deal with publishers simply to delay showing articles.

[–] [email protected] 14 points 1 year ago (1 children)

It exists, it's called a robots.txt file that the developers can put into place, and then bots like the webarchive crawler will ignore the content.

And therein lies the issue: if you place a robots.txt out for the content, all bots will ignore the content, including search engine indexers.

So huge publishers want it both ways, they want to be indexed, but they don't want the content to be archived.

If the NYT is serious about not wanting to have their content on the webarchive but still want humans to see it, the solution is simple: Put that content behind a login! But the NYT doesn't want to do that, since then they'll lose out on the ad revenue of having regular people load their website.

I think in the case of the article here though, the motivation is a bit more nefarious, in that the NYT et al simply don't want to be held accountable. So there's a choice to be had for them, either retain the privilege of being regarded as serious journalism, or act like a bunch of hacks that can't be relied upon.

[–] [email protected] 4 points 1 year ago

It exists, it's called a robots.txt file that the developers can put into place, and then bots like the webarchive crawler will ignore the content.

the internet archive doesn't respect robots.txt:

Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes.

the only way to stay out of the internet archive is to follow the process they created and hope they agree to remove you. or firewall them.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/

load more comments (35 replies)