this post was submitted on 21 May 2025
579 points (98.8% liked)

Technology

70851 readers
3359 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Researchers published a massive database of more than 2 billion Discord messages that they say they scraped using Discord’s public API. The data was pulled from 3,167 servers and covers posts made between 2015 and 2024, the entire time Discord has been active.

Though the researchers claim they’ve anonymized the data, it’s hard to imagine anyone is comfortable with almost a decade of their Discord messages sitting in a public JSON file online. Separately, a different programmer released a Discord tool called "Searchcord" based on a different data set that shows non-anonymized chat histories.

(page 2) 33 comments
sorted by: hot top controversial new old
[–] [email protected] 3 points 2 weeks ago

If they were on OPEN servers, I doubt they cared that much.

[–] [email protected] 3 points 2 weeks ago (3 children)

I can't find this "public" json

load more comments (3 replies)
[–] [email protected] 2 points 2 weeks ago

I was hoping people would do this!!!

[–] [email protected] 1 points 1 week ago

I was hoping to play around with the dataset over the weekend to toy with some text-embedding techniques, but they’ve pulled the cord on the download links.

Anyone have a copy of the full archive they’re willing to share, or a magnet link?

[–] [email protected] 0 points 1 week ago

404? another source please? I don't trust them on this exact thing.

load more comments
view more: ‹ prev next ›