this post was submitted on 19 Sep 2024
444 points (99.6% liked)
Technology
60052 readers
3244 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
At least in theory you could still do NLP from online sources, but the sheer amount of work necessary to ensure that you got the bots out makes it unfeasible.
Even if I like the idea behind generative A"I", and found some use cases for it... yeah I can't help but sympathise with Speer. Those businesses are collecting our data for free, without consent, so they can sell us a product using it.
Not just that, but the increasing number of sites blocking or having countermeasures against the tools they use also increases the amount of work/makes it harder.
Several years ago, it would have been easy and cheap to noodle up a quick Twitter or Reddit bot to churn through posts and spit out the posts on the other side. These days, you need to pay for that, and in some cases, pay quite a lot.
X (formerly known as Twitter), for example, wants to charge $100/month, and Reddit wants $0.24 per 100 API calls.
You can scrape, of course, but that risks getting you banned, if you're not going to run into barriers. The website formerly known as Twitter no longer allows you to see parent tweets, nor replies if you're not logged in, for example.