This is signal detection theory combined with an arms race that keeps the problem hard. You cannot block scrapers without blocking people, and you cannot inconvenience bots without also inconveniencing readers. You might figure something clever out temporarily, but eventually this truism will resurface. Excuse me while I solve a few more captchas.
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
-
No spam posting.
-
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
-
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
-
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
-
No trolling.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Time to start hosting Trojans on your website
The internet as we know it is dead, we just need a few more years to realise it. And I'm afraid that telecommunications will be going the same way, when no-one can trust that anyone is who they say anymore.
Thanks, great site! 😊
You're welcome.
I believe I found it originally via the "distribuverse"... specifically, ZeroNet.
They block VPN exit nodes. Why bother hosting a web site if you don't want anyone to read your content?
Fuck that noise. My privacy is more important to me than your blog.
A problem with this approach was that many readers use VPN's and other proxies that change IP addresses virtually every time they use them. For that reason and because I believe in protecting every Internet user's privacy as much as possible, I wanted a way of immediately unblocking visitors to my website without them having to reveal personal information like names and email addresses.
I recently spent a few weeks on a new idea for solving this problem. With some help from two knowledgeable users on Blue Dwarf, I came up with a workable approach two weeks ago. So far, it looks like it works well enough. To summarize this method, when a blocked visitor reaches my custom 403 error page, he is asked whether he would like to be unblocked by having his IP address added to the website's white list. If he follows that hypertext link, he is sent to the robot test page. If he answers the robot test question correctly, his IP address is automatically added to the white list. He doesn't need to enter it or even know what it is. If he fails the test, he is told to click on the back button in his browser and try again. After he has passed the robot test, Nginx is commanded to reload its configuration file (PHP command: shell_exec("sudo nginx -s reload");), which causes it to immediately accept the new whitelist entry, and he is granted immediate access. He is then allowed to visit cheapskatesguide as often as he likes for as long as he continues to use the same IP address. If he switches IP addresses in the future, he has about a one in twenty chance of needing to pass the robot test again each time he switches IP addresses. My hope is that visitors who use proxies will only have to pass the test a few times a year. As the whitelist grows, I suppose that frequency may decrease. Of course, it will reach a non-zero equilibrium point that depends on the churn in the IP addresses being used by commercial web-hosting companies. In a few years, I may have a better idea of where that equilibrium point is.
They block VPN exit nodes. Why bother hosting a web site if you don’t want anyone to read your content?
Fuck that noise. My privacy is more important to me than your blog.
It's a minimalist private blog that sets no 3rd party cookies and loads no 3rd party resources. I presume that alleviates your concerns? 😜
The admin could use a CDN and not worry about it, if it's just static content.
I believe using a CDN would defeat the author's goal of not being reliant on third-party service providers.
and filtering malicious traffic is more important to me than you visiting my services, so I guess that makes us even :-)
You know how popular VPNs are, right? And how they improve privacy and security for people who is them? And you're blocking anyone who's exercising a basic privacy right?
It's not an ethically sound position.
You had me until the "ethically sound position" part.
You're saying that Joe Blogger is acting unethically because he doesn't allow VPN users to visit his site. C'mon, brother.
You're saying targeting people who are taking steps to improve their privacy and security is ethical? Out do you just believe that there's no such thing as ethics in CIS?
You're putting words in my mouth. I didn't say that. Targeting sounds like specifically doing it with an agenda.
What you're saying the equivalent of being offended that you can't bring guns inside someone's private property because they don't want to, period. "It is not ethical that you forbid me from exercising my constitutional rights of bearing arms in your house. How dare you not allowing me to put my AK-47 in your kitchen counter!"
Nope. I said that if someone doesn't want to deal with VPN users because it's more hassle than worth (e.g. bots), then so be it. Joe Blogger may get 20 visitors a month instead of 24. Oh the horror!
I am a huge advocate of privacy laws. But if Joe Blogger doesn't allow me in his personal website, eh. I might try archive.org.