884
Google won’t comment on a potentially massive leak of its search algorithm documentation
(www.theverge.com)
This is a most excellent place for technology news and articles.
Can't wait for selfhosted web search to become better.
You mean hosting your own crawler/indexer? That doesn't really sound like a thing you could do cost-effectively.
Surprisingly, it's very doable, requires basic technical knowledge and relatively minimal computing resources (runs in the background on your computer).
https://yacy.net/ Github
I have tampermonkey script that sends yacy to crawl any websites that I visit, and it's keeping up relatively good index for personal use of the visited websites. Combine yacy with ~300gb of Kiwix databases, add searxng as a frontend and you have pretty strong self hosted search engine.
Of course you need to supplement your searches from other search engines, as yacy does not crawl the whole web, just what you tell it to.
I encourage anyone who's even slightly interested on this stuff to try Yacy, it's ancient piece of software, but it still works very well and is not an abandoned project yet!
--
I personally use Yacy mostly on private mode, but it does have the distributed network there as well.
Yeah, I guess the P2P component sort of solves part of the issue I was imagining by distributing indexes and crawling. I was thinking that people were trying to run all of Google on a raspberry pi at home.