Haven't heard about the Gutenberg project before, seems pretty neat!
I'd probably add repair.wiki to a list of things I'd archive, although some of that content is picture heavy so not as easily compressible as Wikipedia
There was a project that allows you to download wikipedia and some other online resources into an easy to search & navigate UI, think it was called Kiwi something but can't remember. It was targeted at regions with poor internet coverage
I feel sorry for whichever researchers are in charge of training and fine tuning those models.... ouch