Removed by mod
People don’t realize how ephemeral information is. How much information from the internet you think will survive 200 years from now?
On the one hand, what a tragedy. On the other hand, thank fuck.
Removed by mod
It’s an interesting thought experiment. We could preserve specific data if we cared to. But as others have echoed, with dynamic content delivery systems, editable forum and social media posts, and in some cases, the ability to petition companies to delete your online persona… all of these mean that storing snapshots becomes a more complex problem.
So far we have storage media which is probably good for 100 years or so before the physical medium begins to degrade. We then have to ensure that connections (physical plugs, protocols) are maintained or available 100 years from now. Offline cold storage sites exist but aren’t storing information to preserve human history. Any data that’s been overwritten or lost to dead links on the web may be sitting on a tape in a warehouse somewhere, but unless you know where to look and have the right credentials, it might as well be lost to time.
Cached webpages were lowkey clutch. Helped with some reddit posts that had deleted posts I needed tech help with.
While sucky, this feels inevitable.
With LLMs and the massive wave of spam coming out right now make caching content way more expensive. And then Google gains no value from this. Long tail spam attacks are already strangling google lately.
I think the only way to run a search engine in the mid 2020s is to download the data, process the page in memory, extract to metadata+embeddings and store only those. There’s no value in storing the rendered page offline for later analysis since you’re likely not doing that later analysis.
Internet Archive hopefully can fare better by being curated by humans and storing data infrequently when important, whereas Google needs to scan a lot of info frequently with nearly no human input.