Any “small-web” search engines?

dch82@lemmy.zip · 6 months ago

Any “small-web” search engines?

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍 · 6 months ago

I’m designing off the top of my head, but I think you could do it with a DHT, or even just steal some distributed ledger algorithm from a blockchain. Or, you develop a distributed skip tree – but you’re right, any sort of distributed query is going to have a possibly unacceptable latency. So you might – like Bitcoin – distributed the index itself to participants (which could be large), but federate the indexing operation s.t. rather than a dozen different search engine crawlers hitting each web site, you’d have one or two crawlers per site feeding the shared index.

Distributed search engines have existed for over a decade. Several solutions for distributed Lucene clusters exist (SOLR, katta, ElasticSearch, O2) and while they’re mostly designed to be run in a LAN where the latencies between nodes is small, I don’t think it’s impossible to imagine a fairly low-latency distributed, replicated index where the nodes have a small subset of peer nodes which, together, encompass the entire index. No instance has the same set of peer nodes, but the combined index is eventually consistent.

Again, I’m thinking more about federating and distributing the index-building, to reduce web sites being hammered by search engines which constitute 80% of their traffic. Federating and distributing the query mechanism is a harder problem, but there’s a lot of existing R&D in this area, and technologies that could be borrowed from other domains (the aforementioned DHT and distributed ledger algorithms).