One of the most unforgivable things about reddit is how pathetic the search engine is, considering the amount of free, top notch information is captured there and you need google +reddit to get at it, what can we do to make federated alternatives self searchable ?
This is going to be even worse than reddit search, unfortunately. There’s not an easy way to make a search like this scale for the small amount of instances we even know about. Considering there are tons of instance out there and there will probably be more in the future, these problems are going to crop up a lot more. It’s actually much easier to search in one centralized location, however the reddit search actually ended up being implemented.
It does become very fragmented. A post on my single-user server is going to be low down the rankings compared to the same post on a subreddit with the weight of the reddit domain name behind it. I’m also not entirely sure if/how content here gets indexed, especially when it appears under different federated domains. Content discovery is very different in a distributed world.
Don’t know how to help but agree on how important search is. Which might be even harder to do given federation.
Also upvote for firefly user name
Discord, for example, means all useful information is captured by discord, never to be searched by plebs. IRC is usually ephemeral. Most web search has been diluted by SEO and content farms to the point of uselessness. Perhaps we can think about next gen search right now. A point of hope is things like gigabrain which, it would seem, use LLMs to ‘cut through the noise’, but also summarize and collate, seems like a useful way forward if distributed. Happy to look into it myself, but would like to hear others input. (pleasently ppl were commenting before I finished)
Eventually I hope lemmy.directory will be great for this purpose. It’s a Lemmy instance configured to pick up every Lemmy community it can find.
I work for a small company that runs a website with lots of information and our search has always sucked. We tried several tweaks and free solutions - the final decision was to pay for search which is what we did and it is awesome now, but expensive. A major company like Reddit should be able to figure it out, but search is harder than most people realize. Google just makes it look easy.
In the past I normally used Pushshift to search Reddit due to how poor the search engine was. I think it was only until very recently when they finally added comment searching.
Simplest implementation is that an instance searches its own content while sending requests to federated instances and merging their results in with its own based on whatever method the instance admins want (whether it puts its own results at the top, or treats them as one set, or whatever). That could cause a lot of traffic and has a load of latency while your search spreads out hop by hop, to the instances that yours is federated with, to the ones they’re federated with, etc. Plus you’d need a mechanism to stop instances from sending a search to an instance that’s already got it, to avoid hammering instances that have multiple federation paths to yours. Not an easy problem.
You might be able to do some kind of index publication where an instance publishes the most notable posts for other instances to include in their indexes, so that when you search it could show you results from among hot posts elsewhere in the fediverse - not an exhaustive list, but a search within posts that are getting attention.
There’s also other stuff I’d be tempted to experiment with, like using some kind of TF-IDF ranking to choose what counts as “most notable”, rather than just activity or view count, so that posts that are particularly relevant to certain topics could be publicised. An instance could even choose to filter that, so for example an instance who chooses to focus on tech topics could publicise highly-relevant tech posts but filter out politics keywords even when a post gets high relevance scores, so that political discussion on that instance is less visible, even when searched for.
Thankyou for applying soilid thought. What there would you consider actionable ? As in could likely be coded (for free)
Any of that could be done; there’s some parts that are more challenging but there are certainly harder things that have been solved by open-source software. I know almost nothing about how Lemmy’s innards are built though, so I couldn’t hazard a guess as to how much effort any of it would take. Some of it could possibly be achieved through separate services that you could host alongside a Lemmy instance, or entirely on their own, while other parts would really work best as features within Lemmy’s own codebase.