Federation Lag-o-meter

hawkwind@lemmy.management · 2 years ago

Federation Lag-o-meter

BadlyHunt@lemmy.pwzle.com · 2 years ago

Awesome work! Added my instance!

aleph@lemm.ee · 2 years ago

When I saw the bar looking like the Burj Khalifa, I assumed it was .world instead of .ml. Interesting.

Props to Ruud@lemmy.world for dealing admirably with the Rexxit hug of death.

nsfw_alt_2023@lemmynsfw.com · edit-2 2 years ago

I’m expecting that JSON parsing is a huge overhead with the fediverse. I work on a SAAS that needs to do all its internal processing in under 10 ms, and serializing/deserializing ends up being a sizable chunk of server time. I saw a 40% reduction in runtime using simdjson for deserializing, and there exists a rust crate for it, but I haven’t had time to look the Lemmy code over.

Can anyone with an overloaded instance get on their command line and gather a decent flamegraph so the performance folks can aim optimizations in the right direction?

https://github.com/brendangregg/FlameGraph

CurlyWurlies4All@slrpnk.net · 2 years ago

Beehaw is currently doing the Burj

aleph@lemm.ee · 2 years ago

Yep, it seems completely different to when I last looked.

It seems everyone gets a turn a top.

myofficialaccount@feddit.de · 2 years ago

Nice work! Maybe add feddit.de?

hawkwind@lemmy.management · edit-2 2 years ago

Fixed! The regex was not getting content from < 0.18.0 instances. Thanks!

EDIT: I am wrong, it was something else in feddit.de’s messages I THOUGHT was a version thing, but must be a localization thing. A string in the JSON was breaking some regex. Regardless… fixed.

myofficialaccount@feddit.de · 2 years ago

Awesome, thank you :⁠-⁠)

maegul (he/they)@lemmy.ml · 2 years ago

Oooohhh … Nice!! I’m repeatedly impressed at how many hackers are going ahead and just getting some stuff done here!!

Questions/thoughts:

What instance is used as a reference for the delay? One you self-host (lemmy.management)?
Sooo … what’s the deal with lemmy.ml … that seems to have gone beyond lag and is basically falling over … seems like the devs have neglected their own instance’s health?
What’s that Redash? Is it a plotly thing or some other product that just uses their graphing library? How have you found it?

hawkwind@lemmy.management · 2 years ago

What instance is used as a reference for the delay? One you self-host (lemmy.management)?

Yes. lemmy.management. It is purposefully updating subscribed communities to as many as possible (via automation.) This doesn’t correct for network lag, but the idea was to capture the “federation” lag. There’s no code I’m aware of that allows admins to prioritize outbound federation traffic. I could be wrong though.

Sooo … what’s the deal with lemmy.ml … that seems to have gone beyond lag and is basically falling over … seems like the devs have neglected their own instance’s health?

I just collect the data.

What’s that Redash? Is it a plotly thing or some other product that just uses their graphing library? How have you found it?

https://redash.io I don’t remember how I found it. Probably an “awesome” list on github.

tenth@lemmy.world · 2 years ago

Great idea. I was trying to figure out if it was lemmy.world trying to deal with new users or a bug with Memmy app that caused random errors

Is it possible to have the lag metrics by instances in a table format? Its so hard to view your site on mobile

hawkwind@lemmy.management · 2 years ago

I didn’t even load it on mobile. I will check it out tonight and maybe just create a separate “mobile friendly” dashboard.

Wailzy@lemmy.world · 2 years ago

Not the person you’re replying to, but I didn’t find it awful on mobile. The zoom by dragging worked well, as did the double tap to view the whole dataset.

For a quick browse I wasn’t frustrated at all and found the information I wanted to in a short amount of time!

adrian@kbin.social · 2 years ago

This looks great. Is there any chance that this could be extended to include Kbin as well, since those instances federated with Lemmy, too?

hawkwind@lemmy.management · 2 years ago

I am actually working on that! Stay tuned. Like days though, don’t get too excited. :)

adrian@kbin.social · 2 years ago

Aye aye. I’m mildly excited.

hawkwind@lemmy.management · 2 years ago

kbin posts DO show up in the details table. you would need to know the ip they are coming from. they don’t include their instance host name in the header, which is why it’s not in the table and instance is null for some IPs. also I don’t scrape and subscribe kbin magazines like i do for lemmy ATM, so the traffic will be low. probably just a few from kbin.social.

thegiddystitcher@lemm.ee · 2 years ago

It’ll be interesting to see how this changes through the day! I know .world tends to slow down later in the day when the US contingent is getting going.

(also, yay lemm.ee)

FakeJake@fr3diver.se · 2 years ago

This looks really good.

As an admin of a small kbin instance, I’ll be keeping an eye on updates from you as this will be very handy!

UncleStewart@lemmy.world · 2 years ago

On mobile, when touching the “Federation Lag-o-meter (now - 1h)” statistics, the page is hard to scroll. Other than this the page is gold

cornflour@lemmy.ca · 2 years ago

This is really cool! Would it be possible to grab this data as json, csv or some other equivalent format? I’m working on making my own lemmy client and this would be very helpful to be able to display i think

hawkwind@lemmy.management · 2 years ago

Should already be able to:

https://redash.io/help/user-guide/integrations-and-api/api

For example: https://aftershock.lemmy.management/api/queries/4/results

The API key for public users is the same as the dashboard slug: oT7pdcoeHWccpvZCNmTpJKoGZND8ZdRO3wDWpMug

Ranger@programming.dev · 2 years ago

Graph should remove the outlier as it is skewing the results for every other instance and not letting to see smaller numbers show up.

Or we should move to log scale so that it can be displayed correctly.

possum@lemmy.ml · 2 years ago

This is awesome! Hopefully it’ll help spread the load among instances. Definitely going to use this to see which instance to move to (and which to avoid)

hawkwind@lemmy.management · 2 years ago

Keep in mind this is a one hour snapshot. I am working on a historical rating as well to give a better indication of overall long term stability.