PostgreSQL Optimizations

daq@lemmy.daqfx.com · edit-2 1 year ago

PostgreSQL Optimizations

daq@lemmy.daqfx.com · 1 year ago

I never manually VACUUMed the DB. I just assumed it does it automatically at regular intervals. VACUUMing manually didn’t seem to make any difference and gave me the following error after a few minutes of running on various tables: ERROR: could not resize shared memory segment "/PostgreSQL.1987530338" to 67128672 bytes: No space left on device I’m not 100% sure where it out of space, but I’m assuming one of the configured buffers since there was still plenty of space left on disk and RAM. I didn’t notice any difference in iowait while it was running or after.
Yes, seeding is mostly inserts, but I see a roughly equal number of selects. I did increase shared_buffers and effective_cache_size with no effect.
https://ctxt.io/2/AABQciw3FA https://ctxt.io/2/AABQTprTEg https://ctxt.io/2/AABQKqOaEg

I did install Prometheus with PG exporter and Grafana. I’m not a DB expert and certainly not a PostgreSQL expert, but I don’t see anything that would indicate an issue. Anything specific you can suggest that I should focus on?

Thanks for all the suggestions!

bahmanm@lemmy.ml · 1 year ago

could not resize shared memory

That means too many chunky parallel maintenance workers are using the memory at the same time (max_parallel_maintenance_workers and maintenance_work_mem.)

VACCUMing is a very important part of how PG works; can you try setting max_parallel_maintenance_workers to 1 or even 0 (disable parallel altogether) and retry the experiment?

I did increase shared_buffers and effective_cache_size with no effect.

That probably rules out the theory of thrashed indices.

https://ctxt.io/2/AABQciw3FA https://ctxt.io/2/AABQTprTEg https://ctxt.io/2/AABQKqOaEg

Since those stats are cumulative, it’s hard to tell anything w/o knowing when was the SELECT run. It’d be very helpful if you could run those queries a few times w/ 1min interval and share the output.

I did install Prometheus with PG exporter and Grafana…Anything specific you can suggest that I should focus on?

I’d start w/ the 3 tables I mentioned in the previous point and try to find anomalies esp under different workloads. The rest, I’m afraid, is going to be a bit of an investigation and detective work.

If you like, you can give me access to the Grafana dashboard so I can take a look and we can take it from there. It’s going to be totally free of charge of course as I am quite interested in your problem: it’s both a challenge for me and helping a fellow Lemmy user. The only thing I ask is that we report back the results and solution here so that others can benefit from the work.

daq@lemmy.daqfx.com · 1 year ago

If you like, you can give me access to the Grafana dashboard so I can take a look and we can take it from there. It’s going to be totally free of charge of course as I am quite interested in your problem: it’s both a challenge for me and helping a fellow Lemmy user. The only thing I ask is that we report back the results and solution here so that others can benefit from the work.

No problem. PM me an IP (v4 or v6) or an email address (disposable is fine) and I’ll reply with a link to access Grafana with above in allow list.