This is a discussion on Re: stats collector process high CPU utilization within the Pgsql Performance forums, part of the PostgreSQL category; --> I wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Regarding temp tables, I'd think that the pgstat entries should be ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Regarding temp tables, I'd think that the pgstat entries should be >> getting dropped at some point in both releases. Maybe there's a bug >> preventing that in 8.2? > Hmmm ... I did rewrite the backend-side code for that just recently for > performance reasons ... could I have broken it? I did some testing with HEAD and verified that pgstat_vacuum_tabstat() still seems to do what it's supposed to, so that theory falls down. Alvaro, could you send Benjamin your stat-file-dumper tool so we can get some more info? Alternatively, if Benjamin wants to send me a copy of his stats file (off-list), I'd be happy to take a look. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Benjamin Minshall <minshall@intellicon.biz> writes: > When I checked on the server this morning, the huge stats file has > returned to a normal size. I set up a script to track CPU usage and > stats file size, and it appears to have decreased from 90MB down to > about 2MB over roughly 6 hours last night. The CPU usage of the stats > collector also decreased accordingly. > The application logs indicate that there was no variation in the > workload over this time period, however the file size started to > decrease soon after the nightly pg_dump backups completed. Coincidence > perhaps? Well, that's pretty interesting. What are your vacuuming arrangements for this installation? Could the drop in file size have coincided with VACUUM operations? Because the ultimate backstop against bloated stats files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to clean out any entries that shouldn't be there anymore. It's sounding like what you had was just transient bloat, in which case it might be useful to inquire whether anything out-of-the-ordinary had been done to the database right before the excessive-CPU-usage problem started. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Benjamin Minshall <minshall@intellicon.biz> writes: > Tom Lane wrote: >> It's sounding like what you had was just transient bloat, in which case >> it might be useful to inquire whether anything out-of-the-ordinary had >> been done to the database right before the excessive-CPU-usage problem >> started. > I don't believe that there was any unusual activity on the server, but I > have set up some more detailed logging to hopefully identify a pattern > if the problem resurfaces. A further report led us to realize that 8.2.x in fact has a nasty bug here: the stats collector is supposed to dump its stats to a file at most every 500 milliseconds, but the code was actually waiting only 500 microseconds :-(. The larger the stats file, the more obvious this problem gets. If you want to patch this before 8.2.4, try this... Index: pgstat.c ================================================== ================= RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v retrieving revision 1.140.2.2 diff -c -r1.140.2.2 pgstat.c *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2 --- pgstat.c 1 Mar 2007 20:04:50 -0000 *************** *** 1689,1695 **** /* Preset the delay between status file writes */ MemSet(&write_timeout, 0, sizeof(struct itimerval)); write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000; /* * Read in an existing statistics stats file or initialize the stats to --- 1689,1695 ---- /* Preset the delay between status file writes */ MemSet(&write_timeout, 0, sizeof(struct itimerval)); write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000; /* * Read in an existing statistics stats file or initialize the stats to regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Benjamin Minshall <minshall@intellicon.biz> writes: > > Tom Lane wrote: > >> It's sounding like what you had was just transient bloat, in which case > >> it might be useful to inquire whether anything out-of-the-ordinary had > >> been done to the database right before the excessive-CPU-usage problem > >> started. > > > I don't believe that there was any unusual activity on the server, but I > > have set up some more detailed logging to hopefully identify a pattern > > if the problem resurfaces. > > A further report led us to realize that 8.2.x in fact has a nasty bug > here: the stats collector is supposed to dump its stats to a file at > most every 500 milliseconds, but the code was actually waiting only > 500 microseconds :-(. The larger the stats file, the more obvious > this problem gets. I think this explains the trigger that was blowing up my FC4 box. merlin ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| "Merlin Moncure" <mmoncure@gmail.com> writes: > On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A further report led us to realize that 8.2.x in fact has a nasty bug >> here: the stats collector is supposed to dump its stats to a file at >> most every 500 milliseconds, but the code was actually waiting only >> 500 microseconds :-(. The larger the stats file, the more obvious >> this problem gets. > I think this explains the trigger that was blowing up my FC4 box. I dug in the archives a bit and couldn't find the report you're referring to? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Merlin Moncure" <mmoncure@gmail.com> writes: > > On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> A further report led us to realize that 8.2.x in fact has a nasty bug > >> here: the stats collector is supposed to dump its stats to a file at > >> most every 500 milliseconds, but the code was actually waiting only > >> 500 microseconds :-(. The larger the stats file, the more obvious > >> this problem gets. > > > I think this explains the trigger that was blowing up my FC4 box. > > I dug in the archives a bit and couldn't find the report you're > referring to? I was referring to this: http://archives.postgresql.org/pgsql...2/msg01418.php Even though the fundamental reason was obvious (and btw, I inherited this server less than two months ago), I was still curious what was making 8.2 blow up a box that was handling a million tps/hour for over a year. :-) merlin ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| "Merlin Moncure" <mmoncure@gmail.com> writes: > On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Merlin Moncure" <mmoncure@gmail.com> writes: >>> I think this explains the trigger that was blowing up my FC4 box. >> >> I dug in the archives a bit and couldn't find the report you're >> referring to? > I was referring to this: > http://archives.postgresql.org/pgsql...2/msg01418.php Oh, the kernel-panic thing. Hm, I wouldn't have thought that replacing a file at a huge rate would induce a kernel panic ... but who knows? Do you want to try installing the one-liner patch and see if the panic goes away? Actually I was wondering a bit if that strange Windows error discussed earlier today could be triggered by this behavior: http://archives.postgresql.org/pgsql...3/msg00000.php regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Tom Lane wrote: > "Merlin Moncure" <mmoncure@gmail.com> writes: >> On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> "Merlin Moncure" <mmoncure@gmail.com> writes: >>>> I think this explains the trigger that was blowing up my FC4 box. >>> I dug in the archives a bit and couldn't find the report you're >>> referring to? > >> I was referring to this: >> http://archives.postgresql.org/pgsql...2/msg01418.php > > Oh, the kernel-panic thing. Hm, I wouldn't have thought that replacing > a file at a huge rate would induce a kernel panic ... but who knows? > Do you want to try installing the one-liner patch and see if the panic > goes away? > > Actually I was wondering a bit if that strange Windows error discussed > earlier today could be triggered by this behavior: > http://archives.postgresql.org/pgsql...3/msg00000.php I think that's very likely. If we're updaitng the file *that* often, we're certainly doing something that's very unusual for the windows filesystem, and possibly for the hardware as well :-) //Magnus ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Sorry, I introduced this bug. --------------------------------------------------------------------------- Tom Lane wrote: > Benjamin Minshall <minshall@intellicon.biz> writes: > > Tom Lane wrote: > >> It's sounding like what you had was just transient bloat, in which case > >> it might be useful to inquire whether anything out-of-the-ordinary had > >> been done to the database right before the excessive-CPU-usage problem > >> started. > > > I don't believe that there was any unusual activity on the server, but I > > have set up some more detailed logging to hopefully identify a pattern > > if the problem resurfaces. > > A further report led us to realize that 8.2.x in fact has a nasty bug > here: the stats collector is supposed to dump its stats to a file at > most every 500 milliseconds, but the code was actually waiting only > 500 microseconds :-(. The larger the stats file, the more obvious > this problem gets. > > If you want to patch this before 8.2.4, try this... > > Index: pgstat.c > ================================================== ================= > RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v > retrieving revision 1.140.2.2 > diff -c -r1.140.2.2 pgstat.c > *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2 > --- pgstat.c 1 Mar 2007 20:04:50 -0000 > *************** > *** 1689,1695 **** > /* Preset the delay between status file writes */ > MemSet(&write_timeout, 0, sizeof(struct itimerval)); > write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; > ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000; > > /* > * Read in an existing statistics stats file or initialize the stats to > --- 1689,1695 ---- > /* Preset the delay between status file writes */ > MemSet(&write_timeout, 0, sizeof(struct itimerval)); > write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; > ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000; > > /* > * Read in an existing statistics stats file or initialize the stats to > > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Bruce Momjian wrote: > Sorry, I introduced this bug. To the gallows with you! that missed the math on that one. Joshua D. Drake > > --------------------------------------------------------------------------- > > Tom Lane wrote: >> Benjamin Minshall <minshall@intellicon.biz> writes: >>> Tom Lane wrote: >>>> It's sounding like what you had was just transient bloat, in which case >>>> it might be useful to inquire whether anything out-of-the-ordinary had >>>> been done to the database right before the excessive-CPU-usage problem >>>> started. >>> I don't believe that there was any unusual activity on the server, but I >>> have set up some more detailed logging to hopefully identify a pattern >>> if the problem resurfaces. >> A further report led us to realize that 8.2.x in fact has a nasty bug >> here: the stats collector is supposed to dump its stats to a file at >> most every 500 milliseconds, but the code was actually waiting only >> 500 microseconds :-(. The larger the stats file, the more obvious >> this problem gets. >> >> If you want to patch this before 8.2.4, try this... >> >> Index: pgstat.c >> ================================================== ================= >> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v >> retrieving revision 1.140.2.2 >> diff -c -r1.140.2.2 pgstat.c >> *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2 >> --- pgstat.c 1 Mar 2007 20:04:50 -0000 >> *************** >> *** 1689,1695 **** >> /* Preset the delay between status file writes */ >> MemSet(&write_timeout, 0, sizeof(struct itimerval)); >> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; >> ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000; >> >> /* >> * Read in an existing statistics stats file or initialize the stats to >> --- 1689,1695 ---- >> /* Preset the delay between status file writes */ >> MemSet(&write_timeout, 0, sizeof(struct itimerval)); >> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000; >> ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000; >> >> /* >> * Read in an existing statistics stats file or initialize the stats to >> >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |