Unix Technical Forum

Re: stats collector process high CPU utilization

This is a discussion on Re: stats collector process high CPU utilization within the Pgsql Performance forums, part of the PostgreSQL category; --> I wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Regarding temp tables, I'd think that the pgstat entries should be ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 04-19-2008, 10:12 AM
Tom Lane
 
Posts: n/a
Default Re: stats collector process high CPU utilization

I wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> Regarding temp tables, I'd think that the pgstat entries should be
>> getting dropped at some point in both releases. Maybe there's a bug
>> preventing that in 8.2?


> Hmmm ... I did rewrite the backend-side code for that just recently for
> performance reasons ... could I have broken it?


I did some testing with HEAD and verified that pgstat_vacuum_tabstat()
still seems to do what it's supposed to, so that theory falls down.

Alvaro, could you send Benjamin your stat-file-dumper tool so we can
get some more info? Alternatively, if Benjamin wants to send me a copy
of his stats file (off-list), I'd be happy to take a look.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-19-2008, 10:12 AM
Tom Lane
 
Posts: n/a
Default Re: stats collector process high CPU utilization

Benjamin Minshall <minshall@intellicon.biz> writes:
> When I checked on the server this morning, the huge stats file has
> returned to a normal size. I set up a script to track CPU usage and
> stats file size, and it appears to have decreased from 90MB down to
> about 2MB over roughly 6 hours last night. The CPU usage of the stats
> collector also decreased accordingly.


> The application logs indicate that there was no variation in the
> workload over this time period, however the file size started to
> decrease soon after the nightly pg_dump backups completed. Coincidence
> perhaps?


Well, that's pretty interesting. What are your vacuuming arrangements
for this installation? Could the drop in file size have coincided with
VACUUM operations? Because the ultimate backstop against bloated stats
files is pgstat_vacuum_tabstat(), which is run by VACUUM and arranges to
clean out any entries that shouldn't be there anymore.

It's sounding like what you had was just transient bloat, in which case
it might be useful to inquire whether anything out-of-the-ordinary had
been done to the database right before the excessive-CPU-usage problem
started.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 04-19-2008, 10:18 AM
Tom Lane
 
Posts: n/a
Default Re: stats collector process high CPU utilization

Benjamin Minshall <minshall@intellicon.biz> writes:
> Tom Lane wrote:
>> It's sounding like what you had was just transient bloat, in which case
>> it might be useful to inquire whether anything out-of-the-ordinary had
>> been done to the database right before the excessive-CPU-usage problem
>> started.


> I don't believe that there was any unusual activity on the server, but I
> have set up some more detailed logging to hopefully identify a pattern
> if the problem resurfaces.


A further report led us to realize that 8.2.x in fact has a nasty bug
here: the stats collector is supposed to dump its stats to a file at
most every 500 milliseconds, but the code was actually waiting only
500 microseconds :-(. The larger the stats file, the more obvious
this problem gets.

If you want to patch this before 8.2.4, try this...

Index: pgstat.c
================================================== =================
RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
retrieving revision 1.140.2.2
diff -c -r1.140.2.2 pgstat.c
*** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2
--- pgstat.c 1 Mar 2007 20:04:50 -0000
***************
*** 1689,1695 ****
/* Preset the delay between status file writes */
MemSet(&write_timeout, 0, sizeof(struct itimerval));
write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;

/*
* Read in an existing statistics stats file or initialize the stats to
--- 1689,1695 ----
/* Preset the delay between status file writes */
MemSet(&write_timeout, 0, sizeof(struct itimerval));
write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;

/*
* Read in an existing statistics stats file or initialize the stats to


regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 04-19-2008, 10:18 AM
Merlin Moncure
 
Posts: n/a
Default Re: stats collector process high CPU utilization

On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Benjamin Minshall <minshall@intellicon.biz> writes:
> > Tom Lane wrote:
> >> It's sounding like what you had was just transient bloat, in which case
> >> it might be useful to inquire whether anything out-of-the-ordinary had
> >> been done to the database right before the excessive-CPU-usage problem
> >> started.

>
> > I don't believe that there was any unusual activity on the server, but I
> > have set up some more detailed logging to hopefully identify a pattern
> > if the problem resurfaces.

>
> A further report led us to realize that 8.2.x in fact has a nasty bug
> here: the stats collector is supposed to dump its stats to a file at
> most every 500 milliseconds, but the code was actually waiting only
> 500 microseconds :-(. The larger the stats file, the more obvious
> this problem gets.


I think this explains the trigger that was blowing up my FC4 box.

merlin

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 04-19-2008, 10:18 AM
Tom Lane
 
Posts: n/a
Default Re: stats collector process high CPU utilization

"Merlin Moncure" <mmoncure@gmail.com> writes:
> On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> A further report led us to realize that 8.2.x in fact has a nasty bug
>> here: the stats collector is supposed to dump its stats to a file at
>> most every 500 milliseconds, but the code was actually waiting only
>> 500 microseconds :-(. The larger the stats file, the more obvious
>> this problem gets.


> I think this explains the trigger that was blowing up my FC4 box.


I dug in the archives a bit and couldn't find the report you're
referring to?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #16 (permalink)  
Old 04-19-2008, 10:18 AM
Merlin Moncure
 
Posts: n/a
Default Re: stats collector process high CPU utilization

On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
> > On 3/1/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> A further report led us to realize that 8.2.x in fact has a nasty bug
> >> here: the stats collector is supposed to dump its stats to a file at
> >> most every 500 milliseconds, but the code was actually waiting only
> >> 500 microseconds :-(. The larger the stats file, the more obvious
> >> this problem gets.

>
> > I think this explains the trigger that was blowing up my FC4 box.

>
> I dug in the archives a bit and couldn't find the report you're
> referring to?


I was referring to this:
http://archives.postgresql.org/pgsql...2/msg01418.php

Even though the fundamental reason was obvious (and btw, I inherited
this server less than two months ago), I was still curious what was
making 8.2 blow up a box that was handling a million tps/hour for over
a year. :-)

merlin

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #17 (permalink)  
Old 04-19-2008, 10:19 AM
Tom Lane
 
Posts: n/a
Default Re: stats collector process high CPU utilization

"Merlin Moncure" <mmoncure@gmail.com> writes:
> On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Merlin Moncure" <mmoncure@gmail.com> writes:
>>> I think this explains the trigger that was blowing up my FC4 box.

>>
>> I dug in the archives a bit and couldn't find the report you're
>> referring to?


> I was referring to this:
> http://archives.postgresql.org/pgsql...2/msg01418.php


Oh, the kernel-panic thing. Hm, I wouldn't have thought that replacing
a file at a huge rate would induce a kernel panic ... but who knows?
Do you want to try installing the one-liner patch and see if the panic
goes away?

Actually I was wondering a bit if that strange Windows error discussed
earlier today could be triggered by this behavior:
http://archives.postgresql.org/pgsql...3/msg00000.php

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #18 (permalink)  
Old 04-19-2008, 10:19 AM
Magnus Hagander
 
Posts: n/a
Default Re: stats collector process high CPU utilization

Tom Lane wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
>> On 3/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> "Merlin Moncure" <mmoncure@gmail.com> writes:
>>>> I think this explains the trigger that was blowing up my FC4 box.
>>> I dug in the archives a bit and couldn't find the report you're
>>> referring to?

>
>> I was referring to this:
>> http://archives.postgresql.org/pgsql...2/msg01418.php

>
> Oh, the kernel-panic thing. Hm, I wouldn't have thought that replacing
> a file at a huge rate would induce a kernel panic ... but who knows?
> Do you want to try installing the one-liner patch and see if the panic
> goes away?
>
> Actually I was wondering a bit if that strange Windows error discussed
> earlier today could be triggered by this behavior:
> http://archives.postgresql.org/pgsql...3/msg00000.php


I think that's very likely. If we're updaitng the file *that* often,
we're certainly doing something that's very unusual for the windows
filesystem, and possibly for the hardware as well :-)

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #19 (permalink)  
Old 04-19-2008, 10:19 AM
Bruce Momjian
 
Posts: n/a
Default Re: stats collector process high CPU utilization


Sorry, I introduced this bug.

---------------------------------------------------------------------------

Tom Lane wrote:
> Benjamin Minshall <minshall@intellicon.biz> writes:
> > Tom Lane wrote:
> >> It's sounding like what you had was just transient bloat, in which case
> >> it might be useful to inquire whether anything out-of-the-ordinary had
> >> been done to the database right before the excessive-CPU-usage problem
> >> started.

>
> > I don't believe that there was any unusual activity on the server, but I
> > have set up some more detailed logging to hopefully identify a pattern
> > if the problem resurfaces.

>
> A further report led us to realize that 8.2.x in fact has a nasty bug
> here: the stats collector is supposed to dump its stats to a file at
> most every 500 milliseconds, but the code was actually waiting only
> 500 microseconds :-(. The larger the stats file, the more obvious
> this problem gets.
>
> If you want to patch this before 8.2.4, try this...
>
> Index: pgstat.c
> ================================================== =================
> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
> retrieving revision 1.140.2.2
> diff -c -r1.140.2.2 pgstat.c
> *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2
> --- pgstat.c 1 Mar 2007 20:04:50 -0000
> ***************
> *** 1689,1695 ****
> /* Preset the delay between status file writes */
> MemSet(&write_timeout, 0, sizeof(struct itimerval));
> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
> ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
>
> /*
> * Read in an existing statistics stats file or initialize the stats to
> --- 1689,1695 ----
> /* Preset the delay between status file writes */
> MemSet(&write_timeout, 0, sizeof(struct itimerval));
> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
> ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
>
> /*
> * Read in an existing statistics stats file or initialize the stats to
>
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly


--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #20 (permalink)  
Old 04-19-2008, 10:19 AM
Joshua D. Drake
 
Posts: n/a
Default Re: stats collector process high CPU utilization

Bruce Momjian wrote:
> Sorry, I introduced this bug.


To the gallows with you! Don't feel bad, there were several hackers
that missed the math on that one.

Joshua D. Drake



>
> ---------------------------------------------------------------------------
>
> Tom Lane wrote:
>> Benjamin Minshall <minshall@intellicon.biz> writes:
>>> Tom Lane wrote:
>>>> It's sounding like what you had was just transient bloat, in which case
>>>> it might be useful to inquire whether anything out-of-the-ordinary had
>>>> been done to the database right before the excessive-CPU-usage problem
>>>> started.
>>> I don't believe that there was any unusual activity on the server, but I
>>> have set up some more detailed logging to hopefully identify a pattern
>>> if the problem resurfaces.

>> A further report led us to realize that 8.2.x in fact has a nasty bug
>> here: the stats collector is supposed to dump its stats to a file at
>> most every 500 milliseconds, but the code was actually waiting only
>> 500 microseconds :-(. The larger the stats file, the more obvious
>> this problem gets.
>>
>> If you want to patch this before 8.2.4, try this...
>>
>> Index: pgstat.c
>> ================================================== =================
>> RCS file: /cvsroot/pgsql/src/backend/postmaster/pgstat.c,v
>> retrieving revision 1.140.2.2
>> diff -c -r1.140.2.2 pgstat.c
>> *** pgstat.c 26 Jan 2007 20:07:01 -0000 1.140.2.2
>> --- pgstat.c 1 Mar 2007 20:04:50 -0000
>> ***************
>> *** 1689,1695 ****
>> /* Preset the delay between status file writes */
>> MemSet(&write_timeout, 0, sizeof(struct itimerval));
>> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
>> ! write_timeout.it_value.tv_usec = PGSTAT_STAT_INTERVAL % 1000;
>>
>> /*
>> * Read in an existing statistics stats file or initialize the stats to
>> --- 1689,1695 ----
>> /* Preset the delay between status file writes */
>> MemSet(&write_timeout, 0, sizeof(struct itimerval));
>> write_timeout.it_value.tv_sec = PGSTAT_STAT_INTERVAL / 1000;
>> ! write_timeout.it_value.tv_usec = (PGSTAT_STAT_INTERVAL % 1000) * 1000;
>>
>> /*
>> * Read in an existing statistics stats file or initialize the stats to
>>
>>
>> regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>> subscribe-nomail command to majordomo@postgresql.org so that your
>> message can get through to the mailing list cleanly

>



--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:33 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com