Unix Technical Forum

slow checkpoint question

This is a discussion on slow checkpoint question within the Pgsql Performance forums, part of the PostgreSQL category; --> What problems await if the I/O rates on devices used for WAL and data are very different? As background, ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 10:26 AM
shakahshakah
 
Posts: n/a
Default slow checkpoint question

What problems await if the I/O rates on devices used
for WAL and data are very different?

As background, I've been debugging a performance issue on
a ~1TB Postgres DB over the past week. While in this case
the root cause is most likely a sickly external RAID array,
it led me to some general questions about checkpointing.

While watching the logs I noticed some blocks of INSERT stmts
sometimes taking *minutes* to complete, followed by
periods of near-normal behavior. A closer look at an
application log showed a pattern similar to the following:

<period of normal inserts until...>
00:58:36 inserted 250 rows in 814 ms
01:01:37 inserted 250 rows in 180032 ms
<another period of normal inserts until...>
01:03:37 inserted 250 rows in 962 ms
01:05:59 inserted 250 rows in 142440 ms
<another period of normal inserts until...>
01:08:36 inserted 250 rows in 782 ms
01:11:19 inserted 250 rows in 163928 ms

My take on the above is the normal INSERTs are evidence
that writing to the WAL device is proceeding as normal,
whereas the slow blocks (appearing almost exactly at
5 min intervals) probably straddle a checkpoint
to a very slow device. It also suggests that
"checkpoint_timeout" is measured/applied from the
start of the previous WAL flush (not from completion).

1. Does the above sound reasonable?
2. What happens when checkpointing time exceeds
the checkpoint_timeout (i.e. am I on the
edge of disaster because of this)?
3. Is there a point in increasing
"checkpoint_timeout", or would the checkpoints
then just take even longer to complete?

FWIW, hardware is a dual-Opteron Sun v20z, WAL on local SCSI disks in
a RAID-1
mirror, data on external RAID 10 array connected via 2Gb FibreChannel.

Postgres version is:
reporting=> select version() ;
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 8.1.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)
(1 row)

Checkpoint and bgwriter settings are almost all defaults:
prompt$ egrep '(bgwriter|background|checkpoint)' /data02/pgsql/data/
postgresql.conf
#bgwriter_delay = 200 # 10-10000 milliseconds
between rounds
#bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers
scanned/round
#bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/
round
#bgwriter_all_percent = 0.333 # 0-100% of all buffers
scanned/round
#bgwriter_all_maxpages = 5 # 0-1000 buffers max written/
round
checkpoint_segments = 32 # in logfile segments, min 1,
16MB each
#checkpoint_timeout = 300 # range 30-3600, in seconds
#checkpoint_warning = 30 # in seconds, 0 is off

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:28 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com