Unix Technical Forum

io storm on checkpoints, postgresql 8.2.4, linux

This is a discussion on io storm on checkpoints, postgresql 8.2.4, linux within the Pgsql Performance forums, part of the PostgreSQL category; --> On Aug 22, 2007, at 10:57 AM, Kenneth Marshall wrote: > On Wed, Aug 22, 2007 at 07:33:35PM +0400, ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 04-19-2008, 11:26 AM
Erik Jones
 
Posts: n/a
Default Re: io storm on checkpoints, postgresql 8.2.4, linux


On Aug 22, 2007, at 10:57 AM, Kenneth Marshall wrote:

> On Wed, Aug 22, 2007 at 07:33:35PM +0400, Dmitry Potapov wrote:
>> Hello!
>>
>> We run a large (~66Gb) web-backend database on Postgresql
>> 8.2.4 on
>> Linux. The hardware is Dual Xeon 5130 with 16Gb ram, LSI Megaraid
>> U320-2x
>> scsi controller w/512Mb writeback cache and a BBU. Storage setup
>> contains 3
>> raid10 arrays (data, xlog, indexes, each on different array), 12
>> HDDs total.
>> Frontend application uses jdbc driver, connection pooling and
>> threads.
>>
>> We've run into an issue of IO storms on checkpoints. Once in
>> 20min
>> (which is checkpoint_interval) the database becomes unresponsive
>> for about
>> 4-8 seconds. Query processing is suspended, server does nothing
>> but writing
>> a large amount of data to disks. Because of the db server being
>> stalled,
>> some of the web clients get timeout and disconnect, which is
>> unacceptable.
>> Even worse, as the new requests come at a pretty constant rate, by
>> the time
>> this storm comes to an end there is a huge amount of sleeping app.
>> threads
>> waiting for their queries to complete. After the db server comes
>> back to
>> life again, these threads wake up and flood it with queries, so
>> performance
>> suffer even more, for some minutes after the checkpoint.
>>
>> It seemed strange to me that our 70%-read db generates so much
>> dirty
>> pages that writing them out takes 4-8 seconds and grabs the full
>> bandwidth.
>> First, I started to tune bgwriter to a more aggressive settings,
>> but this
>> was of no help, nearly no performance changes at all. Digging into
>> the issue
>> further, I discovered that linux page cache was the reason. "Dirty"
>> parameter in /proc/meminfo (which shows the amount of ready-to-
>> write "dirty"
>> data currently sitting in page cache) grows between checkpoints
>> from 0 to
>> about 100Mb. When checkpoint comes, all the 100mb got flushed out
>> to disk,
>> effectively causing a IO storm.
>>
>> I found this (http://www.westnet.com/~gsmith/content/linux-
>> pdflush.htm
>> <http://www.westnet.com/%7Egsmith/content/linux-pdflush.htm>)
>> document and
>> peeked into mm/page-writeback.c in linux kernel source tree. I'm
>> not sure
>> that I understand pdflush writeout semantics correctly, but looks
>> like when
>> the amount of "dirty" data is less than dirty_background_ratio*RAM/
>> 100,
>> pdflush only writes pages in background, waking up every
>> dirty_writeback_centisecs and writing no more than 1024 pages
>> (MAX_WRITEBACK_PAGES constant). When we hit
>> dirty_background_ratio, pdflush
>> starts to write out more agressively.
>>
>> So, looks like the following scenario takes place: postgresql
>> constantly
>> writes something to database and xlog files, dirty data gets to
>> the page
>> cache, and then slowly written out by pdflush. When postgres
>> generates more
>> dirty pages than pdflush writes out, the amount of dirty data in the
>> pagecache is growing. When we're at checkpoint, postgres does fsync
>> () on the
>> database files, and sleeps until the whole page cache is written out.
>>
>> By default, dirty_background_ratio is 2%, which is about 328Mb
>> of 16Gb
>> total. Following the curring pdflush logic, nearly this amount of
>> data we
>> face to write out on checkpoint effective stalling everything
>> else, so even
>> 1% of 16Gb is too much. My setup experience 4-8 sec pause in
>> operation even
>> on ~100Mb dirty pagecache...
>>
>> I temporaly solved this problem by setting
>> dirty_background_ratio to
>> 0%. This causes the dirty data to be written out immediately. It
>> is ok for
>> our setup (mostly because of large controller cache), but it
>> doesn't looks
>> to me as an elegant solution. Is there some other way to fix this
>> issue
>> without disabling pagecache and the IO smoothing it was designed
>> to perform?
>>
>> --
>> Regards,
>> Dmitry

>
> Dmitry,
>
> You are working at the correct level. The bgwriter performs the I/O
> smoothing
> function at the database level. Obviously, the OS level smoothing
> function
> needed to be tuned and you have done that within the parameters of
> the OS.
> You may want to bring this up on the Linux kernel lists and see if
> they have
> any ideas.
>
> Good luck,
>
> Ken


Have you tried decreasing you checkpoint interval? That would at
least help to reduce the amount of data that needs to be flushed when
Postgres fsyncs.

Erik Jones

Software Developer | EmmaŽ
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-19-2008, 11:27 AM
Decibel!
 
Posts: n/a
Default Re: io storm on checkpoints, postgresql 8.2.4, linux

On Tue, Aug 28, 2007 at 10:00:57AM -0500, Erik Jones wrote:
> >> It seemed strange to me that our 70%-read db generates so much
> >>dirty
> >>pages that writing them out takes 4-8 seconds and grabs the full
> >>bandwidth.
> >>First, I started to tune bgwriter to a more aggressive settings,
> >>but this
> >>was of no help, nearly no performance changes at all. Digging into
> >>the issue
> >>further, I discovered that linux page cache was the reason. "Dirty"
> >>parameter in /proc/meminfo (which shows the amount of ready-to-
> >>write "dirty"
> >>data currently sitting in page cache) grows between checkpoints
> >>from 0 to
> >>about 100Mb. When checkpoint comes, all the 100mb got flushed out
> >>to disk,
> >>effectively causing a IO storm.
> >>
> >> I found this (http://www.westnet.com/~gsmith/content/linux-
> >>pdflush.htm
> >><http://www.westnet.com/%7Egsmith/content/linux-pdflush.htm>)
> >>document and
> >>peeked into mm/page-writeback.c in linux kernel source tree. I'm
> >>not sure
> >>that I understand pdflush writeout semantics correctly, but looks
> >>like when
> >>the amount of "dirty" data is less than dirty_background_ratio*RAM/
> >>100,
> >>pdflush only writes pages in background, waking up every
> >>dirty_writeback_centisecs and writing no more than 1024 pages
> >>(MAX_WRITEBACK_PAGES constant). When we hit
> >>dirty_background_ratio, pdflush
> >>starts to write out more agressively.
> >>
> >> So, looks like the following scenario takes place: postgresql
> >>constantly
> >>writes something to database and xlog files, dirty data gets to
> >>the page
> >>cache, and then slowly written out by pdflush. When postgres
> >>generates more
> >>dirty pages than pdflush writes out, the amount of dirty data in the
> >>pagecache is growing. When we're at checkpoint, postgres does fsync
> >>() on the
> >>database files, and sleeps until the whole page cache is written out.
> >>
> >> By default, dirty_background_ratio is 2%, which is about 328Mb
> >>of 16Gb
> >>total. Following the curring pdflush logic, nearly this amount of
> >>data we
> >>face to write out on checkpoint effective stalling everything
> >>else, so even
> >>1% of 16Gb is too much. My setup experience 4-8 sec pause in
> >>operation even
> >>on ~100Mb dirty pagecache...
> >>
> >> I temporaly solved this problem by setting
> >>dirty_background_ratio to
> >>0%. This causes the dirty data to be written out immediately. It
> >>is ok for
> >>our setup (mostly because of large controller cache), but it
> >>doesn't looks
> >>to me as an elegant solution. Is there some other way to fix this
> >>issue
> >>without disabling pagecache and the IO smoothing it was designed
> >>to perform?

> >
> >You are working at the correct level. The bgwriter performs the I/O
> >smoothing
> >function at the database level. Obviously, the OS level smoothing
> >function
> >needed to be tuned and you have done that within the parameters of
> >the OS.
> >You may want to bring this up on the Linux kernel lists and see if
> >they have
> >any ideas.
> >
> >Good luck,
> >
> >Ken

>
> Have you tried decreasing you checkpoint interval? That would at
> least help to reduce the amount of data that needs to be flushed when
> Postgres fsyncs.


The downside to that is it will result in writing a lot more data to WAL
as long as full page writes are on.

Isn't there some kind of a timeout parameter for how long dirty data
will sit in the cache? It seems pretty broken to me to allow stuff to
sit in a dirty state indefinitely.
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.3 (FreeBSD)

iD8DBQFG1JTMdO30qud8SkgRAtk2AJwIQJCAstBw+lZa/a++KDBP9kvWvgCgvWHG
Lg25pyfHSy3cw/4thK0qiO8=
=s3wh
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:21 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com