Unix Technical Forum

8.3beta1 testing on Solaris

This is a discussion on 8.3beta1 testing on Solaris within the Pgsql Performance forums, part of the PostgreSQL category; --> I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just avoiding the symptom to a later value.. I ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 04-19-2008, 11:41 AM
Jignesh K. Shah
 
Posts: n/a
Default Re: [HACKERS] 8.3beta1 testing on Solaris

I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is just
avoiding the symptom to a later value.. I promise to look more into it
before making any recommendations to increase NUM_CLOG_BUFFERs.


Because though "iGen" showed improvements in that area by increasing
num_clog_buffers , EAStress had shown no improvements.. Plus the reason
I think this is not the problem in 8.3beta1 since the Lock Output
clearly does not show CLOGControlFile as to be the issue which I had
seen in earlier case. So I dont think that increasing NUM_CLOG_BUFFERS
will change thing here.

Now I dont understand the code pretty well yet I see three hotspots and
not sure if they are related to each other
* ProcArrayLock waits - causing Waits as reported by
83_lockwait.d script
* SimpleLRUReadPage - causing read IOs as reported by
iostat/rsnoop.d
* GetSnapshotData - causing CPU utilization as reported by hotuser

But I will shut up and do more testing.

Regards,
Jignesh



Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>
>> Actually, 32 made a significant difference as I recall ... do you still have
>> the figures for that, Jignesh?
>>

>
> I'd want to see a new set of test runs backing up any call for a change
> in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area that
> benchmarks using code from a few months back shouldn't carry a lot of
> weight.
>
> regards, tom lane
>


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-19-2008, 11:41 AM
Jignesh K. Shah
 
Posts: n/a
Default Re: [HACKERS] 8.3beta1 testing on Solaris

Also to give perspective on the equivalent writes on CLOG

I used the following script which runs for 10 sec to track all writes to
the clog directory and here is what it came up with... (This is with 500
users running)

# cat write.d
#!/usr/sbin/dtrace -s
syscall::write:entry
/execname=="postgres" &&
dirname(fds[arg0].fi_pathname)=="/export/home0/igen/pgdata/pg_clog"/
{
@write[fds[arg0].fi_pathname,arg1] = count();
}
tick-10sec
{
exit(0);
}

# ./write.d
dtrace: script './write.d' matched 2 probes
CPU ID FUNCTION:NAME
3 1026 :tick-10sec

/export/home0/igen/pgdata/pg_clog/001E
-2753028277088 1
#
I modified read.d to do a 5sec read
# ./read.d
dtrace: script './read.d' matched 3 probes
CPU ID FUNCTION:NAME
0 1 :BEGIN
0 1027 :tick-5sec

/export/home0/igen/pgdata/pg_clog/001F
-2753028268896 1
/export/home0/igen/pgdata/pg_clog/001F
-2753028252512 1
/export/home0/igen/pgdata/pg_clog/001F
-2753028285280 2
/export/home0/igen/pgdata/pg_clog/001F
-2753028277088 3
/export/home0/igen/pgdata/pg_clog/001F
-2753028236128 3
/export/home0/igen/pgdata/pg_clog/001E
-2753028285280 5
/export/home0/igen/pgdata/pg_clog/001E
-2753028236128 9
/export/home0/igen/pgdata/pg_clog/001E
-2753028277088 13
/export/home0/igen/pgdata/pg_clog/001E
-2753028268896 15
/export/home0/igen/pgdata/pg_clog/001E
-2753028252512 27
#

So the ratio of reads vs writes to clog files is pretty huge..


-Jignesh



Jignesh K. Shah wrote:
> Tom,
>
> Here is what I did:
>
> I started aggregating all read information:
>
> First I also had added group by pid (arg0,arg1, pid) and the counts
> were all coming as 1
>
> Then I just grouped by filename and location (arg0,arg1 of reads) and
> the counts came back as
>
> # cat read.d
> #!/usr/sbin/dtrace -s
> syscall::read:entry
> /execname=="postgres"/
> {
> @read[fds[arg0].fi_pathname, arg1] = count();
> }
>
>
> # ./read.d
> dtrace: script './read.d' matched 1 probe
> ^C
>
> /export/home0/igen/pgdata/pg_clog/0014
> -2753028293472 1
> /export/home0/igen/pgdata/pg_clog/0014
> -2753028277088 1
> /export/home0/igen/pgdata/pg_clog/0015
> -2753028244320 2
> /export/home0/igen/pgdata/pg_clog/0015
> -2753028268896 14
> /export/home0/igen/pgdata/pg_clog/0015
> -2753028260704 25
> /export/home0/igen/pgdata/pg_clog/0015
> -2753028252512 27
> /export/home0/igen/pgdata/pg_clog/0015
> -2753028277088 28
> /export/home0/igen/pgdata/pg_clog/0015
> -2753028293472 37
>
>
> FYI I pressed ctrl-c within like less than a second
>
> So to me this seems that multiple processes are reading the same page
> from different pids. (This was with about 600 suers active.
>
> Aparently we do have a problem that we are reading the same buffer
> address again. (Same as not being cached anywhere or not finding it
> in cache anywhere).
>
> I reran lock wait script on couple of processes and did not see
> CLogControlFileLock as a problem..
>
> # ./83_lwlock_wait.d 14341
>
> Lock Id Mode Count
> WALInsertLock Exclusive 1
> ProcArrayLock Exclusive 16
>
> Lock Id Combined Time (ns)
> WALInsertLock 383109
> ProcArrayLock 198866236
>
> # ./83_lwlock_wait.d 14607
>
> Lock Id Mode Count
> WALInsertLock Exclusive 2
> ProcArrayLock Exclusive 15
>
> Lock Id Combined Time (ns)
> WALInsertLock 55243
> ProcArrayLock 69700140
>
> #
>
> What will help you find out why it is reading the same page again?
>
>
> -Jignesh
>
>
>
> Jignesh K. Shah wrote:
>> I agree with Tom.. somehow I think increasing NUM_CLOG_BUFFERS is
>> just avoiding the symptom to a later value.. I promise to look more
>> into it before making any recommendations to increase NUM_CLOG_BUFFERs.
>>
>>
>> Because though "iGen" showed improvements in that area by increasing
>> num_clog_buffers , EAStress had shown no improvements.. Plus the
>> reason I think this is not the problem in 8.3beta1 since the Lock
>> Output clearly does not show CLOGControlFile as to be the issue which
>> I had seen in earlier case. So I dont think that increasing
>> NUM_CLOG_BUFFERS will change thing here.
>>
>> Now I dont understand the code pretty well yet I see three hotspots
>> and not sure if they are related to each other
>> * ProcArrayLock waits - causing Waits as reported by
>> 83_lockwait.d script
>> * SimpleLRUReadPage - causing read IOs as reported by
>> iostat/rsnoop.d
>> * GetSnapshotData - causing CPU utilization as reported by hotuser
>>
>> But I will shut up and do more testing.
>>
>> Regards,
>> Jignesh
>>
>>
>>
>> Tom Lane wrote:
>>> Josh Berkus <josh@agliodbs.com> writes:
>>>
>>>> Actually, 32 made a significant difference as I recall ... do you
>>>> still have the figures for that, Jignesh?
>>>>
>>>
>>> I'd want to see a new set of test runs backing up any call for a change
>>> in NUM_CLOG_BUFFERS --- we've changed enough stuff around this area
>>> that
>>> benchmarks using code from a few months back shouldn't carry a lot of
>>> weight.
>>>
>>> regards, tom lane
>>>

>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 04-19-2008, 11:41 AM
Tom Lane
 
Posts: n/a
Default Re: [HACKERS] 8.3beta1 testing on Solaris

"Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
> So the ratio of reads vs writes to clog files is pretty huge..


It looks to me that the issue is simply one of not having quite enough
CLOG buffers. Your first run shows 8 different pages being fetched and
the second shows 10. Bearing in mind that we "pin" the latest CLOG page
into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for
older pages, so what we've got here is thrashing for the available
slots.

Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 04-19-2008, 11:41 AM
Jignesh K. Shah
 
Posts: n/a
Default Re: [HACKERS] 8.3beta1 testing on Solaris


I changed CLOG Buffers to 16

Running the test again:
# ./read.d
dtrace: script './read.d' matched 2 probes
CPU ID FUNCTION:NAME
0 1027 :tick-5sec

/export/home0/igen/pgdata/pg_clog/0024
-2753028219296 1
/export/home0/igen/pgdata/pg_clog/0025
-2753028211104 1
# ./read.d
dtrace: script './read.d' matched 2 probes
CPU ID FUNCTION:NAME
1 1027 :tick-5sec

# ./read.d
dtrace: script './read.d' matched 2 probes
CPU ID FUNCTION:NAME
1 1027 :tick-5sec

# ./read.d
dtrace: script './read.d' matched 2 probes
CPU ID FUNCTION:NAME
0 1027 :tick-5sec

/export/home0/igen/pgdata/pg_clog/0025
-2753028194720 1


So Tom seems to be correct that it is a case of CLOG Buffer thrashing.
But since I saw the same problem with two different workloads, I think
people hitting this problem is pretty high.

Also I am bit surprised that CLogControlFile did not show up as being
hot.. Maybe because not much writes are going on .. Or maybe since I did
not trace all 500 users to see their hot lock status..


Dmitri has another workload to test, I might try that out later on to
see if it causes similar impact or not.

Of course I havent seen my throughput go up yet since I am already CPU
bound... But this is good since the number of IOPS to the disk are
reduced (and hence system calls).


If I take this as my baseline number.. I can then proceed to hunt other
bottlenecks????


Whats the view of the community?

Hunt down CPU utilizations or Lock waits next?

Your votes are crucial on where I put my focus.

Another thing Josh B told me to check out was the wal_writer_delay setting:

I have done two settings with almost equal performance (with the CLOG 16
setting) .. One with 100ms and other default at 200ms.. Based on the
runs it seemed that the 100ms was slightly better than the default ..
(Plus the risk of loosing data is reduced from 600ms to 300ms)

Thanks.

Regards,
Jignesh




Tom Lane wrote:
> "Jignesh K. Shah" <J.K.Shah@Sun.COM> writes:
>
>> So the ratio of reads vs writes to clog files is pretty huge..
>>

>
> It looks to me that the issue is simply one of not having quite enough
> CLOG buffers. Your first run shows 8 different pages being fetched and
> the second shows 10. Bearing in mind that we "pin" the latest CLOG page
> into buffers, there are only NUM_CLOG_BUFFERS-1 buffers available for
> older pages, so what we've got here is thrashing for the available
> slots.
>
> Try increasing NUM_CLOG_BUFFERS to 16 and see how it affects this test.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:58 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com