This is a discussion on strange performance regression between 7.4 and 8.1 within the Pgsql Performance forums, part of the PostgreSQL category; --> On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Joshua D. Drake wrote: > > > ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Joshua D. Drake wrote: > > > Alex Deucher wrote: > >> Hello, > >> > >> I have noticed a strange performance regression and I'm at a loss as > >> to what's happening. We have a fairly large database (~16 GB). The > >> original postgres 7.4 was running on a sun v880 with 4 CPUs and 8 GB > >> of ram running Solaris on local scsi discs. The new server is a sun > >> Opteron box with 4 cores, 8 GB of ram running postgres 8.1.4 on Linux > >> (AMD64) on a 4 Gbps FC SAN volume. When we created the new database > >> it was created from scratch rather than copying over the old one, > >> however the table structure is almost identical (UTF8 on the new one > >> vs. C on the old). The problem is queries are ~10x slower on the new > >> hardware. I read several places that the SAN might be to blame, but > >> testing with bonnie and dd indicates that the SAN is actually almost > >> twice as fast as the scsi discs in the old sun server. I've tried > >> adjusting just about every option in the postgres config file, but > >> performance remains the same. Any ideas? > > > > Vacuum? Analayze? default_statistics_target? How many shared_buffers? > > effective_cache_size? work_mem? > > Also, an explain analyze from both the 7.4 and 8.1 systems with one of the > 10x slower queries would probably be handy. here are some examples. Analyze is still running on the new db, I'll post results when that is done. Mostly what our apps do is prepared row selects from different tables: select c1,c2,c3,c4,c5 from t1 where c1='XXX'; old server: db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c2_index on t1 (cost=0.00..166.89 rows=42 width=26) (actual time=5.722..5.809 rows=2 loops=1) Index Cond: ((c2)::text = '6258261'::text) Total runtime: 5.912 ms (3 rows) db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c1_key on t1 (cost=0.00..286.08 rows=72 width=26) (actual time=12.423..12.475 rows=12 loops=1) Index Cond: ((c1)::text = '6258261'::text) Total runtime: 12.538 ms (3 rows) new server: db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c2_index on t1 (cost=0.00..37.63 rows=11 width=26) (actual time=33.461..51.377 rows=2 loops=1) Index Cond: ((c2)::text = '6258261'::text) Total runtime: 51.419 ms (3 rows) db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Index Scan using t1_c1_index on t1 (cost=0.00..630.45 rows=2907 width=26) (actual time=45.733..46.271 rows=12 loops=1) Index Cond: ((c1)::text = '6258261'::text) Total runtime: 46.325 ms (3 rows) Alex ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| On Thu, 1 Mar 2007, Alex Deucher wrote: > here are some examples. Analyze is still running on the new db, I'll > post results when that is done. Mostly what our apps do is prepared > row selects from different tables: > select c1,c2,c3,c4,c5 from t1 where c1='XXX'; > > old server: > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------- > Index Scan using t1_c2_index on t1 (cost=0.00..166.89 rows=42 > width=26) (actual time=5.722..5.809 rows=2 loops=1) > Index Cond: ((c2)::text = '6258261'::text) > Total runtime: 5.912 ms > (3 rows) > > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Index Scan using t1_c1_key on t1 (cost=0.00..286.08 rows=72 > width=26) (actual time=12.423..12.475 rows=12 loops=1) > Index Cond: ((c1)::text = '6258261'::text) > Total runtime: 12.538 ms > (3 rows) > > > new server: > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c2='6258261'; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Index Scan using t1_c2_index on t1 (cost=0.00..37.63 rows=11 > width=26) (actual time=33.461..51.377 rows=2 loops=1) > Index Cond: ((c2)::text = '6258261'::text) > Total runtime: 51.419 ms > (3 rows) > > db=# EXPLAIN ANALYZE select c1,c2 from t1 where c1='6258261'; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------------- > Index Scan using t1_c1_index on t1 (cost=0.00..630.45 rows=2907 > width=26) (actual time=45.733..46.271 rows=12 loops=1) > Index Cond: ((c1)::text = '6258261'::text) > Total runtime: 46.325 ms > (3 rows) Notice the huge disparity here betwen the expected number of rows (2907) and the actual rows? That's indicative of needing to run analyze. The time is only about 4x the 7.4 runtime and that's with the analyze running merrily along in the background. It's probably not as bad off as you think. At least this query isn't 10x. :-) Run these again for us after analyze is complete. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Thu, 1 Mar 2007, Alex Deucher wrote: >> >> Postgresql might be choosing a bad plan because your >> effective_cache_size >> >> is >> >> way off (it's the default now right?). Also, what was the block >> read/write >> > >> > yes it's set to the default. >> > >> >> speed of the SAN from your bonnie tests? Probably want to tune >> >> random_page_cost as well if it's also at the default. >> >> >> > >> > ------Sequential Output------ --Sequential Input- >> > --Random- >> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> > --Seeks-- >> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >> /sec >> > %CP >> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 >> 397.7 >> > 0 >> > >> >> So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that write >> speed is about the same as my single SATA drive write speed on my >> workstation, >> so not that great. The read speed is decent, though and with that sort of >> read performance, you might want to lower random_page_cost to something >> like >> 2.5 or 2 so the planner will tend to prefer index scans. >> > > Right, but the old box was getting ~45MBps on both reads and writes, > so it's an improvement for me > know how it goes. Do you think that is because you have a different interface between you and the SAN? ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do that and costs quite a bit less than a SAN. Is the SAN being shared between the database servers and other servers? Maybe it was just random timing that gave you the poor write performance on the old server which might be also yielding occassional poor performance on the new one. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| At 07:36 PM 3/1/2007, Jeff Frost wrote: >On Thu, 1 Mar 2007, Alex Deucher wrote: > >>> >> Postgresql might be choosing a bad plan because your >>> effective_cache_size >>> >> is >>> >> way off (it's the default now right?). Also, what was the >>> block read/write >>> > >>> > yes it's set to the default. >>> > >>> >> speed of the SAN from your bonnie tests? Probably want to tune >>> >> random_page_cost as well if it's also at the default. >>> >> >>> > >>> > ------Sequential Output------ --Sequential Input- >>> > --Random- >>> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >>> > --Seeks-- >>> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP >>> K/sec %CP /sec >>> > %CP >>> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 >>> 145504 13 397.7 >>> > 0 >>> > >>>So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that write >>>speed is about the same as my single SATA drive write speed on my >>>workstation, >>>so not that great. The read speed is decent, though and with that sort of >>>read performance, you might want to lower random_page_cost to something like >>>2.5 or 2 so the planner will tend to prefer index scans. >> >>Right, but the old box was getting ~45MBps on both reads and writes, >>so it's an improvement for me >>know how it goes. > >Do you think that is because you have a different interface between >you and the SAN? ~45MBps is pretty slow - your average 7200RPM >ATA133 drive can do that and costs quite a bit less than a SAN. > >Is the SAN being shared between the database servers and other >servers? Maybe it was just random timing that gave you the poor >write performance on the old server which might be also yielding >occassional poor performance on the new one. Remember that pg, even pg 8.2.3, has a known history of very poor insert speed (see comments on this point by Josh Berkus, Luke Lonergan, etc) For some reason, the code changes that have resulted in dramatic improvements in pg's read speed have not had nearly the same efficacy for writes. Bottom line: pg presently has a fairly low and fairly harsh upper bound on write performance. What exactly that bound is has been the subject of some discussion, but IIUC the fact of its existence is well established. Various proposals for improving the situation exist, I've even made some of them, but AFAIK this is currently considered one of the "tough pg problems". Cheers, Ron Peacetree ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Alex Deucher wrote: > > >> >> Postgresql might be choosing a bad plan because your > >> effective_cache_size > >> >> is > >> >> way off (it's the default now right?). Also, what was the block > >> read/write > >> > > >> > yes it's set to the default. > >> > > >> >> speed of the SAN from your bonnie tests? Probably want to tune > >> >> random_page_cost as well if it's also at the default. > >> >> > >> > > >> > ------Sequential Output------ --Sequential Input- > >> > --Random- > >> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > >> > --Seeks-- > >> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > >> /sec > >> > %CP > >> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 > >> 397.7 > >> > 0 > >> > > >> > >> So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that write > >> speed is about the same as my single SATA drive write speed on my > >> workstation, > >> so not that great. The read speed is decent, though and with that sort of > >> read performance, you might want to lower random_page_cost to something > >> like > >> 2.5 or 2 so the planner will tend to prefer index scans. > >> > > > > Right, but the old box was getting ~45MBps on both reads and writes, > > so it's an improvement for me > > know how it goes. > > Do you think that is because you have a different interface between you and > the SAN? ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do > that and costs quite a bit less than a SAN. > > Is the SAN being shared between the database servers and other servers? Maybe > it was just random timing that gave you the poor write performance on the old > server which might be also yielding occassional poor performance on the new > one. > The direct attached scsi discs on the old database server we getting 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. We have 4 servers on the SAN each with it's own 4 GBps FC link via an FC switch. I'll try and re-run the numbers when the servers are idle this weekend. Alex ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| \ >> Is the SAN being shared between the database servers and other >> servers? Maybe >> it was just random timing that gave you the poor write performance on >> the old >> server which might be also yielding occassional poor performance on >> the new >> one. >> > > The direct attached scsi discs on the old database server we getting > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. How many spindles you got in that SAN? We > have 4 servers on the SAN each with it's own 4 GBps FC link via an FC > switch. I'll try and re-run the numbers when the servers are idle > this weekend. > > Alex > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On 3/1/07, Joshua D. Drake <jd@commandprompt.com> wrote: > \ > >> Is the SAN being shared between the database servers and other > >> servers? Maybe > >> it was just random timing that gave you the poor write performance on > >> the old > >> server which might be also yielding occassional poor performance on > >> the new > >> one. > >> > > > > The direct attached scsi discs on the old database server we getting > > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. > > How many spindles you got in that SAN? 105 IIRC. Alex ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Thu, 1 Mar 2007, Alex Deucher wrote: > On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: >> On Thu, 1 Mar 2007, Alex Deucher wrote: >> >> >> >> Postgresql might be choosing a bad plan because your >> >> effective_cache_size >> >> >> is >> >> >> way off (it's the default now right?). Also, what was the block >> >> read/write >> >> > >> >> > yes it's set to the default. >> >> > >> >> >> speed of the SAN from your bonnie tests? Probably want to tune >> >> >> random_page_cost as well if it's also at the default. >> >> >> >> >> > >> >> > ------Sequential Output------ --Sequential Input- >> >> > --Random- >> >> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> >> > --Seeks-- >> >> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP >> >> /sec >> >> > %CP >> >> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 >> >> 397.7 >> >> > 0 >> >> > >> >> >> >> So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that >> write >> >> speed is about the same as my single SATA drive write speed on my >> >> workstation, >> >> so not that great. The read speed is decent, though and with that sort >> of >> >> read performance, you might want to lower random_page_cost to something >> >> like >> >> 2.5 or 2 so the planner will tend to prefer index scans. >> >> >> > >> > Right, but the old box was getting ~45MBps on both reads and writes, >> > so it's an improvement for me >> > know how it goes. >> >> Do you think that is because you have a different interface between you and >> the SAN? ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do >> that and costs quite a bit less than a SAN. >> >> Is the SAN being shared between the database servers and other servers? >> Maybe >> it was just random timing that gave you the poor write performance on the >> old >> server which might be also yielding occassional poor performance on the new >> one. >> > > The direct attached scsi discs on the old database server we getting > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. We > have 4 servers on the SAN each with it's own 4 GBps FC link via an FC > switch. I'll try and re-run the numbers when the servers are idle > this weekend. Sorry, I thought the old server was also attached to the SAN. My fault for not hanging onto the entire email thread. I think you're mixing and matching your capitol and lower case Bs in your sentence above though. :-) I suspect what you really mean is The SAN got 62/145MBps (megabytes/sec) and teh FC link is 4Gbps (gigabits/sec) or 500MBps. Is that correct? If so, and seeing that you think there are 105 spindles on the SAN, I'd say you're either maxxing out the switch fabric of the SAN with your servers or you have a really poorly performing SAN in general, or you just misunderstood the . As a comparison With 8 WD Raptors configured in a RAID10 with normal ext3 I get about 160MB/s write and 305MB/s read performance. Hopefully the SAN has lots of other super nifty features that make up for the poor performance. :-( -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > On Thu, 1 Mar 2007, Alex Deucher wrote: > > > On 3/1/07, Jeff Frost <jeff@frostconsultingllc.com> wrote: > >> On Thu, 1 Mar 2007, Alex Deucher wrote: > >> > >> >> >> Postgresql might be choosing a bad plan because your > >> >> effective_cache_size > >> >> >> is > >> >> >> way off (it's the default now right?). Also, what was the block > >> >> read/write > >> >> > > >> >> > yes it's set to the default. > >> >> > > >> >> >> speed of the SAN from your bonnie tests? Probably want to tune > >> >> >> random_page_cost as well if it's also at the default. > >> >> >> > >> >> > > >> >> > ------Sequential Output------ --Sequential Input- > >> >> > --Random- > >> >> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > >> >> > --Seeks-- > >> >> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > >> >> /sec > >> >> > %CP > >> >> > luna12-san 16000M 58896 91 62931 9 35870 5 54869 82 145504 13 > >> >> 397.7 > >> >> > 0 > >> >> > > >> >> > >> >> So, you're getting 62MB/s writes and 145MB/s reads. Just FYI, that > >> write > >> >> speed is about the same as my single SATA drive write speed on my > >> >> workstation, > >> >> so not that great. The read speed is decent, though and with that sort > >> of > >> >> read performance, you might want to lower random_page_cost to something > >> >> like > >> >> 2.5 or 2 so the planner will tend to prefer index scans. > >> >> > >> > > >> > Right, but the old box was getting ~45MBps on both reads and writes, > >> > so it's an improvement for me > >> > know how it goes. > >> > >> Do you think that is because you have a different interface between you and > >> the SAN? ~45MBps is pretty slow - your average 7200RPM ATA133 drive can do > >> that and costs quite a bit less than a SAN. > >> > >> Is the SAN being shared between the database servers and other servers? > >> Maybe > >> it was just random timing that gave you the poor write performance on the > >> old > >> server which might be also yielding occassional poor performance on the new > >> one. > >> > > > > The direct attached scsi discs on the old database server we getting > > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. We > > have 4 servers on the SAN each with it's own 4 GBps FC link via an FC > > switch. I'll try and re-run the numbers when the servers are idle > > this weekend. > > Sorry, I thought the old server was also attached to the SAN. My fault for > not hanging onto the entire email thread. > > I think you're mixing and matching your capitol and lower case Bs in your > sentence above though. :-) whoops > > I suspect what you really mean is The SAN got 62/145MBps (megabytes/sec) and > teh FC link is 4Gbps (gigabits/sec) or 500MBps. Is that correct? If so, and > seeing that you think there are 105 spindles on the SAN, I'd say you're either > maxxing out the switch fabric of the SAN with your servers or you have a > really poorly performing SAN in general, or you just misunderstood the . > > As a comparison With 8 WD Raptors configured in a RAID10 with normal ext3 I > get about 160MB/s write and 305MB/s read performance. Hopefully the SAN has > lots of other super nifty features that make up for the poor performance. :-( > It's big and reliable (and compared to lots of others, relatively inexpensive) which is why we bought it. We bought it mostly as a huge file store. The RAID groups on the SAN were set up for maximum capacity rather than for performance. Using it for the databases just came up recently. Alex ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| Alex Deucher wrote: > On 3/1/07, Joshua D. Drake <jd@commandprompt.com> wrote: >> \ >> >> Is the SAN being shared between the database servers and other >> >> servers? Maybe >> >> it was just random timing that gave you the poor write performance on >> >> the old >> >> server which might be also yielding occassional poor performance on >> >> the new >> >> one. >> >> >> > >> > The direct attached scsi discs on the old database server we getting >> > 45MBps not the SAN. The SAN got 62/145Mbps, which is not as bad. >> >> How many spindles you got in that SAN? > > 105 IIRC. You have 105 spindles are you are only get 62megs on writes? That seems about half what you should be getting. (at least). Joshua D. Drake > > Alex > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |