Unix Technical Forum

Large table performance

This is a discussion on Large table performance within the Pgsql Performance forums, part of the PostgreSQL category; --> Hello - I have a fairly large table (3 million records), and am fetching 10,000 non-contigous records doing a ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 10:05 AM
Mark Dobbrow
 
Posts: n/a
Default Large table performance

Hello -

I have a fairly large table (3 million records), and am fetching 10,000 non-contigous records doing a simple select on an indexed column ie

select grades from large_table where teacher_id = X

This is a test database, so the number of records is always 10,000 and i have 300 different teacher ids.

The problem is, sometimes fetching un-cached records takes 0.5 secs and sometimes (more often) is takes more like 10.0 seconds

(fetching the same records for a given teacher_id a second time takes about 0.25 secs)

Has anyone seen similar behavior or know what the solution might be?

any help much appreciated,
Mark



ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 10:05 AM
Dave Cramer
 
Posts: n/a
Default Re: Large table performance


On 12-Jan-07, at 7:31 PM, Mark Dobbrow wrote:

> Hello -
>
> I have a fairly large table (3 million records), and am fetching
> 10,000 non-contigous records doing a simple select on an indexed
> column ie
>
> select grades from large_table where teacher_id = X
>
> This is a test database, so the number of records is always 10,000
> and i have 300 different teacher ids.
>
> The problem is, sometimes fetching un-cached records takes 0.5 secs
> and sometimes (more often) is takes more like 10.0 seconds
>
> (fetching the same records for a given teacher_id a second time
> takes about 0.25 secs)
>
> Has anyone seen similar behavior or know what the solution might be?
>
> any help much appreciated,
> Mark
>
>
>
> ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192


5000 is pretty low, you need at least 1/4 of memory for an 8.1.x or
newer server.
effective cache should be 3/4 of available memory

Dave
>
>



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 10:05 AM
Steinar H. Gunderson
 
Posts: n/a
Default Re: Large table performance

On Fri, Jan 12, 2007 at 07:40:25PM -0500, Dave Cramer wrote:
> 5000 is pretty low, you need at least 1/4 of memory for an 8.1.x or
> newer server.


Is this the new "common wisdom"? It looks like at some point, someone here
said "oh, and it looks like you're better off using large values here for
8.1.x and newer", and now everybody seems to repeat it as if it was always
well-known.

Are there any real benchmarks out there that we can point to? And, if you set
shared_buffers to half of the available memory, won't the kernel cache
duplicate more or less exactly the same data? (At least that's what people
used to say around here, but I guess the kernel cache gets adapted to the
fact that Postgres won't ask for the most common stuff, ie. the one in the
shared buffer cache.)

/* Steinar */
--
Homepage: http://www.sesse.net/

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 10:05 AM
Daniel Cristian Cruz
 
Posts: n/a
Default Re: Large table performance

What if we start a project where we define tests for PostgreSQL
overall performance and individual points with any database structure?
It could be done, throught a SQL logger and statistics, where we can
see complete processess and measure then after. We have many things to
measure, and something that would help here is pg_buffercache (contrib
module). We could define many other tests.

I was thinking about something like that, where an aplication reads
information (from catalog too) about an production database, and use
this information to build a data set of any size, respecting anything
measured before.

Is it too complicated? I'm trying to make programs with C++ and
libpqxx, and successfully used Python with PostgreSQL before (was a
database structure comparer). Python could make it easyer, C++ could
be a chalenge for someone like me.

Someone would like to contribute? When we start the project?

On 1/12/07, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:
> On Fri, Jan 12, 2007 at 07:40:25PM -0500, Dave Cramer wrote:
> > 5000 is pretty low, you need at least 1/4 of memory for an 8.1.x or
> > newer server.

>
> Is this the new "common wisdom"? It looks like at some point, someone here
> said "oh, and it looks like you're better off using large values here for
> 8.1.x and newer", and now everybody seems to repeat it as if it was always
> well-known.
>
> Are there any real benchmarks out there that we can point to? And, if you set
> shared_buffers to half of the available memory, won't the kernel cache
> duplicate more or less exactly the same data? (At least that's what people
> used to say around here, but I guess the kernel cache gets adapted to the
> fact that Postgres won't ask for the most common stuff, ie. the one in the
> shared buffer cache.)
>
> /* Steinar */
> --
> Homepage: http://www.sesse.net/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
>



--
Daniel Cristian Cruz
Analista de Sistemas
Especialista postgreSQL e Linux
Instrutor Certificado Mandriva

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 10:05 AM
Dave Dutcher
 
Posts: n/a
Default Re: Large table performance

Have you run vacuum and analyze on the table? What version of Postgres are
you running? What OS are you using?

This looks like a straight forward query. With any database the first time
you run the query its going to be slower because it actually has to read off
disk. The second time its faster because some or all of the data/indexes
will be cached. However 10 seconds sounds like a long time for pulling
10,000 records out of a table of 3 million. If you post an EXPLAIN ANALYZE,
it might give us a clue.

Dave


-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailtogsql-performance-owner@postgresql.org] On Behalf Of Mark Dobbrow
Sent: Friday, January 12, 2007 6:31 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Large table performance


Hello -

I have a fairly large table (3 million records), and am fetching 10,000
non-contigous records doing a simple select on an indexed column ie

select grades from large_table where teacher_id = X

This is a test database, so the number of records is always 10,000 and i
have 300 different teacher ids.

The problem is, sometimes fetching un-cached records takes 0.5 secs and
sometimes (more often) is takes more like 10.0 seconds

(fetching the same records for a given teacher_id a second time takes about
0.25 secs)

Has anyone seen similar behavior or know what the solution might be?

any help much appreciated,
Mark



ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192





Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 10:05 AM
Shoaib Mir
 
Posts: n/a
Default Re: Large table performance

Depending on the available memory try increasing the shared buffers and
work_mem and see if that changes the query execution time. Also make sure
you have proper indices created and also if possible try doing partitions
for the table.

Once you post the EXPLAIN ANALYZE output that will certainly help solving
the problem...

-------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 1/14/07, Dave Dutcher <dave@tridecap.com> wrote:
>
> Have you run vacuum and analyze on the table? What version of Postgres
> are you running? What OS are you using?
>
> This looks like a straight forward query. With any database the first
> time you run the query its going to be slower because it actually has to
> read off disk. The second time its faster because some or all of the
> data/indexes will be cached. However 10 seconds sounds like a long time for
> pulling 10,000 records out of a table of 3 million. If you post an EXPLAIN
> ANALYZE, it might give us a clue.
>
> Dave
>
>
> -----Original Message-----
> *From:* pgsql-performance-owner@postgresql.org [mailto:
> pgsql-performance-owner@postgresql.org] *On Behalf Of *Mark Dobbrow
> *Sent:* Friday, January 12, 2007 6:31 PM
> *To:* pgsql-performance@postgresql.org
> *Subject:* [PERFORM] Large table performance
>
> Hello -
>
> I have a fairly large table (3 million records), and am fetching 10,000
> non-contigous records doing a simple select on an indexed column ie
>
> select grades from large_table where teacher_id = X
>
> This is a test database, so the number of records is always 10,000 and i
> have 300 different teacher ids.
>
> The problem is, sometimes fetching un-cached records takes 0.5 secs and
> sometimes (more often) is takes more like 10.0 seconds
>
> (fetching the same records for a given teacher_id a second time takes
> about 0.25 secs)
>
> Has anyone seen similar behavior or know what the solution might be?
>
> any help much appreciated,
> Mark
>
>
>
> ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192
>
>
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 10:05 AM
Luke Lonergan
 
Posts: n/a
Default Re: Large table performance

Mark,

This behavior likely depends on how the data is loaded into the DBMS. If
the records you are fetching are distributed widely among the 3M records on
disk, then


On 1/12/07 4:31 PM, "Mark Dobbrow" <mdobbrow@yahoo.com> wrote:

> Hello -
>
> I have a fairly large table (3 million records), and am fetching 10,000
> non-contigous records doing a simple select on an indexed column ie
>
> select grades from large_table where teacher_id = X
>
> This is a test database, so the number of records is always 10,000 and i have
> 300 different teacher ids.
>
> The problem is, sometimes fetching un-cached records takes 0.5 secs and
> sometimes (more often) is takes more like 10.0 seconds
>
> (fetching the same records for a given teacher_id a second time takes about
> 0.25 secs)
>
> Has anyone seen similar behavior or know what the solution might be?
>
> any help much appreciated,
> Mark
>
>
>
> ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192
>
>
>




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 10:05 AM
Luke Lonergan
 
Posts: n/a
Default Re: Large table performance

Mark,

Note that selecting an index column means that Postgres fetches the whole
rows from disk. I think your performance problem is either: 1) slow disk or
2) index access of distributed data. If itıs (1), there are plenty of
references from this list on how to check for that and fix it. If itıs (2),
see below.

The performance of index accessed data in Postgres depends on how the data
is loaded into the DBMS. If the records you are fetching are distributed
widely among the 3M records on disk, then the select is going to ³hop, skip
and jump² across the disk to get the records you need. If those records are
stored more closely together, then the fetching from disk is going to be
sequential. A good example of the best situation for an index is an index
on a date column when the data is loaded sequentially by date. A query
against a specific date range will result in an ordered fetch from the disk,
which leverages fast sequential access.

The difference in performance between ordered and distributed access is
similar to the difference between ³random seek² and ³sequential² performance
of the disk subsystem. The random seek performance of typical disk
subsystems with one thread (for one user in postgres) is 120 seeks per
second. If your data was randomly distributed, youıd expect about
10,000/120 = 83 seconds to gather these records. Since youıre getting 10
seconds, I expect that your data is lumped into groups and you are getting a
mix of sequential reads and seeks.

Note that adding more disks into a RAID does not help the random seek
performance within Postgres, but may linearly improve the ordered access
speed. So even with 44 disks in a RAID10 pool on a Sun X4500, the seek
performance of Postgres (and other DBMSıs without async or threaded I/O) is
that of a single disk * 120 seeks per second. Adding more users allows the
seeks to scale on such a machine as users are added, up to the number of
disks in the RAID. But for your one user example * no help.

If your problem is (2), you can re-order the data on disk by using a CREATE
TABLE statement like this:
CREATE TABLE fast_table AS SELECT * FROM slow_table ORDER BY teacher_id;
CREATE INDEX teacher_id_ix ON fast_table;
VACUUM ANALYZE fast_table;

You should then see ordered access when you do index scans on teacher_id.

- Luke


On 1/12/07 4:31 PM, "Mark Dobbrow" <mdobbrow@yahoo.com> wrote:

> Hello -
>
> I have a fairly large table (3 million records), and am fetching 10,000
> non-contigous records doing a simple select on an indexed column ie
>
> select grades from large_table where teacher_id = X
>
> This is a test database, so the number of records is always 10,000 and i have
> 300 different teacher ids.
>
> The problem is, sometimes fetching un-cached records takes 0.5 secs and
> sometimes (more often) is takes more like 10.0 seconds
>
> (fetching the same records for a given teacher_id a second time takes about
> 0.25 secs)
>
> Has anyone seen similar behavior or know what the solution might be?
>
> any help much appreciated,
> Mark
>
>
>
> ps. My shared_buffers is set at 5000 (kernal max), and work_mem=8192
>
>
>




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:23 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com