Unix Technical Forum

RAID arrays and performance

This is a discussion on RAID arrays and performance within the Pgsql Performance forums, part of the PostgreSQL category; --> Mark Mielke wrote: > At a minimum, this breaks your query into: 1) Preload all the index > pages ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #21 (permalink)  
Old 04-19-2008, 11:42 AM
James Mansion
 
Posts: n/a
Default Re: RAID arrays and performance

Mark Mielke wrote:
> At a minimum, this breaks your query into: 1) Preload all the index
> pages you will need

Isn't this fairly predictable - the planner has chosen the index so it
will be operating
on a bounded subset.
> , 2) Scan the index pages you needed

Yes, and AIO helps when you can scan them in arbitrary order, as they
are returned.
> , 3) Preload all the table page you will need

No - just request that they load. You can do work as soon as any page
is returned.
> , 4) Scan the table pages you needed.

In the order that is most naturally returned by the disks.

> But do you really need the whole index?

I don't think I suggested that.
> What if you only need parts of the index, and this plan now reads the
> whole index using async I/O "just in case" it is useful?

Where did you get that from?
> index scan into a regular B-Tree scan, which is difficult to perform
> async I/O for, as you don't necessarily know which pages to read next.
>

Most B-trees are not so deep. It would generally be a win to retain
interior nodes of indices in
shared memory, even if leaf pages are not present. In such a case, it
is quite quick to establish
which leaf pages must be demand loaded.

I'm not suggesting that Postgres indices are structured in a way that
would support this sort
of thing now.

James


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #22 (permalink)  
Old 04-19-2008, 11:42 AM
Mark Mielke
 
Posts: n/a
Default Re: RAID arrays and performance

James Mansion wrote:
> Mark Mielke wrote:
>> At a minimum, this breaks your query into: 1) Preload all the index
>> pages you will need

> Isn't this fairly predictable - the planner has chosen the index so it
> will be operating
> on a bounded subset.

What is the bounded subset? It is bounded by the value. What value? You
need to access the first page before you know what the second page is.
PostgreSQL or the kernel should already have the hottest pages in
memory, so the value of doing async I/O is very likely the cooler pages
that are unique to the query. We don't know what the cooler pages are
until we follow three tree down.

>> , 2) Scan the index pages you needed

> Yes, and AIO helps when you can scan them in arbitrary order, as they
> are returned.


I don't think you are talking about searching a B-Tree, as the order is
important when searching, and search performance would be reduced if one
reads and scans more pages than necessary to map from the value to the
row. I presume you are talking about scanning the entire index. Where
"needed" means "all". Again, this only benefits a subset of the queries.

>> , 3) Preload all the table page you will need

> No - just request that they load. You can do work as soon as any page
> is returned.


The difference between preload and handling async I/O in terms of
performance is debatable. Greg reports that async I/O on Linux sucks,
but posix_fadvise*() has substantial benefits. posix_fadvise*() is
preload not async I/O (he also reported that async I/O on Solaris seems
to work well). Being able to do work as the first page is available is a
micro-optimization as far as I am concerned at this point (one that may
not yet work on Linux), as the real benefit comes from utilizing all 12
disks in Matthew's case, not from guaranteeing that data is processed as
soon as possible.

>> , 4) Scan the table pages you needed.

> In the order that is most naturally returned by the disks.


Micro-optimization.

>> But do you really need the whole index?

> I don't think I suggested that.
>> What if you only need parts of the index, and this plan now reads the
>> whole index using async I/O "just in case" it is useful?

> Where did you get that from?


I get it from your presumption that you can know which pages of the
index to load in advance. The only way you can know which pages must be
loaded, is to know that you need to query them all. Unless you have some
way of speculating with some degree of accuracy which colder pages in
the index you will need, without looking at the index?

>> index scan into a regular B-Tree scan, which is difficult to perform
>> async I/O for, as you don't necessarily know which pages to read next.

> Most B-trees are not so deep. It would generally be a win to retain
> interior nodes of indices in
> shared memory, even if leaf pages are not present. In such a case, it
> is quite quick to establish
> which leaf pages must be demand loaded.


This is bogus. The less deep the B-Tree is, the less there should be any
requirement for async I/O. Hot index pages will be in cache.

> I'm not suggesting that Postgres indices are structured in a way that
> would support this sort
> of thing now.


In your hand waving, you have assumed that PostgreSQL B-Tree index might
need to be replaced? :-)

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #23 (permalink)  
Old 04-19-2008, 11:42 AM
James Mansion
 
Posts: n/a
Default Re: RAID arrays and performance

Mark Mielke wrote:
> PostgreSQL or the kernel should already have the hottest pages in
> memory, so the value of doing async I/O is very likely the cooler
> pages that are unique to the query. We don't know what the cooler
> pages are until we follow three tree down.
>

I'm assuming that at the time we start to search the index, we have some
idea of value or values that
we are looking for. Or, as you say, we are applying a function to 'all
of it'.

Think of a 'between' query. The subset of the index that can be a match
can be bounded by the leaf
pages that contain the end point(s). Similarly if we have a merge with
a sorted intermediate set from
a prior step then we also have bounds on the values.

I'm not convinced that your assertion that the index leaf pages must
necessarily be processed in-order
is true - it depends what sort of operation is under way. I am assuming
that we try hard to keep
interior index nodes and data in meory and that having identified the
subset of these that we want, we
can immediately infer the set of leaves that are potentially of interest.
> The difference between preload and handling async I/O in terms of
> performance is debatable. Greg reports that async I/O on Linux sucks,
> but posix_fadvise*() has substantial benefits. posix_fadvise*() is
> preload not async I/O (he also reported that async I/O on Solaris
> seems to work well). Being able to do work as the first page is
> available is a micro-optimization as far as I am concerned at this
> point (one that may not yet work on Linux), as the real benefit comes
> from utilizing all 12 disks in Matthew's case, not from guaranteeing
> that data is processed as soon as possible.
>

I see it as part of the same problem. You can partition the data across
all the disks and run queries in parallel
against the partitions, or you can lay out the data in the RAID array in
which case the optimiser has very little idea
how the data will map to physical data layout - so its best bet is to
let the systems that DO know decide the
access strategy. And those systems can only do that if you give them a
lot of requests that CAN be reordered,
so they can choose a good ordering.

> Micro-optimization.
>

Well, you like to assert this - but why? If a concern is the latency
(and my experience suggests that latency is the
biggest issue in practice, not throughput per se) then overlapping
processing while waiting for 'distant' data is
important - and we don't have any information about the physical layout
of the data that allows us to assert that
forward access pre-read of data from a file is the right strategy for
accessing it as fast as possible - we have to
allow the OS (and to an increasing extent the disks) to manage the
elevator IO to best effect. Its clear that the
speed of streaming read and write of modern disks is really high
compared to that of random access, so anything
we can do to help the disks run in that mode is pretty worthwhile even
if the physical streaming doesn't match
any obvious logical ordering of the OS files or logical data pages
within them. If you have a function to apply to
a set of data elements and the elements are independant, then requiring
that the function is applied in an order
rather than conceptually in parallel is going to put a lot of constraint
on how the hardware can optimise it.

Clearly a hint to preload is better than nothing. But it seems to me
that the worst case is that we wait for
the slowest page to load and then start processing hoping that the rest
of the data stays in the buffer cache
and is 'instant', while AIO and evaluate-when-ready means that process
is still bound by the slowest
data to arrive, but at that point there is little processing still to
do, and the already-processed buffers can be
reused earlier. In the case where there is significant presure on the
buffer cache, this can be significant.

Of course, a couple of decades bullying Sybase systems on Sun Enterprise
boxes may have left me
somewhat jaundiced - but Sybase can at least parallelise things.
Sometimes. When it does, its quite
a big win.

> In your hand waving, you have assumed that PostgreSQL B-Tree index
> might need to be replaced? :-)
>

Sure, if the intent is to make the system thread-hot or AIO-hot, then
the change is potentially very
invasive. The strategy to evaluate queries based on parallel execution
and async IO is not necessarily
very like a strategy where you delegate to the OS buffer cache.

I'm not too bothered for the urpose of this discussion whether the way
that postgres currently
navigates indexes is amenable to this. This is bikeshed land, right?

I think it is foolish to disregard strategies that will allow
overlapping IO and processing - and we want to
keep disks reading and writing rather than seeking. To me that suggests
AIO and disk-native queuing
are quite a big deal. And parallel evaluation will be too as the number
of cores goes up and there is
an expectation that this should reduce latency of individual query, not
just allow throughput with lots
of concurrent demand.


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #24 (permalink)  
Old 04-19-2008, 11:42 AM
Mark Mielke
 
Posts: n/a
Default Re: RAID arrays and performance

James Mansion wrote:
> Mark Mielke wrote:
>> PostgreSQL or the kernel should already have the hottest pages in
>> memory, so the value of doing async I/O is very likely the cooler
>> pages that are unique to the query. We don't know what the cooler
>> pages are until we follow three tree down.
>>

> I'm assuming that at the time we start to search the index, we have
> some idea of value or values that
> we are looking for. Or, as you say, we are applying a function to
> 'all of it'.
>
> Think of a 'between' query. The subset of the index that can be a
> match can be bounded by the leaf
> pages that contain the end point(s). Similarly if we have a merge
> with a sorted intermediate set from
> a prior step then we also have bounds on the values.


How do you find the bounding points for these pages? Does this not
require descending through the tree in a more traditional way?

> I'm not convinced that your assertion that the index leaf pages must
> necessarily be processed in-order
> is true - it depends what sort of operation is under way. I am
> assuming that we try hard to keep
> interior index nodes and data in meory and that having identified the
> subset of these that we want, we
> can immediately infer the set of leaves that are potentially of interest.


It is because you are missing my point. In order to find a reduced set
of pages to load, one must descend through the B-Tree. Identifying the
subset requires sequential loading of pages. There is some theoretical
potential for async I/O, but I doubt your own assertion that this is
significant in any significant way. I ask you again - how do you know
which lower level pages to load before you load the higher level pages?
The only time the B-Tree can be processed out of order in this regard is
if you are doing a bitmap index scan or some similar operation that will
scan the entire tree, and you do not care what order the pages arrive
in. If you are looking for a specific key, one parent page leads to one
child page, and the operation is sequential.

>> The difference between preload and handling async I/O in terms of
>> performance is debatable. Greg reports that async I/O on Linux sucks,
>> but posix_fadvise*() has substantial benefits. posix_fadvise*() is
>> preload not async I/O (he also reported that async I/O on Solaris
>> seems to work well). Being able to do work as the first page is
>> available is a micro-optimization as far as I am concerned at this
>> point (one that may not yet work on Linux), as the real benefit comes
>> from utilizing all 12 disks in Matthew's case, not from guaranteeing
>> that data is processed as soon as possible.

> I see it as part of the same problem. You can partition the data
> across all the disks and run queries in parallel
> against the partitions, or you can lay out the data in the RAID array
> in which case the optimiser has very little idea
> how the data will map to physical data layout - so its best bet is to
> let the systems that DO know decide the
> access strategy. And those systems can only do that if you give them
> a lot of requests that CAN be reordered,
> so they can choose a good ordering.


You can see it however you like - what remains, is that the 12X speed up
is going to come from using 12 disks, and loading the 12 disks in
parallel. More theoretical improvements with regard to the ability for a
particular hard disk to schedule reads and return results out of order,
have not, in my reading, been shown to reliably improve performance to a
significant degree. Unfortunately, Linux doesn't support my SATA NCQ
yet, so I haven't been able to experiment myself. Gregory provided
numbers showing a 2X - 3X performance when using three disks. This has
the potential for significant improvement with real hardware, modest
cost, and perhaps trivial changes to PostgreSQL. What you describe is a
re-design of the I/O strategy that will be designed for asynchronous
I/O, with some sort of state machine that will be able to deal
efficiently with either index pages or table pages out of order. Do you
have numbers to show that such a significant change would result in a
reasonable return on the investment?

>> Micro-optimization.

> Well, you like to assert this - but why?

I'll quote from:

http://en.wikipedia.org/wiki/Native_Command_Queuing

Most reading I have done shows NCQ to have limited gains, with most
benchmarks being suspect. Also, latency is only for the first page. One
presumes that asynch I/O will be mostly valuable in the case where many
pages can be scheduled for reading at the same time. In the case that
many pages are scheduled for reading, the first page will be eventually
served, and the overall bandwidth is still the same. In your theoretical
model, you are presuming the CPU is a bottleneck, either for processing,
or scheduling the next batch of reads. I think you are hand waving, and
given that Linux doesn't yet support asynch I/O well, Gregory's model
will serve more PostgreSQL users than yours with less complexity.

> Clearly a hint to preload is better than nothing. But it seems to me
> that the worst case is that we wait for
> the slowest page to load and then start processing hoping that the
> rest of the data stays in the buffer cache
> and is 'instant', while AIO and evaluate-when-ready means that process
> is still bound by the slowest
> data to arrive, but at that point there is little processing still to
> do, and the already-processed buffers can be
> reused earlier. In the case where there is significant presure on the
> buffer cache, this can be significant.


It seems to me that you have yet to prove that there will be substantial
gains in overall performance for preload. Leaping on to presuming that
PostgreSQL should switch to a fully asynch I/O model seems a greater
stretch. By the sounds of it, Gregory could have the first implemented
very soon. When will you have yours? :-)

>> In your hand waving, you have assumed that PostgreSQL B-Tree index
>> might need to be replaced? :-)

> Sure, if the intent is to make the system thread-hot or AIO-hot, then
> the change is potentially very
> invasive. The strategy to evaluate queries based on parallel
> execution and async IO is not necessarily
> very like a strategy where you delegate to the OS buffer cache.
>
> I'm not too bothered for the urpose of this discussion whether the way
> that postgres currently
> navigates indexes is amenable to this. This is bikeshed land, right?


I am only interested by juicy projects that have a hope of success. This
subject does interest me - I am hoping my devil's advocate participation
encourages people to seek a practical implementation that will benefit me.

> I think it is foolish to disregard strategies that will allow
> overlapping IO and processing - and we want to
> keep disks reading and writing rather than seeking. To me that
> suggests AIO and disk-native queuing
> are quite a big deal. And parallel evaluation will be too as the
> number of cores goes up and there is
> an expectation that this should reduce latency of individual query,
> not just allow throughput with lots
> of concurrent demand.


I am more amenable to multi-threaded index processing for the same query
than async I/O to take advantage of NCQ. Guess we each come from a
different background. :-)

Cheers,
mark

--
Mark Mielke <mark@mielke.cc>

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #25 (permalink)  
Old 04-19-2008, 11:42 AM
Greg Smith
 
Posts: n/a
Default Re: RAID arrays and performance

On Tue, 4 Dec 2007, Mark Mielke wrote:

>> This is bikeshed land, right?

> I am only interested by juicy projects that have a hope of success. This
> subject does interest me - I am hoping my devil's advocate participation
> encourages people to seek a practical implementation that will benefit me.


Nah, you're both in bikeshed land. Ultimately this feature is going to
get built out in a way that prioritizes as much portability as is possible
while minimizing the impact on existing code. The stated PostgreSQL bias
is that async-I/O is not worth the trouble until proven otherwise:
http://www.postgresql.org/docs/faqs.....html#item1.14

That's why Greg Stark is busy with low-level tests of I/O options while
you're arguing much higher level details. Until you've got some
benchmarks in this specific area to go by, you can talk theory all day and
not impact what implementation will actually get built one bit.

As an example of an area you've already brought up where theory and
benchmarks clash violently, the actual situation with NCQ on SATA drives
has been heavily blurred because of how shoddy some manufacturer's NCQ
implementation are. Take a look at the interesting thread on this topic
at http://lkml.org/lkml/2007/4/3/159 to see what I'm talking about. Even
if you have an NCQ drive, and a version of Linux that supports it, and
you've setup everything up right, you can still end up with unexpected
performance regressions under some circumstances. It's still the wild
west with that technology.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #26 (permalink)  
Old 04-19-2008, 11:43 AM
Bruce Momjian
 
Posts: n/a
Default Re: RAID arrays and performance

Mark Mielke wrote:
> Asynchronous I/O is no more a magic bullet than threading. It requires a
> lot of work to get it right, and if one gets it wrong, it can be slower
> than the regular I/O or single threaded scenarios. Both look sexy on
> paper, neither may be the solution to your problem. Or they may be. We
> wouldn't know without numbers.


Agreed. We currently don't use multiple CPUs or multiple disks
efficiently for single-query loads. There is certainly more we could do
in these areas, and it is on the TODO list.

The good news is that most work loads are multi-user and we use
resources more evenly in those cases.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #27 (permalink)  
Old 09-20-2008, 06:18 PM
Greg Smith
 
Posts: n/a
Default Re: RAID arrays and performance

On Thu, 18 Sep 2008, Matthew Wakeling wrote:

> On Tue, 29 Jan 2008, Gregory Stark wrote:


>> I have a patch which implements it for the low hanging fruit of bitmap
>> index scans. it does it using an extra trip through the buffer manager
>> which is the least invasive approach but not necessarily the best.

>
> Gregory - what's the status of that patch at the moment? Will it be making it
> into a new version of Postgres, or are we waiting for it to be implemented
> fully?


It and a related fadvise patch have been floating around the project queue
for a while now. I just sorted through all the various patches and
mesasges related to this area and updated the list at
http://wiki.postgresql.org/wiki/Comm...ending_patches
recently, I'm trying to kick back a broader reviewed version of this
concept right now.

> It's just that our system is doing a lot of bitmap index scans at the moment,
> and it'd help to be able to spread them across the 16 discs in the RAID
> array. It's the bottleneck in our system at the moment.


If you have some specific bitmap index scan test case suggestions you can
pass along (either publicly or in private to me, I can probably help
anonymize them), that's one of the things that has been holding this up.
Alternately, if you'd like to join in on testing this all out more help
would certainly be welcome.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #28 (permalink)  
Old 09-20-2008, 06:18 PM
Scott Marlowe
 
Posts: n/a
Default Re: RAID arrays and performance

On Thu, Sep 18, 2008 at 1:30 PM, Greg Smith <gsmith@gregsmith.com> wrote:
> If you have some specific bitmap index scan test case suggestions you can
> pass along (either publicly or in private to me, I can probably help
> anonymize them), that's one of the things that has been holding this up.
> Alternately, if you'd like to join in on testing this all out more help
> would certainly be welcome.


I posted in pgsql-perform about a problem that's using bitmap heap
scans that's really slow compared to just using nested-loops. Don't
know if that is relevant or not.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #29 (permalink)  
Old 09-20-2008, 06:18 PM
Matthew Wakeling
 
Posts: n/a
Default Re: RAID arrays and performance

On Thu, 18 Sep 2008, Greg Smith wrote:
>> It's just that our system is doing a lot of bitmap index scans at the
>> moment, and it'd help to be able to spread them across the 16 discs in the
>> RAID array. It's the bottleneck in our system at the moment.

>
> If you have some specific bitmap index scan test case suggestions you can
> pass along (either publicly or in private to me, I can probably help
> anonymize them), that's one of the things that has been holding this up.


Okay, here's a description of what we're doing. We are importing data from
a large file (so we don't have a choice on the order that the data comes
in). For each entry in the source, we have to look up the corresponding
row in the database, and issue the correct "UPDATE" command according to
what's in the database and what's in the data source. Long story - it
isn't feasible to change the overall process.

In order to improve the performance, I made the system look ahead in the
source, in groups of a thousand entries, so instead of running:

SELECT * FROM table WHERE field = 'something';

a thousand times, we now run:

SELECT * FROM table WHERE field IN ('something', 'something else'...);

with a thousand things in the IN. Very simple query. It does run faster
than the individual queries, but it still takes quite a while. Here is an
example query:

SELECT a1_.id AS a1_id, a1_.primaryIdentifier AS a2_ FROM Gene AS a1_
WHERE a1_.primaryIdentifier IN ('SPAC11D3.15', 'SPAC11D3.16c',
'SPAC11D3.17', 'SPAC11D3.18c', 'SPAC15F9.01c', 'SPAC15F9.02', 'SPAC16.01',
'SPAC18G6.01c', 'SPAC18G6.02c', 'SPAC18G6.04c', 'SPAC18G6.05c',
'SPAC18G6.06', 'SPAC18G6.07c', 'SPAC18G6.09c', 'SPAC18G6.10',
'SPAC18G6.11c', 'SPAC18G6.12c', 'SPAC18G6.13', 'SPAC18G6.14c',
'SPAC18G6.15', 'SPAC1B9.01c', 'SPAC1D4.02c', 'SPAC1D4.03c', 'SPAC1D4.04',
'SPAC1D4.05c', 'SPAC1D4.07c', 'SPAC1D4.08', 'SPAC1D4.09c', 'SPAC1D4.10',
'SPAC1D4.11c', 'SPAC1F3.11', 'SPAC23A1.10', 'SPAC23E2.01', 'SPAC23E2.02',
'SPAC23E2.03c', 'SPAC26A3.02', 'SPAC26A3.03c', 'SPAC26A3.05',
'SPAC26A3.06', 'SPAC26A3.07c', 'SPAC26A3.08', 'SPAC26A3.09c',
'SPAC26A3.10', 'SPAC26A3.11', 'SPAC26A3.14c', 'SPAC26A3.15c',
'SPAC26A3.16', 'SPAC27F1.01c', 'SPAC27F1.03c', 'SPAC27F1.04c',
'SPAC27F1.05c', 'SPAC27F1.06c', 'SPAC3H8.02', 'SPAC3H8.03', 'SPAC3H8.04',
'SPAC3H8.05c', 'SPAC3H8.06', 'SPAC3H8.07c', 'SPAC3H8.08c', 'SPAC3H8.09c',
'SPAC3H8.10', 'SPAC3H8.11', 'SPAC8E11.11', 'SPBC106.15', 'SPBC17G9.10',
'SPBC24E9.15c', 'WBGene00000969', 'WBGene00003035', 'WBGene00004095',
'WBGene00016011', 'WBGene00018672', 'WBGene00018674', 'WBGene00018675',
'WBGene00018676', 'WBGene00018959', 'WBGene00018960', 'WBGene00018961',
'WBGene00023407') ORDER BY a1_.id LIMIT 2000;

And the corresponding EXPLAIN ANALYSE:

Limit (cost=331.28..331.47 rows=77 width=17) (actual time=121.973..122.501 rows=78 loops=1)
-> Sort (cost=331.28..331.47 rows=77 width=17) (actual time=121.968..122.152 rows=78 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 29kB
-> Bitmap Heap Scan on gene a1_ (cost=174.24..328.87 rows=77 width=17) (actual time=114.311..121.705 rows=78 loops=1)
Recheck Cond: (primaryidentifier = ANY ('{SPAC11D3.15...
-> Bitmap Index Scan on gene__key_primaryidentifier (cost=0.00..174.22 rows=77 width=0) (actual time=44.434..44.434 rows=150 loops=1)
Index Cond: (primaryidentifier = ANY ('{SPAC11D3.15,SPAC11D3.16c...
Total runtime: 122.733 ms
(9 rows)

Although it's probably in the cache, as it took 1073 ms the first time.
The table has half a million rows, but tables all over the database are
being accessed, so the cache is shared between several hundred million
rows.

Postgres executes this query in two stages. First it does a trawl of the
index (on field), and builds an bitmap. Then it fetches the pages
according to the bitmap. I can see the second stage being quite easy to
adapt for fadvise, but the first stage would be a little more tricky. Both
stages are equally important, as they take a comparable amount of time.

We are running this database on a 16-spindle RAID array, so the benefits
to our process of fully utilising them would be quite large. I'm
considering if I can parallelise things a little though.

> Alternately, if you'd like to join in on testing this all out more help would
> certainly be welcome.


How would you like me to help?

Matthew

--
What goes up must come down. Ask any system administrator.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #30 (permalink)  
Old 09-20-2008, 06:18 PM
Tom Lane
 
Posts: n/a
Default Re: RAID arrays and performance

Matthew Wakeling <matthew@flymine.org> writes:
> In order to improve the performance, I made the system look ahead in the
> source, in groups of a thousand entries, so instead of running:
> SELECT * FROM table WHERE field = 'something';
> a thousand times, we now run:
> SELECT * FROM table WHERE field IN ('something', 'something else'...);
> with a thousand things in the IN. Very simple query. It does run faster
> than the individual queries, but it still takes quite a while. Here is an
> example query:


Your example shows the IN-list as being sorted, but I wonder whether you
actually are sorting the items in practice? If not, you might try that
to improve locality of access to the index.

Also, parsing/planning time could be part of your problem here with 1000
things to look at. Can you adjust your client code to use a prepared
query? I'd try
SELECT * FROM table WHERE field = ANY($1::text[])
(or whatever the field datatype actually is) and then push the list
over as a single parameter value using array syntax. You might find
that it scales to much larger IN-lists that way.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:05 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com