Unix Technical Forum

Propagating outer join conditions

This is a discussion on Propagating outer join conditions within the Pgsql Performance forums, part of the PostgreSQL category; --> The following left outer join plan puzzles me: EXPLAIN ANALYZE SELECT * from t28 LEFT OUTER JOIN (t1 JOIN ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 09:48 AM
Aaron Birkland
 
Posts: n/a
Default Propagating outer join conditions

The following left outer join plan puzzles me:

EXPLAIN ANALYZE SELECT * from t28 LEFT OUTER JOIN (t1 JOIN t11 ON
(t11.o = '<http://example.org>' AND t11.s = t1.o)) ON t28.s = t1.s
WHERE t28.o = '"spec"';

t28, t1, and t11 all have indexed columns named 's' and 'o' that contain 'text';

Nested Loop Left Join (cost=794249.26..3289704.61 rows=1 width=301)
(actual time=581293.390..581293.492 rows=1 loops=1)
Join Filter: (t28.s = t1.s)
-> Index Scan using t28_o on t28 (cost=0.00..9.22 rows=1
width=89) (actual time=0.073..0.077 rows=1 loops=1)
Index Cond: (o = '"spec"'::text)
-> Merge Join (cost=794249.26..3267020.66 rows=1813979 width=212)
(actual time=230365.522..577078.266 rows=1894969 loops=1)
Merge Cond: (t1.o = t11.s)
-> Index Scan using t1_o on t1 (cost=0.00..2390242.10
rows=22225696 width=109) (actual time=0.209..162586.801 rows=22223925
loops=1)
-> Sort (cost=794249.26..798784.21 rows=1813979 width=103)
(actual time=230365.175..237409.474 rows=1894969 loops=1)
Sort Key: t11.s
-> Bitmap Heap Scan on t11 (cost=78450.82..605679.55
rows=1813979 width=103) (actual time=3252.103..22782.271 rows=1894969
loops=1)
Recheck Cond: (o = '<http://example.org>'::text)
-> Bitmap Index Scan on t11_o
(cost=0.00..78450.82 rows=1813979 width=0) (actual
time=2445.422..2445.422 rows=1894969 loops=1)
Index Cond: (o = '<http://example.org>'::text)


It seems to me that this plan is not very desirable, since the outer
part of the nested loop left join (the merge join node) is very
expensive. Is is possible to generate a plan that looks like this:

Nested Loop Left Join (cost=???)
-> Index Scan using t28_o on t28 (cost=0.00..9.11 rows=1 width=89)
Index Cond: (o = '"spec"'::text)
-> Nested Loop (cost=???)
-> Index Scan using t1_s on t1 (cost=???)
Index Cond: (s = t28.s)
-> Bitmap Heap Scan on t11 (cost=???)
Recheck Cond: (t11.s = t1.o)
Filter: (o = '<http://example.org>'::text)
-> Bitmap Index Scan on t11_s (cost=??? )
Index Cond: (t11.s = t1.o)

I *think* this plan is equivalent to the above if I'm assuming the
behaviour of the 'nested loop left join' node correctly. So far, I
have been tweaking the statistics, cost estimates, and
enabling.disabling certain plans to see if I can get it to propagate
the join condition t1.s = t28.s to the outer node of the left join..
but so far, I cannot. So, my questions are:

1) Is my 'desired' query plan logic correct
2) Can the executor execute a plan such as my 'desired' plan
3) If (1) and (2) are 'yes', then how may I get the planner to
generate such a plan, or do I just need to look harder into tweaking
the statistics and cost estimates

-Aaron

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 09:49 AM
Jonathan Blitz
 
Posts: n/a
Default Re: Propagating outer join conditions

How about trying:

Select *
From
(Select * from t28 where t28.0='spec') t28a
Left out join (t1 JOIN t11 ON
> (t11.o = '<http://example.org>' AND t11.s = t1.o)) ON t28a.s = t1.s


In this way, I think, the where clause on t28 would be performed before the
join rather than after.

Jonathan Blitz


> -----Original Message-----
> From: Aaron Birkland [mailto:birkie@gmail.com]
> Sent: Sunday, December 03, 2006 5:12 PM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] Propagating outer join conditions
>
> The following left outer join plan puzzles me:
>
> EXPLAIN ANALYZE SELECT * from t28 LEFT OUTER JOIN (t1 JOIN t11 ON
> (t11.o = '<http://example.org>' AND t11.s = t1.o)) ON t28.s = t1.s
> WHERE t28.o = '"spec"';
>
> t28, t1, and t11 all have indexed columns named 's' and 'o' that contain

'text';
>
> Nested Loop Left Join (cost=794249.26..3289704.61 rows=1 width=301)
> (actual time=581293.390..581293.492 rows=1 loops=1)
> Join Filter: (t28.s = t1.s)
> -> Index Scan using t28_o on t28 (cost=0.00..9.22 rows=1
> width=89) (actual time=0.073..0.077 rows=1 loops=1)
> Index Cond: (o = '"spec"'::text)
> -> Merge Join (cost=794249.26..3267020.66 rows=1813979 width=212)
> (actual time=230365.522..577078.266 rows=1894969 loops=1)
> Merge Cond: (t1.o = t11.s)
> -> Index Scan using t1_o on t1 (cost=0.00..2390242.10
> rows=22225696 width=109) (actual time=0.209..162586.801 rows=22223925
> loops=1)
> -> Sort (cost=794249.26..798784.21 rows=1813979 width=103)
> (actual time=230365.175..237409.474 rows=1894969 loops=1)
> Sort Key: t11.s
> -> Bitmap Heap Scan on t11 (cost=78450.82..605679.55
> rows=1813979 width=103) (actual time=3252.103..22782.271 rows=1894969
> loops=1)
> Recheck Cond: (o = '<http://example.org>'::text)
> -> Bitmap Index Scan on t11_o
> (cost=0.00..78450.82 rows=1813979 width=0) (actual
> time=2445.422..2445.422 rows=1894969 loops=1)
> Index Cond: (o = '<http://example.org>'::text)
>
>
> It seems to me that this plan is not very desirable, since the outer
> part of the nested loop left join (the merge join node) is very
> expensive. Is is possible to generate a plan that looks like this:
>
> Nested Loop Left Join (cost=???)
> -> Index Scan using t28_o on t28 (cost=0.00..9.11 rows=1 width=89)
> Index Cond: (o = '"spec"'::text)
> -> Nested Loop (cost=???)
> -> Index Scan using t1_s on t1 (cost=???)
> Index Cond: (s = t28.s)
> -> Bitmap Heap Scan on t11 (cost=???)
> Recheck Cond: (t11.s = t1.o)
> Filter: (o = '<http://example.org>'::text)
> -> Bitmap Index Scan on t11_s (cost=??? )
> Index Cond: (t11.s = t1.o)
>
> I *think* this plan is equivalent to the above if I'm assuming the
> behaviour of the 'nested loop left join' node correctly. So far, I
> have been tweaking the statistics, cost estimates, and
> enabling.disabling certain plans to see if I can get it to propagate
> the join condition t1.s = t28.s to the outer node of the left join..
> but so far, I cannot. So, my questions are:
>
> 1) Is my 'desired' query plan logic correct
> 2) Can the executor execute a plan such as my 'desired' plan
> 3) If (1) and (2) are 'yes', then how may I get the planner to
> generate such a plan, or do I just need to look harder into tweaking
> the statistics and cost estimates
>
> -Aaron
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.6/565 - Release Date: 12/2/2006
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.6/565 - Release Date: 12/02/2006
>


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.15.6/565 - Release Date: 12/02/2006



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 09:49 AM
Aaron Birkland
 
Posts: n/a
Default Re: Propagating outer join conditions

First, I forgot to mention - this is 8.2 RC1 I was trying on

The suggested change produces an identical 'bad' query plan. The main
issue (I think) is that the query node that processes "t1 JOIN t11 ON
...' is not aware of the join condition 't28.s = t1.s'.. even though
the value of t28.s (as determined by the inner index scan where t28.o
= 'spec') could(should?) theoretically be known to it. If it did, then
I imagine it would realize that a nested loop join starting with t1.s
= t28.s (which is very selective) would be much cheaper than doing the
big merge join.

-Aaron

On 12/3/06, Jonathan Blitz <jb@anykey.co.il> wrote:
> How about trying:
>
> Select *
> From
> (Select * from t28 where t28.0='spec') t28a
> Left out join (t1 JOIN t11 ON
> > (t11.o = '<http://example.org>' AND t11.s = t1.o)) ON t28a.s = t1.s

>
> In this way, I think, the where clause on t28 would be performed before the
> join rather than after.


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 09:49 AM
Alexandru Coseru
 
Posts: n/a
Default Hardware advice

Hello..

I'm waiting for my new system , and meanwhile , i have some questions.
First , here are the specs:


The server will have kernel 2.1.19 and it will be use only as a postgresql
server (nothing else... no named,dhcp,web,mail , etc).
Postgresql version will be 8.2.
It will be heavily used on inserts , triggers on each insert/update and
occasionally selects.

System: SuperMicro 7045B-3
CPU: 1 Dual Core Woodcrest ,2.66 Ghz , 4Mb cache , FSB 1333Mhz
RAM: 4 Gb (4 x 1 Gb modules) at 667Mhz
RAID CTRL: LSI MegaRAID SAS 8408E
DISKS: 8 x SATA II 7200 rpm , NCQ , 8 Mb cache with Supermicro 8
Sas enclosure

Based on the needs , i'm planning an update of the drives to 15.000 rpms
SAS. (pretty expensive now)


Question 1:
The RAID layout should be:
a) 2 hdd in raid 1 for system and pg_xlog and 6 hdd in raid10
for data ?
b) 8 hdd in raid10 for all ?
c) 2 hdd in raid1 for system , 2 hdd in raid1 for pg_xlog , 4
hdd in raid10 for data ?
Obs: I'm going for setup a) , but i want to hear your thoughts as well.


Question 2: (Don't want to start a flame here..... but here is goes)
What filesystem should i run for data ? ext3 or xfs ?
The tables have ~ 15.000 rel_pages each. The biggest table has now
over 30.000 pages.

Question 3:
The block size in postgresql is 8kb. The strip size in the raid
ctrl is 64k.
Should i increase the pgsql block size to 16 or 32 or even 64k ?



As soon as the system will be delivered , i'm planning some benchmarks.

Regards
Alex


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 09:49 AM
Tom Lane
 
Posts: n/a
Default Re: Propagating outer join conditions

"Aaron Birkland" <birkie@gmail.com> writes:
> ... Is is possible to generate a plan that looks like this:


> Nested Loop Left Join (cost=???)
> -> Index Scan using t28_o on t28 (cost=0.00..9.11 rows=1 width=89)
> Index Cond: (o = '"spec"'::text)
> -> Nested Loop (cost=???)
> -> Index Scan using t1_s on t1 (cost=???)
> Index Cond: (s = t28.s)
> -> Bitmap Heap Scan on t11 (cost=???)
> Recheck Cond: (t11.s = t1.o)
> Filter: (o = '<http://example.org>'::text)
> -> Bitmap Index Scan on t11_s (cost=??? )
> Index Cond: (t11.s = t1.o)


No. It'd take massive work in both the planner and executor to support
something like that --- they are designed around the assumption that
nestloop with inner index scan is exactly that, just one indexscan on
the inside of the loop. (As of 8.2 there's some klugery to handle an
Append of indexscans too, but it won't generalize readily.)

It might be something to work on for the future, but I can see a couple
of risks to trying to support this type of plan:
* exponential growth in the number of plans the planner has to consider;
* greatly increased runtime penalty if the planner has underestimated
the number of rows produced by the outer side of the nestloop.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 09:49 AM
Josh Berkus
 
Posts: n/a
Default Re: Hardware advice

Alexandru,

> The server will have kernel 2.1.19 *and it will be use only as a postgresql


Assuming you're talking Linux, I think you mean 2.6.19?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 09:49 AM
Alexandru Coseru
 
Posts: n/a
Default Re: Hardware advice

Hello..

Yes , sorry for the mistype..

Regards
Alex
----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: <pgsql-performance@postgresql.org>
Cc: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
Sent: Sunday, December 03, 2006 10:11 PM
Subject: Re: [PERFORM] Hardware advice


Alexandru,

> The server will have kernel 2.1.19 and it will be use only as a postgresql


Assuming you're talking Linux, I think you mean 2.6.19?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco



--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.15.4/563 - Release Date: 12/2/2006



---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 09:49 AM
Sven Geisler
 
Posts: n/a
Default Re: Hardware advice

Hi Alexandru,

Alexandru Coseru schrieb:
> [...]
> Question 1:
> The RAID layout should be:
> a) 2 hdd in raid 1 for system and pg_xlog and 6 hdd in
> raid10 for data ?
> b) 8 hdd in raid10 for all ?
> c) 2 hdd in raid1 for system , 2 hdd in raid1 for pg_xlog ,
> 4 hdd in raid10 for data ?
> Obs: I'm going for setup a) , but i want to hear your thoughts as well.


This depends on you data size. I think, option a and c are good.
The potential bottleneck may the RAID 1 for pg_xlog if you have huge
amount of updates and insert.
What is about another setup

4 hdd in RAID 10 for System and pg_xlog - System partitions are normally
not in heavy use and pg_xlog should be fast for writing.
4 hdd in RAID 10 for data.

>
>
> Question 2: (Don't want to start a flame here..... but here is goes)
> What filesystem should i run for data ? ext3 or xfs ?
> The tables have ~ 15.000 rel_pages each. The biggest table has
> now over 30.000 pages.


We have a database running with 60,000+ tables. The tables size is
between a few kByte for the small tables and up to 30 GB for the largest
one. We had no issue with ext3 in the past.

>
> Question 3:
> The block size in postgresql is 8kb. The strip size in the
> raid ctrl is 64k.
> Should i increase the pgsql block size to 16 or 32 or even 64k ?


You should keep in mind that the file system has also a block size. Ext3
has as maximum 4k.
I would set up the partitions aligned to the stripe size to prevent
unaligned reads. I guess, you can imagine that a larger block size of
postgresql may also end up in unaligned reads because the file system
has a smaller block size.

RAID Volume and File system set up
1. Make all partitions aligned to the RAID strip size.
The first partition should be start at 128 kByte.
You can do this with fdisk. after you created the partition switch
to the expert mode (type x) and modify the begin of the partition
(type b). You should change this value to 128 (default is 63).
All other partition should also start on a multiple of 128 kByte.

2. Give the file system a hint that you work with larger block sizes.
Ext3: mke2fs -b 4096 -j -R stride=2 /dev/sda1 -L LABEL
I made a I/O test with PostgreSQL on a RAID system with stripe size
of 64kByte and block size of 8 kByte in the RAID system.
Stride=2 was the best value.


PS: You should have a second XEON in your budget plan.

Sven.


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 09:50 AM
Alexandru Coseru
 
Posts: n/a
Default Re: Hardware advice

Hello..

Thanks for the advices..

Actually , i'm waiting for the clovertown to show up on the market...

Regards
Alex

----- Original Message -----
From: "Sven Geisler" <sgeisler@aeccom.com>
To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
Cc: <pgsql-performance@postgresql.org>
Sent: Tuesday, December 05, 2006 11:57 AM
Subject: Re: [PERFORM] Hardware advice


> Hi Alexandru,
>
> Alexandru Coseru schrieb:
>> [...]
>> Question 1:
>> The RAID layout should be:
>> a) 2 hdd in raid 1 for system and pg_xlog and 6 hdd in
>> raid10 for data ?
>> b) 8 hdd in raid10 for all ?
>> c) 2 hdd in raid1 for system , 2 hdd in raid1 for pg_xlog ,
>> 4 hdd in raid10 for data ?
>> Obs: I'm going for setup a) , but i want to hear your thoughts as
>> well.

>
> This depends on you data size. I think, option a and c are good.
> The potential bottleneck may the RAID 1 for pg_xlog if you have huge
> amount of updates and insert.
> What is about another setup
>
> 4 hdd in RAID 10 for System and pg_xlog - System partitions are normally
> not in heavy use and pg_xlog should be fast for writing.
> 4 hdd in RAID 10 for data.
>
>>
>>
>> Question 2: (Don't want to start a flame here..... but here is goes)
>> What filesystem should i run for data ? ext3 or xfs ?
>> The tables have ~ 15.000 rel_pages each. The biggest table has
>> now over 30.000 pages.

>
> We have a database running with 60,000+ tables. The tables size is
> between a few kByte for the small tables and up to 30 GB for the largest
> one. We had no issue with ext3 in the past.
>
>>
>> Question 3:
>> The block size in postgresql is 8kb. The strip size in the
>> raid ctrl is 64k.
>> Should i increase the pgsql block size to 16 or 32 or even 64k ?

>
> You should keep in mind that the file system has also a block size. Ext3
> has as maximum 4k.
> I would set up the partitions aligned to the stripe size to prevent
> unaligned reads. I guess, you can imagine that a larger block size of
> postgresql may also end up in unaligned reads because the file system
> has a smaller block size.
>
> RAID Volume and File system set up
> 1. Make all partitions aligned to the RAID strip size.
> The first partition should be start at 128 kByte.
> You can do this with fdisk. after you created the partition switch
> to the expert mode (type x) and modify the begin of the partition
> (type b). You should change this value to 128 (default is 63).
> All other partition should also start on a multiple of 128 kByte.
>
> 2. Give the file system a hint that you work with larger block sizes.
> Ext3: mke2fs -b 4096 -j -R stride=2 /dev/sda1 -L LABEL
> I made a I/O test with PostgreSQL on a RAID system with stripe size
> of 64kByte and block size of 8 kByte in the RAID system.
> Stride=2 was the best value.
>
>
> PS: You should have a second XEON in your budget plan.
>
> Sven.
>
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.409 / Virus Database: 268.15.7/569 - Release Date: 12/5/2006
>
>



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-19-2008, 09:50 AM
Alex Turner
 
Posts: n/a
Default Re: Hardware advice

The test that I did - which was somewhat limited, showed no benefit
splitting disks into seperate partitions for large bulk loads.

The program read from one very large file and wrote the input out to two
other large files.

The totaly throughput on a single partition was close to the maximum
theoretical for that logical drive, even though the process was reading and
writing to three seperate places on the disk. I don't know what this means
for postgresql setups directly, but I would postulate that the benefit from
splitting pg_xlog onto a seperate spindle is not as great as it might once
have been for large bulk transactions. I am therefore going to be going to
a single 6 drive RAID 5 for my data wharehouse application because I want
the read speed to be availalbe. I can benefit from fast reads when I want
to do large data scans at the expense of slightly slower insert speed.

Alex.

On 12/5/06, Alexandru Coseru <alexandru.coseru@totaltelecom.ro> wrote:
>
> Hello..
>
> Thanks for the advices..
>
> Actually , i'm waiting for the clovertown to show up on the market...
>
> Regards
> Alex
>
> ----- Original Message -----
> From: "Sven Geisler" <sgeisler@aeccom.com>
> To: "Alexandru Coseru" <alexandru.coseru@totaltelecom.ro>
> Cc: <pgsql-performance@postgresql.org>
> Sent: Tuesday, December 05, 2006 11:57 AM
> Subject: Re: [PERFORM] Hardware advice
>
>
> > Hi Alexandru,
> >
> > Alexandru Coseru schrieb:
> >> [...]
> >> Question 1:
> >> The RAID layout should be:
> >> a) 2 hdd in raid 1 for system and pg_xlog and 6 hdd in
> >> raid10 for data ?
> >> b) 8 hdd in raid10 for all ?
> >> c) 2 hdd in raid1 for system , 2 hdd in raid1 for pg_xlog

> ,
> >> 4 hdd in raid10 for data ?
> >> Obs: I'm going for setup a) , but i want to hear your thoughts as
> >> well.

> >
> > This depends on you data size. I think, option a and c are good.
> > The potential bottleneck may the RAID 1 for pg_xlog if you have huge
> > amount of updates and insert.
> > What is about another setup
> >
> > 4 hdd in RAID 10 for System and pg_xlog - System partitions are normally
> > not in heavy use and pg_xlog should be fast for writing.
> > 4 hdd in RAID 10 for data.
> >
> >>
> >>
> >> Question 2: (Don't want to start a flame here..... but here is goes)
> >> What filesystem should i run for data ? ext3 or xfs ?
> >> The tables have ~ 15.000 rel_pages each. The biggest table has
> >> now over 30.000 pages.

> >
> > We have a database running with 60,000+ tables. The tables size is
> > between a few kByte for the small tables and up to 30 GB for the largest
> > one. We had no issue with ext3 in the past.
> >
> >>
> >> Question 3:
> >> The block size in postgresql is 8kb. The strip size in the
> >> raid ctrl is 64k.
> >> Should i increase the pgsql block size to 16 or 32 or even 64k

> ?
> >
> > You should keep in mind that the file system has also a block size. Ext3
> > has as maximum 4k.
> > I would set up the partitions aligned to the stripe size to prevent
> > unaligned reads. I guess, you can imagine that a larger block size of
> > postgresql may also end up in unaligned reads because the file system
> > has a smaller block size.
> >
> > RAID Volume and File system set up
> > 1. Make all partitions aligned to the RAID strip size.
> > The first partition should be start at 128 kByte.
> > You can do this with fdisk. after you created the partition switch
> > to the expert mode (type x) and modify the begin of the partition
> > (type b). You should change this value to 128 (default is 63).
> > All other partition should also start on a multiple of 128 kByte.
> >
> > 2. Give the file system a hint that you work with larger block sizes.
> > Ext3: mke2fs -b 4096 -j -R stride=2 /dev/sda1 -L LABEL
> > I made a I/O test with PostgreSQL on a RAID system with stripe size
> > of 64kByte and block size of 8 kByte in the RAID system.
> > Stride=2 was the best value.
> >
> >
> > PS: You should have a second XEON in your budget plan.
> >
> > Sven.
> >
> >
> >
> >
> > --
> > No virus found in this incoming message.
> > Checked by AVG Free Edition.
> > Version: 7.1.409 / Virus Database: 268.15.7/569 - Release Date:

> 12/5/2006
> >
> >

>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:34 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com