Unix Technical Forum

RAID controllers for Postgresql on large setups

This is a discussion on RAID controllers for Postgresql on large setups within the Pgsql Performance forums, part of the PostgreSQL category; --> On May 12, 2008, at 11:24 PM, Francisco Reyes wrote: > Any PCI controller you have had good experience ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 05-13-2008, 07:16 PM
Vivek Khera
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups


On May 12, 2008, at 11:24 PM, Francisco Reyes wrote:

> Any PCI controller you have had good experience with?
> How any other PCI-X/PCI-e controller that you have had good results?


The LSI controllers are top-notch, and always my first choice. They
have PCI-X and PCI-e versions.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 05-16-2008, 02:43 PM
James Mansion
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

PFC wrote:
> PCI limits you to 133 MB/s (theoretical), actual speed being
> around 100-110 MB/s.

Many servers do have more than one bus. You have to process that data
too so its not going to be as much of a limit as you are suggesting. It
may be possible to stream a compressed data file to the server and copy
in from that after decompression, which will free LAN bandwidth. Or
even if you RPC blocks of compressed data and decompress in the proc and
insert right there.

> On your current setup with 15K drives if you need 1 fsync per
> INSERT you won't do more than 250 per second, which is very limiting...

Well, thats 250 physical syncs. But if you have multiple insert streams
(for group commit), or can batch the rows in each insert or copy, its
not necessarily as much of a problem as you seem to be implying.
Particularly if you are doing the holding table trick.

James


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 05-16-2008, 02:44 PM
PFC
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

>
>> You say that like you don't mind having PCI in a server whose job is
>> to perform massive query over large data sets.

>
> I am in my 4th week at a new job. Trying to figure what I am working
> with.


LOOL, ok, hehe, not exactly the time to have a "let's change everything"
fit

> From what I see I will likely get as much improvement from new hardware
> as from re-doing some of the database design. Can't get everything done
> at once, not to mention I have to redo one machine sooner rather than
> later so I need to prioritize.
>
>> In fact for bulk IO a box with 2 SATA drives would be just as fast as
>> your monster RAID, lol.

>
> I am working on setting up a standard test based on the type of
> operations that the company does. This will give me a beter idea.
> Specially I will work with the developers to make sure the queries I
> create for the benchmark are representative of the workload.


watching vmstat (or iostat) while running a very big seq scan query will
give you information about the reading speed of your drives.
Same for writes, during one of your big updates, watch vmstat, you'll
know if you are CPU bound or IO bound...

- one core at 100% -> CPU bound
- lots of free CPU but lots of iowait -> disk bound
- disk throughput decent (in your setup, 100 MB/s) -> PCI bus saturation
- disk throughput miserable (< 10 MB/s) -> random IO bound (either random
reads or fsync() or random writes depending on the case)

> In your opinion if we get a new machine with PCI-e, at how many spindles
> will the SCSI random access superiority start to be less notable?
> Specially given the low number of connections we usually have running
> against these machines.


Sorting of random reads depends on multiple concurrent requests (which
you don't have). Sorting of random writes does not depend on concurrent
requests so, you'll benefit on your updates. About SCSI vs SATA vs number
of spindles : can't answer this one.

> We are using one single SQL statement.


OK, so forget about fsync penalty, but do tune your checkpoints so they
are not happening all the time... and bgwriter etc.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 05-29-2008, 10:44 AM
Marinos Yannikos
 
Posts: n/a
Default Re: RAID controllers for Postgresql on large setups

PFC schrieb:
> PCI limits you to 133 MB/s (theoretical), actual speed being around
> 100-110 MB/s.


"Current" PCI 2.1+ implementations allow 533MB/s (32bit) to 1066MB/s
(64bit) since 6-7 years ago or so.

> For instance here I have a box with PCI, Giga Ethernet and a
> software RAID5 ; reading from the RAID5 goes to about 110 MB/s (actual
> disk bandwidth is closer to 250 but it's wasted) ; however when using
> the giga ethernet to copy a large file over a LAN, disk and ethernet
> have to share the PCI bus, so throughput falls to 50 MB/s. Crummy, eh ?


Sounds like a slow Giga Ethernet NIC...

> Let me repeat this : at the current state of SATA drives, just TWO
> of them is enough to saturate a PCI bus. I'm speaking desktop SATA
> drives, not high-end SCSI ! (which is not necessarily faster for pure
> throughput anyway).
> Adding more drives will help random reads/writes but do nothing for
> throughput since the tiny PCI pipe is choking.


In my experience, SATA drives are very slow for typical database work
(which is heavy on random writes). They often have very slow access
times, bad or missing NCQ implementation (controllers / SANs as well)
and while I am not very familiar with the protocol differences, they
seem to add a hell of a lot more latency than even old U320 SCSI drives.

Sequential transfer performance is a nice indicator, but not very
useful, since most serious RAID arrays will have bottlenecks other than
the theoretical cumulated transfer rate of all the drives (from
controller cache speed to SCSI bus to fibre channel). Thus, lower
sequential transfer rate and lower access times scale much better.

>> Whether a SAN or just an external enclosure is 12disk enough to
>> substain 5K inserts/updates per second on rows in the 30 to 90bytes
>> territory? At 5K/second inserting/updating 100 Million records would
>> take 5.5 hours. That is fairly reasonable if we can achieve. Faster
>> would be better, but it depends on what it would cost to achieve.


5K/s inserts (with no indexes) are easy with PostgreSQL and typical
(current) hardware. We are copying about 175K rows/s with our current
server (Quad core Xeon 2.93GHz, lots of RAM, meagre performance SATA SAN
with RAID-5 but 2GB writeback cache). Rows are around 570b each on
average. Performance is CPU-bound with a typical number of indexes on
the table and much lower than 175K/s though, for single row updates we
get about 9K/s per thread (=5.6MB/s) and that's 100% CPU-bound on the
server - if we had to max this out, we'd thus use several clients in
parallel and/or collect inserts in text files and make bulk updates
using COPY. The slow SAN isn't a problem now.

Our SATA SAN suffers greatly when reads are interspersed with writes,
for that you want more spindles and faster disks.

To the OP I have 1 hearty recommendation: if you are using the
RAID-functionality of the 2120, get rid of it. If you can wipe the
disks, try using Linux software-RAID (yes, it's an admin's nightmare
etc. but should give much better performance even though the 2120's
plain SCSI won't be hot either) and then start tuning your PostgreSQL
installation (there's much to gain here). Your setup looks decent
otherwise for what you are trying to do (but you need a fast CPU) and
your cheapest upgrade path would be a decent RAID controller or at least
a decent non-RAID SCSI controller for software-RAID (at least 2 ports
for 12 disks), although the plain PCI market is dead.

-mjy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:26 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com