This is a discussion on New to PostgreSQL, performance considerations within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi yall, although I've worked with databases for more than 7 years now, I'm petty new to PostgreSQL. I ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi yall, although I've worked with databases for more than 7 years now, I'm petty new to PostgreSQL. I have an application using SQLite3 as an embedded SQL solution because it's simple and it can handle the load that *most* of my clients have. Because of that '*most*' part, because of the client/server way and because of the license, I'm think about start using PostgreSQL. My app uses only three tables: one has low read and really high write rates, a second has high read and low write and the third one is equally high on both. I need a db that can handle something like 500 operations/sec continuously. It's something like 250 writes/sec and 250 reads/sec. My databases uses indexes. Each table would have to handle 5 million rows/day. So I'm thinking about creating different tables (clusters?) to different days to make queries return faster. Am I right or there is no problem in having a 150 million (one month) rows on a table? All my data is e-mail traffic: user's quarentine, inbond traffic, outbond traffic, sender, recipients, subjects, attachments, etc... What do you people say, is it possible with PostgreSQL? What kind of hardware would I need to handle that kind of traffic? On a first test, at a badly tunned AMD Athlon XP 1800+ (ergh!) I could do 1400 writes/sec locally after I disabled fsync. We have UPSs, in the last year we only had 1 power failure. Thank you all for your tips. Best regards, Daniel Colchete ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Daniel van Ham Colchete wrote: > although I've worked with databases for more than 7 years now, I'm > petty new to PostgreSQL. Same here. > I need a db that can handle something like 500 operations/sec > continuously. It's something like 250 writes/sec and 250 reads/sec. My > databases uses indexes. Taken from an email to the admin list about a week ago - Stats about the system: Postgres 8.1.4 db size: 200+ GB Inheritance is used extremely heavily, so in figuring out what could cause a create to hang, it may be of interest to know that there are: 101,745 tables 314,821 indexes 1,569 views The last averages taken on the number of writes per hour on this database: ~3 million (this stat is a few weeks old) Machine info: OS: Solaris 10 Sunfire X4100 XL 2x AMD Opteron Model 275 dual core procs 8GB of ram > Each table would have to handle 5 million rows/day. So I'm thinking > about creating different tables (clusters?) to different days to make > queries return faster. Am I right or there is no problem in having a > 150 million (one month) rows on a table? Sounds to me that a month might be on the large size for real fast response times - I would think of seperating weekly rather than daily. Start with http://www.postgresql.org/docs/8.2/i...l-inherit.html then the next chapter explains using that to partition data into different tables dependant on specified criteria. You may be interested in tsearch2 which is in the contrib dir and adds full text indexing. -- Shane Ambler pgSQL@007Marketing.com Get Sheeky @ http://Sheeky.Biz ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Hi Gene, at my postgresql.conf, the only non-comented lines are: fsync = off lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' max_connections = 100 shared_buffers = 5000 temp_buffers = 1000 work_mem = 4096 The only two values I changed are shared_buffers and work_mem. *** BUT *** I'm using Gentoo Linux, so all my libraries (including glibc that is very important to PostgreSQL), and all my softwares are compiled with good CFLAG options to my processor ("-O2 march=athlon-xp (...)"). My Linux is not an Intel-AMD binary compatible turtle like Fedora/RedHat/SUSE/... It's really important to have your GLIBC compiled for your processor. It is essencial for performance. I can't imagine anyone buying a $1k-dollar quad-core XEON and using an i585 compatible distro that doesn't even know what the fudge is SSE/SSE2/vectorized instructions. Best regards, Daniel Colchete On 12/10/06, Gene <genekhart@gmail.com> wrote: > I have a similar type application, I'm partitioning using constraint > exclusion so queries only have to look at a few tables. I've found that > there is some overhead to using partitioning so you should test to see how > many partitions you want to create. Could I check out you postgresql.conf > parameters to compare? thanks > > > Gene Hart ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Shane, really good answer man!!! It really helped me! I can't wait to have time to test this partitions thing. This will really solve a lot of my problems (and another one that I was delaying). Instead of making partitions over days/weeks I can tune it to the seconds (timestamp) and try to find the best partition size for performance. That partition thing is really good. That server you sent me handles about 15 times more writes than I would have to. Thank you very much for your answer. Best regards, Daniel Colchete On 12/10/06, Shane Ambler <pgsql@007marketing.com> wrote: > Daniel van Ham Colchete wrote: > > although I've worked with databases for more than 7 years now, I'm > > petty new to PostgreSQL. > > Same here. > > > I need a db that can handle something like 500 operations/sec > > continuously. It's something like 250 writes/sec and 250 reads/sec. My > > databases uses indexes. > > Taken from an email to the admin list about a week ago - > > Stats about the system: > Postgres 8.1.4 > db size: 200+ GB > Inheritance is used extremely heavily, so in figuring out what could > cause a create to hang, it may be of interest to know that there are: > 101,745 tables > 314,821 indexes > 1,569 views > The last averages taken on the number of writes per hour on this > database: ~3 million (this stat is a few weeks old) > > Machine info: > OS: Solaris 10 > Sunfire X4100 XL > 2x AMD Opteron Model 275 dual core procs > 8GB of ram > > > > Each table would have to handle 5 million rows/day. So I'm thinking > > about creating different tables (clusters?) to different days to make > > queries return faster. Am I right or there is no problem in having a > > 150 million (one month) rows on a table? > > Sounds to me that a month might be on the large size for real fast > response times - I would think of seperating weekly rather than daily. > > Start with > http://www.postgresql.org/docs/8.2/i...l-inherit.html > then the next chapter explains using that to partition data into > different tables dependant on specified criteria. > > You may be interested in tsearch2 which is in the contrib dir and adds > full text indexing. > > > -- > > Shane Ambler > pgSQL@007Marketing.com > > Get Sheeky @ http://Sheeky.Biz > ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Gene, at Postgres's docs they say that the "constraint checks are relatively expensive". From what you're saying, it's really worth studying the matter deply first. I never understood what's the matter between the ASCII/ISO-8859-1/UTF8 charsets to a database. They're all simple C strings that doesn't have the zero-byte in the midlle (like UTF16 would) and that doesn't require any different processing unless you are doing case insensitive search (them you would have a problem). ASCII chars are also correct UTF8 chars as well. The first 127 Unicode chars are the same as the ASCII chars. So you would not have any problems changing your table from ASCII to UTF8. My software uses UTF16 and UTF8 at some of it's internals and I only notice performance problems with UTF16 (because of the zero-byte thing, the processing I make is diferent). So, I imagine that you wouldn't have any performance issues changing from ASCII to UTF8 if necessary. Nowadays everything is turning to Unicode (thank god). I wouldn't start anything with any other charset. I would only be asking for a rewrite in a near future. Best, Daniel On 12/10/06, Gene <genekhart@gmail.com> wrote: > I'm using gentoo as well, I'm having performance issues as the number of > partitions is increasing I imagine do due to overhead managing them and > figuring out where to put each insert/update. I'm switching to weekly > partitions instead of daily. I believe in PG8.2 constraint exclusion works > with updates/deletes also so I'm eager to upgrade. I get about 1 million > records per day in two tables each, each record updated about 4 times within > 30 minutes. > > Do you think using UTF8 vs US-ASCII hurts performance signficantly, some of > my smaller tables require unicode, and I don't think you can have some > tables be unicode and some be ASCII. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| I'm using gentoo as well, I'm having performance issues as the number of partitions is increasing I imagine do due to overhead managing them and figuring out where to put each insert/update. I'm switching to weekly partitions instead of daily. I believe in PG8.2 constraint exclusion works with updates/deletes also so I'm eager to upgrade. I get about 1 million records per day in two tables each, each record updated about 4 times within 30 minutes. Do you think using UTF8 vs US-ASCII hurts performance signficantly, some of my smaller tables require unicode, and I don't think you can have some tables be unicode and some be ASCII. On 12/10/06, Daniel van Ham Colchete <daniel.colchete@gmail.com> wrote: > > Hi Gene, > > at my postgresql.conf, the only non-comented lines are: > fsync = off > lc_messages = 'C' > lc_monetary = 'C' > lc_numeric = 'C' > lc_time = 'C' > max_connections = 100 > shared_buffers = 5000 > temp_buffers = 1000 > work_mem = 4096 > > The only two values I changed are shared_buffers and work_mem. > > *** BUT *** > I'm using Gentoo Linux, so all my libraries (including glibc that is > very important to PostgreSQL), and all my softwares are compiled with > good CFLAG options to my processor ("-O2 march=athlon-xp (...)"). My > Linux is not an Intel-AMD binary compatible turtle like > Fedora/RedHat/SUSE/... It's really important to have your GLIBC > compiled for your processor. It is essencial for performance. > > I can't imagine anyone buying a $1k-dollar quad-core XEON and using an > i585 compatible distro that doesn't even know what the fudge is > SSE/SSE2/vectorized instructions. > > Best regards, > Daniel Colchete > > On 12/10/06, Gene <genekhart@gmail.com> wrote: > > I have a similar type application, I'm partitioning using constraint > > exclusion so queries only have to look at a few tables. I've found that > > there is some overhead to using partitioning so you should test to see > how > > many partitions you want to create. Could I check out you > postgresql.conf > > parameters to compare? thanks > > > > > > Gene Hart > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Gene Hart cell: 443-604-2679 |
| |||
| On Dec 11, 2006, at 10:47 , Daniel van Ham Colchete wrote: > I never understood what's the matter between the ASCII/ISO-8859-1/UTF8 > charsets to a database. If what you mean by ASCII is SQL_ASCII, then there is at least one significant difference between UTF8 (the PostgreSQL encoding) and SQL_ASCII. AIUI, SQL_ASCII does no checking at all with respect to what bytes are going in, while UTF8 does make sure to the best of its ability that the bytes represent valid UTF-8 characters, throwing an error if an invalid byte sequence is detected. There's more information regarding this here: http://www.postgresql.org/docs/8.2/interactive/ multibyte.html#MULTIBYTE-CHARSET-SUPPORTED Michael Glaesemann grzm seespotcode net ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Sun, Dec 10, 2006 at 11:02:44PM -0200, Daniel van Ham Colchete wrote: >I'm using Gentoo Linux, so all my libraries (including glibc that is >very important to PostgreSQL), and all my softwares are compiled with >good CFLAG options to my processor ("-O2 march=athlon-xp (...)"). My >Linux is not an Intel-AMD binary compatible turtle like >Fedora/RedHat/SUSE/... It's really important to have your GLIBC >compiled for your processor. It is essencial for performance. Please, point to the benchmarks that demonstrate this for a postgres application. Mike Stone ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Hi Daniel On 10-Dec-06, at 8:02 PM, Daniel van Ham Colchete wrote: > Hi Gene, > > at my postgresql.conf, the only non-comented lines are: > fsync = off This can, and will result in lost data. > lc_messages = 'C' > lc_monetary = 'C' > lc_numeric = 'C' > lc_time = 'C' How much memory does this machine have and what version of postgresql are you using? > max_connections = 100 > shared_buffers = 5000 > temp_buffers = 1000 > work_mem = 4096 > > The only two values I changed are shared_buffers and work_mem. Dave > > *** BUT *** > I'm using Gentoo Linux, so all my libraries (including glibc that is > very important to PostgreSQL), and all my softwares are compiled with > good CFLAG options to my processor ("-O2 march=athlon-xp (...)"). My > Linux is not an Intel-AMD binary compatible turtle like > Fedora/RedHat/SUSE/... It's really important to have your GLIBC > compiled for your processor. It is essencial for performance. > > I can't imagine anyone buying a $1k-dollar quad-core XEON and using an > i585 compatible distro that doesn't even know what the fudge is > SSE/SSE2/vectorized instructions. > > Best regards, > Daniel Colchete > > On 12/10/06, Gene <genekhart@gmail.com> wrote: >> I have a similar type application, I'm partitioning using constraint >> exclusion so queries only have to look at a few tables. I've found >> that >> there is some overhead to using partitioning so you should test to >> see how >> many partitions you want to create. Could I check out you >> postgresql.conf >> parameters to compare? thanks >> >> >> Gene Hart > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| On Dec 11, 2006, at 02:47 , Daniel van Ham Colchete wrote: > I never understood what's the matter between the ASCII/ISO-8859-1/UTF8 > charsets to a database. They're all simple C strings that doesn't have > the zero-byte in the midlle (like UTF16 would) and that doesn't > require any different processing unless you are doing case insensitive > search (them you would have a problem). That's not the whole story. UTF-8 and other variable-width encodings don't provide a 1:1 mapping of logical characters to single bytes; in particular, combination characters opens the possibility of multiple different byte sequences mapping to the same code point; therefore, string comparison in such encodings generally cannot be done at the byte level (unless, of course, you first acertain that the strings involved are all normalized to an unambiguous subset of your encoding). PostgreSQL's use of strings is not limited to string comparison. Substring extraction, concatenation, regular expression matching, up/ downcasing, tokenization and so on are all part of PostgreSQL's small library of text manipulation functions, and all deal with logical characters, meaning they must be Unicode-aware. Alexander. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |