This is a discussion on TB-sized databases within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi all, I have a user who is looking to store 500+ GB of data in a database (and ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi all, I have a user who is looking to store 500+ GB of data in a database (and when all the indexes and metadata are factored in, it's going to be more like 3-4 TB). He is wondering how well PostgreSQL scales with TB-sized databases and what can be done to help optimize them (mostly hardware and config parameters, maybe a little advocacy). I can't speak on that since I don't have any DBs approaching that size. The other part of this puzzle is that he's torn between MS SQL Server (running on Windows and unsupported by us) and PostgreSQL (running on Linux...which we would fully support). If any of you have ideas of how well PostgreSQL compares to SQL Server, especially in TB-sized databases, that would be much appreciated. We're running PG 8.2.5, by the way. Peter ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Peter Koczan wrote: > Hi all, > > I have a user who is looking to store 500+ GB of data in a database > (and when all the indexes and metadata are factored in, it's going to > be more like 3-4 TB). He is wondering how well PostgreSQL scales with > TB-sized databases and what can be done to help optimize them (mostly > hardware and config parameters, maybe a little advocacy). I can't > speak on that since I don't have any DBs approaching that size. > > The other part of this puzzle is that he's torn between MS SQL Server > (running on Windows and unsupported by us) and PostgreSQL (running on > Linux...which we would fully support). If any of you have ideas of how > well PostgreSQL compares to SQL Server, especially in TB-sized > databases, that would be much appreciated. > > We're running PG 8.2.5, by the way. Well I can't speak to MS SQL-Server because all of our clients run PostgreSQL 1.5TB range. All perform admirably as long as you have the hardware behind it and are doing correct table structuring (such as table partitioning). Sincerely, Joshua D. Drake > > Peter > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| We have several TB database in production and it works well on HP rx1620 dual Itanium2, MSA 20, running Linux. It's read-only storage for astronomical catalogs with about 4-billions objects. We have custom index for spherical coordinates which provide great performance. Oleg On Mon, 26 Nov 2007, Peter Koczan wrote: > Hi all, > > I have a user who is looking to store 500+ GB of data in a database > (and when all the indexes and metadata are factored in, it's going to > be more like 3-4 TB). He is wondering how well PostgreSQL scales with > TB-sized databases and what can be done to help optimize them (mostly > hardware and config parameters, maybe a little advocacy). I can't > speak on that since I don't have any DBs approaching that size. > > The other part of this puzzle is that he's torn between MS SQL Server > (running on Windows and unsupported by us) and PostgreSQL (running on > Linux...which we would fully support). If any of you have ideas of how > well PostgreSQL compares to SQL Server, especially in TB-sized > databases, that would be much appreciated. > > We're running PG 8.2.5, by the way. > > Peter > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > Regards, Oleg __________________________________________________ ___________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| I had a client that tried to use Ms Sql Server to run a 500Gb+ database. The database simply colapsed. They switched to Teradata and it is running good. This database has now 1.5Tb+. Currently I have clients using postgresql huge databases and they are happy. In one client's database the biggest table has 237Gb+ (only 1 table!) and postgresql run the database without problem using partitioning, triggers and rules (using postgresql 8.2.5). Pablo Peter Koczan wrote: > Hi all, > > I have a user who is looking to store 500+ GB of data in a database > (and when all the indexes and metadata are factored in, it's going to > be more like 3-4 TB). He is wondering how well PostgreSQL scales with > TB-sized databases and what can be done to help optimize them (mostly > hardware and config parameters, maybe a little advocacy). I can't > speak on that since I don't have any DBs approaching that size. > > The other part of this puzzle is that he's torn between MS SQL Server > (running on Windows and unsupported by us) and PostgreSQL (running on > Linux...which we would fully support). If any of you have ideas of how > well PostgreSQL compares to SQL Server, especially in TB-sized > databases, that would be much appreciated. > > We're running PG 8.2.5, by the way. > > Peter > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > > ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| I think either would work; both PostgreSQL and MS SQL Server have success stories out there running VLDBs. It really depends on what you know and what you have. If you have a lot of experience with Postgres running on Linux, and not much with SQL Server on Windows, of course the former would be a better choice for you. You stand a much better chance working with tools you know. Pablo Alcaraz wrote: > I had a client that tried to use Ms Sql Server to run a 500Gb+ database. > The database simply colapsed. They switched to Teradata and it is > running good. This database has now 1.5Tb+. > > Currently I have clients using postgresql huge databases and they are > happy. In one client's database the biggest table has 237Gb+ (only 1 > table!) and postgresql run the database without problem using > partitioning, triggers and rules (using postgresql 8.2.5). > > Pablo > > Peter Koczan wrote: >> Hi all, >> >> I have a user who is looking to store 500+ GB of data in a database >> (and when all the indexes and metadata are factored in, it's going to >> be more like 3-4 TB). He is wondering how well PostgreSQL scales with >> TB-sized databases and what can be done to help optimize them (mostly >> hardware and config parameters, maybe a little advocacy). I can't >> speak on that since I don't have any DBs approaching that size. >> >> The other part of this puzzle is that he's torn between MS SQL Server >> (running on Windows and unsupported by us) and PostgreSQL (running on >> Linux...which we would fully support). If any of you have ideas of how >> well PostgreSQL compares to SQL Server, especially in TB-sized >> databases, that would be much appreciated. >> >> We're running PG 8.2.5, by the way. >> >> Peter >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 4: Have you searched our list archives? >> >> http://archives.postgresql.org >> >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Thanks all. This is just what I needed. On Nov 26, 2007 1:16 PM, Stephen Cook <sclists@gmail.com> wrote: > I think either would work; both PostgreSQL and MS SQL Server have > success stories out there running VLDBs. It really depends on what you > know and what you have. If you have a lot of experience with Postgres > running on Linux, and not much with SQL Server on Windows, of course the > former would be a better choice for you. You stand a much better chance > working with tools you know. > > > > Pablo Alcaraz wrote: > > I had a client that tried to use Ms Sql Server to run a 500Gb+ database. > > The database simply colapsed. They switched to Teradata and it is > > running good. This database has now 1.5Tb+. > > > > Currently I have clients using postgresql huge databases and they are > > happy. In one client's database the biggest table has 237Gb+ (only 1 > > table!) and postgresql run the database without problem using > > partitioning, triggers and rules (using postgresql 8.2.5). > > > > Pablo > > > > Peter Koczan wrote: > >> Hi all, > >> > >> I have a user who is looking to store 500+ GB of data in a database > >> (and when all the indexes and metadata are factored in, it's going to > >> be more like 3-4 TB). He is wondering how well PostgreSQL scales with > >> TB-sized databases and what can be done to help optimize them (mostly > >> hardware and config parameters, maybe a little advocacy). I can't > >> speak on that since I don't have any DBs approaching that size. > >> > >> The other part of this puzzle is that he's torn between MS SQL Server > >> (running on Windows and unsupported by us) and PostgreSQL (running on > >> Linux...which we would fully support). If any of you have ideas of how > >> well PostgreSQL compares to SQL Server, especially in TB-sized > >> databases, that would be much appreciated. > >> > >> We're running PG 8.2.5, by the way. > >> > >> Peter > >> > >> ---------------------------(end of broadcast)--------------------------- > >> TIP 4: Have you searched our list archives? > >> > >> http://archives.postgresql.org > >> > >> > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 5: don't forget to increase your free space map settings > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Tue, 2007-11-27 at 14:18 -0600, Peter Koczan wrote: > Thanks all. This is just what I needed. All of those responses have cooked up quite a few topics into one. Large databases might mean text warehouses, XML message stores, relational archives and fact-based business data warehouses. The main thing is that TB-sized databases are performance critical. So it all depends upon your workload really as to how well PostgreSQL, or another other RDBMS vendor can handle them. Anyway, my reason for replying to this thread is that I'm planning changes for PostgreSQL 8.4+ that will make allow us to get bigger and faster databases. If anybody has specific concerns then I'd like to hear them so I can consider those things in the planning stages. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Simon Riggs wrote: > All of those responses have cooked up quite a few topics into one. Large > databases might mean text warehouses, XML message stores, relational > archives and fact-based business data warehouses. > > The main thing is that TB-sized databases are performance critical. So > it all depends upon your workload really as to how well PostgreSQL, or > another other RDBMS vendor can handle them. > > > Anyway, my reason for replying to this thread is that I'm planning > changes for PostgreSQL 8.4+ that will make allow us to get bigger and > faster databases. If anybody has specific concerns then I'd like to hear > them so I can consider those things in the planning stages it would be nice to do something with selects so we can recover a rowset on huge tables using a criteria with indexes without fall running a full scan. In my opinion, by definition, a huge database sooner or later will have tables far bigger than RAM available (same for their indexes). I think the queries need to be solved using indexes enough smart to be fast on disk. Pablo ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Tue, 27 Nov 2007, Pablo Alcaraz wrote: > it would be nice to do something with selects so we can recover a rowset > on huge tables using a criteria with indexes without fall running a full > scan. You mean: Be able to tell Postgres "Don't ever do a sequential scan of this table. It's silly. I would rather the query failed than have to wait for a sequential scan of the entire table." Yes, that would be really useful, if you have huge tables in your database. Matthew -- Trying to write a program that can't be written is... well, it can be an enormous amount of fun! -- Computer Science Lecturer ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| Pablo Alcaraz escribió: > In my opinion there are queries that I think they ll need to be tuned for > "huge databases" (huge databases = a database which relevant > tables(indexes) are (will be) far bigger that all the ram available): > > -- example table > CREATE TABLE homes ( > id bigserial, > name text, > location text, > bigint money_win, > int zipcode; > ); > CREATE INDEX money_win_idx ON homes(money_win); > CREATE INDEX zipcode_idx ON homes(zipcode); Your example does not work, so I created my own for your first item. alvherre=# create table test (a int primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" CREATE TABLE alvherre=# insert into test select * from generate_series(1, 100000); INSERT 0 100000 alvherre=# analyze test; ANALYZE > SELECT max( id) from homes; > I think the information to get the max row quickly could be found using the > pk index. Idem min( id). alvherre=# explain analyze select max(a) from test; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=0.03..0.04 rows=1 width=0) (actual time=0.054..0.057 rows=1 loops=1) InitPlan -> Limit (cost=0.00..0.03 rows=1 width=4) (actual time=0.041..0.043 rows=1 loops=1) -> Index Scan Backward using test_pkey on test (cost=0.00..3148.26 rows=100000 width=4) (actual time=0.034..0.034 rows=1 loops=1) Filter: (a IS NOT NULL) Total runtime: 0.143 ms (6 rows) > SELECT max( id) from homes WHERE id > 8000000000; > Same, but useful to find out the same thing in partitioned tables (using id > like partition criteria). It would be nice if Postgres would not need the > WHERE clause to realize it does not need to scan every single partition, > but only the last. Idem min(id). Yeah, this could be improved. > SELECT * from homes WHERE money_win = 1300000000; > Postgres thinks too easily to solve these kind of queries that it must to > do a sequential scan where the table (or the index) does not fix in memory > if the number of rows is not near 1 (example: if the query returns 5000 > rows). Same case with filters like 'WHERE money_win >= xx', 'WHERE > money_win BETWEEN xx AND yy'. But I do not know if this behavior is because > I did a wrong posgresql's configuration or I missed something. There are thresholds to switch from index scan to seqscans. It depends on the selectivity of the clauses. > SELECT count( *) from homes; > it would be *cute* that Postgres stores this value and only recalculate if > it thinks the stored value is wrong (example: after an anormal shutdown). This is not as easy as you put it for reasons that have been discussed at length. I'll only say that there are workarounds to make counting quick. > SELECT zipcode, count( zipcode) as n from homes GROUP BY zipcode; > it would be *very cute* that Postgres could store this value (or is this > there?) on the index or wherever and it only recalculates if it thinks the > stored value is wrong (example: after an anormal shutdown). Same as above. > Last but not least, it would be *excelent* that this kind of optimization > would be posible without weird non standard sql sentences. Right. If you can afford to sponsor development, it could make them a reality sooner. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4" "You're _really_ hosed if the person doing the hiring doesn't understand relational systems: you end up with a whole raft of programmers, none of whom has had a Date with the clue stick." (Andrew Sullivan) ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |