Unix Technical Forum

Thousands of tables versus on table?

This is a discussion on Thousands of tables versus on table? within the Pgsql Performance forums, part of the PostgreSQL category; --> On 6/6/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote: > Well, you will always have to deal with the sort of people ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #31 (permalink)  
Old 04-19-2008, 10:56 AM
Jonah H. Harris
 
Posts: n/a
Default Re: Thousands of tables versus on table?

On 6/6/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
> Well, you will always have to deal with the sort of people who will
> base their technical prescriptions on the shiny ads they read in
> SuperGlobalNetworkedExecutiveGoFast, or whatever rag they're reading
> these days.


Always.

> I usually encourage such people actually to perform the
> analysis of the license, salary, contingency, and migrations costs


Yes, this is the best way.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #32 (permalink)  
Old 04-19-2008, 10:56 AM
Andrew Sullivan
 
Posts: n/a
Default control of benchmarks (was: Thousands of tables)

On Wed, Jun 06, 2007 at 02:01:59PM -0400, Jonah H. Harris wrote:
> They did this for the same reason as everyone else. They don't want
> non-experts tuning the database incorrectly, writing a benchmark paper
> about it, and making the software look bad.


I agree that Oracle is a fine system, and I have my doubts about the
likelihood Oracle will fall over under fairly heavy loads. But I
think the above is giving Oracle Corp a little too much credit.

Corporations exist to make money, and the reason they prohibit doing
anything with their software and then publishing it without their
approval is because they want to control all the public perception of
their software, whether deserved or not. Every user of any large
software system (Oracle or otherwise) has their favourite horror
story about the grotty corners of that software;
commercially-licensed people just aren't allowed to prove it in
public. It's not only the clueless Oracle is protecting themselves
against; it's also the smart, accurate, but expensive corner-case
testers. I get to complain that PostgreSQL is mostly fast but has
terrible outlier performance problems. I can think of another system
that I've used that certainly had a similar issue, but I couldn't
show you the data to prove it. Everyone who used it knew about it,
though.

A

--
Andrew Sullivan | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #33 (permalink)  
Old 04-19-2008, 10:56 AM
Jonah H. Harris
 
Posts: n/a
Default Re: control of benchmarks (was: Thousands of tables)

On 6/6/07, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
> But I think the above is giving Oracle Corp a little too
> much credit.


Perhaps. However, Oracle has a thousand or so knobs which can control
almost every aspect of every subsystem. If you know how they interact
with each other and how to use them properly, they can make a huge
difference in performance. Most people do not know all the knobs or
understand what difference each can make given the theory and
architecture of the system, which results in poor general
configurations. Arguably, there is a cost associated with having
someone staffed and/or consulted that has the depth of knowledge
required to tune it in such a manner which goes back to a basic
cost/benefit analysis.

Oracle, while seeming like a one-size-fits-all system, has the same
basic issue as PostgreSQL and everyone else; to get optimum
performance, it has to be tuned specifically for the
application/workload at hand.

> Corporations exist to make money, and the reason they prohibit doing
> anything with their software and then publishing it without their
> approval is because they want to control all the public perception of
> their software, whether deserved or not.


Of course. Which is why audited benchmarks like SPEC and TPC are
around. While they may not represent one's particular workload, they
are the only way to fairly demonstrate comparable performance.

> Every user of any large software system (Oracle or otherwise)
> has their favourite horror story about the grotty corners of
> that software;


Of course, but they also never say why it was caused. With Oracle,
almost all bad-performance cases I've seen are related to improper
tuning and/or hardware; even by experienced DBAs.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #34 (permalink)  
Old 04-19-2008, 10:56 AM
Craig James
 
Posts: n/a
Default Re: Thousands of tables versus on table?

Jonah H. Harris wrote:
> On 6/6/07, Craig James <craig_james@emolecules.com> wrote:
>> They're blowing smoke if they think Oracle can do this.

>
> Oracle could handle this fine.
>
>> Oracle fell over dead, even with the best indexing possible,
>> tuned by the experts, and using partitions keyed to the
>> customerID.

>
> I don't think so, whoever tuned this likely didn't know what they were
> doing.


Wrong on both counts.

You didn't read my message. I said that *BOTH* Oracle and Postgres performed well with table-per-customer. I wasn't Oracle bashing. In fact, I was doing the opposite: Someone's coworker claimed ORACLE was the miracle cure for all problems, and I was simply pointing out that there are no miracle cures. (I prefer Postgres for many reasons, but Oracle is a fine RDBMS that I have used extensively.)

The technical question is simple: Table-per-customer or big-table-for-everyone. The answer is, "it depends." It depends on your application, your read-versus-write ratio, the table size, the design of your application software, and a dozen other factors. There is no simple answer, but there are important technical insights which, I'm happy to report, various people contributed to this discussion. Perhaps you have some technical insight too, because it really is an important question.

The reason I assert (and stand by this) that "They're blowing smoke" when they claim Oracle has the magic cure, is because Oracle and Postgres are both relational databases, they write their data to disks, and they both have indexes with O(log(N)) retrieval/update times. Oracle doesn't have a magical workaround to these facts, nor does Postgres.

Craig

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #35 (permalink)  
Old 04-19-2008, 10:56 AM
Jonah H. Harris
 
Posts: n/a
Default Re: Thousands of tables versus on table?

On 6/6/07, Craig James <craig_james@emolecules.com> wrote:
> You didn't read my message. I said that *BOTH* Oracle
> and Postgres performed well with table-per-customer.


Yes, I did. My belief is that Oracle can handle all customers in a
single table.

> The technical question is simple: Table-per-customer or
> big-table-for-everyone. The answer is, "it depends."


I agree, it does depend on the data, workload, etc. No
one-size-fits-all answer there.

> The reason I assert (and stand by this) that "They're
> blowing smoke" when they claim Oracle has the magic
> cure, is because Oracle and Postgres are both relational
> databases, they write their data to disks, and they both
> have indexes with O(log(N)) retrieval/update times. Oracle
> doesn't have a magical workaround to these facts,
> nor does Postgres.


Agreed that they are similar on the basics, but they do use
significantly different algorithms and optimizations. Likewise, there
is more tuning that can be done with Oracle given the amount of time
and money one has to spend on it. Again, cost/benefit analysis on
this type of an issue... but you're right, there is no "magic cure".

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 3rd Floor | jharris@enterprisedb.com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #36 (permalink)  
Old 04-19-2008, 10:57 AM
david@lang.hm
 
Posts: n/a
Default Re: Thousands of tables versus on table?

On Wed, 6 Jun 2007, Scott Marlowe wrote:

>> > pretty doggone cheap relative to the table searches, which means it
>> > almost certainly will lose badly if you carry the subdivision out to
>> > the extent that the individual tables become small. (This last could

>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> what is considered 'small'? a few thousand records, a few million records?

>
> I would say small is when the individual tables are in the 10 to 20 Megabyte
> range. How many records that is depends on record width, of course.
> Basically, once the tables get small enough that you don't really need
> indexes much, since you tend to grab 25% or more of each one that you're
> going to hit in a query.


thanks, that makes a lot of sense

>> what multiplication factor would there need to be on the partitioning to
>> make it worth while? 100 tables, 1000 tables, 10000 tables?

> Really depends on the size of the master table I think. If the master table
> is about 500 Megs in size, and you partition it down to about 1 meg per child
> table, you're probably ok. Walking through 500 entries for constraint
> exclusion seems pretty speedy from the tests I've run on a 12M row table that
> was about 250 Megs, split into 200 to 400 or so equisized child tables. The
> time to retrieve 85,000 rows that were all neighbors went from 2 to 6
> seconds, to about 0.2 seconds, and we got rid of indexes entirely since they
> weren't really needed anymore.


remember, I'm talking about a case wher eyou don't have to go through
contraint checking. you know to start with what customerID you are dealing
with so you just check the tables for that customer

>> the company that I'm at started out with a seperate database per customer
>> (not useing postgres), there are basicly zero cross-customer queries, with
>> a large volume of updates and lookups.
>>
>> overall things have now grown to millions of updates/day (some multiple of
>> this in lookups), and ~2000 customers, with tens of millions of rows
>> between them.
>>
>> having each one as a seperate database has really helped us over the years
>> as it's made it easy to scale (run 500 databases on each server instead of
>> 1000, performance just doubled)

> I think that for what you're doing, partitioning at the database level is
> probably a pretty good compromise solution. Like you say, it's easy to put
> busy databases on a new server to balance out the load. Hardware is cheap.
>
>> various people (not database experts) are pushing to install Oracle
>> cluster so that they can move all of these to one table with a customerID
>> column.

> Have these people identified a particular problem they're trying to solve, or
> is this a religious issue for them? From your description it sounds like a
> matter of dogma, not problem solving.


in part it is, in part it's becouse the commercial database companies have
told management that doing database replication is impossible with so many
databases (we first heard this back when we had 300 or so databases),
we've gone the expensive EMC disk-layer replication route, but they think
that mergeing everything will simplify things somehow so the database can
do it's job better.

I see it as just a limitation on the replication solution offered by the
bigname vendors.

>> the database folks won't comment much on this either way, but they don't
>> seem enthusiastic to combine all the data togeather.

> I think they can see the fecal matter heading towards the rotational cooling
> device on this one. I can't imagine this being a win from the perspective of
> saving the company money.


neither do I.

David Lang

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:54 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com