Unix Technical Forum

count * performance issue

This is a discussion on count * performance issue within the Pgsql Performance forums, part of the PostgreSQL category; --> On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote: > Count() on Oracle and MySQL is almost ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #31 (permalink)  
Old 04-19-2008, 11:45 AM
Steinar H. Gunderson
 
Posts: n/a
Default Re: count * performance issue

On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote:
> Count() on Oracle and MySQL is almost instantaneous, even for very large
> tables. So why can't Postgres do what they do?


In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.)

/* Steinar */
--
Homepage: http://www.sesse.net/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_www...ql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #32 (permalink)  
Old 04-19-2008, 11:45 AM
Greg Smith
 
Posts: n/a
Default Re: count * performance issue

On Thu, 6 Mar 2008, Steinar H. Gunderson wrote:

> On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote:
>> Count() on Oracle and MySQL is almost instantaneous, even for very large
>> tables. So why can't Postgres do what they do?

>
> In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.)


Exactly. There is a good discussion of this at
http://www.mysqlperformanceblog.com/...t-vs-countcol/ and I
found the comments from Ken Jacobs were the most informative.

In short, if you want any reasonable database integrity you have to use
InnoDB with MySQL, and once you make that choice it has the same problem.
You only get this accelerated significantly when using MyISAM, which can
tell you an exact count of all the rows it hasn't corrupted yet.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_www...ql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #33 (permalink)  
Old 04-19-2008, 11:45 AM
Bill Moran
 
Posts: n/a
Default Re: count * performance issue

In response to Craig James <craig_james@emolecules.com>:

> In the 3 years I've been using Postgres, the problem of count() performance has come up more times than I can recall, and each time the answer is, "It's a sequential scan -- redesign your application."
>
> My question is: What do the other databases do that Postgres can't do, and why not?
>
> Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do?


I don't know about Oracle, but MySQL has this problem as well. Use
innodb tables and see how slow it is. The only reason myisam tables
don't have this problem is because they don't implement any of the
features that make the problem difficult to solve.

> On the one hand, I understand that Postgres has its architecture, and I understand the issue of row visibility, and so forth. On the other hand, my database is just sitting there, nothing going on, no connections except me, and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that either Oracle or MySQL would answer in a fraction of a second. It's hard for me to believe there isn't a better way.


There's been discussion about putting visibility information in indexes.
I don't know how far along that effort is, but I expect that will improve
count() performance significantly.

> This is a real problem. Countless people (including me) have spent significant effort rewriting applications because of this performance flaw in Postgres. Over and over, the response is, "You don't really need to do that ... change your application." Well, sure, it's always possible to change the application, but that misses the point. To most of us users, count() seems like it should be a trivial operation. On other relational database systems, it is a trivial operation.
>
> This is really a significant flaw on an otherwise excellent relational database system.


Not really. It really is a design flaw in your application ... it doesn't
make relational sense to use the number of rows in a table for anything.
Just because other people do it frequently doesn't make it right.

That being said, it's still a useful feature, and I don't hear anyone
denying that. As I said, google around a bit WRT to PG storing
visibility information in indexes, as I think that's the way this will
be improved.

> My rant for today...


Feel better now?

--
Bill Moran

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_www...ql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #34 (permalink)  
Old 04-19-2008, 11:45 AM
Tom Lane
 
Posts: n/a
Default Re: count * performance issue

Craig James <craig_james@emolecules.com> writes:
> Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do?


AFAIK the above claim is false for Oracle. They have the same
transactional issues we do.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_www...ql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #35 (permalink)  
Old 04-19-2008, 11:45 AM
Mark Kirkwood
 
Posts: n/a
Default Re: count * performance issue

Craig James wrote:
>
> My question is: What do the other databases do that Postgres can't do,
> and why not?
>
> Count() on Oracle and MySQL is almost instantaneous, even for very
> large tables. So why can't Postgres do what they do?
>


I think Mysql can only do that for the myisam engine - innodb and
falcon are similar to Postgres.

I don't believe Oracle optimizes bare count(*) on a table either - tho
it may be able to use a suitable index (if present) to get the answer
quicker.

regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #36 (permalink)  
Old 04-19-2008, 11:45 AM
Craig James
 
Posts: n/a
Default Re: count * performance issue

Tom Lane wrote:
> Craig James <craig_james@emolecules.com> writes:
>> Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do?

>
> AFAIK the above claim is false for Oracle. They have the same
> transactional issues we do.


My experience doesn't match this claim. When I ported my application from Oracle to Postgres, this was the single biggest performance problem. count() in Oracle was always very fast. We're not talking about a 20% or 50% difference, we're talking about a small fraction of a second (Oracle) versus a minute (Postgres) -- something like two or three orders of magnitude.

It may be that Oracle has a way to detect when there's no transaction and use a faster method. If so, this was a clever optimization -- in my experience, that represents the vast majority of the times you want to use count(). It's not very useful to count the rows of a table that many apps are actively modifying since the result may change the moment your transaction completes. Most of the time when you use count(), it's because you're the only one modifying the table, so the count will be meaningful.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #37 (permalink)  
Old 04-19-2008, 11:45 AM
paul rivers
 
Posts: n/a
Default Re: count * performance issue

Craig James wrote:
> Tom Lane wrote:
>> Craig James <craig_james@emolecules.com> writes:
>>> Count() on Oracle and MySQL is almost instantaneous, even for very
>>> large tables. So why can't Postgres do what they do?

>>
>> AFAIK the above claim is false for Oracle. They have the same
>> transactional issues we do.

>
> My experience doesn't match this claim. When I ported my application
> from Oracle to Postgres, this was the single biggest performance
> problem. count() in Oracle was always very fast. We're not talking
> about a 20% or 50% difference, we're talking about a small fraction of
> a second (Oracle) versus a minute (Postgres) -- something like two or
> three orders of magnitude.
>
> It may be that Oracle has a way to detect when there's no transaction
> and use a faster method. If so, this was a clever optimization -- in
> my experience, that represents the vast majority of the times you want
> to use count(). It's not very useful to count the rows of a table
> that many apps are actively modifying since the result may change the
> moment your transaction completes. Most of the time when you use
> count(), it's because you're the only one modifying the table, so the
> count will be meaningful.
>
> Craig
>
>


Oracle will use a btree index on a not null set of columns to do a fast
full index scan, which can be an order of magnitude or faster compared
to a table scan. Also, Oracle can use a bitmap index (in cases where a
bitmap index isn't otherwise silly) for a bitmap fast index scan/bitmap
conversion for similar dramatic results.

For "large" tables, Oracle is not going to be as fast as MyISAM tables
in MySQL, even with these optimizations, since MyISAM doesn't have to
scan even index pages to get a count(*) answer against the full table.

Paul



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #38 (permalink)  
Old 04-19-2008, 11:45 AM
paul rivers
 
Posts: n/a
Default Re: count * performance issue

Mark Mielke wrote:
> Josh Berkus wrote:
>>>> Count() on Oracle and MySQL is almost instantaneous, even for very
>>>> large tables. So why can't Postgres do what they do?
>>>>
>>> AFAIK the above claim is false for Oracle. They have the same
>>> transactional issues we do.
>>>

>>
>> Nope. Oracle's MVCC is implemented through rollback segments, rather than
>> non-overwriting the way ours is. So Oracle can just do a count(*) on the
>> index, then check the rollback segment for any concurrent
>> update/delete/insert activity and adjust the count. This sucks if there's
>> a *lot* of concurrent activity, but in the usual case it's pretty fast

>
> I read the "almost instantaneous" against "the above claim is false" and
> "Nope.", and I am not sure from the above whether you are saying that
> Oracle keeps an up-to-date count for the index (which might make it
> instantaneous?), or whether you are saying it still has to scan the
> index - which can take time if the index is large (therefore not
> instantaneous).
>
> Cheers,
> mark
>
> --
> Mark Mielke <mark@mielke.cc>
>


Oracle scans the index pages, if the b-tree index is on non-nullable
columns, or if the bitmap index is on low-ish cardinality data.
Otherwise, it table scans. MyISAM in MySQL would be an example where a
counter is kept.





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #39 (permalink)  
Old 04-19-2008, 11:45 AM
Arjen van der Meijden
 
Posts: n/a
Default Re: count * performance issue

On 6-3-2008 16:28 Craig James wrote:
> On the one hand, I understand that Postgres has its architecture, and I
> understand the issue of row visibility, and so forth. On the other
> hand, my database is just sitting there, nothing going on, no
> connections except me, and... it takes FIFTY FIVE SECONDS to count 20
> million rows, a query that either Oracle or MySQL would answer in a
> fraction of a second. It's hard for me to believe there isn't a better
> way.


Can you explain to me how you'd fit this in a fraction of a second?

mysql> select count(*) from messages;
+----------+
| count(*) |
+----------+
| 21908505 |
+----------+
1 row in set (8 min 35.09 sec)

This is a table containing the messages on forumtopics and is therefore
relatively large. The hardware is quite beefy for a forum however (4
3Ghz cores, 16GB, 14+1 disk raid5). This table has about 20GB of data.

If I use a table that contains about the same amount of records as the
above and was before this query probably much less present in the
innodb-buffer (but also less frequently touched by other queries), we
see this:

mysql> select count(*) from messagesraw;
+----------+
| count(*) |
+----------+
| 21962804 |
+----------+
1 row in set (5 min 16.41 sec)

This table is about 12GB.

In both cases MySQL claimed to be 'Using index' with the PRIMARY index,
which for those tables is more or less identical.

Apparently the time is still table-size related, not necessarily
tuple-count related. As this shows:

mysql> select count(*) from articlestats;
+----------+
| count(*) |
+----------+
| 34467246 |
+----------+
1 row in set (54.14 sec)

that table is only 2.4GB, but contains 57% more records, although this
was on another database on a system with somewhat different specs (8
2.6Ghz cores, 16GB, 7+7+1 raid50), used a non-primary index and I have
no idea how well that index was in the system's cache prior to this query.

Repeating it makes it do that query in 6.65 seconds, repeating the
12GB-query doesn't make it any faster.

Anyway, long story short: MySQL's table-count stuff also seems
table-size related. As soon as the index it uses fits in the cache or it
doesn't have to use the primary index, it might be a different story,
but when the table(index) is too large to fit, it is quite slow.
Actually, it doesn't appear to be much faster than Postgresql's (8.2)
table-based counts. If I use a much slower machine (2 2Ghz opterons, 8GB
ddr memory, 5+1 old 15k rpm scsi disks in raid5) with a 1GB, 13M record
table wich is similar to the above articlestats, it is able to return a
count(*) in 3 seconds after priming the cache.

If you saw instantaneous results with MySQL, you have either seen the
query-cache at work or where using myisam. Or perhaps with a fast
system, you had small tuples with a nice index in a nicely primed cache.

Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #40 (permalink)  
Old 04-19-2008, 11:45 AM
Gregory Stark
 
Posts: n/a
Default Re: count * performance issue

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Well, scanning an index to get a count might be significantly faster
> than scanning the main table, but it's hardly "instantaneous". It's
> still going to take time proportional to the table size.


Hm, Mark's comment about bitmap indexes makes that not entirely true. A bitmap
index can do RLE compression which makes the relationship between the size of
the table and the time taken to scan the index more complex. In the degenerate
case where there are no concurrent updates (assuming you can determine that
quickly) it might actually be constant time.

> Unless they keep a central counter of the number of index entries;
> which would have all the same serialization penalties we've talked
> about before...


Bitmap indexes do in fact have concurrency issues -- arguably they're just a
baroque version of this central counter in this case.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:09 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com