Unix Technical Forum

Performance of count(*)

This is a discussion on Performance of count(*) within the Pgsql Performance forums, part of the PostgreSQL category; --> "Craig A. James" <cjames@modgraph-usa.com> writes: > Tom Lane wrote: >> You mean >> http://archives.postgresql.org/pgsql...0/msg00283.php >> ? I don't see ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #31 (permalink)  
Old 04-19-2008, 10:25 AM
Tom Lane
 
Posts: n/a
Default Re: Performance of count(*)

"Craig A. James" <cjames@modgraph-usa.com> writes:
> Tom Lane wrote:
>> You mean
>> http://archives.postgresql.org/pgsql...0/msg00283.php
>> ? I don't see anything there that bears on Steve's suggestion.


> Mea culpa, it's October 8, not October 18:
> http://archives.postgresql.org/pgsql...0/msg00143.php


I still don't see the relevance to Steve's suggestion.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #32 (permalink)  
Old 04-19-2008, 10:25 AM
Tino Wildenhain
 
Posts: n/a
Default Re: Performance of count(*)

Michael Stone schrieb:
> On Thu, Mar 22, 2007 at 02:24:39PM -0400, Merlin Moncure wrote:
>> Tino was saying that rather that build a complete indexing storage
>> management solution that lives outside the database, it is better to
>> do intelligent session management so that you get the simplicity if a
>> two tier client server system but the scalability of a web app.

>
> No, what he was saying was "there's this thing called a cursor". I
> thought there was enough information in the original message to indicate
> that the author knew about cursors. There are certainly pros and cons
> and nuances to different approaches, but Tino's message didn't touch on
> anything that specific.


Sure, the message thread sometimes loose history so I wasnt 100% sure
what the framework really is - although I assumed it could be a web
solution. With stand alone applications you usually have a limited
number of users connecting and they are connected during the session
so you can easily use cursors there.

> And even if you do use some kind of "intelligent session management",
> how many simultaneous cursors can postgres sanely keep track of?
> Thousands? Millions? Tens of Millions? I suspect there's a scalability
> limit in there somewhere. Luckily I don't spend much time in the web
> application space, so I don't need to know.


Depending on the application, you can even simulate above situation
with a web framework if you manage session in the web framework
with persistent connections for a limited amount of users to work
the same time (certainly not feasable for a public web shop but for
data management systems for inhouse use). In this case, cursors
would be perfect too.

In any other case I fail to see the advantage in storing "index
data" outside the database with all the roundtripping involved.

If the query is complex and rerunning it for every batch is
expensive, fetching the whole result to the application in
case of users really traversing the complete batch
(How often is that really done? I mean, who browses to an
end of a huge result set?) is costy as well w/o really
benefit.

It would be much more easy and clean imho, in this case
to really fetch the data to session and batch linked
scratch table.

If its fast or you can prepare a batch helper table
with index, you can just select the batch equival
portion of the result.

You dont need extensive session management in the
web application to scroll thru result sets in this
way. This can all be encoded in forms or links.

Regards
Tino



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #33 (permalink)  
Old 04-19-2008, 10:26 AM
Tino Wildenhain
 
Posts: n/a
Default Re: Performance of count(*)

Craig A. James schrieb:
> Tino Wildenhain wrote:
>>> You guys can correct me if I'm wrong, but the key feature that's
>>> missing from Postgres's flexible indexing is the ability to maintain
>>> state across queries. Something like this:
>>>
>>> select a, b, my_index_state() from foo where ...
>>> offset 100 limit 10 using my_index(prev_my_index_state);
>>>

>>
>> Yes, you are wrong :-) The technique is called "CURSOR"
>> if you maintain persistent connection per session
>> (e.g. stand allone application or clever pooling webapplication)

>
> That's my whole point: If relational databases had a simple mechanism
> for storing their internal state in an external application, the need
> for cursors, connection pools, and all those other tricks would be
> eliminated.


Well the cursor is exactly the simple handle to the internal
state of the relational db you are looking for.
Do you really think transferring the whole query-tree, open index
and data files to the client over the network would really improve
the situation?

> As I said earlier, relational technology was invented in an earlier era,
> and hasn't caught up with the reality of modern web apps.


There is nothing modern with todays web apps.

>> If its a naive web application you just store your session
>> in tables where you can easily maintain the scroll state
>> as well.

>
> One thing I've learned in 25 years of software development is that
> people who use my software have problems I never imagined. I've been
> the one who was naive when I said similar things about my customers, and
> was later embarrassed to learn that their problems were more complex
> than I ever imagined.


Sure it really depends on the application how the best solution
would look like but I'm quite certain, counterfaiting internal
stuff of the underlying relational database in the application
makes more problems then it solves. If you can't handle SQL,
dont use SQL, you can build web applications w/o any relational
database if you want it.

Regards
Tino Wildenhain

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #34 (permalink)  
Old 04-19-2008, 10:26 AM
Tino Wildenhain
 
Posts: n/a
Default Re: Performance of count(*)

Michael Stone schrieb:
> On Thu, Mar 22, 2007 at 06:27:32PM +0100, Tino Wildenhain wrote:
>> Craig A. James schrieb:
>>> You guys can correct me if I'm wrong, but the key feature that's
>>> missing from Postgres's flexible indexing is the ability to maintain
>>> state across queries. Something like this:
>>>
>>> select a, b, my_index_state() from foo where ...
>>> offset 100 limit 10 using my_index(prev_my_index_state);
>>>

>>
>> Yes, you are wrong :-) The technique is called "CURSOR"
>> if you maintain persistent connection per session
>> (e.g. stand allone application or clever pooling webapplication)

>
> Did you read the email before correcting it? From the part you trimmed out:
>
>> The problem is that relational databases were invented before the web
>> and its stateless applications. In the "good old days", you could
>> connect to a database and work for hours, and in that environment
>> cursors and such work well -- the RDBMS maintains the internal state
>> of the indexing system. But in a web environment, state information
>> is very difficult to maintain. There are all sorts of systems that
>> try (Enterprise Java Beans, for example), but they're very complex.


Yes, but actually this is not true. They are not so complex in this
regard. All you have to do is to look in the pg_cursor view if
your cursor is there and if not, create it in your session.
All you need to maintain is the cursor name which maps to your
session + the special query you run. This should be easy
in any web application.

> It sounds like they wrote their own middleware to handle the problem,
> which is basically what you suggested (a "clever pooling web
> application") after saying "wrong".


I read about "building index data outside postgres" which still is
the wrong approach imho.

This discussion is a bit theoretical until we see the actual problem
and the proposed solution here.

Regards
Tino

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #35 (permalink)  
Old 04-19-2008, 10:26 AM
Michael Stone
 
Posts: n/a
Default Re: Performance of count(*)

On Fri, Mar 23, 2007 at 01:01:02PM +0100, Tino Wildenhain wrote:
>This discussion is a bit theoretical until we see the actual problem
>and the proposed solution here.


It's good to see you back off a bit from your previous stance of
assuming that someone doesn't know what they're doing and that their
solution is absolutely wrong without actually knowing anything about
what they are trying to do.

Mike Stone

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #36 (permalink)  
Old 04-19-2008, 10:26 AM
Tino Wildenhain
 
Posts: n/a
Default Re: Performance of count(*)

Michael Stone schrieb:
> On Fri, Mar 23, 2007 at 01:01:02PM +0100, Tino Wildenhain wrote:
>> This discussion is a bit theoretical until we see the actual problem
>> and the proposed solution here.

>
> It's good to see you back off a bit from your previous stance of
> assuming that someone doesn't know what they're doing and that their
> solution is absolutely wrong without actually knowing anything about
> what they are trying to do.


Well I'm sure its very likely wrong :-) At least the core part of
it with the statement of "keeping index data outside postgres".

What I meant with my comment about the theoreticalness: we cannot
make educated suggestions about alternative solutions to the problem
until we know the problem and maybe the current solution in detail.

Regards
Tino

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:25 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com