Unix Technical Forum

Experiences with extensibility

This is a discussion on Experiences with extensibility within the Pgsql General forums, part of the PostgreSQL category; --> On Wed, Jan 09, 2008 at 12:38:43PM -0700, Guido Neitzer wrote: > >>Easy multi-master clustering with just two machines. ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #51 (permalink)  
Old 04-10-2008, 12:17 AM
Andrew Sullivan
 
Posts: n/a
Default Re: Experiences with extensibility

On Wed, Jan 09, 2008 at 12:38:43PM -0700, Guido Neitzer wrote:
> >>Easy multi-master clustering with just two machines.


> As I said: FrontBase is offering that.


It looks like a two-phase commit answer, if I'm reading correctly. You can
do this today on many systems (including Postgres), but the classical
problem with 2PC clustering is that it is very slow compared to single-node
systems. Anyway, FrontBase looks interesting. Thanks for the reference.

A


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #52 (permalink)  
Old 04-10-2008, 12:17 AM
Sim Zacks
 
Posts: n/a
Default Re: count(*) and bad design was: Experiences with extensibility

It would be an administrative nightmare unless you had very few where clauses
that you were tracking.

Instead of using a trigger, you could use Listen/Notify to call a daemon on the
server to run the procedure and then you have no insert/delete overhead.
Or you could call the function on a cron job every 10 minutes...

Chris Browne wrote:
> zb@cybertec.at (Zoltan Boszormenyi) writes:
>> which will be fast and depending on the initial value of COUNT(*)
>> it will be very close to the exact figure. You can extend the example
>> with more columns if you know your SELECT COUNT(*) ... WHERE
>> conditions in advance but this way you have to keep several administrative
>> tables for different monitored tables. Again, this trades some disk space
>> and INSERT/DELETE operation speed on the monitored tables for
>> quicker count.

>
> Actually, this approach will be Really Terrible for any cases where
> multiple connections are adding/deleting tuples concurrently, as it
> will force ALL updates to serialize behind the update to the central
> table.
>
> Occasionally, you'll have something even worse, namely a deadlock,
> where two or more of the updates fighting over the single summary
> tuple fall into a bad state, and one of them is forced to give up,
> potentially rolling back its whole transaction.
>
> [Waving hands for a moment]
>
> What I would do *instead* would be for each INSERT to add a tuple with
> a count of 1, and for each DELETE to add a tuple with count of -1, and
> then to periodically have a single process walk through to summarize
> the table. There may be a further optimization to be had by doing a
> per-statement trigger that counts the number of INSERTs/DELETEs done,
> so that inserting 30 tuples (in the table being tracked) leads to
> adding a single tuple with count of 30 in the summary table.
>
> That changes the tradeoffs, again...
>
> - Since each INSERT/DELETE is simply doing an INSERT into the summary
> table, the ongoing activity is *never* blocking anything
>
> - You get the count by requesting
> SELECT SUM(rec_cnt) as record_count from rec_count where tablename = 'foo';
>
> - Note that the query is MVCC-consistent with the table!
>
> - Once in a while, you'll want to run a single procedure that, for
> each table, deletes all the existing records, and replaces them
> with a single one consisting of the sum of the individual values.
>
> - You can re-sync a table by running the query:
> begin;
> delete from record_count where tablename = 'foo';
> insert into record_count(tablename, rec_cnt) select 'foo', (select count(*) from foo);
> commit;

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #53 (permalink)  
Old 04-10-2008, 12:17 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: count(*) and bad design was: Experiences with extensibility

On Wed, Jan 09, 2008 at 03:28:04PM +0100, Ivan Sergio Borgonovo wrote:
> Let me consider an everyday use where count() looks as the most
> obvious solution: paging.
>
> I search trough a table and I need to know which is the last page.


There's an often overlooked solution to this. Let's say your count
returns 100,000 records, are you going to give them link to 1000
different pages? Not really. Probably about 10, so really your count
only is interested in an exact result less than 100, or that's it's
more than 100.

By placing the where clause in a subselect with a limit of 101 and a
count() around it you have an upper bound on the cost of the count, the
result 101 simply means "more than 100". Depending on the data you
might just put the limit on the query that fetches the data and using
everything after the 10th record to determine your count on the client
side and skip the extra round trip.

> Is there a way to count based on indexes without taking into account
> deleted rows so to "count" faster?


I've also often pondered whether the SQL standard support for table
sampling would be good here. Sure, you still need to check visibility,
but if you specify that the DB only needs to check 10% of the tuples
and to extrapolate the results from that, you could get a fast yet
reasonably accurate result. IIRC patches for this have been floated on
the lists.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
> -- John F Kennedy


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFHhfo2IB7bNG8LQkwRAviuAJ9nbGeFCOdr4m8QGq1AjD 70Y3HE7QCdERPN
g/ssQLq3Dsyqz1Z6vjqaumQ=
=JOy1
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #54 (permalink)  
Old 04-10-2008, 12:17 AM
Harald Fuchs
 
Posts: n/a
Default Re: count(*) and bad design was: Experiences with extensibility

In article <60ejcqy6j0.fsf@dba2.int.libertyrms.com>,
Chris Browne <cbbrowne@acm.org> writes:

> There may be a further optimization to be had by doing a
> per-statement trigger that counts the number of INSERTs/DELETEs done,
> so that inserting 30 tuples (in the table being tracked) leads to
> adding a single tuple with count of 30 in the summary table.


This would be nice, but at least the 8.2.4 docs say

Statement-level triggers do not currently have any way to examine
the individual row(s) modified by the statement.

Is this restriction removed in a later version?


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #55 (permalink)  
Old 04-10-2008, 12:17 AM
Alvaro Herrera
 
Posts: n/a
Default Re: count(*) and bad design was: Experiences withextensibility

Harald Fuchs wrote:
> In article <60ejcqy6j0.fsf@dba2.int.libertyrms.com>,
> Chris Browne <cbbrowne@acm.org> writes:
>
> > There may be a further optimization to be had by doing a
> > per-statement trigger that counts the number of INSERTs/DELETEs done,
> > so that inserting 30 tuples (in the table being tracked) leads to
> > adding a single tuple with count of 30 in the summary table.

>
> This would be nice, but at least the 8.2.4 docs say
>
> Statement-level triggers do not currently have any way to examine
> the individual row(s) modified by the statement.
>
> Is this restriction removed in a later version?


Nope.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #56 (permalink)  
Old 04-10-2008, 12:17 AM
Guido Neitzer
 
Posts: n/a
Default Re: Experiences with extensibility

On 09.01.2008, at 13:51, Martin wrote:

> I've been working with FrontBase a lot lately and I wouldn't say
> anything about it qualifies as "incredibly easy" and reliable it
> is not.


We had never ever any reliability issues with FrontBase as long as
didn't try to insert garbage. It really doesn't like that.

> Performance of FrontBase is just plain terrible. One of
> our reports takes 9 minutes on FrontBase and 10 seconds on
> Postgres.


As I said: depends on what you are doing.

Which version did you use where you got that terrible performance? The
latest one? There was a bug in there query planner in an old version
that totally killed some queries.

There is also the or-query problem (not sure whether that one is still
in there): if your report does something like "select bar from foo
where a = 1 or b = 1;" it didn't use indexes on a or b which is
terrible. If you have that, use separate selects combined with a union
if you can.

In my cases it was never as fast as PostgreSQL as soon as there are
more than two tables involved. But except a couple of cases where I
had to use PostgreSQL for performance reasons, I was okay with it. But
performance is not everything.

PostgreSQL is for most cases the better product, but for some cases,
FrontBase is. Depends on what you're doing how much which side has ...

> Then there's the documentation issue...


PostgreSQL documentation is better, but what is your issue exactly?
The main problem I have with the FrontBase documentation is that it
mostly gives no useful examples.

As this is off-topic here: we can transfer that either to the
FrontBase list or to private mail if you like.

cug

--
http://www.event-s.net


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #57 (permalink)  
Old 04-10-2008, 12:18 AM
Ron Mayer
 
Posts: n/a
Default Re: count(*) and bad design was: Experiences with extensibility

Chris Browne wrote:
> zb@cybertec.at (Zoltan Boszormenyi) writes:
>> SELECT COUNT(*)

> [Waving hands for a moment]


Would what Chris describes below be a good candidate for
a pgfoundry project that has functions that'll create the
triggers for you? (yeah, I might be volunteering, but would
undoubtedly need help)

Then when people ask it again and again the response could
be "install http://pgfoundry.org/fast_count_star" rather
than "go read the mailing list archives and roll
your own - but remember to worry about deadlock and
contention on the table containing the counts".


> What I would do *instead* would be for each INSERT to add a tuple with
> a count of 1, and for each DELETE to add a tuple with count of -1, and
> then to periodically have a single process walk through to summarize
> the table. There may be a further optimization to be had by doing a
> per-statement trigger that counts the number of INSERTs/DELETEs done,
> so that inserting 30 tuples (in the table being tracked) leads to
> adding a single tuple with count of 30 in the summary table.
>
> That changes the tradeoffs, again...
>
> - Since each INSERT/DELETE is simply doing an INSERT into the summary
> table, the ongoing activity is *never* blocking anything
>
> - You get the count by requesting
> SELECT SUM(rec_cnt) as record_count from rec_count where tablename = 'foo';
>
> - Note that the query is MVCC-consistent with the table!
>
> - Once in a while, you'll want to run a single procedure that, for
> each table, deletes all the existing records, and replaces them
> with a single one consisting of the sum of the individual values.
>
> - You can re-sync a table by running the query:
> begin;
> delete from record_count where tablename = 'foo';
> insert into record_count(tablename, rec_cnt) select 'foo', (select count(*) from foo);
> commit;

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #58 (permalink)  
Old 04-10-2008, 12:18 AM
Alban Hertroys
 
Posts: n/a
Default Re: count(*) and bad design was: Experiences with extensibility

On Jan 9, 2008, at 8:07 PM, Scott Marlowe wrote:
> I could see a use for an approximate count(*) with where clause, just
> like I could see a use for the ability to retrieve random rows from a
> table without using order by random() on it. And those are both
> things that would require some form of hacking in the db that I'm
> certainly not capable of pulling off...


About returning random rows... I've successfully applied a scrolling
cursor for that.

You need to scroll to the last row to find the size of the result
set, but after that it's pretty easy to return random rows by
scrolling to them (and marking them 'read' in some way to prevent
accidentally returning the same row again).

It does require some specific application code though - doing it
server side would mean to pass the query as a function argument
(which still requires unnatural SQL statements in your application
code) or write a function for each query (*cough*).

Performance was quite adequate (a few 100 ms) for a query returning
random 5 rows from 3 joined tables or more, some of which had a few
100k rows. Calculating random() for each record in the result set (to
sort on) was taking much longer. That was on a dual 64-bit opteron
with 4GB RAM, iirc.

Of course a built-in statement would be preferable, I just felt like
pointing out that order by random() isn't necessarily the best
alternative

Regards,
Alban Hertroys.

!DSPAM:737,478cb43e9496078213597!



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #59 (permalink)  
Old 04-10-2008, 12:18 AM
Ivan Sergio Borgonovo
 
Posts: n/a
Default Re: count(*) and bad design was: Experiences withextensibility

On Tue, 15 Jan 2008 14:43:35 +0100
Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:

> You need to scroll to the last row to find the size of the result
> set, but after that it's pretty easy to return random rows by
> scrolling to them (and marking them 'read' in some way to prevent
> accidentally returning the same row again).


Could you post a snippet of code or something giving a more detailed
idea of it?

BTW since cursors support offset if you're not interested if the
order of the retrieved rows is random too you don't even have to
remember which one you read I think.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #60 (permalink)  
Old 04-10-2008, 12:18 AM
Alban Hertroys
 
Posts: n/a
Default Re: count(*) and bad design was: Experiences with extensibility

On Jan 15, 2008, at 3:03 PM, Ivan Sergio Borgonovo wrote:

> On Tue, 15 Jan 2008 14:43:35 +0100
> Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:
>
>> You need to scroll to the last row to find the size of the result
>> set, but after that it's pretty easy to return random rows by
>> scrolling to them (and marking them 'read' in some way to prevent
>> accidentally returning the same row again).

>
> Could you post a snippet of code or something giving a more detailed
> idea of it?
>
> BTW since cursors support offset if you're not interested if the
> order of the retrieved rows is random too you don't even have to
> remember which one you read I think.


I posted it on this list a while ago when I came up with this
solution. I had some trouble finding my old post in the pgsql-general
archives though - I could find the thread, just not my final posting,
and searching didn't even turn up the thread.

I did find it here: http://www.mail-archive.com/pgsql-
general@postgresql.org/msg103670.html
The thread contains several other approaches to the problem, it
really depends on your problem domain which one fits your bill.

I think the function in my original posting could do with clearer
comments though, so here's the function again:

/*
* Return $limit random rows from the result set of SQL query $query
*/
function randomSet(
$query, // The query to execute
$limit // The (max) number of random rows required
) {
// SQL to declare the cursor
query("DECLARE _cur SCROLL CURSOR WITHOUT HOLD FOR $query");

/* Get the range for random(1, n)
*
* Determined by scrolling the cursor to the last row.
* Equivalent to select count(*), but without a separate query.
*/
query("MOVE FORWARD ALL IN _cur");
$count = pg_affected_rows();

$uniques = array(); // A list of used cursor offsets
$resultSet = array();

// Fetch random rows until we have enough or there are no more
while ($limit > 0 && count($uniques) < $count) {
// Determine random scroll offset
$idx = random(1, $count);

// Skip records with an index we already used
if (in_array($idx, $uniques))
continue;

//Fetch the random row
$record = query("FETCH ABSOLUTE $idx FROM _cur");

// Add the row offset to the list of used offsets
$uniques[] = $idx;

$resultSet[] = $record;
$limit--;
}

// query
query("CLOSE _cur");

return $resultSet;
}



Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,478f32e59497683469944!



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:49 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com