Unix Technical Forum

Re: Oracle Analytical Functions

This is a discussion on Re: Oracle Analytical Functions within the Pgsql General forums, part of the PostgreSQL category; --> Hi Willem, Il giorno 30/gen/08, alle ore 22:15, Willem Buitendyk ha scritto: > I'm trying to replicate the use ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 12:19 AM
Enrico Sirola
 
Posts: n/a
Default Re: Oracle Analytical Functions

Hi Willem,

Il giorno 30/gen/08, alle ore 22:15, Willem Buitendyk ha scritto:

> I'm trying to replicate the use of Oracle's 'lag' and 'over
> partition by' analytical functions in my query. I have a table
> (all_client_times) such as:
>
> client_id, datetime
> 122, 2007-05-01 12:00:00
> 122, 2007-05-01 12:01:00
> 455, 2007-05-01 12:02:00
> 455, 2007-05-01 12:03:00
> 455, 2007-05-01 12:08:00
> 299, 2007-05-01 12:10:00
> 299, 2007-05-01 12:34:00
>
> and I would like to create a new view that takes the first table and
> calculates the time difference in minutes between each row so that
> the result is something like:
>
> client_id,datetime, previousTime, difftime
> 122,2007-05-01 12:01:00, 2007-05-01 12:00:00, 1
> 455,2007-05-01 12:03:00, 2007-05-01 12:02:00, 1
> 455,2007-05-01 12:08:00, 2007-05-01 12:03:00, 5
> 299,2007-05-01 12:34:00, 2007-05-01 12:10:00, 24


I'd create a "previousTime" column and manage it using a trigger.
Anyway, it depends on the time-dependancy of the table
Then you can perform "temporal" in a much easier way.
You could be interested in taking a look at the following link

http://www.cs.arizona.edu/~rts/tdbbook.pdf

Cheers,
e.


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 12:19 AM
Enrico Sirola
 
Posts: n/a
Default Re: Oracle Analytical Functions

Hi Adam,

Il giorno 31/gen/08, alle ore 16:13, Adam Rich ha scritto:

>>> I'm trying to replicate the use of Oracle's 'lag' and 'over
>>> partition by' analytical functions in my query. I have a table
>>> (all_client_times) such as:
>>> and I would like to create a new view that takes the first table and
>>> calculates the time difference in minutes between each row so that
>>> the result is something like:

>
> I thought of a another way of doing this. In my tests, it's a little
> faster, too.
>
> DROP SEQUENCE if exists seq1;
> DROP SEQUENCE if exists seq2;
> CREATE TEMPORARY SEQUENCE seq1 CACHE 1000;
> CREATE TEMPORARY SEQUENCE seq2 CACHE 1000;
>
> select a.client_id, b.datetime, a.datetime as previousTime,
> (b.datetime -
> a.datetime) as difftime from
> (select nextval('seq1') as s, client_id, datetime from
> all_client_times
> order by client_id, datetime OFFSET 0) as a
> inner join
> (select nextval('seq2') as s, client_id, datetime from
> all_client_times
> order by client_id, datetime OFFSET 0) as b
> on a.s=(b.s-1) where a.client_id=b.client_id


very interesting indeed. I guess this strategy is more interesting than
the trigger (or rule) based one when you perform much more inserts on
the table that
the select you proposed above. It sounds strange that the select
proposed
is faster than a (single) select on the same table with an additional
previousTime
column populated via trigger/rule.
Bye,
e.



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 12:19 AM
Willem Buitendyk
 
Posts: n/a
Default Re: Oracle Analytical Functions

Hey Adam,

I tried your sequence method this morning on an unsorted table and for
some reason the order by's aren't working. If I create a sorted view
(client_id, datetime) on the 'all_client_times' table and then use that
view with your sequence method all works fine. The strange thing is
that my table which has about 750K rows only ends up returning 658 rows
with your sequence method using the unsorted table. In fact, when I
tried the same thing with the lagfunc() method you wrote earlier on an
unsorted table the same thing occurs - only returning 658 rows instead
of the 750K. Again, all works well with lagfunc() if I use it on a
sorted view and I remove the order by in the function. This is not too
much of a problem as I can use a sorted view first but I don't
understand why this is happening. Perhaps this is a bug?

As well, I am finding that the lagfunc() is consistently faster than the
sequence method.

cheers,

Willem

Adam Rich wrote:
>>> I'm trying to replicate the use of Oracle's 'lag' and 'over
>>> partition by' analytical functions in my query. I have a table
>>> (all_client_times) such as:
>>> and I would like to create a new view that takes the first table and
>>> calculates the time difference in minutes between each row so that
>>> the result is something like:
>>>

>
> I thought of a another way of doing this. In my tests, it's a little
> faster, too.
>
> DROP SEQUENCE if exists seq1;
> DROP SEQUENCE if exists seq2;
> CREATE TEMPORARY SEQUENCE seq1 CACHE 1000;
> CREATE TEMPORARY SEQUENCE seq2 CACHE 1000;
>
> select a.client_id, b.datetime, a.datetime as previousTime, (b.datetime -
> a.datetime) as difftime from
> (select nextval('seq1') as s, client_id, datetime from all_client_times
> order by client_id, datetime OFFSET 0) as a
> inner join
> (select nextval('seq2') as s, client_id, datetime from all_client_times
> order by client_id, datetime OFFSET 0) as b
> on a.s=(b.s-1) where a.client_id=b.client_id
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>
>



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 12:19 AM
Willem Buitendyk
 
Posts: n/a
Default Re: Oracle Analytical Functions

The 'all_client_times' table has 753698 rows. The lagfunc() on the
sorted view returns 753576 rows and appears to work exactly as needed.
Using the function on an unsorted table returns only 686 rows and is
missing a whole lot of data. Running the count query returns 122 -
which is correct as the amount of clients that I have. Each client has
between 5 - 7K records each.

The way I see it is for each client there will be one row, namely, the
first in the series, that will not be included in the final results as
it would not have a previous time. With that in mind, if I take my
table row count as 753698 and minus the amount of clients I have, 122,
then I should get the number of results as 753576 which is correct when
I use your methods on a sorted table but which is not correct when I
use your methods on an unsorted table.

willem

Adam Rich wrote:
> Hi Willem,
>
>
>> for some reason the order by's aren't working.
>>

>
> Could you provide more details? Do you get a specific error message?
>
>
>> only returning 658 rows instead of the 750K.
>>

>
> You should not expect the same row count in both source table and
> result set. Even in your example -- you provided 8 source rows, and
> 4 result rows. You can determine the correct number of results via
> "the number of records, related to client_ids having two or more records
> in all_client_times, minus one". It may be true that you have 750k
> records but only 658 rows that satisfy this requirement.
>
> What do you get for this query?
>
> select count(*) from ( select client_id, count(*) as rows
> from all_client_times group by client_id having count(*) > 1 ) as x
>
>
> Adam
>
>
>
>
>



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-10-2008, 12:19 AM
Willem Buitendyk
 
Posts: n/a
Default Re: Oracle Analytical Functions

Here is a little test example. It seems that the second order by
condition is not working - in this case datetime.

create table arb_test (
client_id integer,
arbnum integer);

insert into arb_test values (2,1);
insert into arb_test values (2,33);
insert into arb_test values (2,6);
insert into arb_test values (2,76);
insert into arb_test values (2,111);
insert into arb_test values (2,10);
insert into arb_test values (2,55);
insert into arb_test values (7,12);
insert into arb_test values (7,6);
insert into arb_test values (7,144);
insert into arb_test values (7,63);
insert into arb_test values (7,87);
insert into arb_test values (7,24);
insert into arb_test values (7,22);
insert into arb_test values (1,14);
insert into arb_test values (1,23);
insert into arb_test values (1,67);
insert into arb_test values (1,90);
insert into arb_test values (1,2);
insert into arb_test values (1,5);
insert into arb_test values (5,8);
insert into arb_test values (5,42);
insert into arb_test values (5,77);
insert into arb_test values (5,9);
insert into arb_test values (5,89);
insert into arb_test values (5,23);
insert into arb_test values (5,11);

DROP SEQUENCE if exists seq1;
DROP SEQUENCE if exists seq2;
CREATE TEMPORARY SEQUENCE seq1 CACHE 1000;
CREATE TEMPORARY SEQUENCE seq2 CACHE 1000;

select a.client_id, b.arbnum, a.arbnum as previousarbnum, (b.arbnum -
a.arbnum) as diffarbnum from
(select nextval('seq1') as s, client_id, arbnum from arb_test
order by client_id, arbnum OFFSET 0) as a
inner join
(select nextval('seq2') as s, client_id, arbnum from arb_test
order by client_id, arbnum OFFSET 0) as b
on a.s=(b.s-1) where a.client_id=b.client_id;

--create or replace view arb_view as select * from arb_test order by
client_id, arbnum;

Here are the results:

client_id | arbnum | previousarbnum | diffarbnum
-----------+--------+----------------+------------
1 | 23 | 14 | 9
1 | 67 | 23 | 44
1 | 90 | 67 | 23
1 | 2 | 90 | -88
1 | 5 | 2 | 3
2 | 33 | 1 | 32
2 | 6 | 33 | -27
2 | 76 | 6 | 70
2 | 111 | 76 | 35
2 | 10 | 111 | -101
2 | 55 | 10 | 45
5 | 42 | 8 | 34
5 | 77 | 42 | 35
5 | 9 | 77 | -68
5 | 89 | 9 | 80
5 | 23 | 89 | -66
5 | 11 | 23 | -12
7 | 6 | 12 | -6
7 | 144 | 6 | 138
7 | 63 | 144 | -81
7 | 87 | 63 | 24
7 | 24 | 87 | -63

When I used a sorted view:

create or replace view arb_view as select * from arb_test order by
client_id, arbnum;

and redid it the results are:

client_id | arbnum | previousarbnum | diffarbnum
-----------+--------+----------------+------------
1 | 5 | 2 | 3
1 | 14 | 5 | 9
1 | 23 | 14 | 9
1 | 67 | 23 | 44
1 | 90 | 67 | 23
2 | 6 | 1 | 5
2 | 10 | 6 | 4
2 | 33 | 10 | 23
2 | 55 | 33 | 22
2 | 76 | 55 | 21
2 | 111 | 76 | 35
5 | 9 | 8 | 1
5 | 11 | 9 | 2
5 | 23 | 11 | 12
5 | 42 | 23 | 19
5 | 77 | 42 | 35
5 | 89 | 77 | 12
7 | 12 | 6 | 6
7 | 22 | 12 | 10
7 | 24 | 22 | 2
7 | 63 | 24 | 39
7 | 87 | 63 | 24
7 | 144 | 87 | 57
(23 rows)


This works the way it should.

--drop table arb_test;
--drop view arb_view;

willem
> The 'all_client_times' table has 753698 rows. The lagfunc() on the
> sorted view returns 753576 rows and appears to work exactly as
> needed. Using the function on an unsorted table returns only 686 rows
> and is missing a whole lot of data. Running the count query returns
> 122 - which is correct as the amount of clients that I have. Each
> client has between 5 - 7K records each.
>
> The way I see it is for each client there will be one row, namely, the
> first in the series, that will not be included in the final results as
> it would not have a previous time. With that in mind, if I take my
> table row count as 753698 and minus the amount of clients I have,
> 122, then I should get the number of results as 753576 which is
> correct when I use your methods on a sorted table but which is not
> correct when I use your methods on an unsorted table.
>
> willem
>
> Adam Rich wrote:
>> Hi Willem,
>>
>>
>>> for some reason the order by's aren't working.
>>>

>>
>> Could you provide more details? Do you get a specific error message?
>>
>>
>>> only returning 658 rows instead of the 750K.
>>>

>>
>> You should not expect the same row count in both source table and
>> result set. Even in your example -- you provided 8 source rows, and
>> 4 result rows. You can determine the correct number of results via
>> "the number of records, related to client_ids having two or more
>> records in all_client_times, minus one". It may be true that you
>> have 750k
>> records but only 658 rows that satisfy this requirement.
>>
>> What do you get for this query?
>>
>> select count(*) from ( select client_id, count(*) as rows from
>> all_client_times group by client_id having count(*) > 1 ) as x
>>
>>
>> Adam
>>
>>
>>
>>
>>

>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org/
>



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-10-2008, 12:19 AM
=?UTF-8?Q?Rodrigo_E._De_Le=C3=B3n_Plicet?=
 
Posts: n/a
Default Re: Oracle Analytical Functions

On Jan 31, 2008 8:49 AM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
> I'd create a "previousTime" column and manage it using a trigger.
> Anyway, it depends on the time-dependancy of the table
> Then you can perform "temporal" in a much easier way.
> You could be interested in taking a look at the following link
>
> http://www.cs.arizona.edu/~rts/tdbbook.pdf


+1

That book provides many sane examples for handling temporal data. Recommended.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:55 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com