Unix Technical Forum

percentile rank query

This is a discussion on percentile rank query within the Pgsql General forums, part of the PostgreSQL category; --> Hi all I'm trying to calculate the percentile rank for a record based on a 'score' column, e.g. a ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 06:14 PM
William Temperley
 
Posts: n/a
Default percentile rank query

Hi all

I'm trying to calculate the percentile rank for a record based on a
'score' column, e.g. a column of integers such as:
23,77,88,23,23,23,12,12,12,13,13,13
without using a stored procedure.

So,
select count(*) as frequency, score
from scoretable
group by score
order by score

Yields:

frequency score
3 12
3 13
4 23
1 77
1 88


However I'd like this result set:

frequency score runningtotal
3 12 3
3 13 6
4 23 10
1 77 11
1 88 12

Where the running total is the previous frequency added to the current
frequency. Score order is significant.

So I can then do ((runningtotal-frequency)+(frequency/2))/(select
count(*) from scoretable) to give me the percentile rank for each
score.

Is this possible in one query? I just can't figure out how to get the
running total in a result set.

Thanks very much,

Will Temperley

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 06:14 PM
Osvaldo Rosario Kussama
 
Posts: n/a
Default Re: percentile rank query

William Temperley escreveu:
> Hi all
>
> I'm trying to calculate the percentile rank for a record based on a
> 'score' column, e.g. a column of integers such as:
> 23,77,88,23,23,23,12,12,12,13,13,13
> without using a stored procedure.
>
> So,
> select count(*) as frequency, score
> from scoretable
> group by score
> order by score
>
> Yields:
>
> frequency score
> 3 12
> 3 13
> 4 23
> 1 77
> 1 88
>
>
> However I'd like this result set:
>
> frequency score runningtotal
> 3 12 3
> 3 13 6
> 4 23 10
> 1 77 11
> 1 88 12
>
> Where the running total is the previous frequency added to the current
> frequency. Score order is significant.
>
> So I can then do ((runningtotal-frequency)+(frequency/2))/(select
> count(*) from scoretable) to give me the percentile rank for each
> score.
>
> Is this possible in one query? I just can't figure out how to get the
> running total in a result set.
>



Try:

SELECT count(*) AS frequency, score,
count((SELECT * FROM scoretable st2 WHERE st2.score <= st1.score)) AS
runningtotal
FROM scoretable st1
GROUP BY score
ORDER BY score

Osvaldo

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 06:14 PM
William Temperley
 
Posts: n/a
Default Re: percentile rank query

On Thu, Apr 10, 2008 at 4:36 PM, Osvaldo Rosario Kussama
<osvaldo.kussama@gmail.com> wrote:
>
> Try:
>
> SELECT count(*) AS frequency, score,
> count((SELECT * FROM scoretable st2 WHERE st2.score <= st1.score)) AS
> runningtotal
> FROM scoretable st1
> GROUP BY score
> ORDER BY score
>
> Osvaldo
>


Thankyou Osvaldo- that worked!

Final version:

SELECT count(*) AS frequency, score,
((((SELECT count(uid) FROM scoretable st2 WHERE st2.score <=
st1.score) - count(*)) + (count(*)/2))::float/(select
count(*) from scoretable))

FROM scoretable st1
GROUP BY score
ORDER BY score

I think that's a percentile rank now.

Cheers

Will

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-11-2008, 01:53 AM
Sam Mason
 
Posts: n/a
Default Re: percentile rank query

On Thu, Apr 10, 2008 at 05:20:21PM +0100, William Temperley wrote:
> SELECT count(*) AS frequency, score,
> ((((SELECT count(uid) FROM scoretable st2 WHERE st2.score <=
> st1.score) - count(*)) + (count(*)/2))::float/(select
> count(*) from scoretable))
>
> FROM scoretable st1
> GROUP BY score
> ORDER BY score
>
> I think that's a percentile rank now.


I'm not quite sure how this is calculated but I think you may want to
be converting to a non-integral type earlier (i.e. as you're dividing
by two, not after). I also find all the subselects a bit difficult to
follow so have moved them around:

SELECT x.frequency, x.score, (x.rank + x.frequency / 2.0) / y.total AS pr
FROM (
SELECT count(*) AS frequency, score,
(SELECT count(*) FROM scoretable t WHERE t.score < s.score) AS rank
FROM scoretable s
GROUP BY score) x, (SELECT count(*) AS total FROM scoretable) y
ORDER BY score;


Sam

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:57 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com