Unix Technical Forum

How pull

This is a discussion on How pull within the pgsql Sql forums, part of the PostgreSQL category; --> Hello, My SQL skills are limited and I'm struggling with a query where I want to return a single ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Sql

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 05:55 PM
Matt Magoffin
 
Posts: n/a
Default How pull

Hello,

My SQL skills are limited and I'm struggling with a query where I want to
return a single item of an aggregate join. The query looks like this:

select
(case
when agg.avg_rating is null then 0.0
when agg.avg_rating < 0.75 then 0.5
when agg.avg_rating < 1.25 then 1.0
when agg.avg_rating < 1.75 then 1.5
when agg.avg_rating < 2.25 then 2.0
when agg.avg_rating < 2.75 then 2.5
when agg.avg_rating < 3.25 then 3.0
when agg.avg_rating < 3.75 then 3.5
when agg.avg_rating < 4.25 then 4.0
when agg.avg_rating < 4.75 then 4.5
else 5.0
end) as avg_rating,
count(item.itemid) as item_count
from media_item item
inner join (
select rating.mediaitem_userrating_hjid as ritemid,
avg(rating.rating) as avg_rating
from media_item_rating rating, media_item item
where rating.mediaitem_userrating_hjid = item.itemid
group by rating.mediaitem_userrating_hjid
) as agg
on item.itemid = agg.ritemid
group by avg_rating
order by avg_rating desc

and a sample of results is this:

avg_rating | item_count
------------+------------
5.0 | 21
4.0 | 33
3.0 | 13
2.0 | 4
1.0 | 1

What I want as well is the ID of the item (and possibly it's avg_rating
value) from the "agg" join with the highest avg_rating for each output
row... something like this

avg_rating | item_count | item_id | item_rating
------------+-----------------------------------
5.0 | 21 | 109890 | 4.9
4.0 | 33 | 89201 | 4.1
3.0 | 13 | 119029 | 2.8
2.0 | 4 | 182999 | 2.2
1.0 | 1 | 1929 | 1.0

So the intention in this example is that item #109890 has an average
rating of 4.9 and that is the highest rating within the > 4.75 rating
group.

If anyone had any tips I'd greatly appreciate it.

-- m@

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 05:55 PM
Matt Magoffin
 
Posts: n/a
Default Re: How pull

> On 23/09/2007, Matt Magoffin <postgresql.org@msqr.us> wrote:
>> Hello,
>>
>> My SQL skills are limited and I'm struggling with a query where I want
>> to
>> return a single item of an aggregate join. The query looks like this:
>>
>> select
>> (case
>> when agg.avg_rating is null then 0.0
>> when agg.avg_rating < 0.75 then 0.5
>> when agg.avg_rating < 1.25 then 1.0
>> when agg.avg_rating < 1.75 then 1.5
>> when agg.avg_rating < 2.25 then 2.0
>> when agg.avg_rating < 2.75 then 2.5
>> when agg.avg_rating < 3.25 then 3.0
>> when agg.avg_rating < 3.75 then 3.5
>> when agg.avg_rating < 4.25 then 4.0
>> when agg.avg_rating < 4.75 then 4.5
>> else 5.0
>> end) as avg_rating,
>> count(item.itemid) as item_count
>> from media_item item
>> inner join (
>> select rating.mediaitem_userrating_hjid as ritemid,
>> avg(rating.rating) as avg_rating
>> from media_item_rating rating, media_item item
>> where rating.mediaitem_userrating_hjid = item.itemid
>> group by rating.mediaitem_userrating_hjid
>> ) as agg
>> on item.itemid = agg.ritemid
>> group by avg_rating
>> order by avg_rating desc
>>
>> and a sample of results is this:
>>
>> avg_rating | item_count
>> ------------+------------
>> 5.0 | 21
>> 4.0 | 33
>> 3.0 | 13
>> 2.0 | 4
>> 1.0 | 1
>>
>> What I want as well is the ID of the item (and possibly it's avg_rating
>> value) from the "agg" join with the highest avg_rating for each output
>> row... something like this
>>
>> avg_rating | item_count | item_id | item_rating
>> ------------+-----------------------------------
>> 5.0 | 21 | 109890 | 4.9
>> 4.0 | 33 | 89201 | 4.1
>> 3.0 | 13 | 119029 | 2.8
>> 2.0 | 4 | 182999 | 2.2
>> 1.0 | 1 | 1929 | 1.0
>>
>> So the intention in this example is that item #109890 has an average
>> rating of 4.9 and that is the highest rating within the > 4.75 rating
>> group.
>>
>> If anyone had any tips I'd greatly appreciate it.
>>

>
> create ranking function to make queries look simpler:
>
> create or replace function ranking_group(numeric) returns numeric as
> $$ select case
> when $1 < 0.3456 then 'quite small'
> ...
> end $$ language sql immutable;
> (I'd make it STRICT, but you allow null rankings)
>
> 1st way: DISTINCT ON + subquery
>
> select *, (select count(*) from rating where rating_group(rating) =
> subq.rating_group ) as rating_group_size
> from (
> select distinct on (rating_group)
> rating_group(r.rating),
> r.item_id as best_rated_item_id,
> r.rating as best_rating
> from rating r
> order by rating_group desc, r.rating desc
> ) subq;
>
>
> 2nd way (faster - actually 2 x faster)
> using FIRST aggregate to calculate all in one pass
>
> create function first(numeric,numeric) returns numeric as 'select $1'
> language sql immutable strict;
> create function first(integer,integer) returns integer as 'select $1'
> language sql immutable strict;
> create aggregate first( integer ) ( SFUNC = first, STYPE = integer );
> create aggregate first ( numeric ) ( SFUNC = first, STYPE = numeric );
>
> select
> rating_group(rating),
> count(*) as num_ratings,
> first(item_id) as best_rated_item_id,
> first(rating) as best_rating
> from ( select * from rating order by rating desc ) ordered_ratings
> group by rating_group
> order by rating_group desc;
>
>
> note: if you can, get rid of null ratings. what are they supposed to
> mean? they make things a bit more complicated.



Thanks very much for the helpful tips, Filip, you understand perfectly
what I'm trying to do. For those nulls, they are not actually needed with
the query I gave, you are right. I have another version of this query that
I started with that uses a left outer join instead of the inner join to
produce another grouping for all items that have no rating set (thus the
null value).

An aggregate function like first() is precisely what I need, however I am
looking for a fairly portable solution I can also use in other databases
(Derby and MySQL at a minimum). Unfortunately I don't think either of
these suggestions will work for those systems because they don't support
DISTINCT ON or custom SQL aggregate functions as nicely as Postgres does.

Also, the "avg_rating" I'm using is itself an aggregate value, but I don't
see how your second solution handles this? The rating group values come
from the avg() calculation on a rating table, like

ratinguser | rating | item_id | idx
------------+--------+---------+-----
1098 | 2 | 11016 | 1
2 | 3 | 11016 | 0
43 | 5 | 9021 | 2
2 | 4 | 9021 | 1
1098 | 2 | 9021 | 0

So here the rating for item 11016 is 2.5 and for 9021 it is 3.66.

In addition, this is not my entire SQL statement, because I need another
join to narrow the item results first... so the full SQL I currently have
is like

select
(case
when agg.avg_rating < 0.75 then 0.5
when agg.avg_rating < 1.25 then 1.0
when agg.avg_rating < 1.75 then 1.5
when agg.avg_rating < 2.25 then 2.0
when agg.avg_rating < 2.75 then 2.5
when agg.avg_rating < 3.25 then 3.0
when agg.avg_rating < 3.75 then 3.5
when agg.avg_rating < 4.25 then 4.0
when agg.avg_rating < 4.75 then 4.5
else 5.0
end) as avg_rating,
count(item.itemid) as item_count
from media_item item
inner join (
select mi.itemid as sitemid
from media_item mi
inner join album_item ai on mi.itemid = ai.mediaitem_item_hjchildid
inner join album a on ai.album_item_hjid = a.albumid
where a.owner_ = 2 and a.allowanonymous = TRUE and a.allowbrowse = TRUE
) as shared
on item.itemid = shared.sitemid
left outer join ( -- change to inner for only items with a rating
select rating.mediaitem_userrating_hjid as ritemid,
avg(rating.rating) as avg_rating
from media_item_rating rating, media_item item
where rating.mediaitem_userrating_hjid = item.itemid
group by rating.mediaitem_userrating_hjid
) as agg
on item.itemid = agg.ritemid
group by avg_rating
order by avg_rating desc

Thanks again for the helpful tips.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:05 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com