Unix Technical Forum

OFFSET impact on Performance???

This is a discussion on OFFSET impact on Performance??? within the Pgsql Performance forums, part of the PostgreSQL category; --> As I read the docs, a temp table doesn't solve our problem, as it does not persist between sessions. ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #11 (permalink)  
Old 04-18-2008, 12:00 PM
Alex Turner
 
Posts: n/a
Default Re: [SQL] OFFSET impact on Performance???

As I read the docs, a temp table doesn't solve our problem, as it does
not persist between sessions. With a web page there is no guarentee
that you will receive the same connection between requests, so a temp
table doesn't solve the problem. It looks like you either have to
create a real table (which is undesirable becuase it has to be
physicaly synced, and TTFB will be very poor) or create an application
tier in between the web tier and the database tier to allow data to
persist between requests tied to a unique session id.

Looks like the solutions to this problem is not RDBMS IMHO.

Alex Turner
NetEconomist


On Wed, 26 Jan 2005 12:11:49 +0200, Andrei Bintintan <klodoma@ar-sd.net> wrote:
> The problems still stays open.
>
> The thing is that I have about 20 - 30 clients that are using that SQL query
> where the offset and limit are involved. So, I cannot create a temp table,
> because that means that I'll have to make a temp table for each session...
> which is a very bad ideea. Cursors somehow the same. In my application the
> Where conditions can be very different for each user(session) apart.
>
> The only solution that I see in the moment is to work at the query, or to
> write a more complex where function to limit the results output. So no
> replace for Offset/Limit.
>
> Best regards,
> Andy.
>
>
> ----- Original Message -----
> From: "Greg Stark" <gsstark@mit.edu>
> To: <alex@neteconomist.com>
> Cc: "Richard Huxton" <dev@archonet.com>; "Andrei Bintintan"
> <klodoma@ar-sd.net>; <pgsql-sql@postgresql.org>;
> <pgsql-performance@postgresql.org>
> Sent: Tuesday, January 25, 2005 8:28 PM
> Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???
>
>
> >
> > Alex Turner <armtuk@gmail.com> writes:
> >
> >> I am also very interesting in this very question.. Is there any way to
> >> declare a persistant cursor that remains open between pg sessions?
> >> This would be better than a temp table because you would not have to
> >> do the initial select and insert into a fresh table and incur those IO
> >> costs, which are often very heavy, and the reason why one would want
> >> to use a cursor.

> >
> > TANSTAAFL. How would such a persistent cursor be implemented if not by
> > building a temporary table somewhere behind the scenes?
> >
> > There could be some advantage if the data were stored in a temporary table
> > marked as not having to be WAL logged. Instead it could be automatically
> > cleared on every database start.
> >
> > --
> > greg
> >
> >

>
>


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-18-2008, 12:02 PM
David Brown
 
Posts: n/a
Default Re: OFFSET impact on Performance???

Although larger offsets have some effect, your real problem is the sort
(of 42693 rows).

Try:

SELECT r.id_order
FROM report r
WHERE r.id_order IN
(SELECT id
FROM orders
WHERE id_status = 6
ORDER BY 1
LIMIT 10 OFFSET 1000)
ORDER BY 1

The subquery doesn't *have* to sort because the table is already ordered
on the primary key.
You can still add a join to orders outside the subselect without
significant cost.

Incidentally, I don't know how you got the first plan - it should
include a sort as well.

Andrei Bintintan wrote:

> explain analyze
> SELECT o.id
> FROM report r
> INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
> ORDER BY 1 LIMIT 10 OFFSET 10
>
> Limit (cost=44.37..88.75 rows=10 width=4) (actual time=0.160..0.275

rows=10 loops=1)
> -> Merge Join (cost=0.00..182150.17 rows=41049 width=4) (actual

time=0.041..0.260 rows=20 loops=1)
> Merge Cond: ("outer".id_order = "inner".id)
> -> Index Scan using report_id_order_idx on report r

(cost=0.00..157550.90 rows=42862 width=4) (actual time=0.018..0.075
rows=20 loops=1)
> -> Index Scan using orders_pkey on orders o

(cost=0.00..24127.04 rows=42501 width=4) (actual time=0.013..0.078
rows=20 loops=1)
> Filter: (id_status = 6)
> Total runtime: 0.373 ms
>
> explain analyze
> SELECT o.id
> FROM report r
> INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
> ORDER BY 1 LIMIT 10 OFFSET 1000000
> Limit (cost=31216.85..31216.85 rows=1 width=4) (actual

time=1168.152..1168.152 rows=0 loops=1)
> -> Sort (cost=31114.23..31216.85 rows=41049 width=4) (actual

time=1121.769..1152.246 rows=42693 loops=1)
> Sort Key: o.id
> -> Hash Join (cost=2329.99..27684.03 rows=41049 width=4)

(actual time=441.879..925.498 rows=42693 loops=1)
> Hash Cond: ("outer".id_order = "inner".id)
> -> Seq Scan on report r (cost=0.00..23860.62

rows=42862 width=4) (actual time=38.634..366.035 rows=42864 loops=1)
> -> Hash (cost=2077.74..2077.74 rows=42501 width=4)

(actual time=140.200..140.200 rows=0 loops=1)
> -> Seq Scan on orders o (cost=0.00..2077.74

rows=42501 width=4) (actual time=0.059..96.890 rows=42693 loops=1)
> Filter: (id_status = 6)
> Total runtime: 1170.586 ms


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 04-18-2008, 12:03 PM
PFC
 
Posts: n/a
Default Re: [SQL] OFFSET impact on Performance???


> As I read the docs, a temp table doesn't solve our problem, as it does
> not persist between sessions. With a web page there is no guarentee
> that you will receive the same connection between requests, so a temp
> table doesn't solve the problem. It looks like you either have to
> create a real table (which is undesirable becuase it has to be
> physicaly synced, and TTFB will be very poor) or create an application
> tier in between the web tier and the database tier to allow data to
> persist between requests tied to a unique session id.
>
> Looks like the solutions to this problem is not RDBMS IMHO.
>
> Alex Turner
> NetEconomist


Did you miss the proposal to store arrays of the found rows id's in a
"cache" table ? Is 4 bytes per result row still too large ?

If it's still too large, you can still implement the same cache in the
filesystem !
If you want to fetch 100.000 rows containing just an integer, in my case
(psycopy) it's a lot faster to use an array aggregate. Time to get the
data in the application (including query) :

select id from temp
=> 849 ms
select int_array_aggregate(id) as ids from temp
=> 300 ms

So you can always fetch the whole wuery results (in the form of an
integer per row) and cache it in the filesystem. It won't work if you have
10 million rows though !

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:24 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com