Unix Technical Forum

Slow join query

This is a discussion on Slow join query within the Pgsql Performance forums, part of the PostgreSQL category; --> I have a query that runs about 30-50 seconds.  The query is a join between 2 tables (customer and ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:06 AM
Tom Tamulewicz
 
Posts: n/a
Default Slow join query





I have a query that runs about 30-50 seconds.  The query is a join between 2 tables (customer and address), each table with about 400,000 rows.  My customer table has fields like first_name and last_name where the address table has city, state, etc.  I'm using "like" in most of the query columns, which all have indexes.  The actual query is:

SELECT p.party_id, p.first_name, p.last_name, pli.address1, pli.city, pli.state FROM customer as p JOIN address as pli ON ( p.party_id = pli.party_id&nbsp WHERE ( p.void_flag IS NULL OR p.void_flag = false )  AND  (first_name like 'B%') AND (last_name like 'S%') AND (pli.state like 'M%') AND (pli.city like 'AL%') ORDER BY last_name, first_name LIMIT 51




When the query runs, the hard drive lights up for the duration.  (I'm confused by this as 'top' reports only 24k of swap in use).  My SUSE 9 test machine has 512 Meg of RAM with 300 Meg used by a Java app.  Postmaster reports 56 Meg under "top" and has a 52 Meg segment under "ipcs".  I've played with the cache size, shared buffers, and OS shmmax with little change in the query performance.



Q: Would this query benefit from using a view between these two tables?



Q: Any idea why the reported swap usage is so low, yet the query slams the drive?  Is postgres not caching this data?  If I run the query with the same arguments, it comes right back the second time.  If I change the args and re-run, it goes back to the hard drive and takes 30-50 seconds. 



Suggestions very welcome,



Tom



 




Who's that on the Red Carpet? Play & win glamorous prizes.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:06 AM
Michael Glaesemann
 
Posts: n/a
Default Re: Slow join query


On Jun 22, 2007, at 13:32 , Tom Tamulewicz wrote:
> ( p.void_flag IS NULL OR p.void_flag = false )

Just a note: you can rewrite (a IS NULL or a = false) as (a IS NOT
TRUE). Shouldn't affect performance, but might make your query easier
to read.

What's the EXPLAIN ANALYZE output for this query?
> When the query runs, the hard drive lights up for the duration.
> (I'm confused by this as 'top' reports only 24k of swap in use).
> My SUSE 9 test machine has 512 Meg of RAM with 300 Meg used by a
> Java app. Postmaster reports 56 Meg under "top" and has a 52 Meg
> segment under "ipcs". I've played with the cache size, shared
> buffers, and OS shmmax with little change in the query performance.
>
> Q: Would this query benefit from using a view between these two
> tables?

I doubt it, as views are just pre-parsed queries: no data is
materialized for the view.
> Q: Any idea why the reported swap usage is so low, yet the query
> slams the drive? Is postgres not caching this data? If I run the
> query with the same arguments, it comes right back the second
> time. If I change the args and re-run, it goes back to the hard
> drive and takes 30-50 seconds.

How much is cached depends on shared_buffers, I believe. If the
result is still cached, that'd explain why running the query with the
same arguments returns so quickly. You might see some improvement
using a prepared query, as the server shouldn't have to reparse and
replan the query. Of course, if you change the arguments, it can't
use the result that's cached from the previous run.

Take this all with an appropriate amount of salt. I'm learning about
this, too.

Michael Glaesemann
grzm seespotcode net



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:06 AM
Tom Tamulewicz
 
Posts: n/a
Default Re: Slow join query





The explain is as follows...



        &n bsp;       &nbs p;                 &n bsp;       &nbs p;                 &n bsp;       &nbs p;                 &n bsp;       &nbs p;                QUERY PLAN       &nbs p;                 &n bsp;       &nbs p;      
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..96.48 rows=1 width=2450)
   ->  Nested Loop  (cost=0.00..96.48 rows=1 width=2450)
       &nbsp ; ->  Index Scan using idx_last_name on customer p  (cost=0.00..50.22 rows=1 width=1209)
        &n bsp;      Index Cond: (((last_name)::text >= 'S'::character varying) AND ((last_name)::text < 'T'::character varying) AND ((first_name)::text >= 'B'::character varying) AND ((first_name)::text < 'C'::character varying))
        &n bsp;      Filter: (((void_flag IS NULL) OR (void_flag = false)) AND ((first_name)::text ~~ 'B%'::text) AND ((last_name)::text ~~ 'S%'::text))
       &nbsp ; ->  Index Scan using address_pkey on address pli  (cost=0.00..46.23 rows=1 width=1257)
        &n bsp;      Index Cond: (("outer".party_id = pli.party_id))
        &n bsp;      Filter: (((state)::text ~~ 'M%'::text) AND ((city)::text ~~ 'AL%'::text))









 

From: Michael Glaesemann <grzm@seespotcode.net>
To: Tom Tamulewicz <tomjt7@hotmail.com>
CC: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow join query
Date: Fri, 22 Jun 2007 14:51:32 -0500
>
>On Jun 22, 2007, at 13:32 , Tom Tamulewicz wrote:
>>( p.void_flag IS NULL OR p.void_flag = false )
>Just a note: you can rewrite (a IS NULL or a = false) as (a IS NOT
>TRUE). Shouldn't affect performance, but might make your query
>easier to read.
>
>What's the EXPLAIN ANALYZE output for this query?
>>When the query runs, the hard drive lights up for the duration.
>>(I'm confused by this as 'top' reports only 24k of swap in use).
>>My SUSE 9 test machine has 512 Meg of RAM with 300 Meg used by a
>>Java app. Postmaster reports 56 Meg under "top" and has a 52 Meg
>>segment under "ipcs". I've played with the cache size, shared
>>buffers, and OS shmmax with little change in the query performance.
>>
>>Q: Would this query benefit from using a view between these two
>>tables?
>I doubt it, as views are just pre-parsed queries: no data is
>materialized for the view.
>>Q: Any idea why the reported swap usage is so low, yet the query
>>slams the drive? Is postgres not caching this data? If I run the
>>query with the same arguments, it comes right back the second
>>time. If I change the args and re-run, it goes back to the hard
>>drive and takes 30-50 seconds.
>How much is cached depends on shared_buffers, I believe. If the
>result is still cached, that'd explain why running the query with
>the same arguments returns so quickly. You might see some
>improvement using a prepared query, as the server shouldn't have to
>reparse and replan the query. Of course, if you change the
>arguments, it can't use the result that's cached from the previous
>run.
>
>Take this all with an appropriate amount of salt. I'm learning about
> this, too.
>
>Michael Glaesemann
>grzm seespotcode net
>
>
>
>---------------------------(end of
>broadcast)---------------------------
>TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do
>not
> match



Picture this – share your photos and you could win big!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 11:06 AM
Michael Glaesemann
 
Posts: n/a
Default Re: Slow join query

[Please don't top post as it makes the discussion more difficult to
follow.]

On Jun 22, 2007, at 16:25 , Tom Tamulewicz wrote:
> The explain is as follows...

EXPLAIN ANALYZE, please. (And for convenience, it helps if you
include the query )

Michael Glaesemann
grzm seespotcode net



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 11:06 AM
Tom Tamulewicz
 
Posts: n/a
Default Re: Slow join query





 









From: Michael Glaesemann <grzm@seespotcode.net>
To: Tom Tamulewicz <tomjt7@hotmail.com>
CC: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow join query
Date: Fri, 22 Jun 2007 14:51:32 -0500
>
>On Jun 22, 2007, at 13:32 , Tom Tamulewicz wrote:
>>( p.void_flag IS NULL OR p.void_flag = false )
>Just a note: you can rewrite (a IS NULL or a = false) as (a IS NOT
>TRUE). Shouldn't affect performance, but might make your query
>easier to read.
>
>What's the EXPLAIN ANALYZE output for this query?
>>When the query runs, the hard drive lights up for the duration.
>>(I'm confused by this as 'top' reports only 24k of swap in use).
>>My SUSE 9 test machine has 512 Meg of RAM with 300 Meg used by a
>>Java app. Postmaster reports 56 Meg under "top" and has a 52 Meg
>>segment under "ipcs". I've played with the cache size, shared
>>buffers, and OS shmmax with little change in the query performance.
>>
>>Q: Would this query benefit from using a view between these two
>>tables?
>I doubt it, as views are just pre-parsed queries: no data is
>materialized for the view.
>>Q: Any idea why the reported swap usage is so low, yet the query
>>slams the drive? Is postgres not caching this data? If I run the
>>query with the same arguments, it comes right back the second
>>time. If I change the args and re-run, it goes back to the hard
>>drive and takes 30-50 seconds.
>How much is cached depends on shared_buffers, I believe. If the
>result is still cached, that'd explain why running the query with
>the same arguments returns so quickly. You might see some
>improvement using a prepared query, as the server shouldn't have to
>reparse and replan the query. Of course, if you change the
>arguments, it can't use the result that's cached from the previous
>run.
>
>Take this all with an appropriate amount of salt. I'm learning about
> this, too.
>
>Michael Glaesemann
>grzm seespotcode net
>
>
>
>---------------------------(end of
>broadcast)---------------------------
>TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do
>not
> match




 





SELECT p.party_id, p.first_name, p.last_name, pli.address1, pli.city, pli.state FROM customer as p JOIN address as pli ON ( p.party_id = pli.party_id&nbsp WHERE ( p.void_flag IS NULL OR p.void_flag = false )  AND  (first_name like 'B%') AND (last_name like 'S%') AND (pli.state like 'M%') AND (pli.city like 'AL%') ORDER BY last_name, first_name LIMIT 51  



        &n bsp;       &nbs p;                 &n bsp;       &nbs p;                 &n bsp;       &nbs p;                 &n bsp;       &nbs p;       &nbsp ; QUERY PLAN       &nbs p;                 &n bsp;       &nbs p;      
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..96.48 rows=1 width=2450) (actual time=13459.814..13459.814 rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..96.48 rows=1 width=2450) (actual time=13459.804..13459.804 rows=0 loops=1)
       &nbsp ; ->  Index Scan using idx_last_name on customer p  (cost=0.00..50.22 rows=1 width=1209) (actual time=57.812..13048.524 rows=2474 loops=1)
        &n bsp;      Index Cond: (((last_name)::text >= 'S'::character varying) AND ((last_name)::text < 'T'::character varying) AND ((first_name)::text >= 'B'::character varying) AND ((first_name)::text < 'C'::character varying))
        &n bsp;      Filter: (((void_flag IS NULL) OR (void_flag = false)) AND ((first_name)::text ~~ 'B%'::text) AND ((last_name)::text ~~ 'S%'::text))
       &nbsp ; ->  Index Scan using address_pkey on address pli  (cost=0.00..46.23 rows=1 width=1257) (actual time=0.149..0.149 rows=0 loops=2474)
        &n bsp;      Index Cond: (("outer".party_id = pli.party_id))
        &n bsp;      Filter: (((state)::text ~~ 'M%'::text) AND ((city)::text ~~ 'AL%'::text))
 Total runtime: 13460.292 ms





Picture this – share your photos and you could win big!





Get a preview of Live Earth, the hottest event this summer - only on MSN
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 11:06 AM
Scott Marlowe
 
Posts: n/a
Default Re: Slow join query

Tom Tamulewicz wrote:
>
>
>
> ------------------------------------------------------------------------
>
> SELECT p.party_id, p.first_name, p.last_name, pli.address1,
> pli.city, pli.state FROM customer as p JOIN address as pli ON (
> p.party_id = pli.party_id ) WHERE ( p.void_flag IS NULL OR
> p.void_flag = false ) AND (first_name like 'B%') AND (last_name
> like 'S%') AND (pli.state like 'M%') AND (pli.city like 'AL%')
> ORDER BY last_name, first_name LIMIT 51
>
>
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..96.48 rows=1 width=2450) (actual
> time=13459.814..13459.814 rows=0 loops=1)
> -> Nested Loop (cost=0.00..96.48 rows=1 width=2450) (actual
> time=13459.804..13459.804 rows=0 loops=1)
> -> Index Scan using idx_last_name on customer p
> (cost=0.00..50.22 rows=1 width=1209) (actual
> time=57.812..13048.524 rows=2474 loops=1)
> Index Cond: (((last_name)::text >= 'S'::character
> varying) AND ((last_name)::text < 'T'::character varying) AND
> ((first_name)::text >= 'B'::character varying) AND
> ((first_name)::text < 'C'::character varying))
> Filter: (((void_flag IS NULL) OR (void_flag =
> false)) AND ((first_name)::text ~~ 'B%'::text) AND
> ((last_name)::text ~~ 'S%'::text))
> -> Index Scan using address_pkey on address pli
> (cost=0.00..46.23 rows=1 width=1257) (actual time=0.149..0.149
> rows=0 loops=2474)
> Index Cond: (("outer".party_id = pli.party_id))
> Filter: (((state)::text ~~ 'M%'::text) AND
> ((city)::text ~~ 'AL%'::text))
> Total runtime: 13460.292 ms
>


The problem here is this bit:

-> Index Scan using idx_last_name on customer p (cost=0.00..50.22
rows=1 width=1209) (actual time=57.812..13048.524 rows=2474 loops=1)
Index Cond: (((last_name)::text >= 'S'::character
varying) AND ((last_name)::text < 'T'::character varying) AND
((first_name)::text >= 'B'::character varying) AND ((first_name)::text <
'C'::character varying))
Filter: (((void_flag IS NULL) OR (void_flag = false)) AND
((first_name)::text ~~ 'B%'::text) AND ((last_name)::text ~~ 'S%'::text))

Note that you're getting back 2474 rows, but the planner expects 1. Not
the actual time going from 57 to 13048, it's spending all it's time
looking up each tuple in the index, then in the table. Using a seq scan
would be much faster.

Have you analyzed this table? If so, you might need to up the stats
target on last_name and see if that helps.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:22 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com