Unix Technical Forum

Suggestions on an update query

This is a discussion on Suggestions on an update query within the Pgsql Performance forums, part of the PostgreSQL category; --> PostgreSql version 8.2.4 Memory = 8 Gig CPUs 1 dual core Zeon running at 3.0 I have a problem ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:41 AM
Campbell, Lance
 
Posts: n/a
Default Suggestions on an update query

PostgreSql version 8.2.4

Memory = 8 Gig

CPUs 1 dual core Zeon running at 3.0



I have a problem with an update query taking over 10 hours in order to
run. I rebooted my server. I ran the SQL command "analyze". Could
you please help me with any suggestions? I have included the two tables
involved in the update below as well as the indexes I am using.



The table result_entry contains 17,767,240 rows and the table
question_number contains 40,787. Each row from the result_entry table
will match to one and only one row in the table question_number using
the fk_question_id field. Each row from the question_number table
matches to an average of 436 rows on the result_entry table.



CREATE TABLE question_number

(

fk_form_id integer not null,

fk_question_id integer not null,

question_number integer not null,

sequence_id integer not null

);



ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey
PRIMARY KEY (fk_question_id);

CREATE INDEX question_number_index1 ON question_number USING btree
(question_number);





CREATE TABLE result_entry (

fk_result_submission_id integer NOT NULL,

fk_question_id integer NOT NULL,

fk_option_order_id integer NOT NULL,

value character varying,

order_id integer NOT NULL,

question_number integer

);



CREATE INDEX result_entery_index1 ON result_entry USING btree
(fk_question_id);





update result_entry set question_number=question_number.question_number


from question_number where
result_entry.fk_question_id=question_number.fk_que stion_id;







explain update result_entry set
question_number=question_number.question_number

from question_number where
result_entry.fk_question_id=question_number.fk_que stion_id;



QUERY PLAN


------------------------------------------------------------------------
---------

Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32)

Hash Cond: (result_entry.fk_question_id =
question_number.fk_question_id)

-> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178
width=28)

-> Hash (cost=927.87..927.87 rows=40787 width=8)

-> Seq Scan on question_number (cost=0.00..927.87 rows=40787
width=8)

(5 rows)







Postgresql.conf settings:



shared_buffers = 1GB

work_mem = 10MB

max_fsm_pages = 204800

random_page_cost = 1.0

effective_cache_size = 8GB





Thanks for any help!





Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:41 AM
Campbell, Lance
 
Posts: n/a
Default Re: Suggestions on an update query

I forgot to include an additional parameter I am using in
Postgresql.conf:



checkpoint_segments = 30



Thanks,



Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu



________________________________

From: pgsql-performance-owner@postgresql.org
[mailtogsql-performance-owner@postgresql.org] On Behalf Of Campbell,
Lance
Sent: Friday, October 26, 2007 3:27 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Suggestions on an update query



PostgreSql version 8.2.4

Memory = 8 Gig

CPUs 1 dual core Zeon running at 3.0



I have a problem with an update query taking over 10 hours in order to
run. I rebooted my server. I ran the SQL command "analyze". Could
you please help me with any suggestions? I have included the two tables
involved in the update below as well as the indexes I am using.



The table result_entry contains 17,767,240 rows and the table
question_number contains 40,787. Each row from the result_entry table
will match to one and only one row in the table question_number using
the fk_question_id field. Each row from the question_number table
matches to an average of 436 rows on the result_entry table.



CREATE TABLE question_number

(

fk_form_id integer not null,

fk_question_id integer not null,

question_number integer not null,

sequence_id integer not null

);



ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey
PRIMARY KEY (fk_question_id);

CREATE INDEX question_number_index1 ON question_number USING btree
(question_number);





CREATE TABLE result_entry (

fk_result_submission_id integer NOT NULL,

fk_question_id integer NOT NULL,

fk_option_order_id integer NOT NULL,

value character varying,

order_id integer NOT NULL,

question_number integer

);



CREATE INDEX result_entery_index1 ON result_entry USING btree
(fk_question_id);





update result_entry set question_number=question_number.question_number


from question_number where
result_entry.fk_question_id=question_number.fk_que stion_id;







explain update result_entry set
question_number=question_number.question_number

from question_number where
result_entry.fk_question_id=question_number.fk_que stion_id;



QUERY PLAN


------------------------------------------------------------------------
---------

Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32)

Hash Cond: (result_entry.fk_question_id =
question_number.fk_question_id)

-> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178
width=28)

-> Hash (cost=927.87..927.87 rows=40787 width=8)

-> Seq Scan on question_number (cost=0.00..927.87 rows=40787
width=8)

(5 rows)







Postgresql.conf settings:



shared_buffers = 1GB

work_mem = 10MB

max_fsm_pages = 204800

random_page_cost = 1.0

effective_cache_size = 8GB





Thanks for any help!





Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:41 AM
Gregory Stark
 
Posts: n/a
Default Re: Suggestions on an update query

"Campbell, Lance" <lance@uiuc.edu> writes:

> QUERY PLAN
>
>
> ------------------------------------------------------------------------
>
> Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32)
> Hash Cond: (result_entry.fk_question_id = question_number.fk_question_id)
> -> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178 width=28)
> -> Hash (cost=927.87..927.87 rows=40787 width=8)
> -> Seq Scan on question_number (cost=0.00..927.87 rows=40787 width=8)
>
> (5 rows)


That looks like an entirely reasonable plan. Is it possible some other session
was blocking this update with a lock on a record? Was there lots of I/O at the
time? You could peek in pg_locks while the update seems frozen.

This looks like a one-time administrative job to add a new column, is that it?
You might also consider creating a new table with the new data and replacing
the old table with the new one with something like:

CREATE TABLE new_result_entry AS
SELECT fk_result_submission_id, fk_question_id, fk_option_order_id,
value, order_id,
question_number.question_number
FROM result_entry
JOIN question_number USING (fk_question_id)

CREATE INDEX result_entery_index1n ON new_result_entry USING btree (fk_question_id);

ALTER TABLE result_entry RENAME TO old_result_entry
ALTER TABLE newresult_entry RENAME TO result_entry

Unfortunately (for this use case) any views, triggers, etc which reference the
old table will continue to reference the old table after the renames. You'll
have to drop and recreate them.

That may not be an option if the data is actively being used though. But if it
is an option there are a few advantages 1) it'll be a bit faster 2) you can
build the indexes on the new data at the end of the creation b) the resulting
table and indexes won't have all the old versions taking up space waiting for
a vacuum.


> Postgresql.conf settings:
> shared_buffers = 1GB
> work_mem = 10MB
> max_fsm_pages = 204800
> random_page_cost = 1.0
> effective_cache_size = 8GB


I would suggest keeping random_page_cost at least slightly above 1.0 and
effective_cache_size should probably be about 6GB rather than 8 since the
shared buffers and other things which use memory reduce the memory available
for cache. Also, work_mem could be larger at least for large batch queries
like this.

None of this is relevant for this query though. Actually I think a larger
work_mem can avoid problems with hash joins so you might try that but I don't
think it would be choosing it estimated that might happen -- and the estimates
all look accurate.


--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 11:41 AM
Joshua D. Drake
 
Posts: n/a
Default Re: Suggestions on an update query

On Fri, 26 Oct 2007 15:31:44 -0500
"Campbell, Lance" <lance@uiuc.edu> wrote:

> I forgot to include an additional parameter I am using in
> Postgresql.conf:
>


O.k. first, just to get it out of the way (and then I will try and
help). Please do not top post, it makes replying contextually very
difficult.
>
> PostgreSql version 8.2.4
>
> Memory = 8 Gig
>
> CPUs 1 dual core Zeon running at 3.0
>


O.k. first you might be grinding through your 20 checkpoint segments
but in reality what I think is happening is you are doing foreign key
checks against all of it and slowing things down.


>
> The table result_entry contains 17,767,240 rows and the table
> question_number contains 40,787. Each row from the result_entry table
> will match to one and only one row in the table question_number using
> the fk_question_id field. Each row from the question_number table
> matches to an average of 436 rows on the result_entry table.
>
>



You could disable the foreign key for the update and then reapply it.

Joshua D. Drake


>
> CREATE TABLE question_number
>
> (
>
> fk_form_id integer not null,
>
> fk_question_id integer not null,
>
> question_number integer not null,
>
> sequence_id integer not null
>
> );
>
>
>
> ALTER TABLE ONLY question_number ADD CONSTRAINT question_number_pkey
> PRIMARY KEY (fk_question_id);
>
> CREATE INDEX question_number_index1 ON question_number USING btree
> (question_number);
>
>
>
>
>
> CREATE TABLE result_entry (
>
> fk_result_submission_id integer NOT NULL,
>
> fk_question_id integer NOT NULL,
>
> fk_option_order_id integer NOT NULL,
>
> value character varying,
>
> order_id integer NOT NULL,
>
> question_number integer
>
> );
>
>
>
> CREATE INDEX result_entery_index1 ON result_entry USING btree
> (fk_question_id);
>
>
>
>
>
> update result_entry set
> question_number=question_number.question_number
>
>
> from question_number where
> result_entry.fk_question_id=question_number.fk_que stion_id;
>
>
>
>
>
>
>
> explain update result_entry set
> question_number=question_number.question_number
>
> from question_number where
> result_entry.fk_question_id=question_number.fk_que stion_id;
>
>
>
> QUERY PLAN
>
>
> ------------------------------------------------------------------------
> ---------
>
> Hash Join (cost=1437.71..1046983.94 rows=17333178 width=32)
>
> Hash Cond: (result_entry.fk_question_id =
> question_number.fk_question_id)
>
> -> Seq Scan on result_entry (cost=0.00..612216.78 rows=17333178
> width=28)
>
> -> Hash (cost=927.87..927.87 rows=40787 width=8)
>
> -> Seq Scan on question_number (cost=0.00..927.87
> rows=40787 width=8)
>
> (5 rows)
>
>
>
>
>
>
>
> Postgresql.conf settings:
>
>
>
> shared_buffers = 1GB
>
> work_mem = 10MB
>
> max_fsm_pages = 204800
>
> random_page_cost = 1.0
>
> effective_cache_size = 8GB
>
>
>
>
>
> Thanks for any help!
>
>
>
>
>
> Lance Campbell
>
> Project Manager/Software Architect
>
> Web Services at Public Affairs
>
> University of Illinois
>
> 217.333.0382
>
> http://webservices.uiuc.edu
>
>
>



--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997 http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHIlkFATb/zqfZUUQRAnSeAKCKrXAiWmjifE/yzHtNJU/OY9SPtwCcDHII
ie4bHtTF27Fh7ZtPjON4+zk=
=ZLNo
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 11:41 AM
Gregory Stark
 
Posts: n/a
Default Re: Suggestions on an update query


"Joshua D. Drake" <jd@commandprompt.com> writes:

> On Fri, 26 Oct 2007 15:31:44 -0500
> "Campbell, Lance" <lance@uiuc.edu> wrote:
>
>> I forgot to include an additional parameter I am using in
>> Postgresql.conf:
>>

>
> O.k. first, just to get it out of the way (and then I will try and
> help). Please do not top post, it makes replying contextually very
> difficult.
>
>> PostgreSql version 8.2.4
>>
>> Memory = 8 Gig
>>
>> CPUs 1 dual core Zeon running at 3.0
>>

>
> O.k. first you might be grinding through your 20 checkpoint segments
> but in reality what I think is happening is you are doing foreign key
> checks against all of it and slowing things down.


If you're going to berate someone about top-posting perhaps you should attach
your own commentary to relevant bits of context :P

But the original post didn't include any foreign key constraints. I suspect
you've guessed it right though. In fact I suspect what's happening is he
doesn't have an index on the referencing column so the foreign key checks are
doing sequential scans of.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 11:41 AM
Joshua D. Drake
 
Posts: n/a
Default Re: Suggestions on an update query

On Sat, 27 Oct 2007 03:04:47 +0100
Gregory Stark <stark@enterprisedb.com> wrote:


> > O.k. first you might be grinding through your 20 checkpoint segments
> > but in reality what I think is happening is you are doing foreign
> > key checks against all of it and slowing things down.

>
> If you're going to berate someone about top-posting perhaps you
> should attach your own commentary to relevant bits of context :P


It was hardly berating Greg, I even said please.


> I
> suspect you've guessed it right though. In fact I suspect what's
> happening is he doesn't have an index on the referencing column so
> the foreign key checks are doing sequential scans of.
>


Sincerely,

Joshua D. Drake


--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997 http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHIqLyATb/zqfZUUQRAoDCAJ9gqo50Ck0kx5PNE0LlAbEh6BT28wCdG17p
TznzpE+C53GMbUHdrpCoZs0=
=YQhY
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 11:41 AM
Campbell, Lance
 
Posts: n/a
Default Re: Suggestions on an update query

Thanks for all of your help. The problem was that the result_entry table
had some constraints that pointed to a third table. When I removed
those constraints the performance was amazing. The update took less
than seven minutes to execute. I did not even consider the fact that
constraints to another table would impact the performance.

Thanks again,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailtogsql-performance-owner@postgresql.org] On Behalf Of Gregory
Stark
Sent: Friday, October 26, 2007 9:05 PM
To: Joshua D. Drake
Cc: Campbell, Lance; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Suggestions on an update query


"Joshua D. Drake" <jd@commandprompt.com> writes:

> On Fri, 26 Oct 2007 15:31:44 -0500
> "Campbell, Lance" <lance@uiuc.edu> wrote:
>
>> I forgot to include an additional parameter I am using in
>> Postgresql.conf:
>>

>
> O.k. first, just to get it out of the way (and then I will try and
> help). Please do not top post, it makes replying contextually very
> difficult.
>
>> PostgreSql version 8.2.4
>>
>> Memory = 8 Gig
>>
>> CPUs 1 dual core Zeon running at 3.0
>>

>
> O.k. first you might be grinding through your 20 checkpoint segments
> but in reality what I think is happening is you are doing foreign key
> checks against all of it and slowing things down.


If you're going to berate someone about top-posting perhaps you should
attach
your own commentary to relevant bits of context :P

But the original post didn't include any foreign key constraints. I
suspect
you've guessed it right though. In fact I suspect what's happening is he
doesn't have an index on the referencing column so the foreign key
checks are
doing sequential scans of.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 11:41 AM
Scott Marlowe
 
Posts: n/a
Default Re: Suggestions on an update query

On 10/29/07, Campbell, Lance <lance@uiuc.edu> wrote:
> Thanks for all of your help. The problem was that the result_entry table
> had some constraints that pointed to a third table. When I removed
> those constraints the performance was amazing. The update took less
> than seven minutes to execute. I did not even consider the fact that
> constraints to another table would impact the performance.


Usually you can put an index on the refrerenced key in the foreign
table to speed things up.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:21 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com