Unix Technical Forum

Non-blocking vacuum full

This is a discussion on Non-blocking vacuum full within the Pgsql Performance forums, part of the PostgreSQL category; --> Hello, I was wondering whether any thought has previously been given to having a non-blocking "vacuum full", in the ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:38 AM
Peter Schuller
 
Posts: n/a
Default Non-blocking vacuum full

Hello,

I was wondering whether any thought has previously been given to
having a non-blocking "vacuum full", in the sense of space reclamation
and table compactation.

The motivation is that it is useful to be able to assume that
operations that span a table will *roughtly* scale linearly with the
size of the table. But when you have a table that over an extended
period of time begins small, grows large, and grows small again (where
"large" might be, say, 200 GB), that assumption is most definitely
not correct when you're on the downward slope of that graph. Having
this assumption remain true simplifies things a lot for certain
workloads (= my particular work load ).

I have only looked very very briefly at the PG code so I don't know
how far fetched it is, but my thought was that it should be possible
to have a slow background process (similar to normal non-full vacuums
nows) that would, instead of registering dead tuples in the FSM, move
live tuples around.

Combine that slow moving operations with a policy to a new tuple space
allocation policy that prefers earlier locations on-disk, it should in
time result in a situation where the physical on-disk file contains
only dead tuples after a certain percentage location. At this point
the file can be truncated, giving space back to the OS as well as
eliminating all that dead space from having to be covered by
sequential scans on the table.

This does of course increase the total cost of all updates and
deletes, but would be very useful in some senarios. It also has the
interesting property that the scan for live tuples to move need not
touch the entire table to be effective; it could by design be applied
to the last <n> percentage of the table, where <n> would be scaled
appropriately with the frequency of the checks relative to
update/insert frequency.

Other benefits:

* Never vacuum full - EVER. Not even after discovering too small
max_fsm_pages or too infrequent vacuums and needing to retroactively
shrink the table.
* Increased locality in general; even if one does not care about
the diskspace or sequential scanning. Particularly relevant for low-update frequency
tables suffering from sudden shrinkage, where a blocking VACUUM FULL Isnot
acceptable.
* Non-blocking CLUSTER is perhaps suddently more trivial to implement?
Or at least SORTOFCLUSTER when you want it for reasons other than
perfect order ("mostly sorted").

Opinions/thoughts?

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller@infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey@scode.org
E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFG/UK4DNor2+l1i30RApbXAJ9tCgYIOSgUwJG39o23pdzvp/XA7QCfSi6w
vH2uO1sqCy6bBL9/IjiUPqs=
=9g+K
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:38 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Non-blocking vacuum full

Peter Schuller wrote:
> I have only looked very very briefly at the PG code so I don't know
> how far fetched it is, but my thought was that it should be possible
> to have a slow background process (similar to normal non-full vacuums
> nows) that would, instead of registering dead tuples in the FSM, move
> live tuples around.


What you've described is actually very close to VACUUM FULL. VACUUM FULL
needs to take an exclusive lock to lock out concurrent scanners that
might miss or see a tuple twice, when a live tuple is moved. That's the
fundamental problem you need to solve.

I think it's doable, if you take a copy of the tuple, and set the ctid
pointer on the old one like an UPDATE, and wait until the old tuple is
no longer visible to anyone before removing it. It does require some
changes to tuple visibility code. For example, a transaction running in
serializable mode shouldn't throw a serialization error when it tries to
update an old, moved row version, but follow the ctid pointer instead.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:38 AM
Ron Mayer
 
Posts: n/a
Default Re: Non-blocking vacuum full

Heikki Linnakangas wrote:
> Peter Schuller wrote:
>> to have a slow background process (similar to normal non-full vacuums

> ...
> I think it's doable, if you take a copy of the tuple, and set the ctid
> pointer on the old one like an UPDATE, and wait until the old tuple is
> no longer visible to anyone before removing it. It does require some
> changes to tuple visibility code.


Wouldn't just having this slow background process
repeatedly alternating between
update table set anycol=anycol where ctid > [some ctid near the end]
and running normal VACUUM statements do what the original poster
was asking? And with 8.3, I guess also avoiding HOT?


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 11:38 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Non-blocking vacuum full

Ron Mayer wrote:
> Heikki Linnakangas wrote:
>> Peter Schuller wrote:
>>> to have a slow background process (similar to normal non-full vacuums

>> ...
>> I think it's doable, if you take a copy of the tuple, and set the ctid
>> pointer on the old one like an UPDATE, and wait until the old tuple is
>> no longer visible to anyone before removing it. It does require some
>> changes to tuple visibility code.

>
> Wouldn't just having this slow background process
> repeatedly alternating between
> update table set anycol=anycol where ctid > [some ctid near the end]
> and running normal VACUUM statements do what the original poster
> was asking?


Almost. Updaters would block waiting for the UPDATE, and updaters in
serializable mode would throw serialization errors. And the "WHERE ctid
> ?" would actually result in a seq scan scanning the whole table, since

our tid scans don't support inequality searches.

> And with 8.3, I guess also avoiding HOT?


HOT shouldn't cause any complications here AFAICS.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:57 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com