This is a discussion on Maintenance question / DB size anomaly... within the Pgsql Performance forums, part of the PostgreSQL category; --> Kurt Overberg <kurt@hotdogrecords.com> writes: > That's the thing thats kinda blowing my mind here, when I look at > ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Kurt Overberg <kurt@hotdogrecords.com> writes: > That's the thing thats kinda blowing my mind here, when I look at > that table: > db1=# select count(*) from _my_cluster.sl_log_1 ; > count > ------- > 6788 > (1 row) Well, that's real interesting. AFAICS there are only two possibilities: 1. VACUUM sees the other 300k tuples as INSERT_IN_PROGRESS; a look at the code shows that these are counted the same as plain live tuples, but they'd not be visible to other transactions. I wonder if you could have any really old open transactions that might have inserted all those tuples? 2. The other 300k tuples are committed good, but they are not seen as valid by a normal MVCC-aware transaction, probably because of transaction wraparound. This would require the sl_log_1 table to have escaped vacuuming for more than 2 billion transactions, which seems a bit improbable but maybe not impossible. (You did say you were running PG 8.0.x, right? That's the last version without any strong defenses against transaction wraparound...) The way to get some facts, instead of speculating, would be to get hold of the appropriate version of pg_filedump from http://sources.redhat.com/rhdb/ and dump out sl_log_1 with it (probably the -i option would be sufficient), then take a close look at the tuples that aren't visible to other transactions. (You could do "select ctid from sl_log_1" to determine which ones are visible.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Kurt Overberg <kurt@hotdogrecords.com> wrote: > > That's the thing thats kinda blowing my mind here, when I look at > that table: > > db1=# select count(*) from _my_cluster.sl_log_1 ; > count > ------- > 6788 > (1 row) > > As far as my DB is concerned, there's only ~7000 rows (on average) > when I look > in there (it does fluctuate, I've seen it go as high as around 12k, > but then its > gone back down, so I know events are moving around in there). This is consistent with my experience with Slony and sl_log_[12] I'm pretty sure that the slon processes vacuum sl_log_* on a fairly regular basis. I'm absolutely positive that slon occasionally switches from using sl_log_1, to sl_log_2, then truncates sl_log_1 (then, after some time, does the same in reverse) So, in order for you to get massive bloat of the sl_log_* tables, you must be doing a LOT of transactions in the time before it switches logs and truncates the unused version. Either that, or something is going wrong. > So from what I can tell- from the disk point of view, there's ~11Gb > of data; from the > vacuum point of view there's 309318 rows. From the psql point of > view, there's only > around 7,000. Am I missing something? Something seems wrong here. Correct me if I'm missing something, but you're saying the table takes up 11G on disk, but vacuum says there are ~14000 pages. That would mean your page size is ~800K. Doesn't seem right. > Unless there's something > going on under the > hood that I don't know about (more than likely), it seems like my > sl_log_1 table is munged or > somehow otherwise very screwed up. I fear that a re-shuffling or > dropping/recreating > the index will mess it up further. Maybe when I take my production > systems down for > maintenance, can I wait until sl_log_1 clears out, so then I can just > drop that > table altogether (and re-create it of course)? Possibly drop this node from the Slony cluster and re-add it. Unless it's the origin node, in which case you'll have to switchover, then redo the origin then switch back ... > > Thanks! > > /kurt > > > > > On Jun 19, 2007, at 5:33 PM, Tom Lane wrote: > > > Kurt Overberg <kurt@hotdogrecords.com> writes: > >> mydb # vacuum verbose _my_cluster.sl_log_1 ; > >> INFO: "sl_log_1": found 455001 removable, 309318 nonremovable row > >> versions in 13764 pages > >> DETAIL: 0 dead row versions cannot be removed yet. > > > > Hmm. So you don't have a long-running-transactions problem (else that > > DETAIL number would have been large). What you do have is a failure > > to vacuum sl_log_1 on a regular basis (because there are so many > > dead/removable rows). I suspect also some sort of Slony problem, > > because AFAIK a properly operating Slony system shouldn't have that > > many live rows in sl_log_1 either --- don't they all represent > > as-yet-unpropagated events? I'm no Slony expert though. You probably > > should ask about that on the Slony lists. > > > >> ...I then checked the disk and those pages are still there. > > > > Yes, regular VACUUM doesn't try very hard to shorten the disk file. > > > >> Would a VACUUM FULL take care of this? > > > > It would, but it will take an unpleasantly long time with so many live > > rows to reshuffle. I'd advise first working to see if you can get the > > table down to a few live rows. Then a VACUUM FULL will be a snap. > > Also, you might want to do REINDEX after VACUUM FULL to compress the > > indexes --- VACUUM FULL isn't good at that. > > > > regards, tom lane > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate > > > > > > -- Bill Moran Collaborative Fusion Inc. wmoran@collaborativefusion.com Phone: 412-422-3463x4023 ************************************************** ************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ************************************************** ************** ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Jun 19, 2007, at 7:26 PM, Tom Lane wrote: > Kurt Overberg <kurt@hotdogrecords.com> writes: >> That's the thing thats kinda blowing my mind here, when I look at >> that table: > >> db1=# select count(*) from _my_cluster.sl_log_1 ; >> count >> ------- >> 6788 >> (1 row) > > Well, that's real interesting. AFAICS there are only two > possibilities: > > 1. VACUUM sees the other 300k tuples as INSERT_IN_PROGRESS; a look at > the code shows that these are counted the same as plain live tuples, > but they'd not be visible to other transactions. I wonder if you > could > have any really old open transactions that might have inserted all > those > tuples? > Unlikely- the database has been stopped and restarted, which I think closes out transactions? Or could that cause the problems? > 2. The other 300k tuples are committed good, but they are not seen as > valid by a normal MVCC-aware transaction, probably because of > transaction wraparound. This would require the sl_log_1 table to have > escaped vacuuming for more than 2 billion transactions, which seems a > bit improbable but maybe not impossible. (You did say you were > running > PG 8.0.x, right? That's the last version without any strong defenses > against transaction wraparound...) Yep, this 8.0.4. It has been running for over a year, fairly heavy updates, so I would guess its possible. > The way to get some facts, instead of speculating, would be to get > hold > of the appropriate version of pg_filedump from > http://sources.redhat.com/rhdb/ and dump out sl_log_1 with it > (probably the -i option would be sufficient), then take a close look > at the tuples that aren't visible to other transactions. (You could > do "select ctid from sl_log_1" to determine which ones are visible.) > Okay, I've grabbed pg_filedump and got it running on the appropriate server. I really have No Idea how to read its output though. Where does the ctid from sl_log_1 appear in the following listing? Block 0 ************************************************** ****** <Header> ----- Block Offset: 0x00000000 Offsets: Lower 20 (0x0014) Block: Size 8192 Version 2 Upper 8176 (0x1ff0) LSN: logid 949 recoff 0xae63b06c Special 8176 (0x1ff0) Items: 0 Free Space: 8156 Length (including item array): 24 BTree Meta Data: Magic (0x00053162) Version (2) Root: Block (1174413) Level (3) FastRoot: Block (4622) Level (1) <Data> ------ Empty block - no items listed <Special Section> ----- BTree Index Section: Flags: 0x0008 (META) Blocks: Previous (0) Next (0) Level (0) ..../this was taken from the first page file (955960160.0 I guess you could call it). Does this look interesting to you, Tom? FWIW- this IS on my master DB. I've been slowly preparing an upgrade to 8.2, I guess I'd better get that inta gear, hmmm? :-( /kurt > regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Kurt Overberg <kurt@hotdogrecords.com> writes: > Okay, I've grabbed pg_filedump and got it running on the appropriate > server. > I really have No Idea how to read its output though. Where does the > ctid from sl_log_1 > appear in the following listing? ctid is (block number, item number) > Block 0 ************************************************** ****** > BTree Meta Data: Magic (0x00053162) Version (2) > Root: Block (1174413) Level (3) > FastRoot: Block (4622) Level (1) This seems to be an index, not the sl_log_1 table. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| OOooooookaaaaaaaaay. Since the discussion has wandered a bit I just wanted to restate things in an effort to clear the problem in my head. Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to be messed up, specifically sl_log_1_idx1 seems to think that there's > 300,000 rows in the table its associated with. I just want to fix the index, really. So my question remains: Its it okay to dump and recreate that index (or reindex it) while the servers are down and the database is not being accessed? Tom, Bill, Chris and Richard, thank you so much for your thoughts on this matter so far. It helps to not feel "so alone" when dealing with difficult issues (for me anyway) on a system I don't know so much about. Thanks guys, /kurt On Jun 19, 2007, at 10:51 PM, Tom Lane wrote: > Kurt Overberg <kurt@hotdogrecords.com> writes: >> Okay, I've grabbed pg_filedump and got it running on the appropriate >> server. >> I really have No Idea how to read its output though. Where does the >> ctid from sl_log_1 >> appear in the following listing? > > ctid is (block number, item number) > >> Block 0 ************************************************** ****** >> BTree Meta Data: Magic (0x00053162) Version (2) >> Root: Block (1174413) Level (3) >> FastRoot: Block (4622) Level (1) > > This seems to be an index, not the sl_log_1 table. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: explain analyze is your friend ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| In response to Kurt Overberg <kurt@hotdogrecords.com>: > OOooooookaaaaaaaaay. Since the discussion has wandered a bit I just > wanted to restate things in an effort to clear the problem in my head. > > Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to > be messed up, specifically sl_log_1_idx1 seems to think that there's > > 300,000 rows in the table its associated with. I just want to fix > the index, really. So my question remains: Apologies, I must have misunderstood some previous message. > Its it okay to dump and recreate that index (or reindex it) while the > servers are down and the database is not being accessed? There are people here who know _way_ more than me -- but I can't see any reason why you couldn't just REINDEX it while everything is running. There may be some performance slowdown during the reindex, but everything should continue to chug along. A drop/recreate of the index should be OK as well. > Tom, Bill, Chris and Richard, thank you so much for your thoughts on > this matter so far. It helps to not feel "so alone" when dealing > with difficult issues (for me anyway) on a system I don't know so > much about. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023 ************************************************** ************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ************************************************** ************** ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Kurt Overberg wrote: > OOooooookaaaaaaaaay. Since the discussion has wandered a bit I just > wanted to restate things in an effort to clear the problem in my head. > > Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to be > messed up, specifically sl_log_1_idx1 seems to think that there's >> 300,000 rows in the table its associated with. I just want to fix the > index, really. So my question remains: > > Its it okay to dump and recreate that index (or reindex it) while the > servers are down and the database is not being accessed? Well, I would probably stop the slon daemons => dropping needed indexes which slony needs can lead to quite a slowdown, and worse, the slowdown happens because the database server is doing things the wrong way. But that's mostly what you need to do. OTOH, depending upon the size of your database, you might consider starting out from a scratch database. Andreas > > Tom, Bill, Chris and Richard, thank you so much for your thoughts on > this matter so far. It helps to not feel "so alone" when dealing > with difficult issues (for me anyway) on a system I don't know so much > about. #slony@Freenode.net, #postgresql@freenode.net are quite helpful, and sometimes faster than mail. Andreas -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGeTFtHJdudm4KnO0RAqDaAKDB1/eGqdwtLQdpTJzrChcp4J5M5wCglphW ljxag882h33fDWXX1ILiUU8= =jzBw -----END PGP SIGNATURE----- ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Kurt Overberg <kurt@hotdogrecords.com> writes: > Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to > be messed up, specifically sl_log_1_idx1 seems to think that there's >>> 300,000 rows in the table its associated with. I just want to fix > the index, really. I'm not sure how you arrive at that conclusion. The VACUUM VERBOSE output you provided here: http://archives.postgresql.org/pgsql...6/msg00370.php shows clearly that there are lots of rows in the table as well as the indexes. A REINDEX would certainly cut the size of the indexes but it isn't going to do anything about the extraneous rows. When last heard from, you were working on getting pg_filedump output for some of the bogus rows --- what was the result? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Dang it, Tom, don't you ever get tired of being right? I guess I had been focusing on the index numbers since they came up first, and its the index files that are > 10Gb. Okay, so I did some digging with pg_filedump, and found the following: .. .. .. .. Block 406 ************************************************** ****** <Header> ----- Block Offset: 0x0032c000 Offsets: Lower 208 (0x00d0) Block: Size 8192 Version 2 Upper 332 (0x014c) LSN: logid 950 recoff 0x9ebcc6e4 Special 8192 (0x2000) Items: 47 Free Space: 124 Length (including item array): 212 <Data> ------ Item 1 -- Length: 472 Offset: 7720 (0x1e28) Flags: USED XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 Block Id: 406 linp Index: 1 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) Item 2 -- Length: 185 Offset: 7532 (0x1d6c) Flags: USED XMIN: 1489323584 CMIN: 4 XMAX: 0 CMAX|XVAC: 0 Block Id: 406 linp Index: 2 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) Item 3 -- Length: 129 Offset: 7400 (0x1ce8) Flags: USED XMIN: 1489323590 CMIN: 2 XMAX: 0 CMAX|XVAC: 0 Block Id: 406 linp Index: 3 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) Item 4 -- Length: 77 Offset: 7320 (0x1c98) Flags: USED XMIN: 1489323592 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 Block Id: 406 linp Index: 4 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) ....I then looked in the DB: mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,1)'; log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | log_cmddata ------------+---------+-------------+---------------+------------- +------------- (0 rows) mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,2)'; log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | log_cmddata ------------+---------+-------------+---------------+------------- +------------- (0 rows) mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,3)'; log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | log_cmddata ------------+---------+-------------+---------------+------------- +------------- (0 rows) ....is this what you were looking for, Tom? The only thing that stands out to me is the XMAX_INVALID mask. Thoughts? Thanks, /kurt On Jun 20, 2007, at 11:22 AM, Tom Lane wrote: > Kurt Overberg <kurt@hotdogrecords.com> writes: >> Okay, so the sl_log_1 TABLE looks okay. Its the indexes that seem to >> be messed up, specifically sl_log_1_idx1 seems to think that there's >>>> 300,000 rows in the table its associated with. I just want to fix >> the index, really. > > I'm not sure how you arrive at that conclusion. The VACUUM VERBOSE > output you provided here: > http://archives.postgresql.org/pgsql...6/msg00370.php > shows clearly that there are lots of rows in the table as well as > the indexes. A REINDEX would certainly cut the size of the indexes > but it isn't going to do anything about the extraneous rows. > > When last heard from, you were working on getting pg_filedump > output for > some of the bogus rows --- what was the result? > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| ||||
| Kurt Overberg <kurt@hotdogrecords.com> writes: > Okay, so I did some digging with pg_filedump, and found the following: > Block 406 ************************************************** ****** > Item 1 -- Length: 472 Offset: 7720 (0x1e28) Flags: USED > XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 > Block Id: 406 linp Index: 1 Attributes: 6 Size: 32 > infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) This is pretty much what you'd expect for a never-updated tuple... > mydb=# select * from _my_cluster.sl_log_1 where ctid = '(406,1)'; > log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | > log_cmddata > ------------+---------+-------------+---------------+------------- > +------------- > (0 rows) so I have to conclude that you've got a wraparound problem. What is the current XID counter? (pg_controldata will give you that, along with a lot of other junk.) It might also be interesting to take a look at "ls -l $PGDATA/pg_clog"; the mod times on the files in there would give us an idea how fast XIDs are being consumed. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |