This is a discussion on Maintenance question / DB size anomaly... within the Pgsql Performance forums, part of the PostgreSQL category; --> Drat! I'm wrong again. I thought for sure there wouldn't be a wraparound problem. So does this affect the ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Drat! I'm wrong again. I thought for sure there wouldn't be a wraparound problem. So does this affect the entire database server, or just this table? Is best way to proceed to immediately ditch this db and promote one of my slaves to a master? I'm just concerned about the data integrity. Note that I don't use OID for anything really, so I'm hoping I'll be safe. Thanks again, Tom. /kurt pg_controldata output: -bash-3.00$ pg_controldata pg_control version number: 74 Catalog version number: 200411041 Database system identifier: 4903924957417782767 Database cluster state: in production pg_control last modified: Wed 20 Jun 2007 03:19:52 PM CDT Current log file ID: 952 Next log file segment: 154 Latest checkpoint location: 3B8/920F0D78 Prior checkpoint location: 3B8/8328E4A4 Latest checkpoint's REDO location: 3B8/9200BBF0 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 1490547335 Latest checkpoint's NextOID: 3714961319 Time of latest checkpoint: Wed 20 Jun 2007 03:17:50 PM CDT Database block size: 8192 Blocks per segment of large relation: 131072 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum number of function arguments: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8 -bash-3.00$ echo $PGDATA Here's the list from pg_clog for June: -rw------- 1 postgres postgres 262144 Jun 1 03:36 054D -rw------- 1 postgres postgres 262144 Jun 1 08:16 054E -rw------- 1 postgres postgres 262144 Jun 1 10:24 054F -rw------- 1 postgres postgres 262144 Jun 1 17:03 0550 -rw------- 1 postgres postgres 262144 Jun 2 03:32 0551 -rw------- 1 postgres postgres 262144 Jun 2 10:04 0552 -rw------- 1 postgres postgres 262144 Jun 2 19:24 0553 -rw------- 1 postgres postgres 262144 Jun 3 03:38 0554 -rw------- 1 postgres postgres 262144 Jun 3 13:19 0555 -rw------- 1 postgres postgres 262144 Jun 4 00:02 0556 -rw------- 1 postgres postgres 262144 Jun 4 07:12 0557 -rw------- 1 postgres postgres 262144 Jun 4 12:37 0558 -rw------- 1 postgres postgres 262144 Jun 4 19:46 0559 -rw------- 1 postgres postgres 262144 Jun 5 03:36 055A -rw------- 1 postgres postgres 262144 Jun 5 10:54 055B -rw------- 1 postgres postgres 262144 Jun 5 18:11 055C -rw------- 1 postgres postgres 262144 Jun 6 03:38 055D -rw------- 1 postgres postgres 262144 Jun 6 10:15 055E -rw------- 1 postgres postgres 262144 Jun 6 15:10 055F -rw------- 1 postgres postgres 262144 Jun 6 23:21 0560 -rw------- 1 postgres postgres 262144 Jun 7 07:15 0561 -rw------- 1 postgres postgres 262144 Jun 7 13:43 0562 -rw------- 1 postgres postgres 262144 Jun 7 22:53 0563 -rw------- 1 postgres postgres 262144 Jun 8 07:12 0564 -rw------- 1 postgres postgres 262144 Jun 8 14:42 0565 -rw------- 1 postgres postgres 262144 Jun 9 01:30 0566 -rw------- 1 postgres postgres 262144 Jun 9 09:19 0567 -rw------- 1 postgres postgres 262144 Jun 9 20:19 0568 -rw------- 1 postgres postgres 262144 Jun 10 03:39 0569 -rw------- 1 postgres postgres 262144 Jun 10 15:38 056A -rw------- 1 postgres postgres 262144 Jun 11 03:34 056B -rw------- 1 postgres postgres 262144 Jun 11 09:14 056C -rw------- 1 postgres postgres 262144 Jun 11 13:59 056D -rw------- 1 postgres postgres 262144 Jun 11 19:41 056E -rw------- 1 postgres postgres 262144 Jun 12 03:37 056F -rw------- 1 postgres postgres 262144 Jun 12 09:59 0570 -rw------- 1 postgres postgres 262144 Jun 12 17:23 0571 -rw------- 1 postgres postgres 262144 Jun 13 03:32 0572 -rw------- 1 postgres postgres 262144 Jun 13 09:16 0573 -rw------- 1 postgres postgres 262144 Jun 13 16:25 0574 -rw------- 1 postgres postgres 262144 Jun 14 01:28 0575 -rw------- 1 postgres postgres 262144 Jun 14 08:40 0576 -rw------- 1 postgres postgres 262144 Jun 14 15:07 0577 -rw------- 1 postgres postgres 262144 Jun 14 22:00 0578 -rw------- 1 postgres postgres 262144 Jun 15 03:36 0579 -rw------- 1 postgres postgres 262144 Jun 15 12:21 057A -rw------- 1 postgres postgres 262144 Jun 15 18:10 057B -rw------- 1 postgres postgres 262144 Jun 16 03:32 057C -rw------- 1 postgres postgres 262144 Jun 16 09:17 057D -rw------- 1 postgres postgres 262144 Jun 16 19:32 057E -rw------- 1 postgres postgres 262144 Jun 17 03:39 057F -rw------- 1 postgres postgres 262144 Jun 17 13:26 0580 -rw------- 1 postgres postgres 262144 Jun 17 23:11 0581 -rw------- 1 postgres postgres 262144 Jun 18 04:40 0582 -rw------- 1 postgres postgres 262144 Jun 18 12:23 0583 -rw------- 1 postgres postgres 262144 Jun 18 17:22 0584 -rw------- 1 postgres postgres 262144 Jun 18 19:40 0585 -rw------- 1 postgres postgres 262144 Jun 19 03:38 0586 -rw------- 1 postgres postgres 262144 Jun 19 09:30 0587 -rw------- 1 postgres postgres 262144 Jun 19 10:23 0588 -rw------- 1 postgres postgres 262144 Jun 19 16:10 0589 -rw------- 1 postgres postgres 262144 Jun 19 21:45 058A -rw------- 1 postgres postgres 262144 Jun 20 03:38 058B -rw------- 1 postgres postgres 262144 Jun 20 12:17 058C -rw------- 1 postgres postgres 131072 Jun 20 15:13 058D On Jun 20, 2007, at 2:37 PM, Tom Lane wrote: > > so I have to conclude that you've got a wraparound problem. What > is the > current XID counter? (pg_controldata will give you that, along with a > lot of other junk.) It might also be interesting to take a look at > "ls -l $PGDATA/pg_clog"; the mod times on the files in there would > give > us an idea how fast XIDs are being consumed. > > regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Kurt Overberg <kurt@hotdogrecords.com> writes: > Drat! I'm wrong again. I thought for sure there wouldn't be a > wraparound problem. Well, I'm not sure what it is now. You showed some invisible tuples with XMINs of XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 XMIN: 1489323590 CMIN: 2 XMAX: 0 CMAX|XVAC: 0 XMIN: 1489323592 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 but the nextXID is 1490547335 which is not that far ahead of those --- about 1.2 million transactions, or less than a day's time according to the clog timestamps, which suggest that you're burning several million XIDs a day. Perhaps you've wrapped past them since your earlier check --- if you try the same "select where ctid = " queries now, do they show rows? The other thing that's strange here is that an 8.0 installation should be pretty aggressive about recycling pg_clog segments, and yet you've got a bunch there. How far back do the files in pg_clog go --- what's the numeric range of the filenames, and the date range of their mod times? Have you checked the postmaster log to see if you're getting any complaints about checkpoint failures or anything like that? It would also be useful to look at the output of select datname, age(datfrozenxid) from pg_database; regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| Okay, select * from _my_cluster.sl_log_1 where ctid = '(1,1)'; select * from _my_cluster.sl_log_1 where ctid = '(1,2)'; select * from _my_cluster.sl_log_1 where ctid = '(1,3)'; select * from _my_cluster.sl_log_1 where ctid = '(1,4)'; all returns zero rows. When I do a dump of that file, I get: Block 1 ************************************************** ****** <Header> ----- Block Offset: 0x00002000 Offsets: Lower 408 (0x0198) Block: Size 8192 Version 2 Upper 7680 (0x1e00) LSN: logid 955 recoff 0x0daed68c Special 8192 (0x2000) Items: 97 Free Space: 7272 Length (including item array): 412 <Data> ------ Item 1 -- Length: 121 Offset: 8068 (0x1f84) Flags: USED XMIN: 1491480520 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 Block Id: 1 linp Index: 1 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) ....the fact that they weren't in the table, but in the file (I did the filedump first, then the query), then redid the filedump, the results are the same, the rows are still in the file. I have no idea how frequently these files are getting written to, I assume frequently. I also looked at the last block listed in the file, 6445, and also looked for items 1-4, and also did not find them in the table using a similar select as above. That seems kinda strange, since there's right this second 11,000 items in that table, but I'll roll with it for awhile. Intrigued, I wanted to see what a filedump looked like of a row that WAS in the table: ctid | log_origin | log_xid | log_tableid | log_actionseq | log_cmdtype | (7,1) | 10 | 1491481037 | 8 | 473490934 | I | (memberid,answerid,taskinstanceid) values ('144854','148707','0') Block 7 ************************************************** ****** <Header> ----- Block Offset: 0x0000e000 Offsets: Lower 424 (0x01a8) Block: Size 8192 Version 2 Upper 508 (0x01fc) LSN: logid 955 recoff 0x0dc4bcc0 Special 8192 (0x2000) Items: 101 Free Space: 84 Length (including item array): 428 <Data> ------ Item 1 -- Length: 129 Offset: 8060 (0x1f7c) Flags: USED XMIN: 1491481037 CMIN: 7 XMAX: 0 CMAX|XVAC: 0 Block Id: 7 linp Index: 1 Attributes: 6 Size: 32 infomask: 0x0912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID) ....the NextID was (taken about 5 mins after the previous filedump): Latest checkpoint's NextXID: 1491498183 I don't see any real differences in the file entry for a row that is in the table, and one that I don't see in the table. I hope I'm getting this right, its totally fascinating seeing how all this works. About your other questions: 1. I have pg_clog segments all the way back to the start of the database, all the way back to March 14th, 2006 (most likely when the database was first brought up on this machine). The numeric names start at 0000 and go to 058E. I checked the recent (within last 8 days) and saw no errors containing the word 'checkpoint'. In fact, very few errors at all. The dang thing seems to be running pretty well, just a little slow. mydb=# select datname, age(datfrozenxid) from pg_database; datname | age -----------+------------ template1 | 1491520270 template0 | 1491520270 postgres | 1491520270 mydb | 1076194904 Oooooooo..... thats not good, is it? Thanks for taking an interest, Tom. I'm most likely going to promote one of my subscribers to be master, then nuke this database, but I have no problems keeping it around if you think I may have found some obscure bug that could help someone debug. Again, this DB gets vacuumed every day, and in the beginning, I think I remember doing a vacuum full every day. Thanks, /kurt On Jun 20, 2007, at 5:08 PM, Tom Lane wrote: > Kurt Overberg <kurt@hotdogrecords.com> writes: >> Drat! I'm wrong again. I thought for sure there wouldn't be a >> wraparound problem. > > Well, I'm not sure what it is now. You showed some invisible tuples > with XMINs of > XMIN: 1489323584 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 > XMIN: 1489323590 CMIN: 2 XMAX: 0 CMAX|XVAC: 0 > XMIN: 1489323592 CMIN: 1 XMAX: 0 CMAX|XVAC: 0 > but the nextXID is > 1490547335 > which is not that far ahead of those --- about 1.2 million > transactions, > or less than a day's time according to the clog timestamps, which > suggest that you're burning several million XIDs a day. Perhaps > you've > wrapped past them since your earlier check --- if you try the same > "select where ctid = " queries now, do they show rows? > > The other thing that's strange here is that an 8.0 installation should > be pretty aggressive about recycling pg_clog segments, and yet you've > got a bunch there. How far back do the files in pg_clog go --- what's > the numeric range of the filenames, and the date range of their mod > times? Have you checked the postmaster log to see if you're > getting any > complaints about checkpoint failures or anything like that? It would > also be useful to look at the output of > select datname, age(datfrozenxid) from pg_database; > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |