Unix Technical Forum

PITR Backups

This is a discussion on PITR Backups within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi - I'm looking at ways to do clean PITR backups. Currently we're pg_dumping our data in some cases ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:05 AM
Dan Gorman
 
Posts: n/a
Default PITR Backups

Hi -
I'm looking at ways to do clean PITR backups. Currently we're
pg_dumping our data in some cases when compressed is about 100GB.
Needless to say it's slow and IO intensive on both the host and the
backup server.

All of our databases are on NetApp storage and I have been looking
at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
replica) for backing up our databases. The problem is because there
is no write-suspend or even a 'hot backup mode' for postgres it's
very plausible that the database has data in RAM that hasn't been
written and will corrupt the data. NetApp suggested that if we do a
SnapMirror, we do a couple in succession ( < 1s) so should one be
corrupt, we try the next one. They said oracle does something similar.

Is there a better way to quiesce the database without shutting it
down? Some of our databases are doing about 250,000 commits/min.

Best Regards,
Dan Gorman


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:05 AM
Tom Lane
 
Posts: n/a
Default Re: PITR Backups

Dan Gorman <dgorman@hi5.com> writes:
> All of our databases are on NetApp storage and I have been looking
> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
> replica) for backing up our databases. The problem is because there
> is no write-suspend or even a 'hot backup mode' for postgres it's
> very plausible that the database has data in RAM that hasn't been
> written and will corrupt the data.


I think you need to read the fine manual a bit more closely:
http://www.postgresql.org/docs/8.2/s...ckup-file.html
If the NetApp does provide an instantaneous-snapshot operation then
it will work fine; you just have to be sure the snap covers both
data and WAL files.

Alternatively, you can use a PITR base backup as suggested here:
http://www.postgresql.org/docs/8.2/s...archiving.html

In either case, the key point is that you need both the data files
and matching WAL files.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:05 AM
Toru SHIMOGAKI
 
Posts: n/a
Default Re: PITR Backups


Tom Lane wrote:
> Dan Gorman <dgorman@hi5.com> writes:
>> All of our databases are on NetApp storage and I have been looking
>> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
>> replica) for backing up our databases. The problem is because there
>> is no write-suspend or even a 'hot backup mode' for postgres it's
>> very plausible that the database has data in RAM that hasn't been
>> written and will corrupt the data.


> Alternatively, you can use a PITR base backup as suggested here:
> http://www.postgresql.org/docs/8.2/s...archiving.html


I think Dan's problem is important if we use PostgreSQL to a large size database:

- When we take a PITR base backup with hardware level snapshot operation
(not filesystem level) which a lot of storage vender provide, the backup data
can be corrupted as Dan said. During recovery we can't even read it,
especially if meta-data was corrupted.

- If we don't use hardware level snapshot operation, it takes long time to take
a large backup data, and a lot of full-page-written WAL files are made.

So, I think users need a new feature not to write out heap pages during taking a
backup.

Any comments?

Best regards,

--
Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
NTT Open Source Software Center


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 11:05 AM
Joshua D. Drake
 
Posts: n/a
Default Re: PITR Backups

Toru SHIMOGAKI wrote:
> Tom Lane wrote:


> - When we take a PITR base backup with hardware level snapshot operation
> (not filesystem level) which a lot of storage vender provide, the backup data
> can be corrupted as Dan said. During recovery we can't even read it,
> especially if meta-data was corrupted.
>
> - If we don't use hardware level snapshot operation, it takes long time to take
> a large backup data, and a lot of full-page-written WAL files are made.


Does it? I have done it with fairly large databases without issue.

Joshua D. Drake


>
> So, I think users need a new feature not to write out heap pages during taking a
> backup.
>
> Any comments?
>
> Best regards,
>



--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 11:05 AM
Steve Atkins
 
Posts: n/a
Default Re: PITR Backups


On Jun 21, 2007, at 7:30 PM, Toru SHIMOGAKI wrote:

>
> Tom Lane wrote:
>> Dan Gorman <dgorman@hi5.com> writes:
>>> All of our databases are on NetApp storage and I have been
>>> looking
>>> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
>>> replica) for backing up our databases. The problem is because there
>>> is no write-suspend or even a 'hot backup mode' for postgres it's
>>> very plausible that the database has data in RAM that hasn't been
>>> written and will corrupt the data.

>
>> Alternatively, you can use a PITR base backup as suggested here:
>> http://www.postgresql.org/docs/8.2/s...archiving.html

>
> I think Dan's problem is important if we use PostgreSQL to a large
> size database:
>
> - When we take a PITR base backup with hardware level snapshot
> operation
> (not filesystem level) which a lot of storage vender provide, the
> backup data
> can be corrupted as Dan said. During recovery we can't even read it,
> especially if meta-data was corrupted.


I can't see any explanation for how this could happen, other
than your hardware vendor is lying about snapshot ability.

What problems have you actually seen?

Cheers,
Steve





---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 11:05 AM
Toru SHIMOGAKI
 
Posts: n/a
Default Re: PITR Backups


Steve Atkins wrote:

>> - When we take a PITR base backup with hardware level snapshot operation
>> (not filesystem level) which a lot of storage vender provide, the
>> backup data
>> can be corrupted as Dan said. During recovery we can't even read it,
>> especially if meta-data was corrupted.

>
> I can't see any explanation for how this could happen, other
> than your hardware vendor is lying about snapshot ability.


All of the hardware vendors I asked always said:

"The hardware level snapshot has nothing to do with filesystem condition and
of course with what data has been written from operating system chache to the
hard disk platter. It just copies byte data on storage to the other volume.

So, if any data is written during taking snapshot, we can't assurance data
correctness *strictly* .

In Oracle, no table data is written between BEGIN BACKUP and END BACKUP, and it
is not a problem REDO is written..."

I'd like to know the correct information if the explanation has any mistakes, or
a good way to avoid the probrem.

I think there are users who want to migrate Oracle to PostgreSQL but can't
because of the problem as above.


Best regards,

--
Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
NTT Open Source Software Center


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 11:05 AM
Toru SHIMOGAKI
 
Posts: n/a
Default Re: PITR Backups


Joshua D. Drake wrote:

>> - If we don't use hardware level snapshot operation, it takes long time to take
>> a large backup data, and a lot of full-page-written WAL files are made.

>
> Does it? I have done it with fairly large databases without issue.


You mean hardware snapshot? I know taking a backup using rsync(or tar, cp?) as a
n online backup method is not so a big problem as documented. But it just take a
long time if we handle a terabyte database. We have to VACUUM and other batch
processes to the large database as well, so we don't want to take a long time
to take a backup...

Regards,

--
Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
NTT Open Source Software Center


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 11:05 AM
Dan Gorman
 
Posts: n/a
Default Re: PITR Backups

Here is an example. Most of the snap shots worked fine, but I did get
this once:

Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [9-1] 2007-06-21
00:39:43 PDTLOG: redo done at 71/99870670
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [10-1] 2007-06-21
00:39:43 PDTWARNING: page 28905 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [11-1] 2007-06-21
00:39:43 PDTWARNING: page 13626 of relation 1663/16384/76716 did not
exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [12-1] 2007-06-21
00:39:43 PDTWARNING: page 28904 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [13-1] 2007-06-21
00:39:43 PDTWARNING: page 26711 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [14-1] 2007-06-21
00:39:43 PDTWARNING: page 28900 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [15-1] 2007-06-21
00:39:43 PDTWARNING: page 3535208 of relation 1663/16384/33190 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [16-1] 2007-06-21
00:39:43 PDTWARNING: page 28917 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [17-1] 2007-06-21
00:39:43 PDTWARNING: page 3535207 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [18-1] 2007-06-21
00:39:43 PDTWARNING: page 28916 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [19-1] 2007-06-21
00:39:43 PDTWARNING: page 28911 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [20-1] 2007-06-21
00:39:43 PDTWARNING: page 26708 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [21-1] 2007-06-21
00:39:43 PDTWARNING: page 28914 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [22-1] 2007-06-21
00:39:43 PDTWARNING: page 28909 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [23-1] 2007-06-21
00:39:43 PDTWARNING: page 28908 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [24-1] 2007-06-21
00:39:43 PDTWARNING: page 28913 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [25-1] 2007-06-21
00:39:43 PDTWARNING: page 26712 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [26-1] 2007-06-21
00:39:43 PDTWARNING: page 28918 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [27-1] 2007-06-21
00:39:43 PDTWARNING: page 28912 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [28-1] 2007-06-21
00:39:43 PDTWARNING: page 3535209 of relation 1663/16384/33190 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [29-1] 2007-06-21
00:39:43 PDTWARNING: page 28907 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [30-1] 2007-06-21
00:39:43 PDTWARNING: page 28906 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [31-1] 2007-06-21
00:39:43 PDTWARNING: page 26713 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [32-1] 2007-06-21
00:39:43 PDTWARNING: page 17306 of relation 1663/16384/76710 did not
exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [33-1] 2007-06-21
00:39:43 PDTWARNING: page 26706 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [34-1] 2007-06-21
00:39:43 PDTWARNING: page 800226 of relation 1663/16384/33204 did
not exist
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [35-1] 2007-06-21
00:39:43 PDTWARNING: page 28915 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [36-1] 2007-06-21
00:39:43 PDTWARNING: page 26710 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [37-1] 2007-06-21
00:39:43 PDTWARNING: page 28903 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [38-1] 2007-06-21
00:39:43 PDTWARNING: page 28902 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [39-1] 2007-06-21
00:39:43 PDTWARNING: page 28910 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:39:43 sfmedstorageha001 postgres[3506]: [40-1] 2007-06-21
00:39:43 PDTPANIC: WAL contains references to invalid pages
Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [1-1] 2007-06-21
00:39:43 PDTLOG: startup process (PID 3506) was terminated by signal 6
Jun 21 00:39:43 sfmedstorageha001 postgres[3503]: [2-1] 2007-06-21
00:39:43 PDTLOG: aborting startup due to startup process failure
Jun 21 00:39:43 sfmedstorageha001 postgres[3505]: [1-1] 2007-06-21
00:39:43 PDTLOG: logger shutting down
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-1] 2007-06-21
00:40:39 PDTLOG: database system was interrupted while in recovery
at 2007-06-21 00:36:40 PDT
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-2] 2007-06-21
00:40:39 PDTHINT: This probably means that some data is corrupted
and you will have to use the last backup for
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [1-3] recovery.
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [2-1] 2007-06-21
00:40:39 PDTLOG: checkpoint record is at 71/9881E928
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [3-1] 2007-06-21
00:40:39 PDTLOG: redo record is at 71/986BF148; undo record is at
0/0; shutdown FALSE
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [4-1] 2007-06-21
00:40:39 PDTLOG: next transaction ID: 0/2871389429; next OID: 83795
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [5-1] 2007-06-21
00:40:39 PDTLOG: next MultiXactId: 1; next MultiXactOffset: 0
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [6-1] 2007-06-21
00:40:39 PDTLOG: database system was not properly shut down;
automatic recovery in progress
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [7-1] 2007-06-21
00:40:39 PDTLOG: redo starts at 71/986BF148
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [8-1] 2007-06-21
00:40:39 PDTLOG: record with zero length at 71/998706A8
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [9-1] 2007-06-21
00:40:39 PDTLOG: redo done at 71/99870670
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [10-1] 2007-06-21
00:40:39 PDTWARNING: page 28905 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [11-1] 2007-06-21
00:40:39 PDTWARNING: page 13626 of relation 1663/16384/76716 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [12-1] 2007-06-21
00:40:39 PDTWARNING: page 28904 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [13-1] 2007-06-21
00:40:39 PDTWARNING: page 26711 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [14-1] 2007-06-21
00:40:39 PDTWARNING: page 28900 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [15-1] 2007-06-21
00:40:39 PDTWARNING: page 3535208 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [16-1] 2007-06-21
00:40:39 PDTWARNING: page 28917 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [17-1] 2007-06-21
00:40:39 PDTWARNING: page 3535207 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [18-1] 2007-06-21
00:40:39 PDTWARNING: page 28916 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [19-1] 2007-06-21
00:40:39 PDTWARNING: page 28911 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [20-1] 2007-06-21
00:40:39 PDTWARNING: page 26708 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [21-1] 2007-06-21
00:40:39 PDTWARNING: page 28914 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [22-1] 2007-06-21
00:40:39 PDTWARNING: page 28909 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [23-1] 2007-06-21
00:40:39 PDTWARNING: page 28908 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [24-1] 2007-06-21
00:40:39 PDTWARNING: page 28913 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [25-1] 2007-06-21
00:40:39 PDTWARNING: page 26712 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [26-1] 2007-06-21
00:40:39 PDTWARNING: page 28918 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [27-1] 2007-06-21
00:40:39 PDTWARNING: page 28912 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [28-1] 2007-06-21
00:40:39 PDTWARNING: page 3535209 of relation 1663/16384/33190 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [29-1] 2007-06-21
00:40:39 PDTWARNING: page 28907 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [30-1] 2007-06-21
00:40:39 PDTWARNING: page 28906 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [31-1] 2007-06-21
00:40:39 PDTWARNING: page 26713 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [32-1] 2007-06-21
00:40:39 PDTWARNING: page 17306 of relation 1663/16384/76710 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [33-1] 2007-06-21
00:40:39 PDTWARNING: page 26706 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [34-1] 2007-06-21
00:40:39 PDTWARNING: page 800226 of relation 1663/16384/33204 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [35-1] 2007-06-21
00:40:39 PDTWARNING: page 28915 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [36-1] 2007-06-21
00:40:39 PDTWARNING: page 26710 of relation 1663/16384/76719 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [37-1] 2007-06-21
00:40:39 PDTWARNING: page 28903 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [38-1] 2007-06-21
00:40:39 PDTWARNING: page 28902 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [39-1] 2007-06-21
00:40:39 PDTWARNING: page 28910 of relation 1663/16384/76718 was
uninitialized
Jun 21 00:40:39 sfmedstorageha001 postgres[3757]: [40-1] 2007-06-21
00:40:39 PDTPANIC: WAL contains references to invalid pages
Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [1-1] 2007-06-21
00:40:39 PDTLOG: startup process (PID 3757) was terminated by signal 6
Jun 21 00:40:39 sfmedstorageha001 postgres[3755]: [2-1] 2007-06-21
00:40:39 PDTLOG: aborting startup due to startup process failure
Jun 21 00:40:39 sfmedstorageha001 postgres[3756]: [1-1] 2007-06-21
00:40:39 PDTLOG: logger shutting down


On Jun 22, 2007, at 12:30 AM, Toru SHIMOGAKI wrote:

>
> Steve Atkins wrote:
>
>>> - When we take a PITR base backup with hardware level snapshot
>>> operation
>>> (not filesystem level) which a lot of storage vender provide,
>>> the backup data
>>> can be corrupted as Dan said. During recovery we can't even
>>> read it,
>>> especially if meta-data was corrupted.

>> I can't see any explanation for how this could happen, other
>> than your hardware vendor is lying about snapshot ability.

>
> All of the hardware vendors I asked always said:
>
> "The hardware level snapshot has nothing to do with filesystem
> condition and of course with what data has been written from
> operating system chache to the hard disk platter. It just copies
> byte data on storage to the other volume.
>
> So, if any data is written during taking snapshot, we can't
> assurance data correctness *strictly* .
>
> In Oracle, no table data is written between BEGIN BACKUP and END
> BACKUP, and it is not a problem REDO is written..."
>
> I'd like to know the correct information if the explanation has any
> mistakes, or a good way to avoid the probrem.
>
> I think there are users who want to migrate Oracle to PostgreSQL
> but can't because of the problem as above.
>
>
> Best regards,
>
> --
> Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
> NTT Open Source Software Center
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster




---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 11:05 AM
Toru SHIMOGAKI
 
Posts: n/a
Default Re: PITR Backups


Dan Gorman wrote:
> Here is an example. Most of the snap shots worked fine, but I did get
> this once:


Thank you for your example. I'd appreciate it if I'd get any responses; whether
we should tackle the problem for 8.4?

Regards,

--
Toru SHIMOGAKI<shimogaki.toru@oss.ntt.co.jp>
NTT Open Source Software Center


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-19-2008, 11:05 AM
Simon Riggs
 
Posts: n/a
Default Re: PITR Backups

On Fri, 2007-06-22 at 11:30 +0900, Toru SHIMOGAKI wrote:
> Tom Lane wrote:
> > Dan Gorman <dgorman@hi5.com> writes:
> >> All of our databases are on NetApp storage and I have been looking
> >> at SnapMirror (PITR RO copy ) and FlexClone (near instant RW volume
> >> replica) for backing up our databases. The problem is because there
> >> is no write-suspend or even a 'hot backup mode' for postgres it's
> >> very plausible that the database has data in RAM that hasn't been
> >> written and will corrupt the data.

>
> > Alternatively, you can use a PITR base backup as suggested here:
> > http://www.postgresql.org/docs/8.2/s...archiving.html

>
> I think Dan's problem is important if we use PostgreSQL to a large size database:
>
> - When we take a PITR base backup with hardware level snapshot operation
> (not filesystem level) which a lot of storage vender provide, the backup data
> can be corrupted as Dan said. During recovery we can't even read it,
> especially if meta-data was corrupted.
>
> - If we don't use hardware level snapshot operation, it takes long time to take
> a large backup data, and a lot of full-page-written WAL files are made.
>
> So, I think users need a new feature not to write out heap pages during taking a
> backup.


Your worries are unwarranted, IMHO. It appears Dan was taking a snapshot
without having read the procedure as clearly outlined in the manual.

pg_start_backup() flushes all currently dirty blocks to disk as part of
a checkpoint. If you snapshot after that point, then you will have all
the data blocks required from which to correctly roll forward. On its
own, the snapshot is an inconsistent backup and will give errors as Dan
shows. It is only when the snapshot is used as the base backup in a full
continuous recovery that the inconsistencies are removed and the
database is fully and correctly restored.

pg_start_backup() is the direct analogue of Oracle's ALTER DATABASE
BEGIN BACKUP. Snapshots work with Oracle too, in much the same way.

After reviewing the manual, if you honestly think there is a problem,
please let me know and I'll work with you to investigate.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:05 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com