This is a discussion on Adding WHERE clause to pg_dump within the pgsql Hackers forums, part of the PostgreSQL category; --> On Fri, 2008-07-25 at 14:11 -0700, daveg wrote: > On Fri, Jul 25, 2008 at 08:26:35PM +0100, Simon Riggs ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Fri, 2008-07-25 at 14:11 -0700, daveg wrote: > On Fri, Jul 25, 2008 at 08:26:35PM +0100, Simon Riggs wrote: > - This can be done with a script. > > Not really. The script would pretty much have to contain most of > pg_dump. That's more than a script. > Yes really. schema pull. In a situation like this you would pull a pg_dump -s then only restore data that you want based on a single transaction snapshot of the objects you are going to query. > - users could make partial dumps and be confused and lose data. > > Yes, but they can already do that with -n, -t, and the new pre-data > and post-data switches. This is one more case where the default is > a full dump but you one can specificly request less. No they actually can't. You are guaranteed that regardless of a -n or -t flag that the data you receive is consistent. You can't guarantee that with -w because you could pull different data based on an arbitrary conditional that can not apply to all objects. Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Fri, 2008-07-25 at 14:11 -0700, daveg wrote: > On Fri, Jul 25, 2008 at 08:26:35PM +0100, Simon Riggs wrote: > > > > On Fri, 2008-07-25 at 11:46 -0700, Joshua D. Drake wrote: > > > On Fri, 2008-07-25 at 19:33 +0100, Simon Riggs wrote: > > > > On Fri, 2008-07-25 at 13:31 -0400, Tom Lane wrote: > > > > > Simon Riggs <simon@2ndquadrant.com> writes: > > > > > > Attached patch implements WHERE clauses for pg_dump. > > > > > > > > > > I still have serious reservations about adding such an ugly, > > > > > non-orthogonal wart to pg_dump. Why is it not appropriate to just > > > > > do a COPY (SELECT ...) TO STDOUT when you need this? > > > > > > > > So you can dump a coherent sample database in one command, not 207. > > > > > > > > Every user of PostgreSQL wants a dev/test database. If the database is > > > > large it isn't practical to take a complete copy. Nor is it practical to > > > > hand-write a data sampling extraction program and if you do, its usually > > > > imperfect in many ways. > > > > > > > > Adding this feature gives a very fast capability to create sample > > > > databases, or incremental backups for many cases. > > > > > > Not sure I buy this argument. I am all for usability and I would be the > > > first to shout about the general ridiculousness of pg_dump/all/restore > > > but in this case I think Tom is right. This feature could easily be done > > > in a script without harassing pg_dump. > > > > You can do it, yes. But it takes a lot longer. If the time to implement > > was similar, then I would immediately agree "feature available already". > > > > pg_dump is not "harassed" by this. What is lost by adding this feature? > > This was discussed at the beginning of June on patches, Dave Durham submitted > a patch to add where clauses via a -w option and then in response to feedback > to add it to each each table of -t. See discussion here: > > http://archives.postgresql.org/pgsql...6/msg00001.php > > and final patch here: > > http://archives.postgresql.org/pgsql...6/msg00026.php. > > We now have two patches on this topic from different submitters with > different use cases supplied as justification. Well, that is truly bizarre. I had no idea about the existence of the other patch. I guess I must have been busy that week. This was designed a while back in conjunction with other related thoughts. I still want an easy way to create a data sample for creating dev databases from large production systems. I defer and apologise to the previous submitter, since he got there first, and apologise again for the noise. (Cheeky code review: Davy's patch fails if used with -o option, plus I think it outputs the wrong text into the dump file, AFAICS). -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Fri, Jul 25, 2008 at 11:17:20PM +0100, Gregory Stark wrote: > "Joshua D. Drake" <jd@commandprompt.com> writes: > > > How do we deal with this? > > > > pg_dump -w "last_update_timestamp < ..." -t 'table*' > > > > What I see is a recipe for inconsistent, un-restorable backups without a > > user realizing what they have done. The only way to deal with the above > > is: > > > > 1. Wildcards aren't allowed if you have -w > > 2. You dump everything, if the WHERE clause isn't relevant you just dump > > the whole table > > There's always > > 3. Apply the WHERE clause to all tables and if there's a table missing > columns referenced in the where clause then fail with the appropriate > error. > > Which seems like the right option to me. The tricky bit would be how to deal > with cases where you want a different where clause for different tables. But > even if it doesn't handle all cases that doesn't mean a partial solution is > unreasonable. Actually, Davy's patch does deal with the case "where you want a different where clause for different tables". -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Fri, 2008-07-25 at 12:38 -0700, Joshua D. Drake wrote: > Gained. Code complexity. Hardly, patch is very small. I would recognise that as a factor otherwise. > What I see is a recipe for inconsistent, un-restorable backups without a > user realizing what they have done. I agree on the backup side, but then who would extract just a portion of their data for backup? It would be no backup at all. If you did use this as part of an incremental backup scheme, then they would have to test it (just like any backup method). Incremental backups rarely have self-consistency except as part of a greater whole. As a dev tool it makes sense. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Simon Riggs wrote: > On Fri, 2008-07-25 at 12:38 -0700, Joshua D. Drake wrote: > > >> Gained. Code complexity. >> > > Hardly, patch is very small. I would recognise that as a factor > otherwise. > > >> What I see is a recipe for inconsistent, un-restorable backups without a >> user realizing what they have done. >> > > I agree on the backup side, but then who would extract just a portion of > their data for backup? It would be no backup at all. > > If you did use this as part of an incremental backup scheme, then they > would have to test it (just like any backup method). Incremental backups > rarely have self-consistency except as part of a greater whole. > > As a dev tool it makes sense. > > I think we have yet another case for moving the core bits of pg_dump into a library that can then be used by lots of clients. Until we do that we're going to get continual pressure to add extra cases to pg_dump unrelated to its principal functionality. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Sat, 2008-07-26 at 07:47 -0400, Andrew Dunstan wrote: > > Simon Riggs wrote: > > As a dev tool it makes sense. > > > I think we have yet another case for moving the core bits of pg_dump > into a library that can then be used by lots of clients. Until we do > that we're going to get continual pressure to add extra cases to pg_dump > unrelated to its principal functionality. That's a good idea and I support that. I'm slightly suprised at the "principal functionality" bit. In a world where PITR exists the role and importance of pg_dump has waned considerably. What *is* its principal function? Does it have just one? One man's dev system is another man's data warehouse, or another man's backup. The meaning of a dump is defined by the user making the data dump, not the tool used. Is this one option sufficient to make us invent pg_make_dev_database? (With all pg_dump options, plus -w). If that's what we need, fine by me. I'm always interested in the capability not the structure/naming. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Simon Riggs wrote: > In a world > where PITR exists the role and importance of pg_dump has waned > considerably. What *is* its principal function? Does it have just one? > > > I think that's probably a rather narrow perspective. PITR doesn't work across versions or architectures or OSes. And if you're using it for failover, then using it for standalone backups as well means you will need a custom archive_command which can be a bit tricky to get right. And a custom dump is almost always far smaller than a PITR dump, even when it's compressed. I suspect that the vast majority of our users are still using pg_dump to make normal backups, and that it works quite happily for them. It's really only when databases get pretty large that this becomes unmanageable. I think using pg_dump for backups and PITR for failover is a good combination for a great many users. So, IMNSHO, making a full database backup is still pg_dump's principal function. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Sat, 2008-07-26 at 09:08 -0400, Andrew Dunstan wrote: > So, IMNSHO, making a full database backup is still pg_dump's principal > function. Making copies for development databases is also a common use case, and if not more common than backups, at least not far behind. This was my stated use case. >From my perspective, this should be fairly simple * do we agree the use case is a problem we care about? * do we agree the proposal would help that use case? * whats the best way to package that feature? If we wish to protect pg_dump's role, then lets have another utility or some packaging that can be used for its other hidden roles. That sounds like we might all agree on that. pg_dev_dump? How should it look? -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| ||||
| Simon Riggs wrote: > If we wish to protect pg_dump's role, then lets have another utility or > some packaging that can be used for its other hidden roles. That sounds > like we might all agree on that. pg_dev_dump? How should it look? > > Actually, if we libraryise pg_dump and add some corresponding \ commands to psql, then this would possibly be unnecessary . cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |