Unix Technical Forum

Best practice to load a huge table from ORACLE to PG

This is a discussion on Best practice to load a huge table from ORACLE to PG within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi All, I´d like to know what´s the best practice to LOAD a 70 milion rows, 101 columns table ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-29-2008, 09:32 PM
Adonias Malosso
 
Posts: n/a
Default Best practice to load a huge table from ORACLE to PG

Hi All,

I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
columns table
from ORACLE to PGSQL.

The current approach is to dump the data in CSV and than COPY it to
Postgresql.

Anyone has a better idea.


Regards
Adonias Malosso

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-29-2008, 09:32 PM
Joshua D. Drake
 
Posts: n/a
Default Re: Best practice to load a huge table from ORACLE to PG

Adonias Malosso wrote:
> Hi All,
>
> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table
> from ORACLE to PGSQL.
>
> The current approach is to dump the data in CSV and than COPY it to
> Postgresql.
>
> Anyone has a better idea.


Write a java trigger in Oracle that notes when a row has been
added/delete/updated and does the exact same thing in postgresql.

Joshua D. Drake


>
>
> Regards
> Adonias Malosso



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-29-2008, 09:32 PM
Luke Lonergan
 
Posts: n/a
Default Re: Best practice to load a huge table from ORACLE to PG

Yep * just do something like this within sqlplus (from
http://www.dbforums.com/showthread.php?t=350614):

set termout off
set hea off
set pagesize 0

spool c:\whatever.csv

select a.a||','||a.b||','||a.c
from a
where a.a="whatever";

spool off

COPY is the fastest approach to get it into PG.

- Luke

On 4/26/08 6:25 AM, "Adonias Malosso" <malosso@gmail.com> wrote:

> Hi All,
>
> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table
> from ORACLE to PGSQL.
>
> The current approach is to dump the data in CSV and than COPY it to
> Postgresql.
>
> Anyone has a better idea.
>
>
> Regards
> Adonias Malosso
>



Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-29-2008, 09:32 PM
Jonah H. Harris
 
Posts: n/a
Default Re: Best practice to load a huge table from ORACLE to PG

On Sat, Apr 26, 2008 at 9:25 AM, Adonias Malosso <malosso@gmail.com> wrote:
> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table
> from ORACLE to PGSQL.


The fastest and easiest method would be to dump the data from Oracle
into CSV/delimited format using something like ociuldr
(http://www.anysql.net/en/ociuldr.html) and load it back into PG using
pg_bulkload (which is a helluva lot faster than COPY). Of course, you
could try other things as well... such as setting up generic
connectivity to PG and inserting the data to a PG table over the
database link.

Similarly, while I hate to see shameless self-plugs in the community,
the *fastest* method you could use is dblink_ora_copy, contained in
EnterpriseDB's PG+ Advanced Server; it uses an optimized OCI
connection to COPY the data directly from Oracle into Postgres, which
also saves you the intermediate step of dumping the data.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-29-2008, 09:32 PM
Greg Smith
 
Posts: n/a
Default Re: Best practice to load a huge table from ORACLE to PG

On Sat, 26 Apr 2008, Adonias Malosso wrote:

> The current approach is to dump the data in CSV and than COPY it to
> Postgresql.


You would have to comment on what you don't like about what you're doing
now, what parts need to be improved for your priorities, to get a properly
targeted answer here.

> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table from ORACLE to PGSQL.


There is no one best practice. There's a wide variety of techniques on
both the Oracle and PostgreSQL side in this area that might be used
depending on what trade-offs are important to you.

For example, if the goal was to accelerate a dump of a single table to run
as fast as possible because you need , you'd want to look into techniques
that dumped that table with multiple sessions going at once, each handling
a section of that table. Typically you'd use one session per CPU on the
server, and you'd use something with a much more direct path into the data
than SQL*PLUS. Then on the PostgreSQL side, you could run multiple COPY
sessions importing at once to read this data all back in, because COPY
will bottleneck at the CPU level before the disks will if you've got
reasonable storage hardware.

There's a list of utilities in this are at
http://www.orafaq.com/wiki/SQL*Loade...a_flat_file.3F
you might look for inspiration in that area, I know the WisdomForce
FastReader handles simultaneous multi-section dumps via a very direct path
to the data.

....but that's just one example based on one set of priorities, and it will
be expensive in terms of dollars and complexity.

As another example of something that changes things considerably, if
there's any data with errors that will cause COPY to abort you might
consider a different approach on the PG side.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-29-2008, 09:32 PM
Dimitri Fontaine
 
Posts: n/a
Default Re: Best practice to load a huge table from ORACLE to PG

Hi,

Le dimanche 27 avril 2008, Greg Smith a écrit*:
> than SQL*PLUS. Then on the PostgreSQL side, you could run multiple COPY
> sessions importing at once to read this data all back in, because COPY
> will bottleneck at the CPU level before the disks will if you've got
> reasonable storage hardware.


Latest pgloader version has been made to handle this exact case, so if you
want to take this route, please consider pgloader 2.3.0:
http://pgloader.projects.postgresql....rallel_loading
http://pgfoundry.org/projects/pgloader/

Another good reason to consider using pgloader is when the datafile contains
erroneous input lines and you don't want the COPY transaction to abort. Those
error lines will get rejected out by pgloader while the correct ones will get
COPYied in.

Regards,
--
dim

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBIFYGUlBXRlnbh1bkRAlcuAJ9X8NCdCyz61H8LwRR17L/m3vBAmACgrtg9
oxPM0SLlDbJC/cld9F4x7l8=
=qIhZ
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-29-2008, 09:32 PM
Tino Wildenhain
 
Posts: n/a
Default Re: Best practice to load a huge table from ORACLE to PG

Adonias Malosso wrote:
> Hi All,
>
> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table
> from ORACLE to PGSQL.
>
> The current approach is to dump the data in CSV and than COPY it to
> Postgresql.
>

Uhm. 101 columns you say? Sounds interesting. There are dataloaders
like: http://pgfoundry.org/projects/pgloader/ which could speed
up loading the data over just copy csv. I wonder how much normalizing
could help.

Tino

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-29-2008, 09:32 PM
Adonias Malosso
 
Posts: n/a
Default Re: Best practice to load a huge table from ORACLE to PG

Jonah,

Thank you for the answer. Good to know about this enterprise DB feature.

I´ll follow using pgloader.

Regards.

Adonias Malosso

On Sat, Apr 26, 2008 at 10:14 PM, Jonah H. Harris <jonah.harris@gmail.com>
wrote:

> On Sat, Apr 26, 2008 at 9:25 AM, Adonias Malosso <malosso@gmail.com>
> wrote:
> > I´d like to know what´s the best practice to LOAD a 70 milion rows,101
> > columns table
> > from ORACLE to PGSQL.

>
> The fastest and easiest method would be to dump the data from Oracle
> into CSV/delimited format using something like ociuldr
> (http://www.anysql.net/en/ociuldr.html) and load it back into PG using
> pg_bulkload (which is a helluva lot faster than COPY). Of course, you
> could try other things as well... such as setting up generic
> connectivity to PG and inserting the data to a PG table over the
> database link.
>
> Similarly, while I hate to see shameless self-plugs in the community,
> the *fastest* method you could use is dblink_ora_copy, contained in
> EnterpriseDB's PG+ Advanced Server; it uses an optimized OCI
> connection to COPY the data directly from Oracle into Postgres, which
> also saves you the intermediate step of dumping the data.
>
> --
> Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
> EnterpriseDB Corporation | fax: 732.331.1301
> 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
> Edison, NJ 08837 | http://www.enterprisedb.com/
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-29-2008, 09:32 PM
Jonah H. Harris
 
Posts: n/a
Default Re: Best practice to load a huge table from ORACLE to PG

On Mon, Apr 28, 2008 at 5:37 PM, Adonias Malosso <malosso@gmail.com> wrote:
> Thank you for the answer. Good to know about this enterprise DB feature.


No problem.

> I´ll follow using pgloader.


That's fine. Though, I'd really suggest pg_bulkload, it's quite a bit faster.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com
Edison, NJ 08837 | http://www.enterprisedb.com/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-29-2008, 09:32 PM
Dorren
 
Posts: n/a
Default Re: Best practice to load a huge table from ORACLE to PG

Someone mentioned FastReader. Here the pdf "Best Practices for high-
speed data transfer from Oracle to Netezza using FastReader (relevant
for PostgreSQL as well)": http://www.wisdomforce.com/dweb/reso...FastReader.pdf
There are few limitations in PG loader that you would need to overcome


On Apr 26, 6:25 am, malo...@gmail.com ("Adonias Malosso") wrote:
> Hi All,
>
> I´d like to know what´s the best practice to LOAD a 70 milion rows, 101
> columns table
> from ORACLE to PGSQL.
>
> The current approach is to dump the data in CSV and than COPY it to
> Postgresql.
>
> Anyone has a better idea.
>
> Regards
> Adonias Malosso


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:09 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com