Unix Technical Forum

multiple insertions through jdbc

This is a discussion on multiple insertions through jdbc within the DB2 forums, part of the Database Server Software category; --> Hi everybody! I'm using DB2 PE v8.2.3 for linux. I've defined a database with the following schema: ANNOTATION(ID,AUTHOR,TEXT) ANNOTATION_BOOK(ANNOTATION_ID,OBJECT_ID) ...


Go Back   Unix Technical Forum > Database Server Software > DB2

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-27-2008, 07:05 AM
Diego
 
Posts: n/a
Default multiple insertions through jdbc

Hi everybody!

I'm using DB2 PE v8.2.3 for linux. I've defined a database with the
following schema:

ANNOTATION(ID,AUTHOR,TEXT)
ANNOTATION_BOOK(ANNOTATION_ID,OBJECT_ID)
BOOK(ID,AUTHOR,TITLE).

Between the book and annotation entities there is a many-to-many
relation: one annotation can involve many books and a book can be
annotated by many annotations. So the attributes ANNOTATION_ID and
OBJECT_ID of the ANNOTATION_BOOK table are foreign keys referring to the
ANNOTATION and BOOK tables ids, respectively. I'm using two sequences
for the attributes ID of tables ANNOTATION and BOOK: annotation_id_seq
and book_id_seq.

I've to make batch insertions in this database that are the result of an
algorithm. The algorithm output is a set made up of thousands of
annotations involving a small number of books. These books might
already been stored in the database and obviously I don't want to have
multiple tuples that represent the same book, i.e., have the same values
for the attributes AUTHOR and TITLE.

The result of the algorithm has to be stored in the database through
JDBC. I'm using the driver in the packages: db2jcc.jar and db2java.jar.


My current solution is:

1. Query the table BOOK to determine the ids of the books that are
already persistent.

2. Inserting the book annotations in sequence in a batch. More
specifically:
2.1 Creating an instance of object java.sql.Statement, let's call it
statement.
2.2 Adding an sql insert command for table ANNOTATION to the statement:

insert into ANNOTATION(ID,AUTHOR,TEXT)
values(next value for annotation_id_seq,...).

2.3 For each book involved in the book annotation:
2.3.1(a) If the book is not already persistent add two sql insert
command for table BOOK and table ANNOTATION_BOOK to the statement:

insert into BOOK(ID,AUTHOR,TITLE)
values(next value for public.book_id_seq,...)

insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
values(previous value for annotation_id_seq,
previous value for book_id_seq);

2.3.1(b) If the book annotation involves a book that is already
persistent add one sql insert command for table ANNOTATION_BOOK with the
value that you retrieved at point 1 for the attribute BOOK_ID:

insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
values(previous value for annotation_id_seq,
<value_retrieved_at_point_1>);
2.4 Execute the batch:

statement.executeBatch();


This solution is not very efficient. I would like to improve the
insertion performance. Maybe after retrieving the ids of the persistent
books (point 1), I could ask the dbms to increase the book and
annotation sequence by how much I need and then insert the tuples one
table at the time, maybe using a prepared statement instead of a statement.

Do you have any suggestion?

Thank you very much,

Diego
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-27-2008, 07:07 AM
Phil Sherman
 
Posts: n/a
Default Re: multiple insertions through jdbc

The nature of this question and the use of PE indicates this could be a
class problem.

The key to improving performance here is to minimize the overhead.
1. Use parameter markers and multiple statements.
2. Figure out a way to make the database manager do some of the work for
you.
3. Consider inserting multiple ANNOTATION_BOOK rows with each statement.

You'll have problems with your 1. because you've made no allowance for
the books you'll be adding to the table. You've also said nothing about
how you will be determining the id for the persistant book associated
with the current annotation. You've also said nothing about determining
that an annotation you are processing already exists in the annotation
table. (It's possible that all annotations are considered new and get a
new row in their table.)

Is a sequence appropriate? Would identity columns for BOOK and
ANNOTATION be an alternative?


Phil Sherman



Diego wrote:
> Hi everybody!
>
> I'm using DB2 PE v8.2.3 for linux. I've defined a database with the
> following schema:
>
> ANNOTATION(ID,AUTHOR,TEXT)
> ANNOTATION_BOOK(ANNOTATION_ID,OBJECT_ID)
> BOOK(ID,AUTHOR,TITLE).
>
> Between the book and annotation entities there is a many-to-many
> relation: one annotation can involve many books and a book can be
> annotated by many annotations. So the attributes ANNOTATION_ID and
> OBJECT_ID of the ANNOTATION_BOOK table are foreign keys referring to the
> ANNOTATION and BOOK tables ids, respectively. I'm using two sequences
> for the attributes ID of tables ANNOTATION and BOOK: annotation_id_seq
> and book_id_seq.
>
> I've to make batch insertions in this database that are the result of an
> algorithm. The algorithm output is a set made up of thousands of
> annotations involving a small number of books. These books might
> already been stored in the database and obviously I don't want to have
> multiple tuples that represent the same book, i.e., have the same values
> for the attributes AUTHOR and TITLE.
>
> The result of the algorithm has to be stored in the database through
> JDBC. I'm using the driver in the packages: db2jcc.jar and db2java.jar.
>
>
> My current solution is:
>
> 1. Query the table BOOK to determine the ids of the books that are
> already persistent.
>
> 2. Inserting the book annotations in sequence in a batch. More
> specifically:
> 2.1 Creating an instance of object java.sql.Statement, let's call it
> statement.
> 2.2 Adding an sql insert command for table ANNOTATION to the
> statement:
>
> insert into ANNOTATION(ID,AUTHOR,TEXT)
> values(next value for annotation_id_seq,...).
>
> 2.3 For each book involved in the book annotation:
> 2.3.1(a) If the book is not already persistent add two sql
> insert command for table BOOK and table ANNOTATION_BOOK to the statement:
>
> insert into BOOK(ID,AUTHOR,TITLE)
> values(next value for public.book_id_seq,...)
>
> insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
> values(previous value for annotation_id_seq,
> previous value for book_id_seq);
>
> 2.3.1(b) If the book annotation involves a book that is already
> persistent add one sql insert command for table ANNOTATION_BOOK with the
> value that you retrieved at point 1 for the attribute BOOK_ID:
>
> insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
> values(previous value for annotation_id_seq,
> <value_retrieved_at_point_1>);
> 2.4 Execute the batch:
>
> statement.executeBatch();
>
>
> This solution is not very efficient. I would like to improve the
> insertion performance. Maybe after retrieving the ids of the persistent
> books (point 1), I could ask the dbms to increase the book and
> annotation sequence by how much I need and then insert the tuples one
> table at the time, maybe using a prepared statement instead of a statement.
>
> Do you have any suggestion?
>
> Thank you very much,
>
> Diego

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-27-2008, 07:08 AM
Diego
 
Posts: n/a
Default Re: multiple insertions through jdbc

Hi Phil,

I've made some modifications to this procedure that go in the direction
you were suggesting in your points 1,2 and 3 and I had a good improvement.

> You've also said nothing about how you will be determining the id for the persistant book associated with the current annotation.


I assume that the pair of attributes (AUTHOR, TITLE) of the relation
BOOK is unique (actually the db is more complex than that but this was a
significant simplification). So I query BOOK with two values for this
pair of attributes, e.g.:

select *
from BOOK " +
where AUTHOR = ? and TITLE = ?

> You've also said nothing about determining that an annotation you are processing already exists in the annotation table. (It's possible that all annotations are considered new and get a new row in their table.)


Yes, that's exactly the case. All annotation are considered new and get
a new row.

> Is a sequence appropriate? Would identity columns for BOOK and ANNOTATION be an alternative?


In the new procedure I cannot use identity columns.


The new procedure works as follows:

1. Query the table BOOK to determine the ids of the books that are
already persistent. To do that I execute a PreparedStatement for each
book:

String query = "select * " +
"from public.BOOK " +
"where AUTHOR = ? and TITLE = ?";
preparedStatement = connection.prepareStatement(query);
for each book
preparedStatement.setXXX(1,...);
preparedStatement.setXXX(2,...);
resultSet = preparedStatement.executeQuery();

As you suggested in point 1, I tried to execute a multiple statement
(using addBatch) with a preparedStatement but it is not supported with a
SELECT.

2. Request as many ids as needed for the annotations from the sequences
that manages the ANNOTATION table id:

preparedStatement = connection.prepareStatement("values nextval for
public.annotation_id_seq");
for each annotation
ResultSet resultSet = preparedStatement.getResultSet();

3. Request as many ids as needed for the books.

4. Inserts tuple in ANNOTATION table.
5. Inserts tuple in BOOK_ANNOTATION table.
6. Insert tuples in BOOK table.

Points 4,5 and 6 are implemented through a prepared statement batch
(what you called in your point "multiple statement", if I got it right).
As an example, consider point 4:

String sql = "insert into public.annotation(id,author,text) " +
"values(?,?,?);";
preparedStatement = connection.prepareStatement(sql);
for each annotation
preparedStatement.setXXX(1,...);
preparedStatement.setXXX(2,...);
preparedStatement.setXXX(3,...);
preparedStatement.addBatch();

This is analogous to what I intended to do for point 1. But apparently
multiple sql commands with prepared statements don't work with SELECTs.

Thanks a lot!

Diego
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:35 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com