This is a discussion on multiple insertions through jdbc within the DB2 forums, part of the Database Server Software category; --> Hi everybody! I'm using DB2 PE v8.2.3 for linux. I've defined a database with the following schema: ANNOTATION(ID,AUTHOR,TEXT) ANNOTATION_BOOK(ANNOTATION_ID,OBJECT_ID) ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi everybody! I'm using DB2 PE v8.2.3 for linux. I've defined a database with the following schema: ANNOTATION(ID,AUTHOR,TEXT) ANNOTATION_BOOK(ANNOTATION_ID,OBJECT_ID) BOOK(ID,AUTHOR,TITLE). Between the book and annotation entities there is a many-to-many relation: one annotation can involve many books and a book can be annotated by many annotations. So the attributes ANNOTATION_ID and OBJECT_ID of the ANNOTATION_BOOK table are foreign keys referring to the ANNOTATION and BOOK tables ids, respectively. I'm using two sequences for the attributes ID of tables ANNOTATION and BOOK: annotation_id_seq and book_id_seq. I've to make batch insertions in this database that are the result of an algorithm. The algorithm output is a set made up of thousands of annotations involving a small number of books. These books might already been stored in the database and obviously I don't want to have multiple tuples that represent the same book, i.e., have the same values for the attributes AUTHOR and TITLE. The result of the algorithm has to be stored in the database through JDBC. I'm using the driver in the packages: db2jcc.jar and db2java.jar. My current solution is: 1. Query the table BOOK to determine the ids of the books that are already persistent. 2. Inserting the book annotations in sequence in a batch. More specifically: 2.1 Creating an instance of object java.sql.Statement, let's call it statement. 2.2 Adding an sql insert command for table ANNOTATION to the statement: insert into ANNOTATION(ID,AUTHOR,TEXT) values(next value for annotation_id_seq,...). 2.3 For each book involved in the book annotation: 2.3.1(a) If the book is not already persistent add two sql insert command for table BOOK and table ANNOTATION_BOOK to the statement: insert into BOOK(ID,AUTHOR,TITLE) values(next value for public.book_id_seq,...) insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID) values(previous value for annotation_id_seq, previous value for book_id_seq); 2.3.1(b) If the book annotation involves a book that is already persistent add one sql insert command for table ANNOTATION_BOOK with the value that you retrieved at point 1 for the attribute BOOK_ID: insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID) values(previous value for annotation_id_seq, <value_retrieved_at_point_1>); 2.4 Execute the batch: statement.executeBatch(); This solution is not very efficient. I would like to improve the insertion performance. Maybe after retrieving the ids of the persistent books (point 1), I could ask the dbms to increase the book and annotation sequence by how much I need and then insert the tuples one table at the time, maybe using a prepared statement instead of a statement. Do you have any suggestion? Thank you very much, Diego |
| |||
| The nature of this question and the use of PE indicates this could be a class problem. The key to improving performance here is to minimize the overhead. 1. Use parameter markers and multiple statements. 2. Figure out a way to make the database manager do some of the work for you. 3. Consider inserting multiple ANNOTATION_BOOK rows with each statement. You'll have problems with your 1. because you've made no allowance for the books you'll be adding to the table. You've also said nothing about how you will be determining the id for the persistant book associated with the current annotation. You've also said nothing about determining that an annotation you are processing already exists in the annotation table. (It's possible that all annotations are considered new and get a new row in their table.) Is a sequence appropriate? Would identity columns for BOOK and ANNOTATION be an alternative? Phil Sherman Diego wrote: > Hi everybody! > > I'm using DB2 PE v8.2.3 for linux. I've defined a database with the > following schema: > > ANNOTATION(ID,AUTHOR,TEXT) > ANNOTATION_BOOK(ANNOTATION_ID,OBJECT_ID) > BOOK(ID,AUTHOR,TITLE). > > Between the book and annotation entities there is a many-to-many > relation: one annotation can involve many books and a book can be > annotated by many annotations. So the attributes ANNOTATION_ID and > OBJECT_ID of the ANNOTATION_BOOK table are foreign keys referring to the > ANNOTATION and BOOK tables ids, respectively. I'm using two sequences > for the attributes ID of tables ANNOTATION and BOOK: annotation_id_seq > and book_id_seq. > > I've to make batch insertions in this database that are the result of an > algorithm. The algorithm output is a set made up of thousands of > annotations involving a small number of books. These books might > already been stored in the database and obviously I don't want to have > multiple tuples that represent the same book, i.e., have the same values > for the attributes AUTHOR and TITLE. > > The result of the algorithm has to be stored in the database through > JDBC. I'm using the driver in the packages: db2jcc.jar and db2java.jar. > > > My current solution is: > > 1. Query the table BOOK to determine the ids of the books that are > already persistent. > > 2. Inserting the book annotations in sequence in a batch. More > specifically: > 2.1 Creating an instance of object java.sql.Statement, let's call it > statement. > 2.2 Adding an sql insert command for table ANNOTATION to the > statement: > > insert into ANNOTATION(ID,AUTHOR,TEXT) > values(next value for annotation_id_seq,...). > > 2.3 For each book involved in the book annotation: > 2.3.1(a) If the book is not already persistent add two sql > insert command for table BOOK and table ANNOTATION_BOOK to the statement: > > insert into BOOK(ID,AUTHOR,TITLE) > values(next value for public.book_id_seq,...) > > insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID) > values(previous value for annotation_id_seq, > previous value for book_id_seq); > > 2.3.1(b) If the book annotation involves a book that is already > persistent add one sql insert command for table ANNOTATION_BOOK with the > value that you retrieved at point 1 for the attribute BOOK_ID: > > insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID) > values(previous value for annotation_id_seq, > <value_retrieved_at_point_1>); > 2.4 Execute the batch: > > statement.executeBatch(); > > > This solution is not very efficient. I would like to improve the > insertion performance. Maybe after retrieving the ids of the persistent > books (point 1), I could ask the dbms to increase the book and > annotation sequence by how much I need and then insert the tuples one > table at the time, maybe using a prepared statement instead of a statement. > > Do you have any suggestion? > > Thank you very much, > > Diego |
| ||||
| Hi Phil, I've made some modifications to this procedure that go in the direction you were suggesting in your points 1,2 and 3 and I had a good improvement. > You've also said nothing about how you will be determining the id for the persistant book associated with the current annotation. I assume that the pair of attributes (AUTHOR, TITLE) of the relation BOOK is unique (actually the db is more complex than that but this was a significant simplification). So I query BOOK with two values for this pair of attributes, e.g.: select * from BOOK " + where AUTHOR = ? and TITLE = ? > You've also said nothing about determining that an annotation you are processing already exists in the annotation table. (It's possible that all annotations are considered new and get a new row in their table.) Yes, that's exactly the case. All annotation are considered new and get a new row. > Is a sequence appropriate? Would identity columns for BOOK and ANNOTATION be an alternative? In the new procedure I cannot use identity columns. The new procedure works as follows: 1. Query the table BOOK to determine the ids of the books that are already persistent. To do that I execute a PreparedStatement for each book: String query = "select * " + "from public.BOOK " + "where AUTHOR = ? and TITLE = ?"; preparedStatement = connection.prepareStatement(query); for each book preparedStatement.setXXX(1,...); preparedStatement.setXXX(2,...); resultSet = preparedStatement.executeQuery(); As you suggested in point 1, I tried to execute a multiple statement (using addBatch) with a preparedStatement but it is not supported with a SELECT. 2. Request as many ids as needed for the annotations from the sequences that manages the ANNOTATION table id: preparedStatement = connection.prepareStatement("values nextval for public.annotation_id_seq"); for each annotation ResultSet resultSet = preparedStatement.getResultSet(); 3. Request as many ids as needed for the books. 4. Inserts tuple in ANNOTATION table. 5. Inserts tuple in BOOK_ANNOTATION table. 6. Insert tuples in BOOK table. Points 4,5 and 6 are implemented through a prepared statement batch (what you called in your point "multiple statement", if I got it right). As an example, consider point 4: String sql = "insert into public.annotation(id,author,text) " + "values(?,?,?);"; preparedStatement = connection.prepareStatement(sql); for each annotation preparedStatement.setXXX(1,...); preparedStatement.setXXX(2,...); preparedStatement.setXXX(3,...); preparedStatement.addBatch(); This is analogous to what I intended to do for point 1. But apparently multiple sql commands with prepared statements don't work with SELECTs. Thanks a lot! Diego |