This is a discussion on Efficient delete from table with compound keys? within the Ingres forums, part of the Database Server Software category; --> Using II 9.0.4 (hp2.us5/105) What is the most efficient way to do the following? TableA is btree with col1, ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Using II 9.0.4 (hp2.us5/105) What is the most efficient way to do the following? TableA is btree with col1, col2, col3 being the compound key and has about 12.5M rows of data. TableA is also compressed and stored across 3 locations if it makes a difference. delete from tableA where col1+col2+col3 in (select col1+col2+col3 from tableB) Thanks Dennis d underscore roesler at agilent dot com |
| |||
| At 11:12 AM -0400 7/25/07, Dennis Roesler wrote: >Using II 9.0.4 (hp2.us5/105) > >What is the most efficient way to do the following? TableA is btree >with col1, col2, col3 being the compound key and has about 12.5M rows of >data. TableA is also compressed and stored across 3 locations if it >makes a difference. > >delete from tableA where col1+col2+col3 in > (select col1+col2+col3 from tableB) > It really depends on how big tableB is. For small tableB (say, up to a few thousand rows), you'll want to use EXISTS so that Ingres can use a K-join: delete from tableA where exists (select * from tableB where tableB.col1 = tableA.col1 and tableB.col2 = tableA.col2 and tableB.col3 = tableB.col3); (at least, I assume that is what you really meant with your original query.) If tableB is into the hundreds-of-thousands of rows, you can do it the same way as long as you don't run into transaction log problems. You might have to break the delete up into a few chunks, perhaps using session temps to hold a few chunks of tableB keys. If tableB is a significant fraction of tableA, you might be better off to copy the surviving rows into a session temp, truncate tableA, stuff the survivors back in, and remodify to btree. I don't know where the breakeven point for this would kick in (and it is somewhat installation dependent), but I'm guessing that deleting somewhere around 1/3 to 1/2 of the table is the breakeven point; probably less if there are a bunch of secondary indexes on tableA. dgtt session.foo as select * from tableA where not exists (same subselect as above); modify tableA to truncated; insert into tableA select * from session.foo; modify tableA to btree .... commit; With the current Ingres, this will still log rows doing the insert. That's unavoidable if you are journaling all this; if you can get away with a journaling hole for tableA, you can avoid the tx log by dropping tableA and doing a create-table-as-select instead of a truncate/insert. (Or, you could copy the session temp into a file and do a copy into the truncated tableA! might be slower though.) Karl |
| |||
| At 11:12 AM -0400 7/25/07, Dennis Roesler wrote: >Using II 9.0.4 (hp2.us5/105) > >What is the most efficient way to do the following? TableA is btree >with col1, col2, col3 being the compound key and has about 12.5M rows of >data. TableA is also compressed and stored across 3 locations if it >makes a difference. > >delete from tableA where col1+col2+col3 in > (select col1+col2+col3 from tableB) > It really depends on how big tableB is. For small tableB (say, up to a few thousand rows), you'll want to use EXISTS so that Ingres can use a K-join: delete from tableA where exists (select * from tableB where tableB.col1 = tableA.col1 and tableB.col2 = tableA.col2 and tableB.col3 = tableB.col3); (at least, I assume that is what you really meant with your original query.) If tableB is into the hundreds-of-thousands of rows, you can do it the same way as long as you don't run into transaction log problems. You might have to break the delete up into a few chunks, perhaps using session temps to hold a few chunks of tableB keys. If tableB is a significant fraction of tableA, you might be better off to copy the surviving rows into a session temp, truncate tableA, stuff the survivors back in, and remodify to btree. I don't know where the breakeven point for this would kick in (and it is somewhat installation dependent), but I'm guessing that deleting somewhere around 1/3 to 1/2 of the table is the breakeven point; probably less if there are a bunch of secondary indexes on tableA. dgtt session.foo as select * from tableA where not exists (same subselect as above); modify tableA to truncated; insert into tableA select * from session.foo; modify tableA to btree .... commit; With the current Ingres, this will still log rows doing the insert. That's unavoidable if you are journaling all this; if you can get away with a journaling hole for tableA, you can avoid the tx log by dropping tableA and doing a create-table-as-select instead of a truncate/insert. (Or, you could copy the session temp into a file and do a copy into the truncated tableA! might be slower though.) Karl |
| ||||
| Karl & Betty Schendel wrote: > At 11:12 AM -0400 7/25/07, Dennis Roesler wrote: >> Using II 9.0.4 (hp2.us5/105) >> >> What is the most efficient way to do the following? TableA is btree >> with col1, col2, col3 being the compound key and has about 12.5M rows of >> data. TableA is also compressed and stored across 3 locations if it >> makes a difference. >> >> delete from tableA where col1+col2+col3 in >> (select col1+col2+col3 from tableB) >> > > It really depends on how big tableB is. > > For small tableB (say, up to a few thousand rows), you'll > want to use EXISTS so that Ingres can use a K-join: tableB could vary from a few hundred to a ~200k-300k rows. > > delete from tableA where exists > (select * from tableB where tableB.col1 = tableA.col1 > and tableB.col2 = tableA.col2 and tableB.col3 = tableB.col3); > > (at least, I assume that is what you really meant > with your original query.) yes > > If tableB is into the hundreds-of-thousands of rows, > you can do it the same way as long as you don't run into > transaction log problems. You might have to break > the delete up into a few chunks, perhaps using session > temps to hold a few chunks of tableB keys. I think I will need to do this, because I have bumped into log full problems. > > If tableB is a significant fraction of tableA, you > might be better off to copy the surviving rows into > a session temp, truncate tableA, stuff the survivors > back in, and remodify to btree. I don't know where > the breakeven point for this would kick in (and it > is somewhat installation dependent), but I'm guessing > that deleting somewhere around 1/3 to 1/2 of the table > is the breakeven point; probably less if there are > a bunch of secondary indexes on tableA. > > dgtt session.foo as select * from tableA > where not exists (same subselect as above); > modify tableA to truncated; > insert into tableA select * from session.foo; > modify tableA to btree .... > commit; > > With the current Ingres, this will still log rows doing > the insert. That's unavoidable if you are journaling all > this; if you can get away with a journaling hole for > tableA, you can avoid the tx log by dropping tableA and > doing a create-table-as-select instead of a truncate/insert. > (Or, you could copy the session temp into a file > and do a copy into the truncated tableA! might be > slower though.) Thanks Karl. Gives me some things to think about. Dennis |