gnuoytr@rcn.com (robert) wrote in message news:<da3c2186.0410021233.6c4c9c23@posting.google. com>...
> consider the following hypothetical (it isn't but....)
>
> - start with a COBOL/VSAM codebase that's REALLY old
> - end with both a COBOL/VSAM and a COBOL/DB2 system which differs in only
> where the data lives. code, including copybooks, remains the same.
>
> mostly the system does periodic (daily/weekly/<etc>) batch runs.
>
> the VSAM order file has 99 line items. this gets converted to an
> Order table with 99 line_items.
>
> beyond the need to fit the copybook definition, it is asserted that a
> two table (Order and Order_line) implementation will be too slow.
>
> this rumination was motivated by reading an article linked to from an
> earlier thread, which discussed join implementation on 390. i suspect what
> it had to say applies generally. what caught me was, as i understood it,
> that nested table reads is most often used. if this is true, and it seems
> that hash joins are only more efficient on equality constraints, then is
> there a known analysis which at least mitigates the reading?
>
> what (hypothetically) we tend to do is put each table in a tablespace.
> for the Order/Order_line implementation, it seems logical to put them into
> one tablespace, cluster Order on Order_num and Order_line on Order_num,
> line_num; and buffer the hell out of it. similarly for indexes.
>
> does this sound remotely on the right track?
Assuming that you never have an SQL statement result in a tablespace
scan, it might be OK to put the Order/Order_line in one simple
tablespace. But if a tablespace scan does occur, DB2 will
(unnecessarily) scan both tables, when it could have just scanned the
table needed if they were in separate tablespaces.
If you use a segmented tablespace, this will not help you since the
data for two tables in a single segmented tablespace will not be on
the same page (the lines items and the associated order), they will
reside in different segments.
I think you are over designing just a bit. By using DB2 buffer pools
effectively (this is the biggest difference between how DB2 and plain
VSAM works), I don't think you need to put the two tables in the same
tablespace.
Have one bufferpool for the catalog, indexes, and small tables that
frequently accessed. Second bufferpool for medium and large tables.
Third bufferpool for large decision support tables (if you have any in
your application).
The speed of a join has nothing to do with whether the tables are
using the same tablespace. The speed is related to whether the
required data page is already in the bufferpool, or if it needs to be
fetched from disk.