This is a discussion on MDC Question about Disk Space within the DB2 forums, part of the Database Server Software category; --> Hello, We are trying to use MDC tables in our DataWareHouse, but the tests shows that MDC use a ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, We are trying to use MDC tables in our DataWareHouse, but the tests shows that MDC use a lot of space in a relation 1/10 or 1/100. (1 byte of source data use 10 or 100 bytes in the tablespace not including the space used on indexes). We did this: CREATE TABLE SALES( TIME INTEGER NOT NULL, STORE INTEGER NOT NULL, PRODUCT INTEGER NOT NULL, MEASURE1 BIGINT, MEASURE2 BIGINT, MEASURE3 BIGINT ) ORGANIZE BY DIMENSIONS(TIME,STORE,PRODUCT) IN DWHSPACE1 INDEX IN INDEXSPACE1 ALTER TABLE LOCAL ADD PRIMARY KEY (TIME,STORE,PRODUCT); CREATE INDEX I_SALES_TIME ON SALES CREATE INDEX I_SALES_STORE ON SALES CREATE INDEX I_SALES_PRODUCT ON SALES For every dimension we have a separate index. First we tryed a little file with 10 rows. With a normal table it uses 4K (I am using 4K pages), but with MDC it uses 40K. Maybe this is not a good example because the sample was not representative, etc, etc. So we tried to load a sample from a production source. We selected a month with 5,382,851 rows. Normally it uses 200MB, so a 10GB tablespace will be enough. When we tried to load after a while, an error appears indicating that there is no more space in the tablespace, in desperation we gave 20GB to the tablespace, having the same results. Later we did this change to the table, because of the high cardinality of time: .... ORGANIZE BY DIMENSIONS(STORE,PRODUCT)... Obtaining the same results, finally this: .... ORGANIZE BY DIMENSIONS(STORE)... but I am writing to the news group. ¿What are we doing wrong? ¿Is there somebody with a successfull example to follow? Greetings, Carlos Farias |
| |||
| Carlos, MDC tables grab an entire extent of disk space for each unique combination of all of the dimension values. Many people need to use a smaller tablespace extent size with MDC tables in order to minimize wasted space when the dimensions contain too much cardinality. If your dimensions are STORE and PRODUCT, You can estimate the number of extents required by multiplying COUNT(DISTINCT(PRODUCT)) * COUNT(DISTINCT(STORE)). Unless you reduce the granularity of the time column by round it down to the month, you should probably not use TIME as a dimension, and just leave it in a regular Type-2 RID index. If you've already made STORE and PRODUCT dimensions for MDC, you may not be gaining much by building RID indexes on them as well. Good Luck, Fred |
| |||
| On Thu, 14 Aug 2003 08:35:23 -0700, Carlos wrote: > Hello, > > We are trying to use MDC tables in our DataWareHouse, but the tests > shows that MDC use a lot of space in a relation 1/10 or 1/100. (1 byte > of source data use 10 or 100 bytes in the tablespace not including the > space used on indexes). > > We did this: > > CREATE TABLE SALES( > TIME INTEGER NOT NULL, > STORE INTEGER NOT NULL, > PRODUCT INTEGER NOT NULL, > MEASURE1 BIGINT, > MEASURE2 BIGINT, > MEASURE3 BIGINT > ) ORGANIZE BY DIMENSIONS(TIME,STORE,PRODUCT) > IN DWHSPACE1 > INDEX IN INDEXSPACE1 > > ALTER TABLE LOCAL ADD PRIMARY KEY (TIME,STORE,PRODUCT); > CREATE INDEX I_SALES_TIME ON SALES > CREATE INDEX I_SALES_STORE ON SALES > CREATE INDEX I_SALES_PRODUCT ON SALES > > For every dimension we have a separate index. > > First we tryed a little file with 10 rows. With a normal table it > uses 4K (I am using 4K pages), but with MDC it uses 40K. Maybe this > is not a good example because the sample was not representative, etc, > etc. > So we tried to load a sample from a production source. We selected a > month with 5,382,851 rows. Normally it uses 200MB, so a 10GB > tablespace will be enough. > When we tried to load after a while, an error appears indicating that > there is no more space in the tablespace, in desperation we gave 20GB > to the tablespace, having the same results. > > Later we did this change to the table, because of the high cardinality > of time: > ... ORGANIZE BY DIMENSIONS(STORE,PRODUCT)... > Obtaining the same results, finally this: > ... ORGANIZE BY DIMENSIONS(STORE)... > > but I am writing to the news group. > > ¿What are we doing wrong? > ¿Is there somebody with a successfull example to follow? > > > Greetings, > > Carlos Farias Carlos, DB2 will allocate an extent of storage for each intersection of your dimensions. So using all of the keys in your primary key as dimensions means you will allocate as many extents as you have rows in your table. So you don't want to use columns with high cardinality (and time and product are likely to be such). If you want to include a time dimension (BTW, why are you not using a DB2 supplied TIME or TIMESTAMP for this) then I'd suggest using a derived value (e.g. YEAR(TIME_COLUMN) + MONTH(TIME_COLUMN)) to group together everything for a month (you could do similar things with weeks, but beware the fact that the last few days of one year and the first few of the next are given separate week numbers by the DB2 week function). Phil |
| ||||
| Fred, Philip thanks for your help. I'm not using the time provided by DB2 because the advice from Ralph Kimball: "Most data warehouses need an explict time dimension even though the primary time key may be an SQL date-valued object. The explicit time is needed to describe fiscal periods, seasons, holidays, weekends, and other calendar calculations that are difficult to get from the SQL date machinery" Thanks again, Carlos |
| Thread Tools | |
| Display Modes | |
|
|