Unix Technical Forum

MDC Question about Disk Space

This is a discussion on MDC Question about Disk Space within the DB2 forums, part of the Database Server Software category; --> Hello, We are trying to use MDC tables in our DataWareHouse, but the tests shows that MDC use a ...


Go Back   Unix Technical Forum > Database Server Software > DB2

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-26-2008, 11:17 AM
Carlos
 
Posts: n/a
Default MDC Question about Disk Space

Hello,

We are trying to use MDC tables in our DataWareHouse, but the tests
shows that MDC use a lot of space in a relation 1/10 or 1/100. (1 byte
of source data use 10 or 100 bytes in the tablespace not including the
space used on indexes).

We did this:

CREATE TABLE SALES(
TIME INTEGER NOT NULL,
STORE INTEGER NOT NULL,
PRODUCT INTEGER NOT NULL,
MEASURE1 BIGINT,
MEASURE2 BIGINT,
MEASURE3 BIGINT
) ORGANIZE BY DIMENSIONS(TIME,STORE,PRODUCT)
IN DWHSPACE1
INDEX IN INDEXSPACE1

ALTER TABLE LOCAL ADD PRIMARY KEY (TIME,STORE,PRODUCT);
CREATE INDEX I_SALES_TIME ON SALES
CREATE INDEX I_SALES_STORE ON SALES
CREATE INDEX I_SALES_PRODUCT ON SALES

For every dimension we have a separate index.

First we tryed a little file with 10 rows. With a normal table it
uses 4K (I am using 4K pages), but with MDC it uses 40K. Maybe this
is not a good example because the sample was not representative, etc,
etc.
So we tried to load a sample from a production source. We selected a
month with 5,382,851 rows. Normally it uses 200MB, so a 10GB
tablespace will be enough.
When we tried to load after a while, an error appears indicating that
there is no more space in the tablespace, in desperation we gave 20GB
to the tablespace, having the same results.

Later we did this change to the table, because of the high cardinality
of time:
.... ORGANIZE BY DIMENSIONS(STORE,PRODUCT)...
Obtaining the same results, finally this:
.... ORGANIZE BY DIMENSIONS(STORE)...

but I am writing to the news group.

¿What are we doing wrong?
¿Is there somebody with a successfull example to follow?


Greetings,

Carlos Farias
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-26-2008, 11:17 AM
Fred
 
Posts: n/a
Default Re: MDC Question about Disk Space

Carlos,

MDC tables grab an entire extent of disk space for each unique
combination of all of the dimension values. Many people need to use a
smaller tablespace extent size with MDC tables in order to minimize
wasted space when the dimensions contain too much cardinality.

If your dimensions are STORE and PRODUCT, You can estimate the number
of extents required by multiplying COUNT(DISTINCT(PRODUCT)) *
COUNT(DISTINCT(STORE)). Unless you reduce the granularity of the time
column by round it down to the month, you should probably not use TIME
as a dimension, and just leave it in a regular Type-2 RID index.

If you've already made STORE and PRODUCT dimensions for MDC, you may
not be gaining much by building RID indexes on them as well.

Good Luck,

Fred
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-26-2008, 03:31 PM
Philip Nelson
 
Posts: n/a
Default Re: MDC Question about Disk Space

On Thu, 14 Aug 2003 08:35:23 -0700, Carlos wrote:

> Hello,
>
> We are trying to use MDC tables in our DataWareHouse, but the tests
> shows that MDC use a lot of space in a relation 1/10 or 1/100. (1 byte
> of source data use 10 or 100 bytes in the tablespace not including the
> space used on indexes).
>
> We did this:
>
> CREATE TABLE SALES(
> TIME INTEGER NOT NULL,
> STORE INTEGER NOT NULL,
> PRODUCT INTEGER NOT NULL,
> MEASURE1 BIGINT,
> MEASURE2 BIGINT,
> MEASURE3 BIGINT
> ) ORGANIZE BY DIMENSIONS(TIME,STORE,PRODUCT)
> IN DWHSPACE1
> INDEX IN INDEXSPACE1
>
> ALTER TABLE LOCAL ADD PRIMARY KEY (TIME,STORE,PRODUCT);
> CREATE INDEX I_SALES_TIME ON SALES
> CREATE INDEX I_SALES_STORE ON SALES
> CREATE INDEX I_SALES_PRODUCT ON SALES
>
> For every dimension we have a separate index.
>
> First we tryed a little file with 10 rows. With a normal table it
> uses 4K (I am using 4K pages), but with MDC it uses 40K. Maybe this
> is not a good example because the sample was not representative, etc,
> etc.
> So we tried to load a sample from a production source. We selected a
> month with 5,382,851 rows. Normally it uses 200MB, so a 10GB
> tablespace will be enough.
> When we tried to load after a while, an error appears indicating that
> there is no more space in the tablespace, in desperation we gave 20GB
> to the tablespace, having the same results.
>
> Later we did this change to the table, because of the high cardinality
> of time:
> ... ORGANIZE BY DIMENSIONS(STORE,PRODUCT)...
> Obtaining the same results, finally this:
> ... ORGANIZE BY DIMENSIONS(STORE)...
>
> but I am writing to the news group.
>
> ¿What are we doing wrong?
> ¿Is there somebody with a successfull example to follow?
>
>
> Greetings,
>
> Carlos Farias


Carlos,

DB2 will allocate an extent of storage for each intersection of your
dimensions. So using all of the keys in your primary key as dimensions
means you will allocate as many extents as you have rows in your table.

So you don't want to use columns with high cardinality (and time and
product are likely to be such).

If you want to include a time dimension (BTW, why are you not using a DB2
supplied TIME or TIMESTAMP for this) then I'd suggest using a derived
value (e.g. YEAR(TIME_COLUMN) + MONTH(TIME_COLUMN)) to group together
everything for a month (you could do similar things with weeks, but beware
the fact that the last few days of one year and the first few of the next
are given separate week numbers by the DB2 week function).

Phil
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-26-2008, 03:32 PM
Carlos
 
Posts: n/a
Default Re: MDC Question about Disk Space

Fred, Philip thanks for your help.

I'm not using the time provided by DB2 because the advice from Ralph
Kimball:

"Most data warehouses need an explict time dimension even though the
primary time key may be an SQL date-valued object. The explicit time
is needed to describe fiscal periods, seasons, holidays, weekends, and
other calendar calculations that are difficult to get from the SQL
date machinery"

Thanks again,

Carlos
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:30 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com