Unix Technical Forum

UDM and Star Schema

This is a discussion on UDM and Star Schema within the SQL Server Data Warehousing forums, part of the Microsoft SQL Server category; --> Hi everyone, I have a general question to datawarehousing and it'll be great if someone can drop some tips ...


Go Back   Unix Technical Forum > Database Server Software > Microsoft SQL Server > SQL Server Data Warehousing

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-27-2008, 07:08 PM
Nestor
 
Posts: n/a
Default UDM and Star Schema

Hi everyone,

I have a general question to datawarehousing and it'll be great if someone
can drop some tips here. I'm about to start designing a corporate real
time datawarehouse with SQL Server 2005 and is deciding on the approach
for this.

SQL Server 2005 now allows us to actually build a datawarehouse without
actually restructuring the datasources into star/snowflake schemas. I'm
curious to know if this will have any performance impact on the actual
implementation? This project will probably require the usage of proactive
caching as well as medium latency molap storage structure.

By using proactive caching, SSAS will intelligently switch between ROLAP
and MOLAP depending on the data changes in the production database. What
is confusing to me is, what happens to ETL then? Does this mean that if
SSAS is made to query directly from the production database, then ETL is
totally eliminated?

Any recommendations will be appreciated.

Thanks.
Nes
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-27-2008, 07:08 PM
Jeje
 
Posts: n/a
Default Re: UDM and Star Schema

SSAS 2005 like any other OLAP tool on the market can works with flat schemas
and more complicated schemas.

but using star or snowflake schema is far better for a DW project and when
you create a cube.
SSAS 2005 is able to create the tables for you, so instead of creating the
tables in the database then creating the cube, you create your dimensions
and cubes first and SSAS2005 will generates the tables for you.
this greatly simplify the prototype step of your project.

proactive caching don't switch from MOLAP to ROLAP.
proactive caching can detect changes in the source database and then update
the cube automatically or can do an incremental loading automatically.
there is a lot of options available here for different usage and
requirements.
but there is no switch from MOLAP to ROLAP.

the ETL process is not required if your cubes are directly connect to the
operational database.
today, a lot of databases schemas are more accessible then before and the
data quality is good, so accessing the source database is a good option.
it's also a good option when you have only 1 database to play with (and not
a lot of databases to synchronize)
but if you have to sync. multiple databases, or you have to do a lot of
cleansing / transformations, then an ETL process is required.
another option we can found today:
the transaction made to the operational system are also replicated into the
datawarehouse in real time (using triggers, procedures or workflow systems)
so the DW database himself become a realtime system.


"Nestor" <n3570r@yahoo.com> wrote in message
newsp.txe6yasa83lx0t@david.virginam.com...
> Hi everyone,
>
> I have a general question to datawarehousing and it'll be great if someone
> can drop some tips here. I'm about to start designing a corporate real
> time datawarehouse with SQL Server 2005 and is deciding on the approach
> for this.
>
> SQL Server 2005 now allows us to actually build a datawarehouse without
> actually restructuring the datasources into star/snowflake schemas. I'm
> curious to know if this will have any performance impact on the actual
> implementation? This project will probably require the usage of proactive
> caching as well as medium latency molap storage structure.
>
> By using proactive caching, SSAS will intelligently switch between ROLAP
> and MOLAP depending on the data changes in the production database. What
> is confusing to me is, what happens to ETL then? Does this mean that if
> SSAS is made to query directly from the production database, then ETL is
> totally eliminated?
>
> Any recommendations will be appreciated.
>
> Thanks.
> Nes


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 02-27-2008, 07:08 PM
Marco Russo
 
Posts: n/a
Default Re: UDM and Star Schema

On Aug 22, 1:40 am, "Jeje" <willg...@hotmail.com> wrote:
> SSAS 2005 like any other OLAP tool on the market can works with flat schemas
> and more complicated schemas.
>
> but using star or snowflake schema is far better for a DW project and when
> you create a cube.
> SSAS 2005 is able to create the tables for you, so instead of creating the
> tables in the database then creating the cube, you create your dimensions
> and cubes first and SSAS2005 will generates the tables for you.
> this greatly simplify the prototype step of your project.
>
> proactive caching don't switch from MOLAP to ROLAP.
> proactive caching can detect changes in the source database and then update
> the cube automatically or can do an incremental loading automatically.
> there is a lot of options available here for different usage and
> requirements.
> but there is no switch from MOLAP to ROLAP.
>
> the ETL process is not required if your cubes are directly connect to the
> operational database.
> today, a lot of databases schemas are more accessible then before and the
> data quality is good, so accessing the source database is a good option.
> it's also a good option when you have only 1 database to play with (and not
> a lot of databases to synchronize)
> but if you have to sync. multiple databases, or you have to do a lot of
> cleansing / transformations, then an ETL process is required.
> another option we can found today:
> the transaction made to the operational system are also replicated into the
> datawarehouse in real time (using triggers, procedures or workflow systems)
> so the DW database himself become a realtime system.
>
> "Nestor" <n35...@yahoo.com> wrote in message
>
> newsp.txe6yasa83lx0t@david.virginam.com...
>
>
>
> > Hi everyone,

>
> > I have a general question to datawarehousing and it'll be great if someone
> > can drop some tips here. I'm about to start designing a corporate real
> > time datawarehouse with SQL Server 2005 and is deciding on the approach
> > for this.

>
> > SQL Server 2005 now allows us to actually build a datawarehouse without
> > actually restructuring the datasources into star/snowflake schemas. I'm
> > curious to know if this will have any performance impact on the actual
> > implementation? This project will probably require the usage of proactive
> > caching as well as medium latency molap storage structure.

>
> > By using proactive caching, SSAS will intelligently switch between ROLAP
> > and MOLAP depending on the data changes in the production database. What
> > is confusing to me is, what happens to ETL then? Does this mean that if
> > SSAS is made to query directly from the production database, then ETL is
> > totally eliminated?

>
> > Any recommendations will be appreciated.

>
> > Thanks.
> > Nes- Hide quoted text -

>
> - Show quoted text -


I would add that in my experience creating a set of views to replace
the star schema is good only for prototypes. Chances are that in a
real world environment you need to do some transformation and/or
cleansing operations that requires you having some staging/working
area. Moreover, you will have a longer lifetime of your solution if
you decouple your OLTP source from the db that becomes the physical
OLAP Data Source.

The capability of SSAS to handle relationship more complex than star/
snowflake schema is a tentative to reach the "UDM utopia" (UDM means
Unified Dimensional Model, at the beginning of Yukon story it was the
"final" model on which every report will be based on - but just only
SSRS has a different model from UDM, so we are still far from
realizing this vision). Today, I'm very happy to use UDM as an OLAP
model that is much richer than those offered by other products (the
single feature of many-to-many dimension relationships allowed me to
build a lot of complex analytical models, like I described in my paper
- http://www.sqlbi.eu/manytomany.aspx).

Marco Russo
http://www.sqlbi.eu
http://sqlblog.com/blogs/marco_russo

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 02-27-2008, 07:08 PM
Nestor
 
Posts: n/a
Default Re: UDM and Star Schema

Hi Jeje,

Thanks for the recommendation. The tricky part is will need the system to
fulfil real time scenarios, have the capability to store historical data
as well as being able to due with very large SCDs.

Proactive caching when set up properly does changes from MOLAP to ROLAP
and to MOLAP again according to set up (some reference
http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at the
end of the day I might need to identify what needs to be real time (hence
ROLAP structure) and what needs to be historical + scheduled (hence MOLAP
and Snowflaked).

I did some intensive research to UDM and basically 'concluded' that the
concept of data cleansing doesn't sit very well in it. Seems like I need
to break out the system into 2 conceptual areas in order to fulfil all
that requirements...

Thanks.
Regards,
David Ong




On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willgart@hotmail.com> wrote:

> SSAS 2005 like any other OLAP tool on the market can works with flat
> schemas and more complicated schemas.
>
> but using star or snowflake schema is far better for a DW project and
> when you create a cube.
> SSAS 2005 is able to create the tables for you, so instead of creating
> the tables in the database then creating the cube, you create your
> dimensions and cubes first and SSAS2005 will generates the tables for
> you.
> this greatly simplify the prototype step of your project.
>
> proactive caching don't switch from MOLAP to ROLAP.
> proactive caching can detect changes in the source database and then
> update the cube automatically or can do an incremental loading
> automatically.
> there is a lot of options available here for different usage and
> requirements.
> but there is no switch from MOLAP to ROLAP.
>
> the ETL process is not required if your cubes are directly connect to
> the operational database.
> today, a lot of databases schemas are more accessible then before and
> the data quality is good, so accessing the source database is a good
> option.
> it's also a good option when you have only 1 database to play with (and
> not a lot of databases to synchronize)
> but if you have to sync. multiple databases, or you have to do a lot of
> cleansing / transformations, then an ETL process is required.
> another option we can found today:
> the transaction made to the operational system are also replicated into
> the datawarehouse in real time (using triggers, procedures or workflow
> systems) so the DW database himself become a realtime system.
>
>
> "Nestor" <n3570r@yahoo.com> wrote in message
> newsp.txe6yasa83lx0t@david.virginam.com...
>> Hi everyone,
>>
>> I have a general question to datawarehousing and it'll be great if
>> someone can drop some tips here. I'm about to start designing a
>> corporate real time datawarehouse with SQL Server 2005 and is deciding
>> on the approach for this.
>>
>> SQL Server 2005 now allows us to actually build a datawarehouse without
>> actually restructuring the datasources into star/snowflake schemas. I'm
>> curious to know if this will have any performance impact on the actual
>> implementation? This project will probably require the usage of
>> proactive caching as well as medium latency molap storage structure.
>>
>> By using proactive caching, SSAS will intelligently switch between
>> ROLAP and MOLAP depending on the data changes in the production
>> database. What is confusing to me is, what happens to ETL then? Does
>> this mean that if SSAS is made to query directly from the production
>> database, then ETL is totally eliminated?
>>
>> Any recommendations will be appreciated.
>>
>> Thanks.
>> Nes

>




--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 02-27-2008, 07:08 PM
Jeje
 
Posts: n/a
Default Re: UDM and Star Schema

Like marco says,

use views in your source database to apply some business rules and cleansing
process.

for the choice of MOLAP or ROLAP...

historical partitions generally are stored in MOLAP mode, because you don't
update them frequently.
the "Today" partition can be in ROLAP mode and use indexed views.
But I recommend the low latency MOLAP mode. provides good performance by
keeping the cache longer, but you suffer few seconds of latency between an
update and the cube update.
if you want to be more realtime the HOLAP mode provides updated result all
the time.

SCD dimension should not impact historical data. but your business key must
be unique other the time to insure usage of both real time and historical
content.

but regarding the data cleansing process you have to do and other technical
constraints.
Try to split your solution in 2 projects:
Real time project --> contains only "today" data, there is no SCD dimension
Historical project -> contains all other data and the SCD dimensions

generally doing analysis against the history will not answer the same
questions as the real time access.

"Nestor" <n3570r@yahoo.com> wrote in message
newsp.txgz9yzc83lx0t@david.virginam.com...
> Hi Jeje,
>
> Thanks for the recommendation. The tricky part is will need the system to
> fulfil real time scenarios, have the capability to store historical data
> as well as being able to due with very large SCDs.
>
> Proactive caching when set up properly does changes from MOLAP to ROLAP
> and to MOLAP again according to set up (some reference
> http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at the
> end of the day I might need to identify what needs to be real time (hence
> ROLAP structure) and what needs to be historical + scheduled (hence MOLAP
> and Snowflaked).
>
> I did some intensive research to UDM and basically 'concluded' that the
> concept of data cleansing doesn't sit very well in it. Seems like I need
> to break out the system into 2 conceptual areas in order to fulfil all
> that requirements...
>
> Thanks.
> Regards,
> David Ong
>
>
>
>
> On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willgart@hotmail.com> wrote:
>
>> SSAS 2005 like any other OLAP tool on the market can works with flat
>> schemas and more complicated schemas.
>>
>> but using star or snowflake schema is far better for a DW project and
>> when you create a cube.
>> SSAS 2005 is able to create the tables for you, so instead of creating
>> the tables in the database then creating the cube, you create your
>> dimensions and cubes first and SSAS2005 will generates the tables for
>> you.
>> this greatly simplify the prototype step of your project.
>>
>> proactive caching don't switch from MOLAP to ROLAP.
>> proactive caching can detect changes in the source database and then
>> update the cube automatically or can do an incremental loading
>> automatically.
>> there is a lot of options available here for different usage and
>> requirements.
>> but there is no switch from MOLAP to ROLAP.
>>
>> the ETL process is not required if your cubes are directly connect to
>> the operational database.
>> today, a lot of databases schemas are more accessible then before and
>> the data quality is good, so accessing the source database is a good
>> option.
>> it's also a good option when you have only 1 database to play with (and
>> not a lot of databases to synchronize)
>> but if you have to sync. multiple databases, or you have to do a lot of
>> cleansing / transformations, then an ETL process is required.
>> another option we can found today:
>> the transaction made to the operational system are also replicated into
>> the datawarehouse in real time (using triggers, procedures or workflow
>> systems) so the DW database himself become a realtime system.
>>
>>
>> "Nestor" <n3570r@yahoo.com> wrote in message
>> newsp.txe6yasa83lx0t@david.virginam.com...
>>> Hi everyone,
>>>
>>> I have a general question to datawarehousing and it'll be great if
>>> someone can drop some tips here. I'm about to start designing a
>>> corporate real time datawarehouse with SQL Server 2005 and is deciding
>>> on the approach for this.
>>>
>>> SQL Server 2005 now allows us to actually build a datawarehouse without
>>> actually restructuring the datasources into star/snowflake schemas. I'm
>>> curious to know if this will have any performance impact on the actual
>>> implementation? This project will probably require the usage of
>>> proactive caching as well as medium latency molap storage structure.
>>>
>>> By using proactive caching, SSAS will intelligently switch between
>>> ROLAP and MOLAP depending on the data changes in the production
>>> database. What is confusing to me is, what happens to ETL then? Does
>>> this mean that if SSAS is made to query directly from the production
>>> database, then ETL is totally eliminated?
>>>
>>> Any recommendations will be appreciated.
>>>
>>> Thanks.
>>> Nes

>>

>
>
>
> --
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 02-27-2008, 07:08 PM
Nestor
 
Posts: n/a
Default Re: UDM and Star Schema

Hi Jeje,

Thanks for the recommendation, I've been ponder about this and thought
that's the best approach as well. It's great to hear from someone who
agrees with the design. At the end of the day there should be one tha
caters to real time reporting and another for historical analysis
reporting. It's a pity that the vision of 'one version of truth' might
have to be compromised in this case though.

Cheers.


On Wed, 22 Aug 2007 17:58:46 -0700, Jeje <willgart@hotmail.com> wrote:

> Like marco says,
>
> use views in your source database to apply some business rules and
> cleansing process.
>
> for the choice of MOLAP or ROLAP...
>
> historical partitions generally are stored in MOLAP mode, because you
> don't update them frequently.
> the "Today" partition can be in ROLAP mode and use indexed views.
> But I recommend the low latency MOLAP mode. provides good performance by
> keeping the cache longer, but you suffer few seconds of latency between
> an update and the cube update.
> if you want to be more realtime the HOLAP mode provides updated result
> all the time.
>
> SCD dimension should not impact historical data. but your business key
> must be unique other the time to insure usage of both real time and
> historical content.
>
> but regarding the data cleansing process you have to do and other
> technical constraints.
> Try to split your solution in 2 projects:
> Real time project --> contains only "today" data, there is no SCD
> dimension
> Historical project -> contains all other data and the SCD dimensions
>
> generally doing analysis against the history will not answer the same
> questions as the real time access.
>
> "Nestor" <n3570r@yahoo.com> wrote in message
> newsp.txgz9yzc83lx0t@david.virginam.com...
>> Hi Jeje,
>>
>> Thanks for the recommendation. The tricky part is will need the system
>> to fulfil real time scenarios, have the capability to store historical
>> data as well as being able to due with very large SCDs.
>>
>> Proactive caching when set up properly does changes from MOLAP to ROLAP
>> and to MOLAP again according to set up (some reference
>> http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at
>> the end of the day I might need to identify what needs to be real time
>> (hence ROLAP structure) and what needs to be historical + scheduled
>> (hence MOLAP and Snowflaked).
>>
>> I did some intensive research to UDM and basically 'concluded' that the
>> concept of data cleansing doesn't sit very well in it. Seems like I
>> need to break out the system into 2 conceptual areas in order to fulfil
>> all that requirements...
>>
>> Thanks.
>> Regards,
>> David Ong
>>
>>
>>
>>
>> On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willgart@hotmail.com> wrote:
>>
>>> SSAS 2005 like any other OLAP tool on the market can works with flat
>>> schemas and more complicated schemas.
>>>
>>> but using star or snowflake schema is far better for a DW project and
>>> when you create a cube.
>>> SSAS 2005 is able to create the tables for you, so instead of creating
>>> the tables in the database then creating the cube, you create your
>>> dimensions and cubes first and SSAS2005 will generates the tables for
>>> you.
>>> this greatly simplify the prototype step of your project.
>>>
>>> proactive caching don't switch from MOLAP to ROLAP.
>>> proactive caching can detect changes in the source database and then
>>> update the cube automatically or can do an incremental loading
>>> automatically.
>>> there is a lot of options available here for different usage and
>>> requirements.
>>> but there is no switch from MOLAP to ROLAP.
>>>
>>> the ETL process is not required if your cubes are directly connect to
>>> the operational database.
>>> today, a lot of databases schemas are more accessible then before and
>>> the data quality is good, so accessing the source database is a good
>>> option.
>>> it's also a good option when you have only 1 database to play with
>>> (and not a lot of databases to synchronize)
>>> but if you have to sync. multiple databases, or you have to do a lot
>>> of cleansing / transformations, then an ETL process is required.
>>> another option we can found today:
>>> the transaction made to the operational system are also replicated
>>> into the datawarehouse in real time (using triggers, procedures or
>>> workflow systems) so the DW database himself become a realtime system.
>>>
>>>
>>> "Nestor" <n3570r@yahoo.com> wrote in message
>>> newsp.txe6yasa83lx0t@david.virginam.com...
>>>> Hi everyone,
>>>>
>>>> I have a general question to datawarehousing and it'll be great if
>>>> someone can drop some tips here. I'm about to start designing a
>>>> corporate real time datawarehouse with SQL Server 2005 and is
>>>> deciding on the approach for this.
>>>>
>>>> SQL Server 2005 now allows us to actually build a datawarehouse
>>>> without actually restructuring the datasources into star/snowflake
>>>> schemas. I'm curious to know if this will have any performance impact
>>>> on the actual implementation? This project will probably require the
>>>> usage of proactive caching as well as medium latency molap storage
>>>> structure.
>>>>
>>>> By using proactive caching, SSAS will intelligently switch between
>>>> ROLAP and MOLAP depending on the data changes in the production
>>>> database. What is confusing to me is, what happens to ETL then? Does
>>>> this mean that if SSAS is made to query directly from the production
>>>> database, then ETL is totally eliminated?
>>>>
>>>> Any recommendations will be appreciated.
>>>>
>>>> Thanks.
>>>> Nes
>>>

>>
>>
>>
>> -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

>




--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 02-27-2008, 07:08 PM
Nestor
 
Posts: n/a
Default Re: UDM and Star Schema

I'm still curious to know if there're anyone who is able to advise on
building a realtime BI system which is capable of storing historical
information, and which dimensions are large SCD Type IIs.

For all I know SQLServer 2005 Analysis Service's proactive caching is good
for direct connection with the relational database, and it does a good job
in balancing performance with near real time data. However, due to the
inherited nature of this, we miss out a very important layer which makes
it very hard to hold historical data, and even harder to make what appears
to be SCD TYPE 1 (even though it's really not type 1) into a Type II.

I'm wondering if there's any solution better than splitting the project
into 2 sub projects with 1 as real time and the other as historical. I'm
trying to maintain the 'one version of truth' perspective in this project.

Cheers.


On Thu, 23 Aug 2007 15:09:03 -0700, Nestor <n3570r@yahoo.com> wrote:

> Hi Jeje,
>
> Thanks for the recommendation, I've been ponder about this and thought
> that's the best approach as well. It's great to hear from someone who
> agrees with the design. At the end of the day there should be one tha
> caters to real time reporting and another for historical analysis
> reporting. It's a pity that the vision of 'one version of truth' might
> have to be compromised in this case though.
>
> Cheers.
>
>
> On Wed, 22 Aug 2007 17:58:46 -0700, Jeje <willgart@hotmail.com> wrote:
>
>> Like marco says,
>>
>> use views in your source database to apply some business rules and
>> cleansing process.
>>
>> for the choice of MOLAP or ROLAP...
>>
>> historical partitions generally are stored in MOLAP mode, because you
>> don't update them frequently.
>> the "Today" partition can be in ROLAP mode and use indexed views.
>> But I recommend the low latency MOLAP mode. provides good performance
>> by keeping the cache longer, but you suffer few seconds of latency
>> between an update and the cube update.
>> if you want to be more realtime the HOLAP mode provides updated result
>> all the time.
>>
>> SCD dimension should not impact historical data. but your business key
>> must be unique other the time to insure usage of both real time and
>> historical content.
>>
>> but regarding the data cleansing process you have to do and other
>> technical constraints.
>> Try to split your solution in 2 projects:
>> Real time project --> contains only "today" data, there is no SCD
>> dimension
>> Historical project -> contains all other data and the SCD dimensions
>>
>> generally doing analysis against the history will not answer the same
>> questions as the real time access.
>>
>> "Nestor" <n3570r@yahoo.com> wrote in message
>> newsp.txgz9yzc83lx0t@david.virginam.com...
>>> Hi Jeje,
>>>
>>> Thanks for the recommendation. The tricky part is will need the system
>>> to fulfil real time scenarios, have the capability to store historical
>>> data as well as being able to due with very large SCDs.
>>>
>>> Proactive caching when set up properly does changes from MOLAP to
>>> ROLAP and to MOLAP again according to set up (some reference
>>> http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at
>>> the end of the day I might need to identify what needs to be real time
>>> (hence ROLAP structure) and what needs to be historical + scheduled
>>> (hence MOLAP and Snowflaked).
>>>
>>> I did some intensive research to UDM and basically 'concluded' that
>>> the concept of data cleansing doesn't sit very well in it. Seems like
>>> I need to break out the system into 2 conceptual areas in order to
>>> fulfil all that requirements...
>>>
>>> Thanks.
>>> Regards,
>>> David Ong
>>>
>>>
>>>
>>>
>>> On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willgart@hotmail.com> wrote:
>>>
>>>> SSAS 2005 like any other OLAP tool on the market can works with flat
>>>> schemas and more complicated schemas.
>>>>
>>>> but using star or snowflake schema is far better for a DW project and
>>>> when you create a cube.
>>>> SSAS 2005 is able to create the tables for you, so instead of
>>>> creating the tables in the database then creating the cube, you
>>>> create your dimensions and cubes first and SSAS2005 will generates
>>>> the tables for you.
>>>> this greatly simplify the prototype step of your project.
>>>>
>>>> proactive caching don't switch from MOLAP to ROLAP.
>>>> proactive caching can detect changes in the source database and then
>>>> update the cube automatically or can do an incremental loading
>>>> automatically.
>>>> there is a lot of options available here for different usage and
>>>> requirements.
>>>> but there is no switch from MOLAP to ROLAP.
>>>>
>>>> the ETL process is not required if your cubes are directly connect to
>>>> the operational database.
>>>> today, a lot of databases schemas are more accessible then before and
>>>> the data quality is good, so accessing the source database is a good
>>>> option.
>>>> it's also a good option when you have only 1 database to play with
>>>> (and not a lot of databases to synchronize)
>>>> but if you have to sync. multiple databases, or you have to do a lot
>>>> of cleansing / transformations, then an ETL process is required.
>>>> another option we can found today:
>>>> the transaction made to the operational system are also replicated
>>>> into the datawarehouse in real time (using triggers, procedures or
>>>> workflow systems) so the DW database himself become a realtime system.
>>>>
>>>>
>>>> "Nestor" <n3570r@yahoo.com> wrote in message
>>>> newsp.txe6yasa83lx0t@david.virginam.com...
>>>>> Hi everyone,
>>>>>
>>>>> I have a general question to datawarehousing and it'll be great if
>>>>> someone can drop some tips here. I'm about to start designing a
>>>>> corporate real time datawarehouse with SQL Server 2005 and is
>>>>> deciding on the approach for this.
>>>>>
>>>>> SQL Server 2005 now allows us to actually build a datawarehouse
>>>>> without actually restructuring the datasources into star/snowflake
>>>>> schemas. I'm curious to know if this will have any performance
>>>>> impact on the actual implementation? This project will probably
>>>>> require the usage of proactive caching as well as medium latency
>>>>> molap storage structure.
>>>>>
>>>>> By using proactive caching, SSAS will intelligently switch between
>>>>> ROLAP and MOLAP depending on the data changes in the production
>>>>> database. What is confusing to me is, what happens to ETL then? Does
>>>>> this mean that if SSAS is made to query directly from the production
>>>>> database, then ETL is totally eliminated?
>>>>>
>>>>> Any recommendations will be appreciated.
>>>>>
>>>>> Thanks.
>>>>> Nes
>>>>
>>>
>>>
>>>
>>> -- Using Opera's revolutionary e-mail client:
>>> http://www.opera.com/mail/

>>

>
>
>




--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 02-27-2008, 07:08 PM
Marco Russo
 
Posts: n/a
Default Re: UDM and Star Schema

On Aug 29, 4:14 pm, Nestor <n35...@yahoo.com> wrote:
> I'm still curious to know if there're anyone who is able to advise on
> building a realtime BI system which is capable of storing historical
> information, and which dimensions are large SCD Type IIs.
>
> For all I know SQLServer 2005 Analysis Service's proactive caching is good
> for direct connection with the relational database, and it does a good job
> in balancing performance with near real time data. However, due to the
> inherited nature of this, we miss out a very important layer which makes
> it very hard to hold historical data, and even harder to make what appears
> to be SCD TYPE 1 (even though it's really not type 1) into a Type II.
>
> I'm wondering if there's any solution better than splitting the project
> into 2 sub projects with 1 as real time and the other as historical. I'm
> trying to maintain the 'one version of truth' perspective in this project.
>
> Cheers.
>
>
>
>
>
> On Thu, 23 Aug 2007 15:09:03 -0700, Nestor <n35...@yahoo.com> wrote:
> > Hi Jeje,

>
> > Thanks for the recommendation, I've been ponder about this and thought
> > that's the best approach as well. It's great to hear from someone who
> > agrees with the design. At the end of the day there should be one tha
> > caters to real time reporting and another for historical analysis
> > reporting. It's a pity that the vision of 'one version of truth' might
> > have to be compromised in this case though.

>
> > Cheers.

>
> > On Wed, 22 Aug 2007 17:58:46 -0700, Jeje <willg...@hotmail.com> wrote:

>
> >> Like marco says,

>
> >> use views in your source database to apply some business rules and
> >> cleansing process.

>
> >> for the choice of MOLAP or ROLAP...

>
> >> historical partitions generally are stored in MOLAP mode, because you
> >> don't update them frequently.
> >> the "Today" partition can be in ROLAP mode and use indexed views.
> >> But I recommend the low latency MOLAP mode. provides good performance
> >> by keeping the cache longer, but you suffer few seconds of latency
> >> between an update and the cube update.
> >> if you want to be more realtime the HOLAP mode provides updated result
> >> all the time.

>
> >> SCD dimension should not impact historical data. but your business key
> >> must be unique other the time to insure usage of both real time and
> >> historical content.

>
> >> but regarding the data cleansing process you have to do and other
> >> technical constraints.
> >> Try to split your solution in 2 projects:
> >> Real time project --> contains only "today" data, there is no SCD
> >> dimension
> >> Historical project -> contains all other data and the SCD dimensions

>
> >> generally doing analysis against the history will not answer the same
> >> questions as the real time access.

>
> >> "Nestor" <n35...@yahoo.com> wrote in message
> >>newsp.txgz9yzc83lx0t@david.virginam.com...
> >>> Hi Jeje,

>
> >>> Thanks for the recommendation. The tricky part is will need the system
> >>> to fulfil real time scenarios, have the capability to store historical
> >>> data as well as being able to due with very large SCDs.

>
> >>> Proactive caching when set up properly does changes from MOLAP to
> >>> ROLAP and to MOLAP again according to set up (some reference
> >>>http://msdn2.microsoft.com/en-us/library/ms174769.aspx). I suppose at
> >>> the end of the day I might need to identify what needs to be real time
> >>> (hence ROLAP structure) and what needs to be historical + scheduled
> >>> (hence MOLAP and Snowflaked).

>
> >>> I did some intensive research to UDM and basically 'concluded' that
> >>> the concept of data cleansing doesn't sit very well in it. Seems like
> >>> I need to break out the system into 2 conceptual areas in order to
> >>> fulfil all that requirements...

>
> >>> Thanks.
> >>> Regards,
> >>> David Ong

>
> >>> On Tue, 21 Aug 2007 16:40:56 -0700, Jeje <willg...@hotmail.com> wrote:

>
> >>>> SSAS 2005 like any other OLAP tool on the market can works with flat
> >>>> schemas and more complicated schemas.

>
> >>>> but using star or snowflake schema is far better for a DW project and
> >>>> when you create a cube.
> >>>> SSAS 2005 is able to create the tables for you, so instead of
> >>>> creating the tables in the database then creating the cube, you
> >>>> create your dimensions and cubes first and SSAS2005 will generates
> >>>> the tables for you.
> >>>> this greatly simplify the prototype step of your project.

>
> >>>> proactive caching don't switch from MOLAP to ROLAP.
> >>>> proactive caching can detect changes in the source database and then
> >>>> update the cube automatically or can do an incremental loading
> >>>> automatically.
> >>>> there is a lot of options available here for different usage and
> >>>> requirements.
> >>>> but there is no switch from MOLAP to ROLAP.

>
> >>>> the ETL process is not required if your cubes are directly connect to
> >>>> the operational database.
> >>>> today, a lot of databases schemas are more accessible then before and
> >>>> the data quality is good, so accessing the source database is a good
> >>>> option.
> >>>> it's also a good option when you have only 1 database to play with
> >>>> (and not a lot of databases to synchronize)
> >>>> but if you have to sync. multiple databases, or you have to do a lot
> >>>> of cleansing / transformations, then an ETL process is required.
> >>>> another option we can found today:
> >>>> the transaction made to the operational system are also replicated
> >>>> into the datawarehouse in real time (using triggers, procedures or
> >>>> workflow systems) so the DW database himself become a realtime system.

>
> >>>> "Nestor" <n35...@yahoo.com> wrote in message
> >>>>newsp.txe6yasa83lx0t@david.virginam.com...
> >>>>> Hi everyone,

>
> >>>>> I have a general question to datawarehousing and it'll be great if
> >>>>> someone can drop some tips here. I'm about to start designing a
> >>>>> corporate real time datawarehouse with SQL Server 2005 and is
> >>>>> deciding on the approach for this.

>
> >>>>> SQL Server 2005 now allows us to actually build a datawarehouse
> >>>>> without actually restructuring the datasources into star/snowflake
> >>>>> schemas. I'm curious to know if this will have any performance
> >>>>> impact on the actual implementation? This project will probably
> >>>>> require the usage of proactive caching as well as medium latency
> >>>>> molap storage structure.

>
> >>>>> By using proactive caching, SSAS will intelligently switch between
> >>>>> ROLAP and MOLAP depending on the data changes in the production
> >>>>> database. What is confusing to me is, what happens to ETL then? Does
> >>>>> this mean that if SSAS is made to query directly from the production
> >>>>> database, then ETL is totally eliminated?

>
> >>>>> Any recommendations will be appreciated.

>
> >>>>> Thanks.
> >>>>> Nes

>
> >>> -- Using Opera's revolutionary e-mail client:
> >>>http://www.opera.com/mail/

>
> --
> Using Opera's revolutionary e-mail client:http://www.opera.com/mail/- Hide quoted text -
>
> - Show quoted text -


As you note, the problem linking directly with relational database is
the lack of surrogate keys.
One simple solution is splitting the issue in 2 projects.
One harder solution (to build and to maintain) is the use of a
separate partition for the "current data". You might prefer to feed
this partition with a very high frequency using a SSIS package that
"pushes" data directly into the partition. In this case you have an
ETL that might generate all surrogate keys you need. At the end of the
day, the "current day" partition can be destroyed and the "regular"
ETL can process the new data with full historical attributes (you
might skip some dimensions in the "current day" processing).

In general, you have to balance the cost of development, the time to
process, the volume of data and the latency the user can tolerate. I
think there is no "silver bullet" that is good for every requirement.

Marco Russo
http://www.sqlbi.eu
http://sqlblog.com/blogs/marco_russo

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:41 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
www.UnixAdminTalk.com