Table of Contents Table of Contents
Previous Page  33 / 96 Next Page
Information
Show Menu
Previous Page 33 / 96 Next Page
Page Background

W H I T E P A P E R

© 2017 Persistent Systems Ltd. All rights reserved.

33

www.persistent.com

Adopting data modeling and ETL tools

that improve developer productivity (see sections

and

4.4.1.1 5.4.1 )

.

Placing a

heavy emphasis on repeatable, automated testing

that implies a test automation framework –we

already commented on this in section

above.

4.3.4

Investing in emerging technologies such as BI SaaS or cloud data warehouse may also help in

developing new reporting capacity and capabilities faster

without the footprint of setting up nd managing

an on-premises data warehouse environment (see the discussion in section

) 3.1.1

Selectively repurposing the staging area from a hands-off area to a publicly available resource to

business users.

This area represents a partial view of source data in a raw form. Typically, business users

cannot understand the raw source data, and in this area performance is not reliable. However, the upside to

keeping a full view of the raw source data in the staging database is to keep all the data deemed relevant for

analysis without filtering out anything and even without having completed all the details of the data model (you

will have recognized here the same principle animating data lakes, see section

)

. That way, technically

3.2.2

savvy users can access the raw source data through ad hoc queries against the staging database and allow

them to validate initial business rules, which usually helps uncover new business logic that impacts the data

integration and data quality endeavors. Reaching these crucial validation points early in the design and

development stage can help avoid scrap and redesign.

5. Dimensional datamodeling

5.1 Dimension DataModelingOverview

In

[1] ,

Kimball describes Dimensional modeling as “a logical design technique for structuring data so that it is

intuitive to business users and delivers fast query performance. It divides the world into measurements, or facts,

and context, or dimension

s 15

.

Facts: Themeasurements of the business are usually numerical and are referred to as

facts.

Dimensions: The context that describes the “who, what, when, where, why, and how” of the measurement are

the dimensions.

Every dimensional model is composed of one table with a multi-part key, called the fact table, and a set of smaller

tables called

dimension

tables. Each dimension table has a single part primary key that corresponds exactly to one of

the components of themultipart key in the fact table.

Dimensional modeling is arguably the best approach to support the twin goals of understandability and query

performance in BI / analytic environments.

15

A historical footnote: the terms “dimension” and “fact” originated from developments conducted jointly by General Mills and Dartmoth

University in the late 1960s. Data marts were defined as dimensional models describing a single business process. Nielsen was the first

vendor to provide dimensional marts for retail sales in the late 70s. The original terms “conformed dimensions” and “conformed facts” were

described by Nielsen Marketing Research to Ralph Kimball in 1984. Finally, the idea that a data warehouse can be built incrementally from a

series of data marts with conformed dimensions was fully described by Ralph Kimball in a DBMS magazine article in August 1996.