

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved.
33
www.persistent.com
—
Adopting data modeling and ETL tools
that improve developer productivity (see sections
and
4.4.1.1 5.4.1 ).
—
Placing a
heavy emphasis on repeatable, automated testing
that implies a test automation framework –we
already commented on this in section
above.
4.3.4—
Investing in emerging technologies such as BI SaaS or cloud data warehouse may also help in
developing new reporting capacity and capabilities faster
without the footprint of setting up nd managing
an on-premises data warehouse environment (see the discussion in section
) 3.1.1—
Selectively repurposing the staging area from a hands-off area to a publicly available resource to
business users.
This area represents a partial view of source data in a raw form. Typically, business users
cannot understand the raw source data, and in this area performance is not reliable. However, the upside to
keeping a full view of the raw source data in the staging database is to keep all the data deemed relevant for
analysis without filtering out anything and even without having completed all the details of the data model (you
will have recognized here the same principle animating data lakes, see section
). That way, technically
3.2.2savvy users can access the raw source data through ad hoc queries against the staging database and allow
them to validate initial business rules, which usually helps uncover new business logic that impacts the data
integration and data quality endeavors. Reaching these crucial validation points early in the design and
development stage can help avoid scrap and redesign.
5. Dimensional datamodeling
5.1 Dimension DataModelingOverview
In
[1] ,Kimball describes Dimensional modeling as “a logical design technique for structuring data so that it is
intuitive to business users and delivers fast query performance. It divides the world into measurements, or facts,
and context, or dimension
s 15.
—
Facts: Themeasurements of the business are usually numerical and are referred to as
facts.
—
Dimensions: The context that describes the “who, what, when, where, why, and how” of the measurement are
the dimensions.
Every dimensional model is composed of one table with a multi-part key, called the fact table, and a set of smaller
tables called
dimension
tables. Each dimension table has a single part primary key that corresponds exactly to one of
the components of themultipart key in the fact table.
Dimensional modeling is arguably the best approach to support the twin goals of understandability and query
performance in BI / analytic environments.
15
A historical footnote: the terms “dimension” and “fact” originated from developments conducted jointly by General Mills and Dartmoth
University in the late 1960s. Data marts were defined as dimensional models describing a single business process. Nielsen was the first
vendor to provide dimensional marts for retail sales in the late 70s. The original terms “conformed dimensions” and “conformed facts” were
described by Nielsen Marketing Research to Ralph Kimball in 1984. Finally, the idea that a data warehouse can be built incrementally from a
series of data marts with conformed dimensions was fully described by Ralph Kimball in a DBMS magazine article in August 1996.