Table of Contents Table of Contents
Previous Page  36 / 96 Next Page
Information
Show Menu
Previous Page 36 / 96 Next Page
Page Background

W H I T E P A P E R

16

For instance, a hire date on employees, or a product introduction date, and the business needs to slice and dice by non-calendar date

attributes (in this case, a virtual outrigger is to be set up on a date dimension table, as it plays a different role).

© 2017 Persistent Systems Ltd. All rights reserved. 36

www.persistent.com

Dimensions are large, as theremay be space savings,

Slicing-dicing by all possible hierarchy attributes is needed (in addition, some BI tools work better with a

snowflaked design), and/or

There is a need for other fact tables at a grain corresponding to higher hierarchy levels (as they can hook

into the appropriate level of the snowflake structure; besides, surrogate keys at higher hierarchy level may

be used for aggregating fact data with summary tables or with OLAP cubes–these keys enjoy the same

advantages related to stability under changes just discussed in the previous point).

The hierarchy is ragged, i.e., where the number of levels is variable and of the same type of data (as in

organization structures or in a parts hierarchy). In this case, a bridge table (a form of snowflaking) works

best, if you need to both restrict facts based on dimension data and summarize them based on the

hierarchical recursive relationship (see

,

page 269).

[1]

Another exception is what is called an “outrigger dimension”, which is a dimension table joined to another

16

dimension tabl

e

, i.e., another form of snowflaking. A discussion when it makes sense to use outriggers

can be read in

,

page 267.

[1]

When a fact measurement is associated with multiple values of the same dimension (even at the lowest level

of granularity), it is necessary to have

a bridge table to link the fact and the dimension

. Examples of open-

ended multiple values, such as a variable number of diagnoses for the same hospital bill line item can’t be

resolved directly on the fact table.

To satisfy lookups and joins, it is recommended to

have the dimension columns of the Fact table declared

as NOT NULL. This also means that each dimension should have a Null value as a row

. Generally, a row

with key -1, and value as “undefined” or “unknown” is inserted in the Dimension table, and the -1 is the

corresponding value in the related Fact table record, to make sure the lookup/join from fact to dimension is

correctly satisfied. NULLs are also not very intuitive for SQLsummary and ranking operations.

Historization/Slowly Changing Dimension: Kimball

,

page 257 also has a great and educative discussion on

[1]

SCD dimensions. There are multiple SCDs,

the most common of them is the Type 2

, which provides both

the newer and older entry, with an indicator for which is the most current record, and the begin/end validity

dates. See also the discussion on subsystem9 in the sectio

n

below, as well as the discussion about mini-

5.3.5

dimensions in subsystem12.

Special Types of Fact tables: These two kinds of facts have always been constructed where needed, in

addition to the basic transaction fact, in response to a report/query requirement.

Periodic Snapshot Fact Tables:

At the periodicity of a day, week or month, a snapshot is captured for

the fact table, with regular repeating measurement or set of measurements

. This is common in the

financial industry for monthly balances, in ERP and retail systems for inventory balances, and in the travel

industry for bookings. Both

en-masse

, end of period loads (adding rows per period) and continuous, rolling

loads (updating the current period rows) may be implemented.

Accumulating Snapshot Fact Tables: For processes with many states with a well-defined beginning and

end (e.g., added to cart, paid, shipped, delivered, returned), the lowest grain transaction fact would have

multiple records with different dates for the same order.

The accumulating snapshot fact has the

discrete states listed with dates

to enable queries like average interval between delivery dates and

return dates.