WP_Data Management Best Practices

W H I T E P A P E R

www.persistent.com

7.2.3 Performance at Dimensional DWModel design stage

—

Design the DW model in such a way that it provides a balance between requirements of slice/dice and filters

across dimensions and fact data, and the need for preserving history of changes

(

page 262, 270, 332, 338,

[1]

339). We review the three patterns introduced in section

froma performance point of view hereunder:

5.2

—

Star Schemas allows for history to be tracked individually on dimension tables, as well as slicing/dicing by

dimensions. De-normalization of dimensions is done to avoid capturing too many minuscule relationships.

This is generally themost efficient froma performance angle.

—

Snow-flake model (normalized or 3NF) works best if you want to slice-dice by all possible relationships.

Other condition where this may apply is in the case of ragged dimension hierarchies and outrigger

dimensions, as explained in section

However, due to too many joins, the database view complexity

5.3.3

goes up and performance is slower. If this is really a requirement, then assess the performance impact

before designing, as the space saving may not be substantial (between 1 to 3% as compared to de-

normalization –

page 267).

[1]

—

Fully de-normalized fact table with dimension attributes and no dimension tables (degenerate

dimensions) – this may be the most simplistic design approach but may be difficult to maintain in the long

run. If any attribute value changes (SCD Type-I or SCD Type-II), it may warrant updating the whole fact

table which is costly from performance point. Also, if there is a need to show a filter in reports to choose

values from, then it would require creating additional views on the fact table with DISTINCT clause which is

inefficient. If slicing/dicing by dimensions is not a key requirement, then this pattern is best from

performance as there are no or minimal joins or indexes lookups. Degenerate dimensions are normal and

helpful too.

—

Design aggregate tables to speed-up dashboards and score-cards which only show summary data. The

aggregations are used to delivery fast query responses. Listed below are 3 aggregation patterns deriving from

[1] .

For purpose of discussion, we shall use a dimension model with Attendance as the main fact, and the

Dimensions as Date, Promotion, Showing, Theatre, andMovie.

—

Lost dimensions/ Summaries of Fact tables are created by only using required dimensions. Data from

other dimensions is “lost”. For instance, a daily sum of earnings by theatre is a case of lost dimensions as

the dimensions Promotions, Showing, andMovie do not matter.

—

Rolled up dimension: Aggregations is often used with dimensions that are hierarchical in the business

world. Some common aggregation dimensions would be date and geographies. However, some business

specific aggregations follow dimensions of fixed hierarchies. These dimensions are also called a

Shrunken Rollup dimension.

—

Collapsed dimension (de-normalized): Columns that are part of dimensions usually get reproduced as is in

the de-normalized structure that has the facts and the dimension attributes.