Table of Contents Table of Contents
Previous Page  39 / 96 Next Page
Information
Show Menu
Previous Page 39 / 96 Next Page
Page Background

W H I T E P A P E R

© 2017 Persistent Systems Ltd. All rights reserved. 39

www.persistent.com

Subsystem 10: Surrogate Key Generator: The ability to

generate surrogate keys

independently for each

dimension, independent of database instance, independent of the operational key, with the ability to serve

distributed clients, in a non-bottlenecked and scalable manner

is an often overlooked, but very crucial

practice for ETLperformance and portability.

Subsystem 11: Hierarchy Manager: For intermediate tables and

ETL staging tables storing dimensions

with hierarchies, often normalized structures work best

, to have the database assist in pre-verifying the

many-to-one hierarchy relationships through referential integrity, especially if the data comes from an informal

source (see pages 393 and 442). However, as mentioned in section

above, the same discussion about

5.3.3

dimensions with hierarchies at the presentation layer points to a different conclusion: prefer de-normalization,

except whenmeeting several of the conditions mentioned above.

Subsystem 12: Special Dimensions Manager: This ETL subsystem deals with dimensions as the ones

presented below.

The best practice in this case is to manage them in a special way and not just like the

normal dimensions.

Date/Time: Dates and times over last few decades, and next few decades are stored here, and with

consideration to financial reporting years of the organization. These dimensions generally don’t come from

amanaged source and are typically created in spreadsheets and imported.

“Junk” dimensions

( ,

page 263): These are dimensions recommended to be built from test and

[1]

miscellaneous flags left over in the fact table, and that are not naturally members of any existing

dimension. Rows should be created either based on number of rows known in advance, or as-and- when

fact table inputs rows arrive.

Mini-dimensions: These are recommended when it is required to track changes to frequently changing

dimension attributes in a very large (wide) dimension and the SCD Type 2 technique is too costly in terms

of size. The solution is to

break these dimension attributes in their own separate dimension table

,

called amini-dimension (see

,

page 259).

[1]

Shrunken dimensions: These consist of conformed dimensions that are a subset of rows and/or columns

of a base dimension, and built entirely from its base dimension rather than from source data to assure

consistency.

Small static dimensions: These are dimensions built without a real outside source usually for small lookups

(e.g., time zones, full names of months of the year).

User maintained dimensions: These dimensions are created by the lines of business and are typically

used for descriptions, groupings and hierarchies for reporting and analysis. Examples are organization

chart structures, and lines of business sales hierarchies. As such,

it is recommended to have the

ownership of the data be with the business team,

and the ETL team just imports data from an

operational/front-end systemwith the appropriate defaults.

Subsystem 13: Fact Table Builders: The ETL considerations for the three kinds of fact tables are listed below.

All three table typesmay coexist, as each allowsmeeting different needs.

Transaction Grain Fact Table Loader: After the initial load, records are fed into the transaction fact table by

using the CDC (Change Data Capture) system, and the data is loaded with the proper surrogate foreign

keys in a process described in subsystem 14 below, thereby guaranteeing referential integrity. Tables are

usually

partitioned by transaction date

(for database administration and performance), and columns

related to ETL job sequence ID, loading timestamp and source lineage information. Late arriving fact data

should be processed along the lines of subsystem16 (see below).