

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 39
www.persistent.com
—
Subsystem 10: Surrogate Key Generator: The ability to
generate surrogate keys
independently for each
dimension, independent of database instance, independent of the operational key, with the ability to serve
distributed clients, in a non-bottlenecked and scalable manner
is an often overlooked, but very crucial
practice for ETLperformance and portability.
—
Subsystem 11: Hierarchy Manager: For intermediate tables and
ETL staging tables storing dimensions
with hierarchies, often normalized structures work best
, to have the database assist in pre-verifying the
many-to-one hierarchy relationships through referential integrity, especially if the data comes from an informal
source (see pages 393 and 442). However, as mentioned in section
above, the same discussion about
5.3.3dimensions with hierarchies at the presentation layer points to a different conclusion: prefer de-normalization,
except whenmeeting several of the conditions mentioned above.
—
Subsystem 12: Special Dimensions Manager: This ETL subsystem deals with dimensions as the ones
presented below.
The best practice in this case is to manage them in a special way and not just like the
normal dimensions.
—
Date/Time: Dates and times over last few decades, and next few decades are stored here, and with
consideration to financial reporting years of the organization. These dimensions generally don’t come from
amanaged source and are typically created in spreadsheets and imported.
—
“Junk” dimensions
( ,page 263): These are dimensions recommended to be built from test and
[1]miscellaneous flags left over in the fact table, and that are not naturally members of any existing
dimension. Rows should be created either based on number of rows known in advance, or as-and- when
fact table inputs rows arrive.
—
Mini-dimensions: These are recommended when it is required to track changes to frequently changing
dimension attributes in a very large (wide) dimension and the SCD Type 2 technique is too costly in terms
of size. The solution is to
break these dimension attributes in their own separate dimension table
,
called amini-dimension (see
,page 259).
[1]—
Shrunken dimensions: These consist of conformed dimensions that are a subset of rows and/or columns
of a base dimension, and built entirely from its base dimension rather than from source data to assure
consistency.
—
Small static dimensions: These are dimensions built without a real outside source usually for small lookups
(e.g., time zones, full names of months of the year).
—
User maintained dimensions: These dimensions are created by the lines of business and are typically
used for descriptions, groupings and hierarchies for reporting and analysis. Examples are organization
chart structures, and lines of business sales hierarchies. As such,
it is recommended to have the
ownership of the data be with the business team,
and the ETL team just imports data from an
operational/front-end systemwith the appropriate defaults.
—
Subsystem 13: Fact Table Builders: The ETL considerations for the three kinds of fact tables are listed below.
All three table typesmay coexist, as each allowsmeeting different needs.
—
Transaction Grain Fact Table Loader: After the initial load, records are fed into the transaction fact table by
using the CDC (Change Data Capture) system, and the data is loaded with the proper surrogate foreign
keys in a process described in subsystem 14 below, thereby guaranteeing referential integrity. Tables are
usually
partitioned by transaction date
(for database administration and performance), and columns
related to ETL job sequence ID, loading timestamp and source lineage information. Late arriving fact data
should be processed along the lines of subsystem16 (see below).