

W H I T E P A P E R
17
For instance, a sales by city aggregate for last month will be wrong when notice of a customer moving two months ago arrives.
© 2017 Persistent Systems Ltd. All rights reserved. 40
www.persistent.com
—
Periodic Snapshot Loader: An ETL fact table builder can be run at the periodicity of a day, week or month,
to capture a snapshot to update this fact table.
—
Accumulating Snapshot Loader: An ETL Fact table builder runs at intervals to gather the state transitions
and summarize them into theAccumulating Snapshot Fact.
—
Subsystem14: Surrogate Key Pipeline: This is the process that exchanges the natural keys of each dimension
by the surrogate keys and handles any referential integrity error. Dimension table processing must be
complete before the fact data enters the surrogate key pipeline, for both the initial (historic) load and the
incremental load.
Mapping tables should be retained in the ETL pipeline to get the corresponding
surrogate keys for natural keys.
The ETL should have logic to insert into the dimension as well as the
mapping tables, in case a new natural key is encountered.
—
Subsystem 15: Multi-Values Dimension Bridge Table Builder:
Construction of the bridge table should
happen during the during the ETL pipeline
. Additional complications arise as these dimensions may also
be SCDType 2.
—
Subsystem 16: LateArriving Data Handler:
The ETL system needs to take care of late arriving data, either
dimension or fact records
. In the former case, if the business can live without facts that have no dimension
data, one strategy is to park aside (into another area), fact data arriving before its dimension data. If reports
with all fact data are needed, the facts could be loaded with dummy dimension placeholder records with their
definitive surrogate key initially. These dimension placeholders are later updated when its data arrives.
Additional consideration is needed when the dimension is slowly changing and updates arrive late, as the
surrogate keys in the fact table point to the wrong dimension data, and for summary (aggregate) tables based
17on now obsolete dimension data . If the Facts arrive late, expensive searches are needed to backtrack and
adjust all the summaries, semi-additive facts, as well as locating which dimension surrogate keys were in
effect at that time. During this time window, the reports will not be always accurate.
—
Subsystem 17: Dimension Manager System: The set of dimensions should be tightly controlled by a
Dimension manager interacting with other subsystems managing dimensions (9 to 12, 15, 16) The difficulty
increases with multi-system EDWs and distributed environments where dimension tables are replicated. In
this case,
a best practice is to have this subsystem replicate dimensions to the Fact Provider Systems
(see below) with a version number attached to each dimension row.
The version number is very useful in
case of drill across queries spanning systems, to ensure consistency.
—
Subsystem 18:
Creation of a Fact Provider system
is a best practice, as it is responsible for
replacing/recalculating surrogate keys, accumulating snapshots, and aggregations using some of the
subsystems introduced above, plus subsystem 19. In distributed, drill-across query environments, it receives
conformed dimensions released by the dimensionmanager.
—
Subsystem 19: Aggregate Builder: This applies in general to environments that have not selected OLAP
engines in their technical architecture (see section
as well as the point below).
ETL must make sure
5.3.3that the aggregates are consistent with the base data
. There are several possible implementations that
range from relying on materialized view DBMS technology, to explicit ETL code implementation of this
subsystem. The Fact Provider System usually triggers the Aggregate Builder based on changes to Fact table
data.