

W H I T E P A P E R
18
In this case, there relational aggregation tables would not be needed (except in mixed OLAP/relational environments).
© 2017 Persistent Systems Ltd. All rights reserved. 41
www.persistent.com
—
Subsystem 20: OLAP Cube Builder. This applies to environments that have selected OLAP engines as their
18primary user access presentation server in their technical architecture (see section
above) .
At the
5.3.3very end of the ETL processing, the OLAP cubes get loaded
. However, as mentioned above, they are
sensitive to the SCD nature of dimensions. Type 1 and Type 3 SCDs would cause large scale
reloading/rebuilding of OLAPdata (the same is true with aggregate tables just described).
—
Subsystem 21: Data Propagation Manager:
The ETL system is the platform that now handles the
distribution of integrated and conformed data of the warehouse to other systems, through data
extracts
(subsuming the former EAI systems). Inputs from the warehouse feed to external entities
(customers, regulatory bodies, partners, vendors), and for corporate analytical systems for data mining and
algorithmic analysis. Some amount of transformation may be required from facts and dimensions into the
formats required by each system.
Below are some best practices related to the development of dimensional modeling
( ,Chapter 10):
[1]—
Use a data profiling tool
(or some queries that can be used to perform Data Profiling). Using a data profiling
tool helps uncover the actual state of quality of the data (usually, not expressed via database metadata),
understanding data issues, and sometimes, even discovering relationships and derivation rules. This is
discussed more widely in section
below. The data profile is an input to the ETL transformations and logic,
6.2and guides the data quality effort.
—
Step 5: This
checklist of steps to follow for dimension tables
is about the initial population of dimension
data, including data cleansing and match/consolidation, especially when populating from different sources
(this is developed in section
below), Type 1 / Type 2 transformations, validating relationships, surrogate
6.2.4key generation and assignment, techniques for fast loading, and loading Date and other special dimensions
introduced above.
—
Step 6: Perform the Fact Table Historical (initial) Load. Extracting the right range of facts, with the right date
filter criteria is important to start the Fact table loading. Audit statistics with counts and sums gathered in these
steps are important to perform validation and verification later. The fact records may contain derived,
calculated facts, although in many cases it is more efficient to calculate them in an OLAP cube or a view rather
than in the physical table.
Loading should be done by disabling constraints, dropping indexes, fast
loading, and then enabling constraints and re-creating indexes.
—
Step 7: Dimension Table Incremental Processing has two basic strategies: extract full load at the source and
compute the delta at the target, or extract only new and changed records at the source. The first strategy
applies when the source system has no capability to identify the new, changed and deleted data;
the second
strategy should be preferred, as it minimizes data transfer loads and expensive comparison
operations.
As with initial loads, the dimension data should be cleaned and transformed prior to identifying
new and changed rows. Techniques for identifying changes in the rows for Type 1 and Type 2 SCD processing
if the dimension has many rows and columns is presented in pg 458.
—
Step 8: Fact Table Incremental Processing.
Automatic auditing, error handling
(including persistence in an
error event schema and verifying quality measures, as mentioned in section
below)
and enabling
6.2.4restarts after catastrophic errors in case the incremental load fails, are key in this step.
Loading of the
transactional fact table should be straightforward; however loading the accumulating snapshot fact table
could be complex. Loading could be improved by trickle load or parallel batch load operation, as suited to the
database.