

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 42
www.persistent.com
—
Step 9: Aggregate Table and OLAP Loads. Aggregation may be managed by technology such as relational
materialized views or OLAP cubes, in which case most of the maintenance work is done automatically after
initial loading. Incremental processing refreshing the aggregates is possible even if aggregates are
maintained by ETL code (you can be confident this is the case in OLAP and materialized view technology);
however, a major change of a dimension attribute, such as a Type 1 SCD attribute change, may require
reprocessing the aggregate table; this will be the case no matter how aggregates are implemented, if by ETL
code or through database technology.
—
Step 10: ETLSystemOperation andAutomation.
—
Jobs are scheduled for a specific time, or may be triggered based on polling.
Aclear workflow should be
built for steps requiring order such as dimension management before facts, aggregate refresh
after incremental fact processing, OLAP rebuild.
—
Concentrate on
ETL testing
( ,p. 455). First, use a small dataset to unit-test both fact table loading and
[1]dimension loading before throwing the full set of historic data at it. Direct error records to a file, investigate
what went wrong, fix the code and rerun the test. When getting your first production load complete, with the
right record count, don’t cry victory too soon: confirm that the data is accurate at the target (again, a
profiling tool is your best alternative). When initial loads are confirmed accurate, move on to test the
incremental load processes
( ,p. 467). This is often neglected, is very unglamorous,
but these test
[1]scriptsmake the difference between a successful ETLproject and an unsuccessful one.
—
A point that deserves particular attention is how to
handle unpredictable errors
, and how to
recover
automatically from them in a transaction-consistent manner
, especially given the fact that these
errors typically occur in a “lights-out” environment. Many ETL products provide tools and features to
enable you to build a robust system under all possible failures, but doing so is more complicated than it
seems.
Alerts on unpredictable errors should be raised and audited.
—
Once any unpredictable error occurs and ETL is re-run, it should not result into any data loss or
inflation;
i.e. the ETL queries or workflows must be coded to handle this. This is very important as the DW
is supposed to provide single version of the truth.
—
There are several supporting database tasks necessary for ETL operations, e.g., table space
management, periodic index maintenance, and database backups.
These tasks should be automated
within the ETL environment to allow for end-to-end operations testing
(see next section), preferably
by using database systemmetadata.
—
Finally, strive to
develop and test an automated operation as complete as possible
for ETLprocesses:
the ideal ETL operation runs the regular load processes in a lights-out manner, without human
intervention. More on this on the section below.
5.3.6Dimension DataModeling - Deployment Considerations
The deployment stage should go through the traditional rigors of alpha and beta testing prior to general availability and
rollout (and this is true of any subsequent delivery after the initial release of the data warehouse).