

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 44
www.persistent.com
System deployment in production starts by passing the complete set of tests on a test system that is as similar as the
production system as possible. Then, move the software artifacts from the test repository to the production repository.
The more automated and parameterized this can be, the better – and this deployment move is something else
to test
. And perform a final automated test on the production system before letting people in. Deployment is much
harder on an existing system in production. Pay attention to the chain dependencies: reports depend on views, which
depend on data warehouse tables, which depend on ETL data pipelines, which depend on data sources. Changing
any of these may break something down the line, so a system that maintains dependencies and allows impacts to be
dealt with individually is critical.
Other deployment considerations for dimensional data warehouses are the following:
—
From a dimension data perspective,
having a single deployment platform for MDM, Identity data,
Customer 360 andBI/Reporting is the recommended approach.
—
Storagemanagement is another key component of the Deployment
, as the growth of data under analysis,
can affect the performance (response times), andmay force storage reconfigurations.
—
The
deployment should facilitate behind the scenes housekeeping
- ETLmetadata, availability, BCP/DR
(Backup Continuity Planning/Disaster Recovery), Performance, Security,Archival, StagingManagement.
5.4 Enhancements to the reference publication
5.4.1 DimensionModeling in our own experience
—
High Level Model Diagram is important for anchoring discussion
. The creation of a high-level model
diagram (Kimball
,figure 7-3) helps all the stakeholders including business users, in discussions. Diagrams
[1]usually elicit better feedback and discussion. This is even more important with distributed teams. The visual
anchoring of the conversations as well as the visual recall provided by the diagram are of great value. To this
effect, the advice about not radically altering the model (or even the relative positions within the high-level
diagram) is very relevant.
—
Automatic Change Data Capture (CDC) mechanisms at the source are key for keeping the data
warehouse up to date in the presence of hard deletes at the source.
Indeed, if the source deletes rows
physically, only database logs or database triggers that generate data as a side effect of the deletion, with the
correct operation code (DELETE), will provide the ETL layer with the necessary information to keep the target
database up to date without having to compare full table contents from the source and the target during
incremental loads, which will be a performance killer when tables are large. Also, as further discussed in the
point below,
CDCmechanisms based on database transaction logs or triggers can be used for pushing
changes in real time to the data warehouse
(as opposed to pulling them through queries). Sectio
n 7.2.4compares thesemechanisms froma performance point of view.
—
Creation of a “data highway” is often needed. Kimball highlights the need for data traveling at various
speeds / in different lanes:
—
Raw-source (immediate) - CEP, alerts, fraud
—
Real time (seconds) - ad selection, sports, stock market, IoT, systems monitoring
—
BusinessActivity (minutes) - Trouble Tickets, Workflows, MobileApp Dashboards
—
Top Line (24 hours) – Tactical Reporting, Overview status dashboards
—
EDW (daily, periodic, yearly) – Historical analysis, analytics, all reporting