

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 54
www.persistent.com
Finally, it is highly recommended to influence management about
establishing a corporate data governance
program
( ,p. 56) to commit to a continuous data quality improvement process that transcends departmental level
[1]organizations and works at the enterprise level. Technology is an enabler, but it does not fix quality problems in an
organization. In
,page 381, Kimball proposes an interesting 9-step data governance program template for any
[1]organization that wants to address and build data quality as part of its culture (he calls it information governance, but
the two terms are basically synonymous). Michael Hammer in his famous reengineering book
points to several
[19]case studies where improvements in information technology and, in particular, in the quality of data involved in key
business processes, was credited as an essential enabler of spectacular gains in productivity in well-known
corporations (in his own words: “seemingly small data quality issues are, in reality, important indications of broken
business processes”).
6.2.2 Data Quality at Requirements Definition Stage
At this stage, it is recommended to have the data steward
have a first “dig into the data”
( ,pp. 95, 99) to better
[1]understand the underlying data sources, starting with the primary data source for the project at hand. It is suggested to
talk to the owners of the core operational system of the project, as well as with the database administrator and the data
modeler. The goal of this data audit at this early stage is to perform a strategic, light assessment on the data to
determine its suitability for inclusion in the data warehouse and provide an early go/no go decision. It is far better to
disqualify a data source at this juncture, even if the consequence is a major disappointment for your business sponsor
and users, rather than coming to this realization during the ETLdevelopment effort.
This exploration journey will be
greatly facilitated by a profiling tool,
rather than hand coding all your queries (e.g.,
SELECTDISTINCT on a database column). Profiling should continue as requirements are getting extracted.
6.2.3 Data Quality at Design Stage
The data quality related activities at this stage are still driven by a
deeper, tactical profiling effort
of the formal data
sources, i.e., those maintained by IT, and the informal sources, coming from the lines of business or which are external
to the organization. The first step in this process is to understand all the sources that are candidates for populating the
target model; the second, is to evaluate each, and determine each data source
( ,p. 307). The outcome (p. 308)
[1]includes the following:
—
Abasic “Go/ No Go” decision for each data source
—
Data quality issues that must be corrected at the source systems before the project can proceed
—
Data quality issues that can be corrected in the ETL processing flow after extraction – sort out standardization,
validation, cleansing andmatching needs.
—
Unanticipated business rules, hierarchical structures and foreign key / primary key relationships.
From this analysis, a decision needs to be taken regarding the best source to populate the dimensional model. A
criterion for choice in case of two or possible feeds for the data include accessibility and data accuracy, as explained in
page 308 of
. [1]Data stewards should
try in obtaining and validating optional data
, which systems and data source owners are
happy to leave unfilled (p. 321).