

W H I T E P A P E R
20
This actually depends on available technology, which could be difficult to use by data stewards, or not –more on this in section below.
6.3.4
© 2017 Persistent Systems Ltd. All rights reserved. 53
www.persistent.com
6.2Best Practices
6.2.1 Data quality at project definition stage
At the time that an analytics initiative is being evaluated to become a project, the first recommended best practice is to
perform a readiness check of your organization for the project to have a chance of succeeding.
In
,p. 16,
[1]Kimball convincingly points out that the potential issues with the data needed to support the business motivation for
the initiative is the single technical feasibility factor that may be a deal breaker for the readiness of an organization to
launch the project (the other two factors being strong business sponsorship and a compelling business reason for the
project).
It is important to say that Kimball’s strong stance assumes all along that analytics projects rely on a well-defined, well-
structured data warehouse (or a data mart, which conceptually is a subset of a warehouse) on which to build business
intelligence applications. This is not the case with data lakes, where the primary motivation is not to lose any data
related to a business process as it might be important for analysis further down the road. We will further develop this
topic in section
below.
6.3.3.2Readiness to proceed on this data feasibility aspect then translates into readiness of the candidate data sources.
Another best practice at this stage is to
perform a quick assessment to disqualify early a candidate data source
from the quality point of view
(p. 16). There are several causes for disqualifying data sources: (i) the required
[1],data is not yet collected, or (ii) it is collected but not at the level of detail required, or (iii) it has severe data quality
problems (data values are incomplete, inaccurate, inconsistent, duplicated, obsolete, etc.).
Depending on the seriousness of the issues, it may be a challenge to construct a data warehouse with the available
data sources. If the issues are very serious and many data sources are disqualified, it may be wiser to defer this
project until the IT department closes the data feasibility gaps, and consider another business initiative with less data
feasibility problems.
When the project is being launched and the core team is lined up, in addition to the classic roles of project manager,
data architect, ETL developer, BI developer, etc., a best practice we already mentioned is to
make sure there is a
data steward
( ,p. 35). He/she should work with business experts to address data quality issues and make sure
[1] 20that IT developers understand them if they are to implement the corresponding validation and cleansing rule
s.
As
points out (in p. 125), master data managed by MDM systems are starting to get traction because of their
[1]support for transactional systems: they are positioned to be the authoritative source in an enterprise to reconcile
different sources for the same attribute, such as customer name or product price. Errors in master data can have
significant costs (e.g. an incorrect priced product may imply that money is lost); MDM systems fix this kind of
problems. Our experience on the subject (see section
)is aligned with this view, and MDM systems are great
6.3.1news for analytics projects: it makes the integration problem much simpler and they solve the integration problem in
the source systems that created the problem.
If the organization has a serious customer, vendor or product
integration problem, the recommendation is to start lobbying for a master data management (MDM)
system
rather than continuously trying to fix the problem in the EDWETLsystem.