WP_Data Management Best Practices

W H I T E P A P E R

This actually depends on available technology, which could be difficult to use by data stewards, or not –more on this in section below.

6.3.4

www.persistent.com

6.2Best Practices

6.2.1 Data quality at project definition stage

At the time that an analytics initiative is being evaluated to become a project, the first recommended best practice is to

perform a readiness check of your organization for the project to have a chance of succeeding.

p. 16,

[1]

Kimball convincingly points out that the potential issues with the data needed to support the business motivation for

the initiative is the single technical feasibility factor that may be a deal breaker for the readiness of an organization to

launch the project (the other two factors being strong business sponsorship and a compelling business reason for the

project).

It is important to say that Kimball’s strong stance assumes all along that analytics projects rely on a well-defined, well-

structured data warehouse (or a data mart, which conceptually is a subset of a warehouse) on which to build business

intelligence applications. This is not the case with data lakes, where the primary motivation is not to lose any data

related to a business process as it might be important for analysis further down the road. We will further develop this

topic in section

below.

6.3.3.2

Readiness to proceed on this data feasibility aspect then translates into readiness of the candidate data sources.

Another best practice at this stage is to

perform a quick assessment to disqualify early a candidate data source

from the quality point of view

(

p. 16). There are several causes for disqualifying data sources: (i) the required

[1],

data is not yet collected, or (ii) it is collected but not at the level of detail required, or (iii) it has severe data quality

problems (data values are incomplete, inaccurate, inconsistent, duplicated, obsolete, etc.).

Depending on the seriousness of the issues, it may be a challenge to construct a data warehouse with the available

data sources. If the issues are very serious and many data sources are disqualified, it may be wiser to defer this

project until the IT department closes the data feasibility gaps, and consider another business initiative with less data

feasibility problems.

When the project is being launched and the core team is lined up, in addition to the classic roles of project manager,

data architect, ETL developer, BI developer, etc., a best practice we already mentioned is to

make sure there is a

data steward

( ,

p. 35). He/she should work with business experts to address data quality issues and make sure

[1] 20

that IT developers understand them if they are to implement the corresponding validation and cleansing rule

points out (in p. 125), master data managed by MDM systems are starting to get traction because of their

[1]

support for transactional systems: they are positioned to be the authoritative source in an enterprise to reconcile

different sources for the same attribute, such as customer name or product price. Errors in master data can have

significant costs (e.g. an incorrect priced product may imply that money is lost); MDM systems fix this kind of

problems. Our experience on the subject (see section

)

is aligned with this view, and MDM systems are great

6.3.1

news for analytics projects: it makes the integration problem much simpler and they solve the integration problem in

the source systems that created the problem.

If the organization has a serious customer, vendor or product

integration problem, the recommendation is to start lobbying for a master data management (MDM)

system

rather than continuously trying to fix the problem in the EDWETLsystem.