WP_Data Management Best Practices

W H I T E P A P E R

www.persistent.com

3.3 Self-service tools and agility

Data integration continues to primarily be an IT-centric activity based on the data, database and technology know-how

needed. Typically, data integration platforms are purchased, managed and used by IT groups responsible for BI, data

warehousing, master datamanagement and other datamanagement programs.

On the other hand, business unit and department heads want faster, better, cheaper information processing. Analysts

need data to solve an increasing number of business questions, and frequently the data they have at hand (LOB excel

files, or corporate reports and dashboards) are not useful. They have to rely on IT data architects to either

(i) Ingest and integrate new, disparate datasets into the warehouse,

(ii) Curate new datasets from existing warehouse data; in other terms, transform existing tables or views virtually

(through views) or in a materialized way (through cleansing/integration transforms) into new layouts as

requested by the analyst, and/or

(iii) Build reports fromexisting datasets in the warehouse for the analysts.

The IT architects are already short on capacity to manage an increasingly complex information technology landscape

for the enterprise, and can't keep up with these (generally unplanned) analyst requests. This generates a bottleneck.

Today, we live in the era of self-service, and this applies to information as well. Business units and departments are

eager to process information and applications themselves, without or with limited IT assistance or intervention, to get

through this high latency problem to get at the data they need.

The first category of self-service tools that appeared in the market was self-service, interactive data visualization and

discovery tools. They provide BI capabilities suitable for workgroup or personal use, where the key criteria are ease of

use, flexibility and control over how to model and build content without having to go to IT. This area became the fastest

growing area of BI in recent years, and it is not uncommon to see them in use in a significant portion of PSL's customer

engagements.

However, these categories of tools only address point (iii) in the laundry list of IT architect tasks above, while leaving

the remaining two tasks: finding, preparing the data for analysis and curating a dataset that an analyst can use, which

basically is still 80% of the work. Of course, this does not go too far. Business users and analysts are now demanding

access to self-service capabilities beyond data discovery and interactive visualization of IT-curated datasets, to

include access to sophisticated data integration tools to prepare data for analysis.

What is meant by “data preparation” in this setting is the ability, through a data-driven experience, for the user

—

To profile and visually be able to highlight the structure, distribution, anomalies and repetitive patterns in data,

thanks to datamining andmachine learning algorithms,

—

To standardize, cleanse and de-duplicate the data, again thanks in part to the same technology,

—

To enrich it with data that is functionally dependent of the data at hand (for instance, to determine missing postal

codes with the available address data),

—

To combine it with other datasets,