

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 18
www.persistent.com
3.3 Self-service tools and agility
Data integration continues to primarily be an IT-centric activity based on the data, database and technology know-how
needed. Typically, data integration platforms are purchased, managed and used by IT groups responsible for BI, data
warehousing, master datamanagement and other datamanagement programs.
On the other hand, business unit and department heads want faster, better, cheaper information processing. Analysts
need data to solve an increasing number of business questions, and frequently the data they have at hand (LOB excel
files, or corporate reports and dashboards) are not useful. They have to rely on IT data architects to either
(i) Ingest and integrate new, disparate datasets into the warehouse,
(ii) Curate new datasets from existing warehouse data; in other terms, transform existing tables or views virtually
(through views) or in a materialized way (through cleansing/integration transforms) into new layouts as
requested by the analyst, and/or
(iii) Build reports fromexisting datasets in the warehouse for the analysts.
The IT architects are already short on capacity to manage an increasingly complex information technology landscape
for the enterprise, and can't keep up with these (generally unplanned) analyst requests. This generates a bottleneck.
Today, we live in the era of self-service, and this applies to information as well. Business units and departments are
eager to process information and applications themselves, without or with limited IT assistance or intervention, to get
through this high latency problem to get at the data they need.
The first category of self-service tools that appeared in the market was self-service, interactive data visualization and
discovery tools. They provide BI capabilities suitable for workgroup or personal use, where the key criteria are ease of
use, flexibility and control over how to model and build content without having to go to IT. This area became the fastest
growing area of BI in recent years, and it is not uncommon to see them in use in a significant portion of PSL's customer
engagements.
However, these categories of tools only address point (iii) in the laundry list of IT architect tasks above, while leaving
the remaining two tasks: finding, preparing the data for analysis and curating a dataset that an analyst can use, which
basically is still 80% of the work. Of course, this does not go too far. Business users and analysts are now demanding
access to self-service capabilities beyond data discovery and interactive visualization of IT-curated datasets, to
include access to sophisticated data integration tools to prepare data for analysis.
What is meant by “data preparation” in this setting is the ability, through a data-driven experience, for the user
—
To profile and visually be able to highlight the structure, distribution, anomalies and repetitive patterns in data,
thanks to datamining andmachine learning algorithms,
—
To standardize, cleanse and de-duplicate the data, again thanks in part to the same technology,
—
To enrich it with data that is functionally dependent of the data at hand (for instance, to determine missing postal
codes with the available address data),
—
To combine it with other datasets,