Table of Contents Table of Contents
Previous Page  13 / 54 Next Page
Information
Show Menu
Previous Page 13 / 54 Next Page
Page Background

W H I T E P A P E R

www.persistent.com

© 2017 Persistent Systems Ltd. All rights reserved.

13

e.

End user skills

– A growing demand for easy to use tools accessing trusted data in the cloud has

created a shift in the BI market towards governed self-service. Organizations can now enable broader

access of analytic insight to remain competitive without requesting their business users to improve their

technical skills: business users can analyze data without necessarily having to write queries in SQL, as

they did with Excel, but now through more powerful tools such as PowerBI, Tableau, and Qlik Sense.

On the other end, traditional BI product suites require dedicated IT resources with developer skills,

as they are more complex to implement; as mentioned above, most are available as SaaS services

and can also be used in single tenant mode installed from the cloud provider marketplace directly on

top of their infrastructure. The platform also needs to support a new breed of users, data scientists,

who run experiments with the data, develop predictive analytic and ML models, and assist in real-time

decision-making.

4.5 Data movement /ETL tools

Once you have decided a data model for your cloud database, you also need to decide how to transform and load

data from one or multiple sources into it. Integration and data movement was identified in

[1]

as the second

leading obstacle, after security, to cloud adoption, pointing to the critical importance of full-featured data

integration tools for the cloud. For this reason, you might need to consider this before making the final choice of

cloud platform provider.

a.

Data Integration

and

Data quality

- Data needs to be integrated and processed for quality either when

it is written in the cloud data warehouse schema, a simpler NoSQL schema, or at a later point in time,

in a data lake. Make sure your transformation needs are covered, whatever your data requirements

might be. Possible pitfalls of PaaS data movement / ETL include (i) reusing legacy transforms: this is

generally not supported, as the tool that was used on premises is not the same as the tool retained for

the clou

d 13

; (ii) the non-availability of quality specific transforms such as cleansing and de-duplicating,

which are present generally in more mature on-premise tools; (iii) processing data at high velocities (see

below): typical transformations on high velocity data may include joining data from multiple streams,

and rolling window aggregation functionality; and (iv) make sure there is a comprehensive data lifecycle

management and administration capabilities.

We believe that development productivity remains a serious obstacle in the cloud, as with on-premise

ETL. Self-service data integration tools such as Trifacta and Alteryx (see last point below) are a possible

path for mitigating this problem.

b. The choice of service level when managing your data movement tool –as with cloud databases, with

data movement tools there is also a deployment choice between IaaS or as PaaS, so this can be seen

as part of

Resource management.

IaaS deployment of traditional ETL tools is a way to solve the DI/DQ pitfalls enumerated on the previous

point, as they are still more mature than PaaS data movement tools. Internal technical skills, analyzed

separately below, may also weigh in on the final choice. On the other hand, PaaS data movement

requires less administration, management and setup than traditional ETL deployed on IaaS. As with their

cloud database platform service counterparts, availability and scalability of ETL tools is also taken care

by the PaaS provider: this matters with large data volumes (see more on this requirement below). PaaS

data movement tools are much more likely to outperform 3rd party ETL tools, for instance, by taking

advantage of parallelism in data transfers to internal nodes of a target MPP data warehouse cluster,

something JDBC based connections of non-native ETL tools will have a hard time doing (especially if

running outside of the cloud provider).

13

Even within the same vendor, we have found that the on premises tool and the cloud tool are not always fully interoperable. In this case, one possible option is to deploy the

on-premise tool on IaaS; another is to use the on-premise tool from your premises if it supports connectivity to the selected cloud database (requiring IT administrators to open

an external communications port, something that administrators don’t easily allow).