Table of Contents Table of Contents
Previous Page  12 / 54 Next Page
Information
Show Menu
Previous Page 12 / 54 Next Page
Page Background

W H I T E P A P E R

www.persistent.com

© 2017 Persistent Systems Ltd. All rights reserved.

12

As you can see, no single technology can capture all requirements. Even two or three requirements, but taken

to their extreme, are either difficult to satisfy with a single existing technology, or a combination of different

database technologies is needed (Lambda architecture is in fact an example). Let us illustrate this with a couple

of examples.

a. At Facebook, the analytics group had to provide OLAP style queries with very low latencies and very

high velocities

[12] .

They experimented with several technologies, nothing worked and, at the end, had

to build a data and query execution engine that worked for the

m 12 .

b. Without extreme query performance requirements, the variety of analytics tasks may bend an architecture

towards polyglot persistence, as in the case of Flipkart, an eCommerce company

[13] :

large incoming

data volumes, data processing at different velocities (both real time and batch), and an analytics layer

requiring ad-hoc analysis, search, machine learning and canned reporting. Their data layer includes

Hadoop (Hive, Spark), Storm, Vertica (an MPP warehouse) and ElasticSearch (see

10 .4.2.1 1 )

. If the

high

velocity requirement is dropped there would still probably be Hadoop, Vertica and ElasticSearch

in the

picture, given the analytics requirements.

4.4 The BI / analytics tools

This is a very important aspect of decision making being the one that most impacts business end users. The

modern cloud database needs to support the breadth of tools that organizations can use to get actionable results

from the data. BI is a good fit for the cloud when the visualization tools are close to where the data is, which is

now the case with cloud analytics. The choice of BI/Analytics tool depend on several dimensions.

a.

Query types

–Traditional BI tools were built for the reporting analytic workload; ad-hoc querying and

OLAP came later and have more “free-form” user experiences and interfaces, sometimes imposing

limitations on the types of queries that may be defined (see section

5.2.4 )

.

b.

Performance and scalability

– This is an area normally associated with the database / data warehouse

layer, but the analytics layer also contributes to the overall time spent (again, refer to section

5.2.4

below

for an example).

c.

Analytic workload

– If the requirement is about reporting or dashboarding, then most cloud platforms

also provide solutions e.g. from SAP, IBM and Oracle as SaaS services from their own clouds or from

Azure and AWS. However, if you are looking for exploration and discovery use cases, then look for tools

like Tableau, Qlikview, etc; these are mainly desktop solutions but can work with cloud sources and can

publish reports and dashboards to the cloud. For machine learning use cases, cloud service providers do

offer them as a service, e.g. Amazon ML, Azure ML, Watson Analytics. Google Analytics offers complete

BI stack in the cloud: it not only offers visual data discovery, exploration, collaboration, and reporting, but

also analytic applications for marketing, sales, service, and social platforms. Finally, if the requirement is

to build a full-blown solution in a given vertical industry, then we are talking about embedding analytics

capabilities in an application that is to be built and deployed using PaaS

development services

and tools.

d.

Data integration / data quality

– Also referred to as data preparation, it has been recently recognized

that it is highly desirable, in a modern BI toolset, to include features to integrate data coming from

different sources and address the heterogeneity of data representations, conventions and standards,

missing values, as well as duplicated records, that impact the quality of data. The most common way

this is being addressed is by loosely coupling self-service

data preparation

tools with BI tools, as will be

explained in the next section.

12

At the root of the problem, OLAP engines operate on mostly static datasets