Table of Contents Table of Contents
Previous Page  9 / 96 Next Page
Information
Show Menu
Previous Page 9 / 96 Next Page
Page Background

W H I T E P A P E R

© 2017 Persistent Systems Ltd. All rights reserved. 9

www.persistent.com

We describe the impact of the cloud for analytics projects in two levels: first, globally, in this chapter; and second, in

our transversal topics, in the dimensional modeling, data quality and non-functional aspects chapters. At a global

level, we will look at the following market categories:

1. Cloud data warehouses

2. Cloud integration tools

3. Cloud / on premise application integration

3.1.1 Cloud datawarehouses

Until three years ago, data warehousing was essentially an on-premises initiative, with data migration and security

issues playing a large role in keeping warehoused storage of corporate information within the walls of organizations.

The first vendor to challenge this status quo wasAmazon, who launched in 2013 its Redshift data warehouse service.

The main value proposition was low cost (about a tenth of the cost of traditional data warehouse solutions when it first

came out), which has often been one of the most painful aspects of data warehousing. This is becoming possible, not

only thanks to the pay-per-use model, but also because database vendors are increasingly switching from expensive

2

proprietary hardware to low-cost commodity servers and storage technolog

y .

The data warehouse as a service options now available in the market include IBM's dashDB and Microsoft's Azure

SQL Data Warehouse. In addition, Oracle, Teradata and SAP offer cloud-based versions of their data warehouse

platforms. New startups such as Snowflake, a columnar cloud DB built on top of AWS S3 offering SQL access to semi-

structured data, were launched in 2015. Google has also introduced their own offer: BigQuery, the public

implementation of Dremel, a (really) massively parallel, scalable query service for datasets of nested data, which is

now positioned as an analytics data warehouse.

The most common use cases for cloud databases are (i) the traditional enterprise data warehouse, (ii) the big

data/data lake use case which we describe in detail in section

,

and (iii) software as a service companies

3.2.2

embedding analytic functionality within their cloud applications.

Most vendors offer both private and public cloud versions of their software. On public clouds, cloud warehouse

vendors also have PaaS offers and IaaS offers. The former are fully managed data warehouses, while the latter allow

customers to take advantage of cloud based hardware and storage, with clients retaining administrative rights on the

3

DBMS warehouse running on this hardwar

e ,

allowing customers to implement, for instance, hybrid cloud

warehouses –more on this below.

The advertised benefits of public cloud data warehouses are:

Faster time to market, especially because of the specialized personnel skills available at the provider to get a

cloud data warehouse up and running, and to operate it.

All the lower TCO benefits of generic cloud computing (e.g., pay per usage, multiple tenants).

Periodic warehouses (load-process-discard) with associated cheaper payment by the hour, for both report

generation and for testing environments for every target platform.

2

For instance, Redshift is powered by a massively parallel processing (MPP) database system: ParAccel (which is now part of Actian), running

on commodity servers, so performance is not being sacrificed just for a low-cost point.

3

A good example is Microsoft: Azure SQL Database is a fully managed PaaS offer, and SQL Server in Azure VM is an IaaS offer.