

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 9
www.persistent.com
We describe the impact of the cloud for analytics projects in two levels: first, globally, in this chapter; and second, in
our transversal topics, in the dimensional modeling, data quality and non-functional aspects chapters. At a global
level, we will look at the following market categories:
1. Cloud data warehouses
2. Cloud integration tools
3. Cloud / on premise application integration
3.1.1 Cloud datawarehouses
Until three years ago, data warehousing was essentially an on-premises initiative, with data migration and security
issues playing a large role in keeping warehoused storage of corporate information within the walls of organizations.
The first vendor to challenge this status quo wasAmazon, who launched in 2013 its Redshift data warehouse service.
The main value proposition was low cost (about a tenth of the cost of traditional data warehouse solutions when it first
came out), which has often been one of the most painful aspects of data warehousing. This is becoming possible, not
only thanks to the pay-per-use model, but also because database vendors are increasingly switching from expensive
2proprietary hardware to low-cost commodity servers and storage technolog
y .The data warehouse as a service options now available in the market include IBM's dashDB and Microsoft's Azure
SQL Data Warehouse. In addition, Oracle, Teradata and SAP offer cloud-based versions of their data warehouse
platforms. New startups such as Snowflake, a columnar cloud DB built on top of AWS S3 offering SQL access to semi-
structured data, were launched in 2015. Google has also introduced their own offer: BigQuery, the public
implementation of Dremel, a (really) massively parallel, scalable query service for datasets of nested data, which is
now positioned as an analytics data warehouse.
The most common use cases for cloud databases are (i) the traditional enterprise data warehouse, (ii) the big
data/data lake use case which we describe in detail in section
,and (iii) software as a service companies
3.2.2embedding analytic functionality within their cloud applications.
Most vendors offer both private and public cloud versions of their software. On public clouds, cloud warehouse
vendors also have PaaS offers and IaaS offers. The former are fully managed data warehouses, while the latter allow
customers to take advantage of cloud based hardware and storage, with clients retaining administrative rights on the
3DBMS warehouse running on this hardwar
e ,allowing customers to implement, for instance, hybrid cloud
warehouses –more on this below.
The advertised benefits of public cloud data warehouses are:
—
Faster time to market, especially because of the specialized personnel skills available at the provider to get a
cloud data warehouse up and running, and to operate it.
—
All the lower TCO benefits of generic cloud computing (e.g., pay per usage, multiple tenants).
—
Periodic warehouses (load-process-discard) with associated cheaper payment by the hour, for both report
generation and for testing environments for every target platform.
2
For instance, Redshift is powered by a massively parallel processing (MPP) database system: ParAccel (which is now part of Actian), running
on commodity servers, so performance is not being sacrificed just for a low-cost point.
3
A good example is Microsoft: Azure SQL Database is a fully managed PaaS offer, and SQL Server in Azure VM is an IaaS offer.