WP_Cloud Analytics - Mapping Requirements to Technology

W H I T E P A P E R

www.persistent.com

Elasticity

is the ability to fit the resources needed to cope with loads dynamically, usually in relation to scale out,

so that when the load increases you scale by adding more resources and when demand wanes you shrink back

and remove unneeded resources. Elasticity is mostly important in cloud environments where you pay-per-use

and don’t want to pay for resources you do not currently need on the one hand, and want to meet rising demand

when needed on the other hand.

There are a variety of benefits and disadvantages to each approach. Scaling up can be expensive, and ultimately,

some experts argue that it’s not viable because of the limits to individual hardware pieces on the market. However,

it does make it easier to control a system, and to provide for certain data quality issues. Scale out is very

popular, as it is the technology behind tools like Apache Hadoop. Here, central data handling software systems

administrate huge clusters of hardware pieces, for systems that are often very versatile and capable.

At the infrastructure layer, elasticity is easier to handle, and Hadoop-as-a-Service offerings providing elastic,

auto-scaling clusters are becoming common (although not pervasive yet: this is something to watch-out for). At

the database layer, NoSQL databases pioneered the introduction of the elasticity property in their systems: these

DBMSs have been designed so that they can be elastic or can be dynamically provisioned in the presence of

load fluctuations. As we argue in section

4.3 ,

NoSQL databases are meant for operational systems with simple

analytics requirements, and solve the elasticity problem by providing write atomicity only at a record level (i.e.,

for a single key, in key-value pair NoSQL systems

) 20 .

On the other hand, traditional DBMS systems handle more

general transactions and more complex queries, and are in general intended for an enterprise infrastructure that

is statically provisioned. Thus, elasticity for general DBMSs is a much more complex problem to solve. Even MPP

databases, which are optimized for analytics workloads, assume static cluster hardware configurations. This is

however starting to change. Some cloud databases are starting to provide for elasticity by separating storage and

compute nodes, detect tenant contention and scale up and down tenants within resource limits. Both Azure SQL

data warehouse and Snowflake are examples of this new trend.

9 Appendix 2 – IaaS management in Microsoft Azure

This appendix complements our running example in

section 5

by providing more detail on the management of

the infrastructure layer and the tools available to IT administrators to manage SQL server on virtual machines in

Azure.

At the infrastructure level (VMs, storage, and networks), IT administrators use IaaS services as follows:

• Azure IaaS VM service allows to control the size of the VMs; parameters include the number of cores,

size of RAM, storage capacity, disk throughput and network bandwidth. Each size determines an hourly

price. As mentioned below, when SQL Server runs on these VMs, right-sizing and properly configuring

them for performance must be well understood. The VM service provides automated features to

dramatically simplify patching, backup, and high availability, as well as monitoring to diagnose problems

in VMs and get alert notifications on metric values or events.

• The VM service provides a means to detect health of virtual machines running on the platform and

to perform auto-recovery of those virtual machines in case they fail. Microsoft provides an availability

SLA of 99.9% for single instance Virtual Machine

s 21 .

This SLA does not cover processes (such as SQL

Server) running on the VM and requires customers to host at least two VM instances in an

availability

t 22 .