W H I T E P A P E R
www.persistent.com
© 2017 Persistent Systems Ltd. All rights reserved.
Elasticity
is the ability to fit the resources needed to cope with loads dynamically, usually in relation to scale out,
so that when the load increases you scale by adding more resources and when demand wanes you shrink back
and remove unneeded resources. Elasticity is mostly important in cloud environments where you pay-per-use
and don’t want to pay for resources you do not currently need on the one hand, and want to meet rising demand
when needed on the other hand.
There are a variety of benefits and disadvantages to each approach. Scaling up can be expensive, and ultimately,
some experts argue that it’s not viable because of the limits to individual hardware pieces on the market. However,
it does make it easier to control a system, and to provide for certain data quality issues. Scale out is very
popular, as it is the technology behind tools like Apache Hadoop. Here, central data handling software systems
administrate huge clusters of hardware pieces, for systems that are often very versatile and capable.
At the infrastructure layer, elasticity is easier to handle, and Hadoop-as-a-Service offerings providing elastic,
auto-scaling clusters are becoming common (although not pervasive yet: this is something to watch-out for). At
the database layer, NoSQL databases pioneered the introduction of the elasticity property in their systems: these
DBMSs have been designed so that they can be elastic or can be dynamically provisioned in the presence of
load fluctuations. As we argue in section
4.3 ,NoSQL databases are meant for operational systems with simple
analytics requirements, and solve the elasticity problem by providing write atomicity only at a record level (i.e.,
for a single key, in key-value pair NoSQL systems
) 20 .On the other hand, traditional DBMS systems handle more
general transactions and more complex queries, and are in general intended for an enterprise infrastructure that
is statically provisioned. Thus, elasticity for general DBMSs is a much more complex problem to solve. Even MPP
databases, which are optimized for analytics workloads, assume static cluster hardware configurations. This is
however starting to change. Some cloud databases are starting to provide for elasticity by separating storage and
compute nodes, detect tenant contention and scale up and down tenants within resource limits. Both Azure SQL
data warehouse and Snowflake are examples of this new trend.
9 Appendix 2 – IaaS management in Microsoft Azure
This appendix complements our running example in
section 5by providing more detail on the management of
the infrastructure layer and the tools available to IT administrators to manage SQL server on virtual machines in
Azure.
At the infrastructure level (VMs, storage, and networks), IT administrators use IaaS services as follows:
• Azure IaaS VM service allows to control the size of the VMs; parameters include the number of cores,
size of RAM, storage capacity, disk throughput and network bandwidth. Each size determines an hourly
price. As mentioned below, when SQL Server runs on these VMs, right-sizing and properly configuring
them for performance must be well understood. The VM service provides automated features to
dramatically simplify patching, backup, and high availability, as well as monitoring to diagnose problems
in VMs and get alert notifications on metric values or events.
• The VM service provides a means to detect health of virtual machines running on the platform and
to perform auto-recovery of those virtual machines in case they fail. Microsoft provides an availability
SLA of 99.9% for single instance Virtual Machine
s 21 .This SLA does not cover processes (such as SQL
Server) running on the VM and requires customers to host at least two VM instances in an
availability
se
t 22 .• At the Azure IaaS storage system, an unfamiliar aspect in cloud deployments is that there is no access
to the underlying hardware. However, IO activity can be monitored and analyzed using analytics when
enabled at the account level –in this case, blob operations are persisted and metrics can be defined and
aggregated over time to understand and benchmark the storage system.
20
Evidence is emerging that indicates that in many application scenarios this is not enough. This problem was recognized recently by senior architects from Amazon and
Google (which has led to systems such as MegaStore, at the heart of Google’s App Engine, that provide transactional guarantees on entity groups that represent fine-grained,
application-defined partitions).
21
This is a downtime of 43 minutes a month, approximately (MSFT even talks about guaranteed downtime of 15 minutes!)
22
Defined as 2 or more VMs deployed across different Fault Domains to avoid a single point of failure –and avoiding guaranteed downtime.
26