Table of Contents Table of Contents
Previous Page  22 / 54 Next Page
Information
Show Menu
Previous Page 22 / 54 Next Page
Page Background

W H I T E P A P E R

www.persistent.com

© 2017 Persistent Systems Ltd. All rights reserved.

5.2.6 Additional PaaS services

The customer utilized the other PaaS services offered by the Azure cloud to meet a few other key requirements

as described below.

• Azure Service Bus to decouple the source system database with the DW: The OLTP product, also to

be deployed in the cloud, is the source application for the analytic component. The underlying OLTP

database schema changes are more frequent than the Analytical DW product revisions. In addition,

customer did not want tight coupling between DW and the source database, as there could be several

consumers of the OLTP data in future –the DW is the single consumer for now. Changes happening

in the source OLTP database are sent as an event message in the form of a JSON message to the

Azure Service Bus. This message envelope has a product major version, minor version and patch

version. This helped ETL to subscribe and consume only those messages which are relevant for ETL/

DW product version. The Service bus topics were partitioned per tenant to achieve parallel consumption

of messages.

• DW backup and recovery: the customer plans to use Backup vault and the Recovery Services vault to

back up the VMs periodically and restore in case of failures.

• Deployment tools: Customer has used Windows PowerShell scripts and workflows (runbooks), to build

the code from the repository and create the deployment kit. This process is automated.

• Real-time resource monitoring: Customer plans to use Azure Diagnostics extensions to collect

performance statistics on Service Bus worker roles, VMs and the OLTP application.

6 Product-to-Product comparison

As it can be seen from previous sections, most of the cloud service providers offer similar building blocks for data

ingestion, processing, streaming, machine learning and visualizations. At the outset, all four have everything

covered; however, there are minor feature differences in terms of implementation. This section describes these

differences.

6.1.1 ETL

All cloud providers have ETL services to offer data flow from external sources into cloud storage. AWS has

Data Pipeline

and

Kinesis ,

Azure has

Data Factory

and

Stream Analytics ,

Google has

Dataflow

and IBM has

Data Connect

and

Streaming Analytics .

All of them provide basic ETL / data processing functionality; however,

support to input/output sources differ. Since this support is also becoming available for additional sources

every couple of months, we suggest the reader to go respective sites to find out the sources each product

supports. One main difference is that Google is the only provider which doesn’t have two separate offering for

traditional ETL and stream processing.

6.1.2 Machine Learning

While AWS has a solid set of products around machine learning, they lack in pre-trained learning models when

compared to Azure. While AWS has good UI interface for ML, it lacks a managed lab notebook, which is a feature

generally appreciated by data scientists. Azure also offers custom R models running over big data. Compared to

AWS andAzure, Google’s Tensorflow has been getting a lot of attention recently and there will be many who will be

keen to see Machine Learning come out of preview. While Google has a strong rich set of pre-trained APIs, it lacks

BI dashboards and visualizations. On the other hand, IBM’s Watson Analytics works more like interrogating data

by asking English questions, more useful for problems where pre-packaged solutions are not sufficiently available.

In short, we can catalog it more as a data discovery tool while Azure ML would be more of development tool.

22