W H I T E P A P E R
www.persistent.com
© 2017 Persistent Systems Ltd. All rights reserved.
5.2.6 Additional PaaS services
The customer utilized the other PaaS services offered by the Azure cloud to meet a few other key requirements
as described below.
• Azure Service Bus to decouple the source system database with the DW: The OLTP product, also to
be deployed in the cloud, is the source application for the analytic component. The underlying OLTP
database schema changes are more frequent than the Analytical DW product revisions. In addition,
customer did not want tight coupling between DW and the source database, as there could be several
consumers of the OLTP data in future –the DW is the single consumer for now. Changes happening
in the source OLTP database are sent as an event message in the form of a JSON message to the
Azure Service Bus. This message envelope has a product major version, minor version and patch
version. This helped ETL to subscribe and consume only those messages which are relevant for ETL/
DW product version. The Service bus topics were partitioned per tenant to achieve parallel consumption
of messages.
• DW backup and recovery: the customer plans to use Backup vault and the Recovery Services vault to
back up the VMs periodically and restore in case of failures.
• Deployment tools: Customer has used Windows PowerShell scripts and workflows (runbooks), to build
the code from the repository and create the deployment kit. This process is automated.
• Real-time resource monitoring: Customer plans to use Azure Diagnostics extensions to collect
performance statistics on Service Bus worker roles, VMs and the OLTP application.
6 Product-to-Product comparison
As it can be seen from previous sections, most of the cloud service providers offer similar building blocks for data
ingestion, processing, streaming, machine learning and visualizations. At the outset, all four have everything
covered; however, there are minor feature differences in terms of implementation. This section describes these
differences.
6.1.1 ETL
All cloud providers have ETL services to offer data flow from external sources into cloud storage. AWS has
Data Pipelineand
Kinesis ,Azure has
Data Factoryand
Stream Analytics ,Google has
Dataflowand IBM has
Data Connectand
Streaming Analytics .All of them provide basic ETL / data processing functionality; however,
support to input/output sources differ. Since this support is also becoming available for additional sources
every couple of months, we suggest the reader to go respective sites to find out the sources each product
supports. One main difference is that Google is the only provider which doesn’t have two separate offering for
traditional ETL and stream processing.
6.1.2 Machine Learning
While AWS has a solid set of products around machine learning, they lack in pre-trained learning models when
compared to Azure. While AWS has good UI interface for ML, it lacks a managed lab notebook, which is a feature
generally appreciated by data scientists. Azure also offers custom R models running over big data. Compared to
AWS andAzure, Google’s Tensorflow has been getting a lot of attention recently and there will be many who will be
keen to see Machine Learning come out of preview. While Google has a strong rich set of pre-trained APIs, it lacks
BI dashboards and visualizations. On the other hand, IBM’s Watson Analytics works more like interrogating data
by asking English questions, more useful for problems where pre-packaged solutions are not sufficiently available.
In short, we can catalog it more as a data discovery tool while Azure ML would be more of development tool.
22