Table of Contents Table of Contents
Previous Page  86 / 96 Next Page
Information
Show Menu
Previous Page 86 / 96 Next Page
Page Background

W H I T E P A P E R

© 2017 Persistent Systems Ltd. All rights reserved. 86

www.persistent.com

Cloud-based services, whether operating as high-level analytics services or foundational platform services,

address some security capabilities while introducing new challenges. The service provider may be addressing

platform and network security to a high degree of assurance but lack visibility into who has accessed what. In

this cases, it is desirable to build an auditing framework (or re-use an existing one) on premise to control data

access.

Compliance: Compliance requirements can come from both internal and external sources and organizations

might adhere to certain regulatory requirements or those imposed by customers or partners. Some typical

data-related compliance requirements that might affect a cloud provider include: PCI DSS, HIPAA, SOX etc.

Visibility: Extended governance requires visibility into cloud operations, including ETL, archiving, and the like.

Cloud providers offer tools and protection strategies to (i) avoid problems that may occur during normal

operations, as well as (ii) to support service-level agreements. These are summarized in the following table.

7.3.4Big data

The following best practices apply to the performance and security management of a big data environment.

7.3.4.1 Performance

Hadoop is a flexible, general purpose environment for many forms of processing presented in section

3.2

above. The same data in Hadoop can be accessed and transformed with Hive, Pig, HBase, Spark and

MapReduce (MR) code written in a variety of languages, even simultaneously. Choose the tooling that

provides optimal performance for your use case as depicted below.

MR applies massive parallel computation to the data, but is a batch operation and is too slow for interactive

workloads. Hive onMapReduce and Pig inherits from this problem.

Partitioning the data sets is the single key recommended best practice to speed up computations on data

lakes

Concern

Protection strategy

Accidental information disclosure

Permissions File, partition, volume or application-

level encryption

Data integrity compromise

Permissions

Data integrity checks

Backup / Restore

Versioning

Accidental deletion

Permissions

Backup

Versioning

System, infrastructure, hardware or software

availability

Backup / Restore

Replication