WP_Cloud Analytics - Mapping Requirements to Technology

W H I T E P A P E R

www.persistent.com

10.2.2 AWS Components in detail

10.2.2.1 AWS Data Pipeline

AWS Data Pipeline is a web service that helps to reliably process and move data between different AWS compute

and storage services, as well as on premise data sources, at specified intervals. With AWS Data Pipeline, we can

regularly access data where it’s stored, transform and process it at scale, and efficiently transfer the results to

AWS services such as Amazon S3, Amazon RDS, Amazon Dynamo DB, and Amazon EMR. AWS Data pipeline

is a reliable, easy to use, flexible, scalable and transparent web service for data process and data transfer.

References

10.2.2.2 AWS Lambda

AWS Lambda is a compute service that lets users run code without provisioning or managing servers. AWS

Lambda executes code only when needed and scales automatically, from a few requests per day to thousands

per second. AWS Lambda can be used to run a code in response to events, such as changes to data in an

Amazon S3 bucket or an Amazon Dynamo DB table; response to HTTP requests using Amazon API Gateway; or

invoke a code using API calls made using AWS SDKs.

Lambda performs operational and administrative activities for users, including capacity provisioning, scaling, high

availability, monitoring fleet health, applying security patches, deploying the code, running a web service front

end, and monitoring and logging the user’s functions. Supported runtimes include

, Python, Java and C#

through .NET Core.

References

http://docs.aws.amazon.com/lambda/latest/dg/welcome.html

10.2.2.3 Amazon Redshift

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud.

Amazon Redshift significantly lowers the cost of a data warehouse, also makes it easy to analyze large amounts

of data very quickly. AWS Redshift provides different features such as specially optimized for data warehouse,

Petabyte scale, automated backups, encryption, network isolation and fault tolerant.

References

10.2.2.4 Amazon RDS

Amazon Relational Database Service (or Amazon RDS) is a distributed relational database service by Amazon

Web Services. It is a web service running in the cloud designed to simplify the setup, operation, and scaling

of a relational database for use in applications. Complex administration processes like patching the database

software, backing up databases and enabling point-in-time recovery are managed automatically. Scaling storage

and compute resources can be performed by a single API call. Amazon RDS provides six familiar database

engines to choose from, including Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle, and Microsoft SQL

Server.

References