Table of Contents Table of Contents
Previous Page  78 / 96 Next Page
Information
Show Menu
Previous Page 78 / 96 Next Page
Page Background

W H I T E P A P E R

© 2017 Persistent Systems Ltd. All rights reserved. 78

www.persistent.com

to what is anticipated in production with mix of concurrent users accessing different types of reports with

varying query workloads or resource needs. For example - for a Retail data warehouse, the workload could be

– 5 Regional managers looking at inventory and sales snapshots of aggregated data, week-on-week

performance while 25 category managers looking at detailed sales records for the last day and inventory

forecast needs for next week.

5. D

evelop a plan to conduct ongoing performance monitoring regularly.

Collect system and database

statistics to understand how system usage and performance may be changing over time to access if future

data growth can be handled with the hardware or software deployed

6.

In general, prefer processing as close to the database as opposed to techniques outside of the

database.

At the same time, prefer using high-level tools for faster development cycles. The sweet spot are

those tools that are high level, that contain the functionality you need, and that generate efficient code that runs

close to the data.

7.3.1.2 Specific solutions

1. ETL andBI Performance - Real-time insights for Network &Subscriber Intelligence

Requirements/Challenges

a. PSLwas tasked to develop an analytics solution for network assurance, which was deployed for one of the

biggest Telecom carriers in USA.

b. The product features were:

i. Real-timemonitoring and reporting of mobile and wireline network performance, service performance

and customer experience

ii. Troubleshooting of mobile and wireline network performance, service performance and customer

experience problems

c. The input data was call data records (CDRs) and data detail records (DDRs) collected in real-time across

disperse networks and collected into a centralized server. The challenges were: collect a very large (16

billion+ records per day) data set, load it into a data warehouse, and compute and display real-time

statistics about network and subscribers on a chunk of this data set, with an end-to-end SLA of 5-mins

delay, with reports to be displayed in less than 20 seconds.

Solution

a. PSLdesigned a data pipeline tomeet above SLAs and throughput requirements.

b.

Data Collection

- As the data moved to the central server, there were configurable polling agents which

diverted the traffic to a distributed cluster running a set of java agents to process the data. The

uncompressed data volume per 5-min was ~50million records fromall networks i.e. 16 billion records/day

c.

Data enriching and raw data load

- The customer used a cluster of 40 blade servers to run the ETL

system. The enrichment processes (java agents) ran on each of the cluster servers to pre-process data

through 112 processes running in parallel, piping multiple input data files into a single one and

compressing it. These agents loaded files into dynamically created staging tables of a data warehouse

implemented on aMPPdatabase Teradata Server (v12, 4-node, 180AMPs), along with logs of what files