

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 78
www.persistent.com
to what is anticipated in production with mix of concurrent users accessing different types of reports with
varying query workloads or resource needs. For example - for a Retail data warehouse, the workload could be
– 5 Regional managers looking at inventory and sales snapshots of aggregated data, week-on-week
performance while 25 category managers looking at detailed sales records for the last day and inventory
forecast needs for next week.
5. D
evelop a plan to conduct ongoing performance monitoring regularly.
Collect system and database
statistics to understand how system usage and performance may be changing over time to access if future
data growth can be handled with the hardware or software deployed
6.
In general, prefer processing as close to the database as opposed to techniques outside of the
database.
At the same time, prefer using high-level tools for faster development cycles. The sweet spot are
those tools that are high level, that contain the functionality you need, and that generate efficient code that runs
close to the data.
7.3.1.2 Specific solutions
1. ETL andBI Performance - Real-time insights for Network &Subscriber Intelligence
—
Requirements/Challenges
a. PSLwas tasked to develop an analytics solution for network assurance, which was deployed for one of the
biggest Telecom carriers in USA.
b. The product features were:
i. Real-timemonitoring and reporting of mobile and wireline network performance, service performance
and customer experience
ii. Troubleshooting of mobile and wireline network performance, service performance and customer
experience problems
c. The input data was call data records (CDRs) and data detail records (DDRs) collected in real-time across
disperse networks and collected into a centralized server. The challenges were: collect a very large (16
billion+ records per day) data set, load it into a data warehouse, and compute and display real-time
statistics about network and subscribers on a chunk of this data set, with an end-to-end SLA of 5-mins
delay, with reports to be displayed in less than 20 seconds.
—
Solution
a. PSLdesigned a data pipeline tomeet above SLAs and throughput requirements.
b.
Data Collection
- As the data moved to the central server, there were configurable polling agents which
diverted the traffic to a distributed cluster running a set of java agents to process the data. The
uncompressed data volume per 5-min was ~50million records fromall networks i.e. 16 billion records/day
c.
Data enriching and raw data load
- The customer used a cluster of 40 blade servers to run the ETL
system. The enrichment processes (java agents) ran on each of the cluster servers to pre-process data
through 112 processes running in parallel, piping multiple input data files into a single one and
compressing it. These agents loaded files into dynamically created staging tables of a data warehouse
implemented on aMPPdatabase Teradata Server (v12, 4-node, 180AMPs), along with logs of what files