Table of Contents Table of Contents
Previous Page  79 / 96 Next Page
Information
Show Menu
Previous Page 79 / 96 Next Page
Page Background

W H I T E P A P E R

© 2017 Persistent Systems Ltd. All rights reserved. 79

www.persistent.com

were loaded, how many bytes, number of rows etc. PSL used Teradata's FASTLOAD (bulk data loading)

utility to process this large amount of data.

d.

Data Processing and target load

- On the Teradata server, an E-L-T approach was followed, which ran

multiple stored procedures to create aggregation tables per 5-min window to cut down on the data volume.

The stored procedures also handled late-arriving data and re-build of aggregate tables. A Kimball-style

dimensional data model was followed along with multi-level partitioning techniques and indexing to

speed-up reporting queries. Aggregate tables were created with various grains (hourly, daily, with storage

up to 365 days). PSL achieved fresh data load rates of 1.5 to 2 TB/hour with this configuration. The data

warehouse size was 90 TB/week and 5-min aggregation data was expired every week with daily/weekly

aggregates stored which were expired after 365 days.

e.

Data Visualizations

- SAP Business Objects (3.x, 4.x) was used with metadata model created along with

aggregate-awareness feature enabled.Aggregated join indexes were used to further speed-up queries by

reducing the IO on database side. Dashboards were refreshed every 5-mins and the response time for

most reports were in 10-15 seconds range.

2. ETL andBI Performance –WorkforceAnalytics

Requirements/Challenges

a. PSLwas tasked to develop a multi-tenant analytics HR workforce solution including a Time &Attendance,

and a Scheduling and absence management products for customers from multiple industry verticals like

Healthcare, Retail, andManufacturing.

b. The product features were:

i.

Real-time visibility into workforce absence and time/attendance tracking

ii.

Guided discovery to find outliers, patterns, trends, and relationships

iii. Labor-cost optimization

c. The input data was time/attendance and scheduling data coming from both a MS SQL Server and an

Oracle-based OLTP system (multiple DB versions to be supported), and the target could be MS SQL

Server or Oracle (again, multiple DB versions to be supported). The solution was hosted onAmazonAWS

public cloud or could also be deployed on premise.

d. The largest data volume handled was 2.5 TB for a retailer in USA

e. The ETL performance requirement was to process the incremental data in near real-time with a

configurable delay (2 to 5-minutes) and load the historical data for past 2 years within 4 hours. There were

a lot of ETL calculations and business rules involved, which slowed down the end-to-end ETL process.

There was also a need to show reports to be sliced-diced by 45 pre-defined time-period ranges and

custom groupings. Response-time performance refreshing reports was very critical (< 10 sec) on the web

as well as onmobile devices (Tablets, iPhones).

Solution

a. PSLused date/timestamp-based strategy for change data capture on the data source tables.

b. PSL developed the ETL in Talend Enterprise Studio (5.5) with E-L-T approach to push the queries to the

database to execute queries natively and thus save time.