

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 79
www.persistent.com
were loaded, how many bytes, number of rows etc. PSL used Teradata's FASTLOAD (bulk data loading)
utility to process this large amount of data.
d.
Data Processing and target load
- On the Teradata server, an E-L-T approach was followed, which ran
multiple stored procedures to create aggregation tables per 5-min window to cut down on the data volume.
The stored procedures also handled late-arriving data and re-build of aggregate tables. A Kimball-style
dimensional data model was followed along with multi-level partitioning techniques and indexing to
speed-up reporting queries. Aggregate tables were created with various grains (hourly, daily, with storage
up to 365 days). PSL achieved fresh data load rates of 1.5 to 2 TB/hour with this configuration. The data
warehouse size was 90 TB/week and 5-min aggregation data was expired every week with daily/weekly
aggregates stored which were expired after 365 days.
e.
Data Visualizations
- SAP Business Objects (3.x, 4.x) was used with metadata model created along with
aggregate-awareness feature enabled.Aggregated join indexes were used to further speed-up queries by
reducing the IO on database side. Dashboards were refreshed every 5-mins and the response time for
most reports were in 10-15 seconds range.
2. ETL andBI Performance –WorkforceAnalytics
—
Requirements/Challenges
a. PSLwas tasked to develop a multi-tenant analytics HR workforce solution including a Time &Attendance,
and a Scheduling and absence management products for customers from multiple industry verticals like
Healthcare, Retail, andManufacturing.
b. The product features were:
i.
Real-time visibility into workforce absence and time/attendance tracking
ii.
Guided discovery to find outliers, patterns, trends, and relationships
iii. Labor-cost optimization
c. The input data was time/attendance and scheduling data coming from both a MS SQL Server and an
Oracle-based OLTP system (multiple DB versions to be supported), and the target could be MS SQL
Server or Oracle (again, multiple DB versions to be supported). The solution was hosted onAmazonAWS
public cloud or could also be deployed on premise.
d. The largest data volume handled was 2.5 TB for a retailer in USA
e. The ETL performance requirement was to process the incremental data in near real-time with a
configurable delay (2 to 5-minutes) and load the historical data for past 2 years within 4 hours. There were
a lot of ETL calculations and business rules involved, which slowed down the end-to-end ETL process.
There was also a need to show reports to be sliced-diced by 45 pre-defined time-period ranges and
custom groupings. Response-time performance refreshing reports was very critical (< 10 sec) on the web
as well as onmobile devices (Tablets, iPhones).
—
Solution
a. PSLused date/timestamp-based strategy for change data capture on the data source tables.
b. PSL developed the ETL in Talend Enterprise Studio (5.5) with E-L-T approach to push the queries to the
database to execute queries natively and thus save time.