WP_Data Management Best Practices

W H I T E P A P E R

www.persistent.com

—

Aggregations: relational or OLAP?

Aggregates can have a very significant effect on performance, in some

cases speeding queries by a factor of 100 or even 1000. However, deciding what to aggregate is not

necessarily easy. User input is critical, as well a means to monitor slow running queries. And, as mentioned in

chapter 4, the choice between deploying relational aggregation tables or OLAP cubes is not easy.

OLAP cube technology have the following advantages over relational:

—

They have far superior analytic capabilities. There are several aspects to this:

—

The definition of calculations, both calculated members and measures, defined on robust query

languages such as MDX, is very powerful.

—

The management of hierarchies is also advanced: for instance, parent-child, ragged dimensional

hierarchies are traversed and used to aggregate measures with powerful and elegant semantics

(whereas this is painful to do in SQL, as mentioned at the beginning of this section).

—

The management of aggregation rules for semi-additive measures, such as inventory balances, and

non-additive measures, such as ratios, is much easier to specify (in relational this must be dealt

directly in the aggregation code).

—

Cubes deliver better and more predictable performance than relational aggregations, with less tuning

effort. With relational implementations, determining the optimal set of aggregations to implement is a

difficult problem. OLAP databases provide more assistance than relational technologies in this area

–even though it is still true that the performance and scalability of OLAP databases varies widely from one

product to another.

—

As mentioned already, security is more powerful on OLAPas it meshes well with parent-child semantics.

The list of disadvantages includes:

—

OLAP is non-standard, non-portable technology, and development expertise is scarcer as compared with SQL

and fragmented by vendor.

—

A Type 1 SCD change to a dimension may cause all the cubes using that dimension to rebuild, and this may

take a very long time.Aggregates defined on a Type 1 dimension attribute should be allowed only under special

circumstances (e.g., updates are rare, cubes depending on it very few and not too large).

Reasons why relational DBMSs won’t be entirely replaced by OLAPengines in a DW include:

—

Loading OLAP cubes from non-dimensional data schemas is difficult, as most OLAP systems do not directly

address the thorny problems of conforming dimension and fact data, data cleansing and management of

constraints such as referential integrity. For these reasons, Kimball recommends OLAP supported by

underlying dimensional relational schemas.

—

When the system is very large, detailed level transactional fact tables are kept in relational models and

aggregate data is stored in OLAP cubes, because OLAP vendors may have size limits not present in relational

implementations (although OLAPdatabases can scale tomultiple terabytes these days).

—

Finally, relational technology is more mature for data management aspects such as backup/restore and

archiving.