

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 50
www.persistent.com
—
Multidimensional schema to NoSQL - Column Family.
The basis for the logical model in NoSQL systems is
unchanged. However, the physical model for column family based systems like Cassandra and HBase will
differ. With these systems, usually a single table for the Fact will contain a de-normalized structure. The
Dimensions attributes become part of column families of the Fact table. This is the way to achieve join-based
filtering for these databases. In addition to being present in the de-normalized structure, the dimensions are
also separately retained as tables. Performance of equality driven queries on dimensions and range based
queries on dimensions need to be carefully evaluated.
This approach works well in case of few dimensions.
A NoSQL system is applicable in multiple business
domains like network access, perimeter/physical security, and data related to audio-video analysis.
—
Multidimensional schema to NoSQL - Document Oriented.
In case of document oriented NoSQL
databases, new “dimensional attributes” (who, why, what, where, when) get added as JSON elements, to the
main “document” fact table. The “document” fact table stores the “how” (many, much, often). We don’t see
document databases playing the role of a data warehouse system, as often.
5.4.5 Self-Service and Agility
Self Service Dimension Modeling and ETL seem a little distant currently. Some of the difficult problems in this field
relate to inferring a relationship, and then cascading maintenance of updates or deletes automatically. Tracking
history and propagating changes, or merging keys related to entities (and their associated measures and results) are
all difficult to achieve. In section
we show an example of the limitations of today’s self-service data preparation
6.3.4tools when it comes to maintaining relationships across datasets (e.g., referential integrity). When data is deleted or
updated, the relationships with other datasets typically need to bemaintained by hand.
Agile software development methods can be applied to dimensional modeling in the following areas:
—
During Sprint Planning.
The Bus Matrix is an effective framework to help come up with the backlog and
plan the sprints.
Clusters of the common dimensions (X axis of the bus matrix), or dimensions in order of
importance (in terms of priority/criticality of business processes) could be a guiding principle for selecting
activities per sprint. Make sure you involve business users. In addition, experience has shown that the first
iteration should deliver the maximum number of key dimensions and facts. This ensures that the designs for
the data model and BI semantic layer are validated with actual data and reports early, minimizing refactoring
later in the project.
—
During Development.
Agile BI/DW & ETL development can take place provided the Data Modeler, Data
Architect and ETL Architect all work in separate sprints
, with the modeler working at least 1-2 sprints in
advance of the ETL Architect. Some of this is because the Modeling activity involves a lot of discussions and
meetings, it is not always possible to complete this activity in the first few days of the sprint, and it is better to
dedicate a full sprint advantage to modeling. The Data Architect must (in conjunction with the Data Modeler)
design the physical schema and carry out some scripting/tests to validate the structures. The ETL Architect
can then take over in the following sprint.
—
After Deployment:
—
In our experience, the work carried out after deployment largely relates to the performance tuning of the
presentation layer to address issues related to the BI layer.
—
Other than the performance, the addition of newer attributes is the other big set of changes, post
deployment.
—
Major requirements about historization or changes to the bus matrix can be driven the same way in a
series of development sprints.