

W H I T E P A P E R
© 2017 Persistent Systems Ltd. All rights reserved. 92
www.persistent.com
Analytic application
. Prebuilt data access application that contains powerful analysis algorithms based on domain expertise
(e.g. data mining algorithms), in addition to normal database queries.
BI applications.
The value-add analytics within the DW/BI system, they include the entire range of data access methods, from ad-
hoc queries, to standardized reports, to analytic applications.
Customer 360.
An application allowing to combine customer data from the various (external) touch points that a customer may use
to contact an organization and the internal data sources that that trace which products they purchase, how they receive service and
support, etc., giving a complete picture of how they interact with the organization.
Data Silos.
An enterprise has data silos when data is stored redundantly by an area of the organization, with each area mandating
its own policies and processes. This leads to inconsistent data definitions, formats and data values, which makes it very hard to
understand and use key business entities that are common across these silos. The first, classical version of an area generating a
data silo corresponded to a local facility or a department within an enterprise. Then, ERP systems were introduced to help alleviate
this problem (among several others). However, ERPs only deal with internal company data, and provide only partial management
of customer data or supplier data: that is done by other packaged applications such as CRMs and SRMs do this. These generate
today's modern version of data silos.
MPP Databases
. The MPP acronym stands for “Massively Parallel Processing”. These databases can be best described as
providing a SQL interface and a relational database management system (RDBMS) running on a cluster of servers networked
together by a high-speed interconnect, where the clusters form a Shared-Nothing architecture: i.e., each system has its own CPU,
memory and disk which they don’t share to any other server in the cluster. Through the database software and high-speed
interconnects, the system functions as a whole and can scale as new servers are added to the cluster (this form of extending
capacity is known as scale-out). This approach is used by MPP database systems like Teradata, Greenplum, Vertica, Netezza,
ParAccel, and others. Why do MPP databases working on a shared nothing cluster work well for data warehouses? For mainly two
reasons:
1. Relational queries are ideally suited to parallel execution; they are decomposed into uniform (relational algebra)
operations applied to uniform streams of data. By partitioning data across disk storage units attached directly to each
processor, an operator can often be split into many independent operators each working on a part of the data. This
partitioned data and execution gives partitioned parallelism. Each operator produces a new relation, so the operators
can be composed into highly parallel dataflow graphs. By streaming the output of one operator into the input of another
operator, the two operators can work in series giving pipelined parallelism.
2. Shared nothing architectures scale well up to hundreds and even thousands of processors that do not interfere with one
another.Aswe will see below, this does not happen with single machines with parallel processors (SMPs), where there is
an interference effect.
A very good introduction to subject is the classic 1992 paper from David Dewitt and Jim Gray [24], from where we took these two
paragraphs above, and which is still strikingly relevant.
OLAP, OLAP database, or engine
. OLAP stands for “Online Analytical Processing” and is a set of principles that provide a
framework for answering multi-dimensional, analytical queries. AOLAP database or engine is one that organizes data natively per
a dimensional model, in cubes (as opposed to relational tables) where data (measures) are categorized by dimensions. OLAP
cubes are often pre- aggregated across dimensions to answer multi-dimensional, analytical queries swiftly and predictably.
SMPDatabases
. Traditional databases work well on small to medium database sizes (up to a few tens of terabytes) on Symmetric
Multi-Processing (SMP) machines, which are tightly coupled multiprocessor systems where processors can run in parallel, are
connected using a common bus, are managed by a single operating system, and share I/O devices and memory resources. SMPs
are rather of the scale-up sort, where additional capacity is obtained by getting a bigger machine. These days, SMPs come with 4
up to 64 processors.