Data
warehouses and OLAP tools are based on a multidimensional data model. This
model views data in the form of a data cube. A data cube allows data to
be modelled and viewed in multiple dimensions. It is defined by dimensions and facts.
Dimensions
are the perspectives or entities with respect to which can organization wants
to keep records. For example, a sales data warehouse may keep records of
a store’s sales with respect to the dimensions time, item, branch,
and location. These dimensions allow the store to keep track of things
like monthly sales of items and the branches and locations at which the items
were sold. Each dimension may have a table associated with it, called a dimension table, which further
describes the dimension.
Facts are
numerical measures. These are the quantities by which we want to analyze
relationships between dimensions. A multidimensional data model is typically
organized around a central theme, like sales, for instance. This theme
is represented by a fact table.
Example:
A
2-D view of sales data according to the dimensions time and item,
where the sales are from branches located in the city of Vancouver. The measure
displayed is dollars sold (in thousands).
A
3-D view of sales data, according to the dimensions time, item,
and location. The measure displayed is dollars sold (in
thousands).
A
3-D data cube representation of the data in above table, according to the
dimensions time, item, and location. The measure displayed
is dollars sold (in thousands).
A 4-D data cube
representation of sales data, according to the dimensions time, item,
location, and supplier. The measure displayed is dollars sold (in
thousands). For improved readability, only some of the cube values are shown.
We
may display any n-D data as a series of (n - 1) - D “cubes.” The
data cube is a metaphor for multidimensional data storage. The actual physical
storage of such data may differ from its logical representation. The important
thing to remember is that data cubes are n-dimensional and do not
confine data to 3-D.
Lattice of
cuboids, making up a 4-D data cube for the dimensions time, item,
location, and supplier. Each cuboid represents a different degree
of summarization.
Given a set of
dimensions, we can generate a cuboid for each of the possible subsets of the
given dimensions. The result would form a lattice of cuboids, each
showing the data at a different level of summarization, or group by. The
lattice of cuboids is then referred to as a data cube.
The
cuboid that holds the lowest level of summarization is called the base cuboid.
The 0-D cuboid, which holds the highest level of summarization, is called the apex
cuboid. In our example, this is the total sales, or dollars sold, summarized over
all four dimensions. The apex cuboid is typically denoted by all.
No comments:
Post a Comment