Schemas of MDDM
The
entity-relationship data model is commonly used in the design of relational
databases, where a database schema consists of a set of entities and the
relationships between them. Such a data model is appropriate for on-line
transaction processing. A data warehouse, however, requires a concise,
subject-oriented schema that facilitates on-line data analysis.
The
most popular data model for a data warehouse is a multidimensional model. Such
a model can exist in the form of a star schema, a snowflake schema, or a fact
constellation schema. Let’s look at each of these schema types.
Star schema:
The most common modelling paradigm is the star schema, in which the data
warehouse contains
- A large central table (fact table) containing the bulk of the data, with no redundancy, and
- A set of smaller attendant tables (dimension tables), one for each dimension. The schema graph resembles a star, with the dimension tables displayed in a radial pattern around the central fact table.
Notice
that in the star schema, each dimension is represented by only one table, and
each table contains a set of attributes. This constraint may introduce some
redundancy. For example, “Ludhiana” and “Jalandhar” are both
cities in Punjab. Entries for such cities in the location dimension
table will create redundancy among the attributes.
Snowflake schema: The snowflake schema is a
variant of the star schema model, where some dimension tables are normalized,
thereby further splitting the data into additional tables. The resulting schema
graph forms a shape similar to a snowflake.
The major
difference between the snowflake and star schema models is that the dimension
tables of the snowflake model may be kept in normalized form to reduce
redundancies. Such a table is easy to maintain and saves storage space.
However, this saving of space is negligible in comparison to the typical
magnitude of the fact table. Furthermore, the snowflake structure can reduce
the effectiveness of browsing, since more joins will be needed to execute a
query. Consequently, the system performance may be adversely impacted. Hence,
although the snowflake schema reduces redundancy, it is not as popular as the
star schema in data warehouse design.
A
snowflake schema for sales is given in this figure. Here, the sales fact
table is identical to that of the star schema. The main difference between the
two schemas is in the definition of dimension tables. The single dimension
table for item in the star schema is normalized in the snowflake schema,
resulting in new item and supplier tables.
For
example, the item dimension table now contains the attributes item
key, item name, brand, type, and supplier key, where supplier key
is linked to the supplier dimension table, containing supplier
key and supplier type information. Similarly, the single dimension
table for location in the star schema can be normalized into two new
tables: location and city. The city key in the new location
table links to the city dimension.
Fact constellation:
Sophisticated applications may require multiple fact tables to share dimension
tables. This kind of schema can be viewed as a collection of stars, and hence
is called a galaxy schema or a fact constellation.
A
fact constellation schema is shown in this figure. This schema specifies two
fact tables, sales and shipping. The sales table
definition is identical to that of the star schema. The shipping table
has five dimensions, or keys: item key, time key, shipper key, from location,
and to location, and two measures: cost and units shipped.
A fact constellation schema allows dimension tables to be shared between fact
tables. For example, the dimensions tables for time, item, and location
are shared between both the sales and shipping fact tables.
In
data warehousing, there is a distinction between a data warehouse and a data
mart. A data warehouse collects information about subjects that span the entire
organization, such as customers, items, sales, assets, and personnel,
and thus its scope is enterprise-wide. For a data warehouse, the fact
constellation schema is commonly used, since it can model multiple,
interrelated subjects.
A
data mart, on the other hand, is a department subset of the data warehouse that
focuses on selected subjects, and thus its scope is department wide. For
data marts, the star or snowflake schema is commonly used, since
both are geared toward modelling single subjects, although the star schema is
more popular and efficient.
Measures: Their Categorization and
Computation
Note
that a multidimensional point in the data cube space can be defined by a
set of dimension-value pairs, for example, time = “Q1”, location =
“Vancouver”, item = “computer”. A data
cube measure is a numerical function that can be evaluated to a particular
point in the data cube space.
Measures
can be organized into three categories (i.e., distributive, algebraic,
holistic)
Distributive:
An aggregate function is distributive if it can be computed in a
distributed manner. ie. Suppose the data are partitioned into n sets. We
apply the function to each partition, resulting in n aggregate values.
If the result derived by applying the function to the n aggregate values
is the same as that derived by applying the function to the entire data set
(without partitioning), the function can be computed in a distributed manner.
For example, count() can be computed for a data cube by first partitioning the
cube into a set of subcubes, computing count() for each subcube, and then
summing up the counts obtained for each subcube. Hence, count() is a
distributive aggregate function. For the same reason, sum(), min(), and max() are
distributive aggregate functions. A measure is distributive if it is
obtained by applying a distributive aggregate function.
Algebraic:
An aggregate function is algebraic if it can be computed by an algebraic
function with M arguments (where M is a bounded positive
integer), each of which is obtained by applying a distributive aggregate
function. For example, avg() (average) can be computed by sum()/count(), where
both sum() and count() are distributive
aggregate
functions. Similarly, it can be shown that min N() and max N() (which find the N
minimum and N maximum values, respectively, in a given set) and standard
deviation() are algebraic aggregate functions. A measure is algebraic if
it is obtained by applying an algebraic aggregate function.
Holistic:
An aggregate function is holistic if there is no constant bound on the
storage size needed to describe a sub aggregate. That is, there does not exist
an algebraic function with M arguments (where M is a constant)
that characterizes the computation. Common examples of holistic functions
include median(), mode(), and rank(). A measure is holistic if it is
obtained by applying a holistic aggregate function.
I am sure trees and berms would minimize noise and light issues for adjacent property owners as well as limits on hours of peak operation. The biggest issue with this use at this location is how/where the associated truck traffic gets in and out of the property on their way to and from 33 and/or 22. secure warehouse storage
ReplyDeleteWow, What a Excellent post. I really found this to much informatics. It is what i was searching for.I would like to suggest you that please keep sharing such type of info.Thanks underground mining definition
ReplyDeleteMendixenables you to develop powerful, high-quality mobile and Web business applications easily, and maintain them at scale. Mendix was founded in 2009 by a team of enterprise software veterans.
ReplyDeleteIT's very informative blog and useful article thank you for sharing with us , keep posting learn more about Product engineering services | Product engineering solutions.
ReplyDeletenice information ...................!
ReplyDeletemicro strategy certification training