Data
discretization techniques can be used to reduce the number of values for a
given continuous attribute by dividing the range of the attribute into
intervals. Interval labels can then be used to replace actual data values. Replacing
numerous values of a continuous attribute by a small number of interval labels
thereby reduces and simplifies the original data. This leads to a concise,
easy-to-use, knowledge-level representation of mining results.
A
concept hierarchy for a given numerical attribute defines a discretization of
the attribute.
·
Concept hierarchies can be used to
reduce the data by collecting and replacing low-level concepts (such as
numerical values for the attribute age) with higher-level concepts (such
as youth, middle-aged, or senior).
·
Although detail is lost by such data
generalization, the generalized data may be more meaningful and easier to
interpret. This contributes to a consistent representation of data mining
results among multiple mining tasks, which is a common requirement.
·
In addition, mining on a reduced data
set requires fewer input/output operations and is more efficient than mining on
a larger, un-generalized data set.
Because of these benefits, discretization techniques and
concept hierarchies are typically applied before data mining as a
pre-processing step, rather than during mining.
Data Cube Generation
A data
cube allows data to be modeled and viewed in multiple dimensions. It is defined
by dimensions and facts.
In
general terms, dimensions are the perspectives or entities with respect to
which an organization wants to keep records. For example, a store may
create a sales data warehouse in order to keep records of the store’s
sales with respect to the dimensions time, item, branch,
and location. These dimensions allow the store to keep track of things
like monthly sales of items and the branches and locations
Each
dimension may have a table associated with it, called a dimension table, which
further describes the dimension.
A
multidimensional data model is typically organized around a central theme, like
sales, for instance. This theme is represented by a fact table. Facts
are numerical measures. Think of them as the quantities by which we want to analyze
relationships between dimensions.
The fact
table contains the names of the facts, or measures, as well as keys to
each of the related dimension tables.
Although
we usually think of cubes as 3-D geometric structures, in data warehousing the
data cube is n-dimensional.
It is efficient to first read more about data warehouse solutions provider before trusting on any of the data warehouse service providers.
ReplyDelete