DWDM \ Knowledge Discovery in Databases
Knowledge Discovery in Databases is used to create knowledge either from structured or unstructured data sources or both.
Data Preprocessing
Data Cleaning |
Missing Data, Noisy Data |
Data Reduction |
1. Attribute Subset Selection(Attributes).
2. Numerosity Reduction(Storage).
3. Dimensionality Reduction(compression).
|
Data Transformation |
1. Smoothing(Remove Noise).
2. Aggregation(summary).
3. Generalization(Low level to high level).
4. Normalization(Scaling).
5. Attribute Construction(Create Attributes) |
Data Cleaning (Missing Data, Noisy Data)
Missing Data |
1. Ignore the tuples. 2. Fill the Missing values with attribute mean / probable value. |
Noisy Data |
1. Binning Method(Create Bins) Here complete data is divided into segments called as bins on those bins , data is smoothed by means of some methods.
2. Regression(Function) Here data is smoothed by fitting it to a regression function. The regression used may be linear (having one independent variable) or multiple (having multiple independent variables).
3. Clustering(Groups) Here Data is grouped into clussters and then outliers are detected.
|
Note:
Missing / Noisy can be generated due to faulty data collection, data entry errors etc.
Data Reduction
Technique |
Handle |
Working |
Data Mining |
huge amount of
data |
Hard with huge
data |
Data Reduction |
Use it in data
mining |
Easy to work with
less data in additional it increase the storage efficiency , reduce data storage
and analysis costs |
Data Reduction Steps
S. No |
Data Reduction Steps |
Details |
1 |
Attribute Subset Selection |
Except highly relevant attributes discarded all |
2 |
Numerosity Reduction |
Store model of data not data |
3 |
Dimensionality Reduction |
Reduce data size using encoding mechanisms (lossy or lossless). After
decompression if the original data retrieved called as lossless reduction
otherwise it is called as lossy reduction. Methods of dimensionality reduction
are:Wavelet transforms and Principal Componenet Analysis(PCA). |
Data Transformation in Data Mining
Data Transformation is processes of converting data are transformed from one format to another format which is more appropriate for data mining.
Data Transformation Strategies
Data Transformation Strategy Name |
Details |
Smoothing |
It is a process of removing noise from the data. |
Aggregation |
Here summary or aggregation operations are applied to the data. |
Generalization |
Here low-level data are replaced with high-level data by using concept
hierarchies climbing. |
Normalization |
Normalization scaled attribute data so as to fall within a small specified
range, such as 0.0 to 1.0. |
Attribute Construction |
Here new attributes are constructed from the given set of attributes. |
|