Data cleaning pretreatment your location:Home > Product center > BigData platform > Data cleaning pretreatment >
Product introduction
Technical parameters
Download product information
Data preprocessing (data cleaning)——CDPP
CDPP(Ceresdata Data PreProcessing)
data cleaning concept
• External data source data content exists“Dirty data”,That is, the data has defects such as vacancies and noise.
• “Dirty data”Will distort the information obtained from the data, affect the operation of the data mining system, and ultimately affect the decision management
data cleaning content
• Incomplete data, erroneous data, duplicate data…
method
• Missing data
• Ignore records, use global constant padding (NULL), attribute mean padding, most likely value padding (recursive tool or decision tree induction)…
• Wrong data
• Binning, clustering method, linear regression, interpersonal joint detection…
• Duplicate data
• Detect and increase attribute values and semantics of different data sources through correlation analysis
CDPP(Ceresdata Data PreProcessing)
data cleaning concept
• External data source data content exists“Dirty data”,That is, the data has defects such as vacancies and noise.
• “Dirty data”Will distort the information obtained from the data, affect the operation of the data mining system, and ultimately affect the decision management
data cleaning content
• Incomplete data, erroneous data, duplicate data…
method
• Missing data
• Ignore records, use global constant padding (NULL), attribute mean padding, most likely value padding (recursive tool or decision tree induction)…
• Wrong data
• Binning, clustering method, linear regression, interpersonal joint detection…
• Duplicate data
• Detect and increase attribute values and semantics of different data sources through correlation analysis