Prouduct Center Prouduct Center 
                        
                     Data cleaning pretreatment your location:Home > Product center > BigData platform > Data cleaning pretreatment >  
                        Product introduction
                            Technical parameters
                            Download product information
                        Data preprocessing (data cleaning)——CDPP
CDPP(Ceresdata Data PreProcessing)
data cleaning concept
• External data source data content exists“Dirty data”,That is, the data has defects such as vacancies and noise.
• “Dirty data”Will distort the information obtained from the data, affect the operation of the data mining system, and ultimately affect the decision management
data cleaning content
• Incomplete data, erroneous data, duplicate data…
method
• Missing data
• Ignore records, use global constant padding (NULL), attribute mean padding, most likely value padding (recursive tool or decision tree induction)…
• Wrong data
• Binning, clustering method, linear regression, interpersonal joint detection…
• Duplicate data
• Detect and increase attribute values and semantics of different data sources through correlation analysis
                            CDPP(Ceresdata Data PreProcessing)
data cleaning concept
• External data source data content exists“Dirty data”,That is, the data has defects such as vacancies and noise.
• “Dirty data”Will distort the information obtained from the data, affect the operation of the data mining system, and ultimately affect the decision management
data cleaning content
• Incomplete data, erroneous data, duplicate data…
method
• Missing data
• Ignore records, use global constant padding (NULL), attribute mean padding, most likely value padding (recursive tool or decision tree induction)…
• Wrong data
• Binning, clustering method, linear regression, interpersonal joint detection…
• Duplicate data
• Detect and increase attribute values and semantics of different data sources through correlation analysis
