Distributed Parallel Database CDPD deploys physical structure across regions Key Technologies WAN-based cross-region deployment Global data table space Data local storage access, no cross-node aggregation synchronization Global metadata consistency SQL request task scheduling distribution Data parallel
Distributed Data Storage CNHC: Hadoop-based NFS Storage CeresData NFS Hadoop Connector allows Hadoop to run a single copy of the data on NFS storage: high reliability, low-cost read performance: single-node performance increased by 3 times to support data out-of-order read and write queries performance
Data preprocessing (data cleaning) CDPP CDPP (Ceresdata Data PreProcessing) Data cleaning concept External data source data content has dirty data, that is, data has vacancies, noise and other defects. Dirty data will distort the information obtained from the data, affecting data mining