Data mining is the process of selecting, exploring, and modeling large amounts of data for previously unknown and non-typical data patterns. Methodology for extracting data from sources includes five stages of sampling, exploration, modification, modeling and evaluation.
The sampling phase is desirable if the analysis data is too large for the time of responsible processing or if it is desirable to avoid circularization problems by dividing the data into a different set of modern construction and validating the model. Removal and modification refers to data review to enhance understanding and data conversion. The next stage is modeling where the actual data is analyzed using traditional statistical methods as well as non-traditional statistical methods such as neural networks and the third resolution. In the end, the intended phase of the models and results of the data mining model is intended using a general standard.
Data mining contains three basic tools, the first one description and one visualization that understand the data set and discover its hidden patterns. The second tool is aggregation and clustering, the combination is the identification of variables that go hand in hand and assembly objects are grouped in such a way that objects belonging to the same group are on the same level and targets that belong to different asymmetric groups. The final data mining tool is the classification and estimation, classification predicts the target variable classified in nature, in contrast to the estimate that refers to the production predicts the target variable of a metric variable in nature