The industrial revolution 4.0 has driven rapid development in the field of data collection and storage technology in various fields. This resulted in an overly large database as a result of its development. However, the data collected is rarely seen anymore, because it is too long, boring, and not interesting. Often, decisions based on data are made no longer based on data, but from the intuition of decision makers. Therefore, the birth of data mining techniques was born.
Other terms that are often used include knowledge discovery (mining) in databases (KDD), knowledge extraction, data / pattern analysis, archeology data, data dredging, information harvesting, and business intelligence.
Data mining is the process of taking patterns on data using statistical techniques, mathematics, artificial intelligence, machine learning that will be processed. The results of this process are very important information. data is a value, variable, or sentence that is obtained based on facts that have not been received and is processed, so every time we do something in data retrieval, the data is stored and data patterns that will be accessed manually so that we can find things that are will occur. Therefore data mining is used as data analysis. Because now it is not possible to do data analysis without using automation from extracting data. The contributing factor is because there is too much data, the dimensionality of the data is too large, and also for the data is too complex to be analyzed manually.
The results of data mining are often integrated with decision support systems (DSS). For example, in business applications information generated by data mining can be integrated with product campaign management tools so that effective marketing promotions are implemented and can be tested. Such integration requires a post processing step that ensures that only valid and useful results will be combined with DSS. One of the jobs and postprocessing is visualization that enables analysts to explore data and data mining results from various perspectives. Statistical measures and hypothesis testing methods can be used during post processing to get rid of false data mining results.
The ability of data mining in processing a large database makes data mining has benefits in automating the process of finding predictive information in a large database, identifying every pattern in the database as detailed as possible, and discovering hidden important characteristics makes data mining useful for making critical decisions in strategy.
Data mining has a very useful function as a database processor. These functions include association, classification, clustering, forecasting, and sequencing. Association, which identifies the relationship between events that occur at a time, such as the contents of a shopping basket. Classification, which concludes the definitions of the characteristics of a group. Example: company customers who have moved to another company’s competition. Clustering, which identifying groups of goods or products that have special characteristics. Forecasting that estimates future value based on patterns with large data sets, such as forecasting market demand. And the last, sequencing, that identifies different relationships over a certain period of time, such as customers who visit supermarkets repeatedly.
Data mining memiliki 6 fase yang bertahap, sebagai berikut :
1. Data cleaning
Data cleaning is the process of removing noise and inconsistent or irrelevant data. In general, the data obtained, both from a company’s database and the results of experiments, have imperfect entries such as missing data, invalid data or just typos. In addition, there are also data attributes that are not relevant to the data mining hypothesis that they have. Irrelevant data is also better removed. Data cleaning will also affect the performance of data mining techniques because the data handled will reduce the amount and complexity.
2. Data integration
Data integration is the merging of data from various databases into one new database. Not infrequently the data needed for data mining not only comes from one database but also comes from several databases or text files. Data integration is carried out on attributes that identify unique entities such as name, product type, customer number and other attributes. Data integration needs to be done carefully because errors in data integration can result in distorted results and even misleading action taking later. For example if the integration of data based on product types turns out to be combining products from different categories, it will get a correlation between products that actually do not exist.
3. Data Selection
The data in the database are often not all used, therefore only the data that is suitable for analysis will be retrieved from the database. For example, a case that examines the propensity for people to buy in the case of market basket analysis, does not need to take the customer’s name, just the customer id.
4. Data transformation
Data is changed or merged into a format suitable for processing in data mining. Some data mining methods require special data formats before they can be applied. For example some standard methods such as association analysis and clustering can only accept categorical data input. Therefore data in the form of numerical numbers that continue to be divided into several intervals. This process is often called data transformation.
5. Mining process,
Data Mining, this stage is the most important stage, using techniques applied to extract potential patterns that are useful for finding valuable and hidden knowledge from data.
6. Pattern evaluation,
To identify interesting patterns into knowledge based that are found. In this stage the results of data mining techniques in the form of distinctive patterns and prediction models are evaluated to assess whether the existing hypotheses have indeed been reached. If it turns out that the results obtained do not fit the hypothesis there are several alternatives that can be taken such as making feedback to improve the data mining process, try other data mining methods that are more appropriate, or accept these results as an unexpected result that might be useful.
7. Presentation of knowledge (knowledge presentation),
Is a visualization and presentation of knowledge about the methods used to obtain knowledge obtained by users. The last stage of the data mining process is how to formulate a decision or action from the analysis results obtained. There are times when this should involve people who don’t understand data mining. Therefore the presentation of data mining results in the form of knowledge that can be understood by everyone is a stage that is needed in the data mining process. In this presentation, visualization can also help communicate the results of data mining