Data mining refers to identifying various types of information (not disputed a priori) through extrapolation targeted by large databases, single or multiple (in the second case, a more accurate statement is obtained by crossing the data of individual banks).
The techniques and strategies applied to data mining operations are primarily automated, consisting of specific software and algorithms suitable for a single purpose. To date, in particular, neural networks, decision trees, clustering and association analysis are used. The purposes of data mining applied to the most varied fields: economic, scientific, operational, etc.
To fully understand what data mining is, beyond the technical definitions, however accurate, it may be helpful to start from its purposes, providing some examples. Let’s take the following questions:
The answer to these questions, or part of them, can be contained in the databases. The problem is that it is unintelligibly so. No one, today, could handle big data in good times, that is, the vast and heterogeneous masses of data contained in data warehouses.
This is where data mining comes into play, which manages to find associations, anomalies and recurring patterns (patterns), therefore ultimately information, within them. But above all, thanks to the high parallelism of the computing resources used (alongside highly specialized operators), it manages to do so with an efficiency that far exceeds that of a human operator who analyzed them manually.
In short, data mining ensures that starting from “cryptic” information, disseminated without apparent order in a database (textual, multimedia, mixed data, etc.), we arrive at knowledge that can be used for various purposes. The whole process is called KDD (an acronym for Knowledge Discovery in Databases), and in reality, it does not end with the actual data mining procedure.
The KDD sequence has several steps, the main ones being:
The main tasks for data mining are:
Depending on the goal, the tools for data mining can change. Not infrequently, then, the various methods can be integrated. A neural network is a particular program that traces the functioning of a biological neural network in some respects. This program is equipped with instructions and a learning algorithm that allows it to evolve with experience, expanding its ability to solve certain types of problems.
A supervised learning neural network is trained by providing inputs (problems) and outputs (solutions). By detecting the associations, it learns to produce correct results autonomously. An unsupervised learning neural network instead is trained only with inputs consisting of selected types of data. By examining them, the network learns to grasp similarities and differences, making classifications. Thanks to the high parallel computing capacity, these two categories of neural networks can profitably and efficiently process big data, carrying out types, associations and clustering.
A decision tree is a graph in which, starting from the root (training set), classification is carried out through a path that is each time a choice between various branches, or subsets (called nodes), whose branches are the alternatives leading to the different leaves (results or classes). A correctly implemented decision tree must have adequate dimensions, which means not excessive: too many variables would make an algorithm that is fast and efficient chaotic and slow. Decision trees are used for segmentation, classification, regression, and time-series operations in data mining.
The fields of application of data mining are innumerable but can be grouped into some macro-categories. The main ones are:
In the vast field of marketing, the main applications of data mining concern:
In the financial sphere, data mining applies, among other things, to:
Also, as regards the scientific field, data mining is used in an endless number of sectors, assuming particular relevance in:
The downside of data mining is the potential privacy-violating effect it holds. Take, for example, the careful segmentation of a target consumer for marketing purposes. It is one of the achievements of data mining, but the side result is that profiling highlights the individual’s characteristics without being aware of them. Nor, therefore, without his having given his consent.
The two sides of the coin cannot be separated. Put, the more you know about an individual, the better you can push them towards a particular purchase. Therefore, this knowledge process is articulated in a 360 ° observation, ranging from purchasing habits to information on the patrimonial situation, from the psychology of the individual to sexual practices, from the discovery of ethnicity to that of religious belief, and so on. Everything is helpful for marketing purposes.
Also Read: What Is Machine Learning?