Definition and fundamentals of data mining. What is it & how does it work?

Understanding and applying data mining:
how it works and the benefits and challenges

Geschrieben von
Thomas
October 16, 2020
-
10 min reading time
[Data Mining Definition] Was ist Data Mining? Headerbild
Inhaltsverzeichnis
  1. Data Mining Definition and Application Examples
  2. How does Data Mining work? Methods
  3. The Data Mining Process
  4. Advantages and problems of data mining
  5. Conclusion
Webinar
Sie haben Interesse an diesem Thema?
Besuchen Sie unser nächstes Event:
Digitale Transformation: Strategie und Umsetzung
Wo:
Online Webinar
Wann:
24. November 2020, 10.00 Uhr
Jetzt anmelden

Big data, business intelligence, data mining and many other similar terms have been the talk of the town for some time. But what exactly does the term data mining actually mean? In this article, you will learn about the benefits and challenges of data mining, what it can achieve and how it is typically used in projects.

What is data mining? (Definition)

Data mining refers to the systematic and computer-aided application of statistical algorithms in order to recognise correlations, patterns, trends and connections in very large data sets (big data/large data sets) and in a highly automated manner. The results are then transferred into usable data structures and made available for further processing.

In a narrower sense, data mining involves the analysis of the "knowledge discovery in databases" process, which is aimed at identifying new relationships in existing data sets. In practice, however, these terms are often used interchangeably to describe not only the actual analysis but also the preparation of the data (e.g. via warehousing/data warehouses), as well as the evaluation and interpretation of the results.

Data mining is a branch of business intelligence (BI) and is also closely linked to predictive analytics, i.e. the prediction of future situations based on existing data.

Application examples

Data mining is mainly used to analyse existing data sets, to recognise patterns and to make decisions based on the results.

The aim is to make practical predictions about the future, to recognise emerging trends early, to confirm or disprove assumptions about correlations and to improve business processes.

Specific use cases include determining the creditworthiness of customers, calculating available credit limits, discovering purchase patterns and trends (shopping basket analysis such as "product Y is often bought together with product X"), evaluating the connection between diseases and the effectiveness of treatments in drug development, or detecting fraud, for example based on the patterns of credit card transactions.

You want more information about Process Mining?

Jetzt anfragen

How does data mining work?

Depending on the application and the task at hand, data mining software tools employ different algorithms, machine learning and AI to extract information from the data. In practice, a distinction is made between the following mining methods, each of which pursues a specific goal:

Data mining methods

Detection of outliers/anomalies:

This method is aimed at detecting unusual data sets, such as outliers or data errors that require further investigation. Where possible, data errors or unusable anomalies that would impair the results are then hidden during further analysis. In some cases, however, it is precisely these outliers that need to be identified (e.g. when detecting fraud).

Cluster analysis/clustering:

In cluster analysis, the aim is to group data records on the basis of their similarities without knowing the underlying data structures/relying on any known structures.

Classification:

Classification means the allocation of data to certain higher-level classes, e.g. the classification of emails as spam or the division of customers into risk groups according to their creditworthiness.

Association rule learning:

Association rule learning is used to identify connections and dependencies in the data. One such example is the classic shopping basket analysis, i.e. identifying which products are often purchased in combination with another.

Regression analysis:

The purpose of regression analysis is to identify relationships between data sets, such as the influence of price and customer purchasing power on sales volume.

The data mining process (explanation of the process)

As a rule, the data mining process is based on the so-called cross-industry standard process for data mining (CRISP-DM), which a number of well-known industrial companies developed in the framework of an EU-funded project. The aim was to create a standardised process model for data mining that could be used to search and analyse any data stock.

The process model is based on six phases, some of which have to be run several times:

Data Mining Definiton: Data Mining Prozess (Prozessdiagramm)
Abb. 1: Data Mining Definition: Das Prozessdiagramm stellt die Beziehung zwischen den verschiedenen Phasen des CRISP-DMa dar. Illustration von Kenneth Jensen,basierend auf IBN SPSS Modeler CRISP-DM Guide [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Phase 1: business understanding

Phase 1 involves the definition of the objectives and business requirements, in order to determine the specific goals and how they are to be achieved.

Phase 2: data understanding (selection of relevant data)

Once the objectives and the procedure have been determined, the existing data can be analysed. In addition, this phase includes an examination of the data quality and an assessment of whether the quality is sufficient for the stated objectives. Should this not be the case, the objectives and requirements may need to be revised.

Phase 3: data preparation

As soon as the objectives and the data are available, the data can be prepared for evaluation. Data preparation is usually the phase that takes the most time.

Phase 4: modelling (selection and application of methods)

Based on the prepared data, one or more data models can be created by selecting and applying one or more data mining methods. During the modelling phase it often becomes apparent that the preparation of the data needs to be adapted in order to apply the selected methods.

Phase 5: evaluation (assessment and interpretation of events)

After the data models have been created, they are evaluated to determine whether the stated objectives have been achieved. Either the most suitable model is selected or – if the results prove unsatisfactory – phase 1 is repeated to revise the objectives and requirements.

Phase 6: deployment (making the results available)

At the end of the process, the findings are processed and made available in a suitable format.

Are your business processes inefficient?

Together we will find out!
Jetzt anfragen

The benefits and drawbacks of using data mining

Benefits

Decision-making:

The evaluation of the data as well as the correlations and insights that have been obtained can be used to discover trends, predict future developments and thus support the management in making decisions.

Efficiency gains:

Efficient analysis of large amounts of data and the information extracted from them can be used to gain a competitive advantage, while the detection of process errors and issues leads to cost reductions.

Business process improvements:

Data mining can be used to confirm or disprove assumptions about problems in business processes and to uncover process weaknesses. Over the years, this has given rise to the special field of process mining, which focuses specifically on the analysis and optimisation of business processes.

Problems and challenges

Highly qualified data mining experts are required:

Having powerful tools is one thing – using them properly is another. In order to obtain valuable and accurate results with data mining, it is essential that the relevant software is operated by specialists who need to understand the source data to be able to prepare them correctly. Similarly, they also need to be able to assess whether the patterns, connections, interrelationships and results provided by the software are generally accurate and relevant.

Poor data quality:

As with all evaluation methods, the quality of the data is a decisive prerequisite for obtaining valid results. Any error or incomplete data set inevitably leads to a deterioration in the quality of the results, which at worst may prove entirely false. Relying on such poor results in sequence may result in the wrong decisions.

Privacy & security:

The collection of large amounts of data inevitably comes with privacy and security risks. The data sets may contain a lot of user-related data that should not be used or linked to one another. On the other hand, the process also creates opportunities for identifying security risks and breaches and subsequently remedying them.

Conclusion

For companies, data mining can bring about a significant improvement in their operations. The can gain new insights by evaluating the growing volumes of data they collect. Many software products already incorporate BI and thus also data mining, and yet companies worldwide often use them without harnessing the full potential for improvement that they offer. The data mining trend is set to continue, especially in connection with process mining, as it gives companies the opportunity to dramatically optimise their business processes and achieve enormous cost savings.

You want to uncover weak points in your processes?

We help you to
Victoria Heidenstedt
Sales Manager
Jetzt anfragen

Schlagwort Wiki

Erklärung wichtiger Begriffe

Weitere Artikel

Newsletter abonnieren

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Halvotc Digital Experts Logo
Bitte aktualisieren Sie Ihren Browser.
Um die beste User Experience mit neuen Web-Technologien zu gewährleisten, bitten wir Sie, einen aktuellen Browser zu verwenden.
Die nachstehende Auflistung bietet Ihnen eine Übersicht & direkte Links zur Downloadseite:

Google Chrome:

Firefox:

Zum Download

Microsoft Edge:

Zum Download