Data Mining Process

Processes serve the purpose of providing people with a clear step-by-step procedures to accomplish a task. In many ways a process serves as a shortcut in solving a problem. As data mining is a complex situation with an endless number of problems there have been developed several processes for completing a data mining project. In this post we will look at the Cross-Industry Standard Process for Data Mining (CRISP-DM).

CRISP-DM

The CRISP-DM is an iterative process that has the following steps…

  1. Organizational understanding
  2. Data understanding
  3. Data preparation
  4. Modeling
  5. Evaluation
  6. Deployment

We will look at each step briefly

  1. Organizational Understanding

Step 1 involves assessing the current goals of the organization and the current context. This information is then used to in deciding goals or research questions for data mining. Data mining needs to be done with a sense of purpose and not just to see what’s out there. Organizational understanding is similar to the introduction section of a research paper in which you often include the problem, questions, and even the intended audience of the research

2. Data Understanding

Once a purpose and questions have been developed for data mining, it is necessary to determine what it will take to answer the questions. Specifically, the data scientist assess the data requirements, description, collection, and assesses data quality. In many ways, data understanding is similar to methodology of a standard research paper in which you assess how you will answer the research questions.

It is particularly common to go back and forth between steps one and two. Organizational understanding influences data understanding which influences data under standing.

3. Data Preparation

Data preparation involves cleaning the data. Another term for this is data mugging. This is the main part of an analysis in data mining. Often the data comes in a very messy way with information spread all over the place and incoherently. This requires the researcher to carefully deal with this problem.

4. Modeling

A model provides a numerical explanation of something in the data. How this is done depends on the type of analysis that is used. As you develop various models you are arriving at various answers to your research questions. It is common to move back and forth between step 3 and 4 as the preparation affects the modeling and the type of modeling you may want to develop may influence data preparation. The results of this step can also be seen as being similar to the results section of an empirical paper.

5. Evaluation

Evaluation is about comparing the results of the study with the original questions. In many ways it is about determining the appropriateness of the answers to the research questions. This experience leads to ideas for additional research. As such, this step is similar to the discussion section of a research paper.

6. Deployment

The last step is when the results are actually used for decision-making or action. If the results indicate that a company should target people under 25 then this is what they do as an example.

Conclusion

The CRISP-DM process is a useful way to begin the data mining experience. The primary goal of data mining is providing evidence for making decisions and or taking action. This end goal has shaped the development of a clear process for taking action.

Advertisements

One thought on “Data Mining Process

  1. Pingback: Data Mining Process | Education and Research | ...

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s