Data Mining

Data Mining

Data mining, also known as knowledge discovery in data (KDD) is most commonly defined as?the process to search large sets of data for patterns and trends, turning those findings into business insights and predictions. ?Data mining goes beyond the search process, as it uses data to evaluate future probabilities and develop actionable analyses.

Phases of Data Mining

  • Business Understanding
  • Data Analysis
  • Data Acquisition
  • Data Cleansing
  • Data Preparation
  • Data Modelling
  • Data Transformation
  • Data Classification
  • Data Forecasting
  • Data Reporting


No alt text provided for this image

Data Mining Process Models

Cross-industry standard process (CRISP) is a reliable and secure data mining model that offers a well organized method for the process of mining the data.

SEMMA (Sample, Explore, Modify, Model, Assess) developed by SAS Institute which allows users to apply visual and exploratory techniques which are used to select and transform the predicted variables and construct models using these variables.

Data Mining Challenges

No alt text provided for this image

Mining various types of knowledge in databases - The requirements of different users differ. Different types of knowledge may pique the interest of different users. As a result, data mining must cover a wide range of knowledge discovery tasks.

Interactive knowledge mining at multiple levels of abstraction - Because it allows users to focus on searching for patterns, providing and refining data mining requests based on returned results, the data mining process must be interactive.

Background Knowledge - This can be used to express discovered patterns not only in concise terms but at multiple levels of abstraction to guide the discovery process and express discovered patterns.

Ad-hoc data mining and data mining query languages - A data mining query language that allows users to describe ad-hoc mining tasks should be integrated with a data warehouse query language and optimized for efficient and flexible data mining.

Data mining results presentation and visualization - Once patterns are identified, they must be expressed in high-level languages and visual representations. Users should be able to easily understand these representations.

Handling noisy or incomplete data - Data cleaning methods that can handle noise and incomplete objects while mining data regularities are required. Without data cleaning methods, the accuracy of discovered patterns will be low.

Pattern evaluation - This refers to the problem's interest. The discovered patterns should be interesting because they either represent common knowledge or a lack of novelty.

Data Mining Tools

  • IBM SPSS
  • Amazon EMR
  • SAS
  • Oracle Data Mining
  • KNIME
  • Rapid Miner
  • Orange
  • Qlik View
  • SSDT

要查看或添加评论,请登录

社区洞察

其他会员也浏览了