Data Mining
Mohammad Rafi Aamiri
Cloud Technical Architect | Customer Connect - Rubellite Level | Health Coach
Data mining, also known as knowledge discovery in data (KDD) is most commonly defined as?the process to search large sets of data for patterns and trends, turning those findings into business insights and predictions. ?Data mining goes beyond the search process, as it uses data to evaluate future probabilities and develop actionable analyses.
Phases of Data Mining
Data Mining Process Models
Cross-industry standard process (CRISP) is a reliable and secure data mining model that offers a well organized method for the process of mining the data.
SEMMA (Sample, Explore, Modify, Model, Assess) developed by SAS Institute which allows users to apply visual and exploratory techniques which are used to select and transform the predicted variables and construct models using these variables.
Data Mining Challenges
领英推荐
Mining various types of knowledge in databases - The requirements of different users differ. Different types of knowledge may pique the interest of different users. As a result, data mining must cover a wide range of knowledge discovery tasks.
Interactive knowledge mining at multiple levels of abstraction - Because it allows users to focus on searching for patterns, providing and refining data mining requests based on returned results, the data mining process must be interactive.
Background Knowledge - This can be used to express discovered patterns not only in concise terms but at multiple levels of abstraction to guide the discovery process and express discovered patterns.
Ad-hoc data mining and data mining query languages - A data mining query language that allows users to describe ad-hoc mining tasks should be integrated with a data warehouse query language and optimized for efficient and flexible data mining.
Data mining results presentation and visualization - Once patterns are identified, they must be expressed in high-level languages and visual representations. Users should be able to easily understand these representations.
Handling noisy or incomplete data - Data cleaning methods that can handle noise and incomplete objects while mining data regularities are required. Without data cleaning methods, the accuracy of discovered patterns will be low.
Pattern evaluation - This refers to the problem's interest. The discovered patterns should be interesting because they either represent common knowledge or a lack of novelty.
Data Mining Tools