Data Mining at a Glance
Anand kumar
Ex - Thomson Reuters | Ex - ValueLabs Techno-Functional Leader | Driving Innovation & Digital Transformation | Expert in Bridging Technology & Business for Scalable Solutions | Empowering Teams for Excellence
What is data mining - By definition : Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems or Data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis.
Data mining is defined as a process used to extract usable data from a larger set of any raw data. It infers analyzing data patterns in large batches of data using one or more techniques. It has a great impact in multiple fields like future healthcare, market business analysis, education, manufacturing engineering, customer relationship management, fraud detection, Intrusion detection, Lie detection, financial banking, customer segmentation, research analysis, criminal investigation, bioinformatics, and many other fields. By applying the approach a business can learn more about their customers and develop more effective strategies related to various business functions and in turn leverage resources in a more optimal and insightful manner.
Data mining is also be called as KDD – Knowledge discovery in the database.database.
There are a few steps involved in Data Mining, like
- Data cleaning removes noise and inconsistency from the core data
- Data integrate steps collects data from various sources
- Data selection is a key process to identify the relevant data selection from the available data
- Data transformation step will normalize the data, from categorical data to numerical data Eg: converting a yes/no to 0 or 1
- Data mining process: In this process, a pattern is applied based on the purpose
- Knowledge presentation is the final stage of a data mining
Data Science will have 3 major segments like Data Architecture, Machine Learning, and Analytics.
Entire Data mining can be considered as supervised and unsupervised learning.
Supervised Learning: Supervising the model, like teach a model by training to predict the outcome
- Classification: Classifying labeled data and
- Regression: Predicting trends using previous label data
Unsupervised Learning: We do not supervise the model, model work on its own to discover information. Uses machine learn Algorithms that draw a conclusion on unlabeled data. It has more difficult algorithms
- Clustering: Finding patterns and grouping unlabeled data
For simple understanding: Supervised Learning will have labeled data and unsupervised learning will have no labeled data
Few top Data mining techniques/algorithms:
Decision Trees: A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility
Random Forest: Or random decision forests are an ensemble learning method for classification, regression, and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.
Association Rule mining: Association rule mining is a procedure which aims to observe frequently occurring patterns, correlations, or associations from datasets found in various kinds of databases such as relational databases, transnational databases, and other forms of repositories.
Linear Regression: is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables)
K Means Clusters: k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining.
Na?ve Bayes: for probability and predictions based on data: n machine learning, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong independence assumptions between the features.
Neural Networks: for clustering and classification: Neural networks are one of the learning algorithms used within machine learning. They consist of different layers for analyzing and learning data. ... Neural Networks learn and attribute weights to the connections between the different neurons each time the network processes data
Hope this will helped/helps you to understand the buzz of "Data mining or data science or machine learning". Though it looks simple as I continued reading it's a lot more complex and interesting to learn.
Note: You can visit Dataaspirant - Data Science Portal for beginners. for more learning