登录查看更多内容

Data Mining at a Glance

Anand kumar

Ex - Thomson Reuters | Ex - ValueLabs Techno-Functional Leader | Driving Innovation & Digital Transformation | Expert in Bridging Technology & Business for Scalable Solutions | Empowering Teams for Excellence

发布日期: 2018年9月30日

What is data mining - By definition : Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems or Data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis.

Data mining is defined as a process used to extract usable data from a larger set of any raw data. It infers analyzing data patterns in large batches of data using one or more techniques. It has a great impact in multiple fields like future healthcare, market business analysis, education, manufacturing engineering, customer relationship management, fraud detection, Intrusion detection, Lie detection, financial banking, customer segmentation, research analysis, criminal investigation, bioinformatics, and many other fields. By applying the approach a business can learn more about their customers and develop more effective strategies related to various business functions and in turn leverage resources in a more optimal and insightful manner.

Data mining is also be called as KDD – Knowledge discovery in the database.database.

There are a few steps involved in Data Mining, like

Data cleaning removes noise and inconsistency from the core data
Data integrate steps collects data from various sources
Data selection is a key process to identify the relevant data selection from the available data
Data transformation step will normalize the data, from categorical data to numerical data Eg: converting a yes/no to 0 or 1
Data mining process: In this process, a pattern is applied based on the purpose
Knowledge presentation is the final stage of a data mining

Data Science will have 3 major segments like Data Architecture, Machine Learning, and Analytics.

Entire Data mining can be considered as supervised and unsupervised learning.

Supervised Learning: Supervising the model, like teach a model by training to predict the outcome

Classification: Classifying labeled data and
Regression: Predicting trends using previous label data

Unsupervised Learning: We do not supervise the model, model work on its own to discover information. Uses machine learn Algorithms that draw a conclusion on unlabeled data. It has more difficult algorithms

Clustering: Finding patterns and grouping unlabeled data

For simple understanding: Supervised Learning will have labeled data and unsupervised learning will have no labeled data

Few top Data mining techniques/algorithms:

Decision Trees: A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility

Random Forest: Or random decision forests are an ensemble learning method for classification, regression, and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.

Association Rule mining: Association rule mining is a procedure which aims to observe frequently occurring patterns, correlations, or associations from datasets found in various kinds of databases such as relational databases, transnational databases, and other forms of repositories.

Linear Regression: is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables)

K Means Clusters: k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining.

Na?ve Bayes: for probability and predictions based on data: n machine learning, naive Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong independence assumptions between the features.

Neural Networks: for clustering and classification: Neural networks are one of the learning algorithms used within machine learning. They consist of different layers for analyzing and learning data. ... Neural Networks learn and attribute weights to the connections between the different neurons each time the network processes data

Hope this will helped/helps you to understand the buzz of "Data mining or data science or machine learning". Though it looks simple as I continued reading it's a lot more complex and interesting to learn.

Note: You can visit Dataaspirant - Data Science Portal for beginners. for more learning

要查看或添加评论，请登录

Anand kumar的更多文章

Micromanagement - Good Or Bad?

2020年9月25日

Micromanagement - Good Or Bad?

The word which hurts many teams in the corporate world is “micromanagement”. I’m sure many of you too heard this word…

3 条评论
Change to Lead

2020年8月24日

Change to Lead

Lot of times we talk about change, and everyone has a different meaning to it. The “Change” which we always discuss…

2 条评论
Don't change the Perspective

2020年5月27日

Don't change the Perspective

Always heard that Learning is important - but how many do really retrospect this in daily life or Did you ever? Many…

3 条评论
Learn Optimism in Life

2020年4月28日

Learn Optimism in Life

Wiki Says "Optimism is a mental attitude reflecting a belief or hope that the outcome of some specific endeavor, or…

2 条评论
The Rise of The Planet of Computers: Artificial Intelligence

2018年9月22日

The Rise of The Planet of Computers: Artificial Intelligence

Yesterday my son asked, “How Alexa knows the answers?” I thought it’s a tough one to explain but the next question was…

1 条评论

See all articles

Data Mining at a Glance

Anand kumar

Ex - Thomson Reuters | Ex - ValueLabs Techno-Functional Leader | Driving Innovation & Digital Transformation | Expert in Bridging Technology & Business for Scalable Solutions | Empowering Teams for Excellence

Anand kumar的更多文章

社区洞察

其他会员也浏览了

Clustering: Unveiling Patterns and Relationships in Unlabeled Data

Linear Regression

Data Mining in Clinical Trials

CLUSTER ANALYSIS

Remembering the Birth of Data Mining and Predictive Analytics: A Look Back at the Origins of Modern Insights

Data Mining vs. Big Data: Understanding Their Roles and Synergy with AI & ML

Research Leaders on Data Science, Big Data key trends, top papers

The Key Ingredients for Game-Changing Business Intelligence (BI) from Unstructured Textual Data

Modern Data Mining and Its Processes in the Corporate World

Data Science Mastery: Maximizing Predictive Accuracy in Vehicle Manufacturing with Advanced Machine Learning Techniques

Anand kumar的更多文章

Micromanagement - Good Or Bad?

Change to Lead

Don't change the Perspective

Learn Optimism in Life

The Rise of The Planet of Computers: Artificial Intelligence

社区洞察

其他会员也浏览了

Clustering: Unveiling Patterns and Relationships in Unlabeled Data

Linear Regression

Data Mining in Clinical Trials

CLUSTER ANALYSIS

Remembering the Birth of Data Mining and Predictive Analytics: A Look Back at the Origins of Modern Insights

Data Mining vs. Big Data: Understanding Their Roles and Synergy with AI & ML

Research Leaders on Data Science, Big Data key trends, top papers

The Key Ingredients for Game-Changing Business Intelligence (BI) from Unstructured Textual Data

Modern Data Mining and Its Processes in the Corporate World

Data Science Mastery: Maximizing Predictive Accuracy in Vehicle Manufacturing with Advanced Machine Learning Techniques