登录查看更多内容

Learning some Machine Learning (while learning from it...)

?? Nir Yosha

Lover of all things security | 20+ years experience

发布日期: 2015年11月13日

Machine learning is ingrained in our day-to-day life. It is part of our spam filters mechanism, voice command smartphone interpretation and any search on Google. Chances are good that machine learning has been helping you along somewhere in you life.

But what is machine learning?

Machine learning exists at the intersection of computer science and statistics.

That might be true... but a little too deep to begin with. Let's start with some basic machine learning concepts :

Problem Definition

Before looking at machine learning models or even start with data collection, one should define well the problem needs to be solved. Remember that eventually, a computer program should look into data , measure values and predict some results. A clear problem definition will prevent using the wrong machine learning tools or data set.

Clean / Transform Data

Preparing and understanding the data set before using it is always important. It becomes critical when dealing with machine learning and big data. Each small mistake can lead to a huge impact on the expected results. More data is not always better results.

There are times when more data helps; there are times when it doesn't.

Since the sample size effects the computation resource requirement, there are times when more data helps; there are times when it doesn't.

Analyzing, preparing and cleaning the data is a big portion of what eventually results in good machine learning...

Pick up the right tool

Machine learning is a group of tools and techniques for multiple type of problems. Picking up the right tool is a essential, after the problem and existing data set is well defined. Remember, machine learning is always about estimation and making the best guess, so don't expect perfect results. There is always margin of error, noise, correlation coefficient and others... Machine learning is also about trial and error.

Here are few technical terms you should be aware of:

Supervised learning - When data points have "labels" assigned to them to "teach" the expected output per input. The algorithm will train on the labeled data and predict labels for new inputs.

Unsupervised learning - no labels are available for training the algorithm, leaving it on its own to find structure in its input.

Classification vs Regression

Classification is when the results should be 1 of n values, and each wrong prediction is equally wrong. For example, if you're trying to classify images of items, identify a cat as a house isn't any better or worse than identify a dog as a house. With classification the output variable takes class labels (in our example - house, cat, dog...).

Machine learning focuses on prediction, based on known properties learned from the training data.

Regression is used when there's some sense of distance between the values. For example, if the actual value of market stock is 150$ and you predicted it to be $149.4, that's a pretty good prediction, while $10 is a much worse prediction. With regression the output variable takes continuous values (in our example what should be the market stock value).

Clustering

Similar to classification in clustering we're trying to group the data, only that the data is not labelled before hand. Clustering look at correlation and see if the data can be divided into groups based on similarity.

Since clustering has no predefined labels, it uses unsupervised learning methods for the training period.

Dimensionality Reduction

Many modern machine learning problems take multiple dimensions of data to build predictions using many coefficients. Dimensionality Reduction simplifies data processing by mapping them into a lower-dimensional space. In many cases non-numeric values should be converted to number values before going on a dimensional reduction phase.

We should stop here... (it's getting out of control already :-)). Many smart people keep on developing machine learning algorithm and spend their entire lifetime on just studying what's already there...

Present the results / Predict

Validate

Nothing much to say about that. Whatever results you get, validate those on real "outside" data. Internal testing can't replace an actual production environment.

Simplify

In most cases machine learning "consumers" are not data scientist or have any expertise in statistics or even knowledge of the data set. The only think they want is an answer. Example of results could be "77 with 30% margin error", "10/90 ratio", "True / False" and so on. Try playing with the results and present them in simple English words vs convoluted formulas or meaningless numbers.

Improve Results

Consider where the model does not work well or what parts the model does not answer. Go back to the initial problem definition and compare it with the results. Most machine learning algorithms, can accept reinforcement and adjustments parameters, for improving the results for future predictions.

Reporting

Like most fortune-tellers know, presenting the prediction is as important as the prediction itself. Visualization is one of the best ways to present machine learning results. Interactive reports with dashboard and drill down capabilities, allow a better understanding of the results.

Before teaching us anything, machine learning should "learn". As such the problem definition, data cleanup, model usage and presentation should be well implemented. The results could be not less than amazing.

要查看或添加评论，请登录

?? Nir Yosha的更多文章

Lateral movement prevention

2019年5月16日

Lateral movement prevention
My quest for identity in a vendor turmoil

2018年11月9日

My quest for identity in a vendor turmoil

1 条评论
Adversarial machine learning

2018年8月20日

Adversarial machine learning

Remember, building robots is extremely dangerous and should not be attempted without great care. When you enter, you…
Clustering IOCs

2018年8月3日

Clustering IOCs

There is something so absolutely freeing about staring at the stars. The milky way stars can be seen without a…
Make threat intelligence actually work

2018年5月21日

Make threat intelligence actually work
Statistics and Threat intelligence

2018年1月29日

Statistics and Threat intelligence
Threat Intel Analysis of Ukrainian's Power Grid Hack

2018年1月14日

Threat Intel Analysis of Ukrainian's Power Grid Hack
Interview with ITPRO at BSides Delaware

2017年11月29日

Interview with ITPRO at BSides Delaware

3 条评论
BSides Boston - Threat Intel in Numbers

2017年5月11日

BSides Boston - Threat Intel in Numbers
Threat Intelligence in Numbers

2017年1月15日

Threat Intelligence in Numbers

As we exit 2016, I try to look at threat intelligence numbers and show how 2017 and beyond will turn threat…

3 条评论

See all articles

Learning some Machine Learning (while learning from it...)

?? Nir Yosha

Lover of all things security | 20+ years experience

Problem Definition

Clean / Transform Data

Pick up the right tool

Classification vs Regression

Clustering

Dimensionality Reduction

Present the results / Predict

Validate

Simplify

Improve Results

Reporting

?? Nir Yosha的更多文章

社区洞察

其他会员也浏览了

Challenges in Implementing Machine Learning Projects

Top 10 Machine Learning Algorithms Every Beginner Should Know!!

Machine Learning: Let’s dive into its fundamentals.

The Evolution of AI: A Journey from Data to Intelligence

Breaking Down the Buzzwords: Understanding the Basics of Machine Learning

XGBOOST CLASSIFIER ALGORITHM IN MACHINE LEARNING

A Deeper Dive into Churn Analysis with Machine Learning

Different types of Machine Learning - Part 02

Understanding the Essentials of Machine Learning: A Deep Dive into Module 1 of Tom M. Mitchell, Machine Learning Book

How does machine learning Work? Its importance in 2024

Problem Definition

Clean / Transform Data

Pick up the right tool

Classification vs Regression

Clustering

Dimensionality Reduction

Present the results / Predict

Validate

Simplify

Improve Results

Reporting

?? Nir Yosha的更多文章

Lateral movement prevention

My quest for identity in a vendor turmoil

Adversarial machine learning

Clustering IOCs

Make threat intelligence actually work

Statistics and Threat intelligence

Threat Intel Analysis of Ukrainian's Power Grid Hack

Interview with ITPRO at BSides Delaware

BSides Boston - Threat Intel in Numbers

Threat Intelligence in Numbers

社区洞察

其他会员也浏览了

Challenges in Implementing Machine Learning Projects

Top 10 Machine Learning Algorithms Every Beginner Should Know!!

Machine Learning: Let’s dive into its fundamentals.

The Evolution of AI: A Journey from Data to Intelligence

Breaking Down the Buzzwords: Understanding the Basics of Machine Learning

XGBOOST CLASSIFIER ALGORITHM IN MACHINE LEARNING

A Deeper Dive into Churn Analysis with Machine Learning

Different types of Machine Learning - Part 02

Understanding the Essentials of Machine Learning: A Deep Dive into Module 1 of Tom M. Mitchell, Machine Learning Book

How does machine learning Work? Its importance in 2024