How Microsoft Is Using Machine Learning To Secure Its Software Development Cycle !!
Anushka Visapure
Solution-Oriented DevOps Engineer || Skilled in Kubernetes | Terraform | Ansible | Docker | Git and GitHub | GitHub Action || Expanding Capabilities in AWS | GCP | Linux.
What Is Machine Learning?
Machine learning is the concept that a computer program can learn and adapt to new data without human intervention. Machine learning is a field of artificial intelligence (AI) that keeps a computer’s built-in algorithms current regardless of changes in the worldwide economy.
Key Takeaways:
1.Machine learning is an area of artificial intelligence (AI) with a concept that a computer program can learn and adapt to new data without human intervention.
2.A complex algorithm or source code is built into a computer that allows for the machine to identify data and build predictions around the data that it identifies.
3.Machine learning is useful in parsing the immense amount of information that is consistently and readily available in the world to assist in decision making.
4.Machine learning can be applied in a variety of areas, such as in investing, advertising, lending, organizing news ,fraud detection and more.
Some machine learning methods:
Machine learning algorithms are often categorized as supervised or unsupervised.
Supervised machine learning algorithms can apply what has been learned in the past to new data using labeled examples to predict future events. Starting from the analysis of a known training dataset, the learning algorithm produces an inferred function to make predictions about the output values. The system is able to provide targets for any new input after sufficient training. The learning algorithm can also compare its output with the correct, intended output and find errors in order to modify the model accordingly.
In contrast, unsupervised machine learning algorithms are used when the information used to train is neither classified nor labeled. Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. The system doesn’t figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structures from unlabeled data.
Semi-supervised machine learning algorithms fall somewhere in between supervised and unsupervised learning, since they use both labeled and unlabeled data for training – typically a small amount of labeled data and a large amount of unlabeled data. The systems that use this method are able to considerably improve learning accuracy. Usually, semi-supervised learning is chosen when the acquired labeled data requires skilled and relevant resources in order to train it / learn from it. Otherwise, acquiring unlabeled data generally doesn’t require additional resources.
Reinforcement machine learning algorithms is a learning method that interacts with its environment by producing actions and discovers errors or rewards. Trial and error search and delayed reward are the most relevant characteristics of reinforcement learning. This method allows machines and software agents to automatically determine the ideal behavior within a specific context in order to maximize its performance. Simple reward feedback is required for the agent to learn which action is best; this is known as the reinforcement signal.
Machine learning enables analysis of massive quantities of data. While it generally delivers faster, more accurate results in order to identify profitable opportunities or dangerous risks, it may also require additional time and resources to train it properly. Combining machine learning with AI and cognitive technologies can make it even more effective in processing large volumes of information.
What Is Artificial Intelligence (AI)?
Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. The term may also be applied to any machine that exhibits traits associated with a human mind such as learning and problem-solving.
The ideal characteristic of artificial intelligence is its ability to rationalize and take actions that have the best chance of achieving a specific goal.
Algorithms often play a very important part in the structure of artificial intelligence, where simple algorithms are used in simple applications, while more complex ones help frame strong artificial intelligence.
Key Takeaways:
1.Artificial intelligence refers to the simulation of human intelligence in machines.
2.The goals of artificial intelligence include learning, reasoning, and perception.
3. AI is being used across different industries including finance and healthcare.
4.Weak AI tends to be simple and single-task oriented, while strong AI carries on tasks that are more complex and human-like.
AI is a bigger concept to create intelligent machines that can simulate human thinking capability and behavior, whereas, machine learning is an application or subset of AI that allows machines to learn from data without being programmed explicitly.
The Artificial intelligence system does not require to be pre-programmed, instead of that, they use such algorithms which can work with their own intelligence. It involves machine learning algorithms such as Reinforcement learning algorithm and deep learning neural networks. AI is being used in multiple places such as Siri, Google?s AlphaGo, AI in Chess playing, etc.
Types of Artificial Intelligence:
Artificial Intelligence can be divided in various types, there are mainly two types of main categorization which are based on capabilities and based on functionally of AI. Following is flow diagram which explain the types of AI.
AI type-1: Based on Capabilities
1. Weak AI or Narrow AI:
- Narrow AI is a type of AI which is able to perform a dedicated task with intelligence.The most common and currently available AI is Narrow AI in the world of Artificial Intelligence.
- Narrow AI cannot perform beyond its field or limitations, as it is only trained for one specific task. Hence it is also termed as weak AI. Narrow AI can fail in unpredictable ways if it goes beyond its limits.
- Apple Siriis a good example of Narrow AI, but it operates with a limited pre-defined range of functions
- IBM's Watson supercomputer also comes under Narrow AI, as it uses an Expert system approach combined with Machine learning and natural language processing.
- Some Examples of Narrow AI are playing chess, purchasing suggestions on e-commerce site, self-driving cars, speech recognition, and image recognition.
2. General AI:
- General AI is a type of intelligence which could perform any intellectual task with efficiency like a human.
- The idea behind the general AI to make such a system which could be smarter and think like a human by its own.
- Currently, there is no such system exist which could come under general AI and can perform any task as perfect as a human.
- The worldwide researchers are now focused on developing machines with General AI.
- As systems with general AI are still under research, and it will take lots of efforts and time to develop such systems.
3. Super AI:
- Super AI is a level of Intelligence of Systems at which machines could surpass human intelligence, and can perform any task better than human with cognitive properties. It is an outcome of general AI.
- Some key characteristics of strong AI include capability include the ability to think, to reason,solve the puzzle, make judgments, plan, learn, and communicate by its own.
- Super AI is still a hypothetical concept of Artificial Intelligence. Development of such systems in real is still world changing task.
Artificial Intelligence type-2:
Based on functionality
1. Reactive Machines
- Purely reactive machines are the most basic types of Artificial Intelligence.
- Such AI systems do not store memories or past experiences for future actions.
- These machines only focus on current scenarios and react on it as per possible best action.
- IBM's Deep Blue system is an example of reactive machines.
- Google's AlphaGo is also an example of reactive machines.
2. Limited Memory
- Limited memory machines can store past experiences or some data for a short period of time.
- These machines can use stored data for a limited time period only.
- Self-driving cars are one of the best examples of Limited Memory systems. These cars can store recent speed of nearby cars, the distance of other cars, speed limit, and other information to navigate the road.
3. Theory of Mind
- Theory of Mind AI should understand the human emotions, people, beliefs, and be able to interact socially like humans.
- This type of AI machines are still not developed, but researchers are making lots of efforts and improvement for developing such AI machines.
4. Self-Awareness
- Self-awareness AI is the future of Artificial Intelligence. These machines will be super intelligent, and will have their own consciousness, sentiments, and self-awareness.
- These machines will be smarter than human mind.
- Self-Awareness AI does not exist in reality still and it is a hypothetical concept.
Machine Learning Case Studies – Power that is beyond imagination!
Machine Learning is hyped as the “next big thing” and is being put into practice by most of the businesses. It has also achieved a prominent role in areas of computer science such as information retrieval, database consistency, and spam detection to be a part of businesses.
How Microsoft Is Using Machine Learning To Secure Its Software Development Cycle
Tech giant Microsoft recently built a machine learning classification system which aims to secure the software development lifecycle. The machine learning system helps in classifying bugs as security or non-security and critical or non-critical. This provides a level of accuracy, akin to that provided by security experts.
The software developers at Microsoft address several issues and vulnerabilities. More than 45,000 developers generate nearly 30,000 bugs per month, which gets stored across 100+ AzureDevOps and GitHub repositories. The tech giant is looking to mitigate these vulnerabilities.
Since 2001, the tech giant has collected 13 million work items and bugs. According to sources, Microsoft spends an estimated $150,000 per issue as a whole to mitigate bugs and vulnerabilities.
However, according to the developers, since there are more than 45,000 developers already working to address the problem, applying more human resources to better label and prioritise the bugs is not possible.
To build the machine learning model, the tech giant used 13 Million work items and bugs to train the model which they had collected for two decades. They stated, “We used that data to develop a process and machine learning model that correctly distinguishes between security and non-security bugs 99% of the time, and accurately identifies the critical, high priority security bugs 97% of the time.”
Behind The Classification System
As large volumes of semi-curated data are adequate for machine learning tasks, the data science and security teams at the tech giant came together to build the supervised machine learning system.
In supervised learning, a machine learning model learns how to classify data from pre-labelled data. The ML developers at the tech giant fed the model with a large number of bugs, which are organised into labelled security, and others that are not labelled security.
To make the machine learning classification system perform like a security expert, the training data was initially approved by the security experts before it was fed to the machine learning model.
To build a machine learning model that yields maximum accuracy, the developers followed an approach in action, which includes five processes:
- Data collection: For data collection, the developer identified all the data types and sources and evaluated its quality
- Data curation and approval: In this approach, the security expert reviewed the data and confirmed that the labels are correct
- Modelling and evaluation: In this approach, a data modelling technique is selected, the model is trained, and the performance is evaluated
- Evaluation of model in production: In this approach, the security experts evaluated the model in production by monitoring the average number of bugs and manually reviewing a random sampling of bugs
- Automated re-training: The developers then conducted automated re-training to make sure that the bug modelling system keeps the right pace with the ever-evolving products at Microsoft
How It Works
The ML developers used statistical sampling to provide security experts with a manageable amount of data to review. To classify bugs accurately, they used a two-step machine learning model operation.
The first step for the machine learning model is to learn how to classify security and non-security bugs. In the second step, the machine learning model applied severity labels such as critical, important, and low-impact to the security bugs.
Wrapping Up
Applying this machine learning classification system, the developers can now accurately classify which work items are security bugs 99% of the time. The model also shows 97% accuracy rate when it comes to labelling critical and non-critical security bugs.
The developers stated, “This level of accuracy gives us the confidence that we are catching more security vulnerabilities before they are exploited.” They added, “In the coming months, we will open-source our methodology to GitHub.”
THANK YOU FOR READING !!!
DevOps @Forescout ?? | Google Developer Expert | AWS | DevOps | 3X GCP | 1X Azure | 1X Terraform | Ansible | Kubernetes | SRE | Platform | Jenkins | Tech Blogger ??
4 年Nice article Anushka Visapure
DevOps, Cloud & Performance Engineer| DevOps Engineer
4 年Good work Anushka Visapure
Java|| Python||Linux and Networking||Hadoop ||Ansible || Kubernetes|| Jenkins|| AWS ||Docker||DSA
4 年Well done ??Anushka Visapure