登录查看更多内容

Using Machine Learning to Build Better Machine Learning: How Tech Giants are Relying on AutoML to Build Better Neural Network Architectures

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

发布日期: 2019年1月16日

Automated machine learning(AutoML) is becoming one of the most popular topics in modern data science applications. Often, people see AutoML as a mechanism to use out-of-the-box machine learning models without the need of sophisticated data science knowledge. While theoretically, this argument makes sense the reality if a bit different. In the current stage of artificial intelligence(AI), most real world applications require some level of machine learning knowledge. The scenarios that you can solve with a vanilla API like the Watson Developer Cloud or Microsoft Cognitive Services are very basic and represent only a small percentage of the broader spectrum of machine learning scenarios. If that’s the case, then we should wonder what’s the real value of AutoML.

In our experience at Invector Labs, the best way to think about AutoML is as a helper or enabler of machine learning applications that complements the skill of a data scientists. More specifically, we believe the most viable case for AutoML in the real world is to help data scientists to select and architect the right machine learning model for a given problem.

The Challenge

Model selection is one of the most difficult aspects of building a machine learning solutions. Somewhat ironically, for all the science and match baked into machine learning applications, model selection remains a highly subjective tasks that relies on the opinions of experts. For any given scenario, the number of machine learning models that can solve it is incredibly large so how can we really know if we are using the most optimal model for the job? Even worse, even if we have selected the right machine learning technique, how can we be sure we have the right neural network architecture in place? And once we settled on a specific architecture, how can we know the correct hyperparameter configurations? Those questions hunt data scientists throughout the entire lifecycle of a machine learning application. Furthermore, the more accuracy is needed for a machine learning problem the more time is spent in the model selection process.

Not surprisingly, the process of selecting and architecting a machine learning model is an extremely time-consuming exercise that never delivers an exact answer. Paradoxically, this is the type of problem in which machine learning excels so can we get creative and model the process of selecting a machine learning architecture as machine learning problem in and out itself?

AutoML to the Rescue

Model search is one of the use cases that seems like a perfect fit for AutoML. Given a dataset, a series of optimization metrics and some constraints in terms of time or resources, AutoML methods should be able to evaluate tens of thousands of neural network architectures and produce an optimal result. While effective data science teams might be able to evaluate a dozen models for a given problem, an AutoML method can quickly search through tens of thousands of architectures in a relatively manageable time.

Using machine learning to build better machine learning models seems like something out of an IronMan movie ?? Is this really happening in the real world? Absolutely! Here are three of my favorite high-profile case studies for AutoML in mission critical applications.

Salesforce.com TransmogrifAI: The Brain Behind Einstein

Salesforce.com’s Einstein is one of the most widely adopted machine learning applications worldwide. Ultimately, Einstein solves a series of machine learning scenarios such as sales forecasting or lead prioritization which are omnipresent in sales and marketing applications. However, what makes Einstein unique is the fact that it machine learning models are able to operate across completely diverse Salesforce.com configuration in a self-service manner. Each customer might have completely different sales and marketing schemas and yet Einstein can still do the job.

The magic behind Salesforce’s Einstein is powered by an open source framework called TransmogrifAI. Conceptually, TransmogrifAI is an AutoML based framework for creating machine learning models against structure datasets(rows and columns). More specifically, TransmogrifAI leverages AutoML is five fundamental areas of a machine learning workflow:

· Feature Inference: Extracting features from a given datasets.

· Transmogrification: Transforming features into numeric values.

· Feature Validation: Reduce dimensions, identify potential bias, etc.

· Model Selection: Conducting search across thousands of potential models.

· Hyperparameter Optimization: Tuning the hyperparameter configuration.

Given the Salesforce.com footprint, TransmogrifAI might be considered one of the largest AutoML applications in the world.

Azure ML: Helping Developers Select the Right Machine Learning Model

Last year, Microsoft Research conducted an experiment of leveraging AutoML and probabilistic programming for automating model selection. The results were captured in a very popular research paper and represented a breakthrough from the performance. In a matter of months, the AutoML approached pioneered by the Microsoft Research team were implemented in Microsoft’s hallmark machine learning product: Azure ML.

The latest release of Azure ML leverage AutoML to streamline model selection. The platform includes an AutoML service that regularly recommends new machine learning pipelines to evaluate given a specific problem. The execution of the pipeline is done in the customer’s Azure ML instance while the AutoML service only sees the results and uses them to make better recommendations.

The AutoML implementation in the Azure ML stack is one of the most complete I’ve ever seen. The current version supports classification and regression ML model recommendation on numeric and text data, with support for automatic feature generation (including missing values imputations, encoding, normalizations and heuristics-based features), feature transformations and selection. Developers can use AutoML through the Python SDK or via Jupyter Notebooks.

Waymo: Automated Model Selection for Self-Driving Vehicles

A self-driving vehicle is something like a big group of machine learning models on four wheels ??. Machine learning enable all the intelligent features of self-driving vehicles like helping cars see their surroundings, make sense of the world, predict how others will behave, and decide their next best move. Alphabet’s subsidiary Waymo is at the forefront of self-driving vehicle technologies and, as a result, is constantly innovating in the machine learning.

Recently, the Waymo engineering team published a detailed blog post of how they are leveraging AutoML to automate model selection on different machine learning applications. Specifically, the Waymo team leverages an AutoML technique known as NAS cells which has proven to be very effective in image analysis algorithms.

At Waymo, AutoML is used to explore hundreds of different NAS cell combinations within a convolutional net architecture (CNN), training and evaluating models for Waymo’s LiDAR segmentation task. The experiment has been producing CNN architectures that are performing with 20–30% less latency and 8–10% lower error rate that hand-crafted models.

As you can see from these examples, AutoML is gaining a position as one of the important elements of highly scalable machine learning architectures. The tools and frameworks for leveraging AutoML in model searching are getting better and are becoming available to mainstream developers. While there are other great use cases for AutoML, model selection remains the one driving the biggest benefits in the context of real world machine learning applications.

要查看或添加评论，请登录

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

2024年2月28日

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Last year, I had the unique opportunity to incubate a new project in the autonomous agents space, alongside a…

1 条评论
Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

2020年5月27日

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Natural language generation(NLG) is one of the fastest growing areas of research in deep learning. NLG applications are…
Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

2020年5月25日

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Training models with massive datasets is becoming the norm in modern deep learning applications. Some of the latest…
Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

2020年5月18日

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Rapid experimentation is a key element of modern software development. The raise in popularity of machine learning, has…
Uber Unveils Its New Data Quality Management Solution

2020年5月13日

Uber Unveils Its New Data Quality Management Solution

Data quality management is one of those often forgotten aspects of machine learning workflows. Small inconsistencies or…
LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

2020年5月7日

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Interoperating TensorFlow and Apache Spark is a common challenge in real world machine learning scenarios. TensorFlow…
Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

2020年5月6日

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Querying relational data structures using natural languages has long been a dream of technologists in the space. With…
Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

2020年5月4日

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Natural language understanding(NLU) has been one of the most active areas adopting state-pf-the-art deep learning…

2 条评论
Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

2020年4月27日

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Generative models have been an important component of machine learning for the last few decades. With the emergence of…
Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

2020年4月22日

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

PyTorch is one of the fastest growing open source projects in the deep learning space. Initially incubated by Facebook,…

See all articles

Using Machine Learning to Build Better Machine Learning: How Tech Giants are Relying on AutoML to Build Better Neural Network Architectures

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

The Challenge

AutoML to the Rescue

Salesforce.com TransmogrifAI: The Brain Behind Einstein

Azure ML: Helping Developers Select the Right Machine Learning Model

Waymo: Automated Model Selection for Self-Driving Vehicles

Jesus Rodriguez的更多文章

社区洞察

其他会员也浏览了

The AI Triad and National Security: A Global Look with East African Focus

2018 Machine Learning Predictions from the Experts Themselves

The State of Machine Learning in Business Today

A Tour of The Top 10 Algorithms for Machine Learning Newbies

Machine Learning

Three paths to Optimal Architectures for Deep learning for Image detection, NLP, and Sequences

Why Machine Learning Systems Misbehave

?? How I Choose the Right Algorithm for My Machine Learning Projects ??

Best Machine Learning Frameworks for Scalable AI Applications in 2025

4 + 1 Reasons You Should Attend the Machine Learning Prague 2020 Conference!

The Challenge

AutoML to the Rescue

Salesforce.com TransmogrifAI: The Brain Behind Einstein

Azure ML: Helping Developers Select the Right Machine Learning Model

Waymo: Automated Model Selection for Self-Driving Vehicles

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Uber Unveils Its New Data Quality Management Solution

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

社区洞察

其他会员也浏览了

The AI Triad and National Security: A Global Look with East African Focus

2018 Machine Learning Predictions from the Experts Themselves

The State of Machine Learning in Business Today

A Tour of The Top 10 Algorithms for Machine Learning Newbies

Machine Learning

Three paths to Optimal Architectures for Deep learning for Image detection, NLP, and Sequences

Why Machine Learning Systems Misbehave

?? How I Choose the Right Algorithm for My Machine Learning Projects ??

Best Machine Learning Frameworks for Scalable AI Applications in 2025

4 + 1 Reasons You Should Attend the Machine Learning Prague 2020 Conference!