登录查看更多内容

TransmogrifAI

360DigiTMG

We don’t just train, we transform by making a POSITIVE impact on your CAREER!

发布日期: 2023年7月7日

TransmogrifAI is a machine learning automation framework designed to simplify the machine learning workflow. It was created by Salesforce and is open-source software. TransmogrifAI is built on top of Apache Spark and is designed to work with big data. The name TransmogrifAI is a reference to a comic strip named Calvin and Hobbes, in which Calvin uses a transmogrifier to transform himself into different creatures.

What is TransmogrifAI?

TransmogrifAI is a machine learning automation framework that is designed to simplify the machine learning workflow. It provides a unified API for data cleaning, feature engineering, and model training. TransmogrifAI uses automated feature engineering to automatically create new features based on the input data. It uses feature selection to select the best features for the model, reducing the risk of over fitting. TransmogrifAI also provides automatic hyper parameter tuning, which helps to optimize the performance of the model.

Why use TransmogrifAI?

TransmogrifAI makes it easier to build and deploy machine learning models. It provides a simple, unified API that abstracts away the complexity of the underlying machine learning algorithms. It also provides a range of tools for data cleaning, feature engineering, and model training. TransmogrifAI is designed to work with big data, which means it can handle large datasets without running into memory or performance issues.?

How does TransmogrifAI work?

TransmogrifAI works by automating many of the steps in the machine learning workflow. It uses automated feature engineering to create new features based on the input data. It also?uses feature selection to select the best features for the model. TransmogrifAI provides automatic hyper parameter tuning, which helps to optimize the performance of the model. TransmogrifAI is built on top of Apache Spark, which means it can handle big data. It also provides a range of tools for data cleaning, feature engineering, and model training.?

The TransmogrifAI Workflow

Feature Inference:?The first step in any machine learning process is data preparation. A data scientist collects all relevant data and compares, combines and aggregates different data sources to extract raw signals that could have predictive power. The extracted signals are then placed into a flexible data structure, commonly known as a data frame, from where they can be further manipulated. Although these data structures are simple and easy to manipulate, they do not provide data scientists with protection against consequential errors such as incorrect assumptions about types or nulls in the data. Features are strongly typed and TransmogrifAI supports a rich and extensible hierarchy of feature types. In addition to allowing user-specified types, TransmogrifAI also derives its own. For example, if it detects that a low cardinality text element is actually a hidden categorical element, it will catalogue it and deal with it accordingly. Strongly typed functions allow developers to catch most errors at compile time, not at runtime. They are also key in automating the type-specific post-processing common to machine learning pipelines.

The TransmogrifAI Feature type hierarchy

Data & Analytics 4 个月前

Machine Learning and Big Data: Are They the Future?

Analytics Insight? 4 个月前

How to approach a Machine Learning Project ?

Akash Raj 2 年前

Transmogrification (a.k.a automated feature engineering):?While strongly typed functions are very helpful in thinking about your data and minimizing subsequent errors, ultimately all functions need to be transformed into a numerical representation that reveals patterns in the data in a way that machine learning algorithms can easily exploit. . This process is known as feature engineering. There are endless ways to transform the element types in the image above and doing it the right way is the art of data science.

As an example, let's ask ourselves how we would go about transforming a US state (eg CA, NY, TX, etc.) to a number. The problem with this encoding is that it does not store any information about the geographical proximity of the states. However, proximity can be an important property in modelling purchasing behaviour. This would solve the first problem, but would still not encode information about whether the states are in the north, south, west, or east of the country. This was a simple illustration of one feature - imagine doing this in the hundreds or thousands! What makes this process particularly challenging is that there is no single correct way, and successful approaches are highly dependent on the problem we are trying to optimize.

TransmogrifAI comes with a myriad of techniques for all supported feature types, from phone numbers, email addresses, geographic locations to text data. These transformations aren't just about getting data into a format that algorithms can use, TransmogrifAI also optimizes transformations to make it easier for machine learning algorithms to learn from data. For example, it can transform a numerical property such as age into the most appropriate age groups for a particular problem – age groups for the fashion industry may be different from wealth management age groups.

Automated Feature Validation:?The function can lead to an explosion in data dimensions. And high-dimensional data is often full of problems! For example, the usage of particular fields in the data may change over time, and models trained on those fields may perform poorly on fresh data. Another big (and often overlooked) problem is hindsight bias or data leakage. This occurs when information is introduced into the training examples that will not actually be present at the time of prediction. The result is models that look amazing on paper but are completely useless in practice. Consider a dataset containing trade information where the task is to predict trades that are likely to be forthcoming. Imagine a field in this dataset called "Deal Amount" that is populated only after the deal is closed. However, in reality, this field will never be filled for a deal that is still running, and the machine learning model will perform poorly on those trades where predictions really matter! These algorithms are particularly useful for maintaining sanity when dealing with high-dimensional and unknown data that can be fraught with hindsight bias. They apply a lot of statistical tests based on feature types and additionally use feature pedigree to detect and remove such bias

Automated Model Selection:?The final stage of the data scientist's process involves applying machine learning algorithms to the prepared data to create a predictive model. There are many different algorithms to try, each with a number of knobs that can be tweaked to varying degrees. Finding the right algorithm and setting the parameters can mean the difference between a powerful model and one that is no better than a coin toss.

It also automatically deals with the problem of imbalanced data by appropriately sampling the data and recalibrating predictions to match true priors. There is often a significant gap in the performance of the best and worst models a data scientist trains on the data, and exploring the space of possible models.

Hyper parameter Optimization:?Underlying all of the stages above is a hyper parameter optimization layer.?However the reality is that all of the stages above come with a variety of knobs that matter. The sampling rate for dealing with imbalanced data is yet another knob that can be adjusted. Tuning all of these parameters can be overwhelming to a data scientist, but can really make the difference between a great model and one that is essentially a random number generator. This is why TransmogrifAI comes with some techniques for automatically tuning these Hyper parameter and a framework to extend to more advance tuning techniques.

Benefits of using TransmogrifAI:

TransmogrifAI simplifies the machine learning workflow by automating many of the steps. This means that data scientists can focus on the more creative aspects of machine learning, such as choosing the right algorithm and interpreting the results. TransmogrifAI provides a range of tools for data cleaning, feature engineering, and model training. It is also designed to work with big data, which means it can handle large datasets without running into memory or performance issues.

Conclusion:

TransmogrifAI is a machine learning automation framework designed to simplify the machine learning workflow. It provides a unified API for data cleaning, feature engineering, and model training. TransmogrifAI uses automated feature engineering to create new features based on the input data. It also provides automatic hyper parameter tuning, which helps to optimize the performance of the model. TransmogrifAI is built on top of Apache Spark, which means it can handle big data. TransmogrifAI simplifies the machine learning workflow by automating many of the steps, which means that data scientists can focus on the more creative aspects of machine learning.

Meta-Dome

23,932 位关注者

Ronaald Patrik (He/Him/His)

Leadership And Development Manager /Visiting Faculty

1 年

Amazing

Manish Nehra

Education Counselor || Career Counselor || Top Voice in Education& Entrepreneurship || Entrepreneur || Startup Mentor

1 年

Amazing?

Jagdish Saini

1 年

Thanks for posting

Jandeep Singh Sethi

1 年

Brilliant work

查看更多评论

要查看或添加评论，请登录

TransmogrifAI

360DigiTMG

We don’t just train, we transform by making a POSITIVE impact on your CAREER!

领英推荐

Meta-Dome

23,932 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

ML Systems for Business: A Step-by-Step Guide

AWS Machine Learning Workflow

Data Engineering in the Era of Machine Learning – Key Insights and Best Practices

The Impact of Machine Learning on Data Pipelines: Challenges and Opportunities

H2O.ai: An Open-Source Platform for Building and Deploying Machine Learning Models

MLOps for Data Scientists

Deploying Machine Learning Models – Overcoming Key Challenges

MLflow Alternatives for Data Version Control: DVC vs. MLflow

Machine Learning in Predictive Analytics

Your First Steps in Data Science: Top 10 Machine Learning Algorithms for Beginners

领英推荐

Meta-Dome

23,932 位关注者

Regression Models - Poisson Regression

2024年11月20日

Decoding Time Series Forecasting: Unveiling the Enigmatic Patterns of Additive Seasonality

2024年11月12日

Black Box Method: Reinforcement Learning Algorithms

2024年11月5日

Time Series Exponential Trend Model

2024年10月15日

Dimension Reduction Linear Discriminant Analysis

2024年10月1日

Unveiling the Art of Time Series Analysis: Choosing the Right Model

2024年9月24日

AWS Cloud-Based Deployment

2024年9月17日

Optimizing Cloud Deployments: The Power of Google Cloud Deployment Manager

2024年9月10日

Mastering the Upstream Data Stream

2024年8月8日

Navigating the Shifting Tides: Monitoring & Maintenance in the World of Concept Drift

2024年6月22日

社区洞察

其他会员也浏览了

ML Systems for Business: A Step-by-Step Guide

AWS Machine Learning Workflow

Data Engineering in the Era of Machine Learning – Key Insights and Best Practices

The Impact of Machine Learning on Data Pipelines: Challenges and Opportunities

H2O.ai: An Open-Source Platform for Building and Deploying Machine Learning Models

MLOps for Data Scientists

Deploying Machine Learning Models – Overcoming Key Challenges

MLflow Alternatives for Data Version Control: DVC vs. MLflow

Machine Learning in Predictive Analytics

Your First Steps in Data Science: Top 10 Machine Learning Algorithms for Beginners