登录查看更多内容

Modern machine learning through simulation systems.

Harpreet Singh

Founder @ Brahma AI Systems | AI, Data Sciences, Autonomous Algorithms

发布日期: 2023年10月28日

Statistics, and machine learning methods can be thought of algorithms that learn patterns based on historical data, subject to certain assumptions. For instance OLS regression is subjected to assumptions of linearity, normality of dependent variables, normality of covariates (features) and independent identically distributed residuals. The upside of these assumptions - we get closed form solutions to such problems.

OLS Regression - Shamelessly generated by ChatGPT from somewhere on the internet

Present day machine learning methods that work on structured data helped break free from the tyranny of these assumptions, as well as solve for challenges like multi-collinearity, high dimensionality and latent factors.

For instance random forests allowed for non-linear relationships, high dimensional feature spaces subject to somewhat less constraining assumptions i.e. non-correlated decision tree models that learn relatively independent patterns.

Random Forests : Generated Using Mid-Journey

XGBoost (Gradient Boosted Trees) is another such successful method/code library which can be used to really solve for most problems to a reasonable degree of effectiveness in practical business setting (not academia).

Gradient Boosted Trees: Generated Using Mid-Journey

Neural networks - convolutional neural nets (CNN), recurrent neural nets (RNN) and their myriad avatars further made it easier to pattern learn from large massive datasets esp. non-tabular data like images, audio & text.

佩尼戈阿利斯泰尔 2 年前

Copy of Predictive vs Causal Models in Machine…

Paritosh Kumar 1 个月前

Graph Machine Learning: It's Everywhere!

Tyler Blalock 1 个月前

One common problem across each one of these algorithms is generalization i.e how to make predictions on data, events or features that the algorithm has never seen before. A related problem is that of unknown "unknowns" i.e. I do not know what I do not know, so I cannot solve for it.

Large language models (LLMs), diffusion models, flow models have solved the generalization problem for text and images. Well known tools like ChatGPT, Stable Diffusion, Midjourney provide ample examples of this. These tools are able to generate novel text, images, audio and video. These models work effectively, because such models do not just learn patterns that exist in data, rather they learn the underlying unobserved drivers that generate observed patterns in data. By manipulating the behavior of these unobserved drivers novel information can be generated.

The next frontier in this space is to bring this same capability to solving the unknown "unknowns" problem for structured business data. The world of images and text has billions of learnable data points which makes it comparatively easier to have data to train generative models. However when one evaluates a business - say sourcing coffee for a company like Starbucks, there is only one version of the past data that can be used to train a learning algorithm.

Hence to generalize statistics and machine learning models being used presently, one needs a way to generate alternate realities and alternate past datasets. A system that can take existing data, combine it with the existing business domain knowledge, and simulate reliable alternate realities can be really useful in evaluating scenarios that the business may not have come across in the past. Through the use of such simulations, and by combining these simulations with generative machine learning techniques, the existing analytics can be made much more robust. An instance where this learning paradigm is being used successfully is the generation of synthetic training data for self-driving cars.

Simulations to model user behavior and their interactions.

Simulations, however are not simple to do. One needs to consider reliability of the simulation, computational cost involved to do simulations at scale, and then learning from the massive amounts of data that the simulation would generate. This brings us to next frontier of machine learning - use of simulations to generalize existing machine learning based decision support systems. By doing so, existing analytics can be leveraged to respond to unknowns that have not been observed in the past, as well as discover newer optimal solutions that were simply not considered due to a single version of the past.

An exciting time to be in machine learning, indeed!

Alok Ranjan

Co-founder at WalkingTree and Qritrim | Generative AI, AI/ML and Product Engineering

1 年

While the whole article was an excellent read, I was excited to read about the synthetic data and the use of simulation in machine learning. I look forward to seeing your next article in this series. Also, part of the work Ashish Kapoor's team is related to the "Generation of data & evaluation of scenarios through simulation.". While they focus on robot intelligence, it can be scaled in many other areas.

1 次回应

Shilpi Sharma

Builder| Technologist | Investor | Advisor

1 年

Great summary. Do you think on this journey of working with unknown "unknowns", we would have to first codify systems with known "unknowns" that will be outcome of permutation & combinations of known attributes of a process and/or interactions of multiple processes? And then those systems will be able to generate new attributes that can imagine an evolved process/interactions etc. to truly simulate unknown "unknowns"? Would love to hear more about it in rest of the series.

4 次回应

查看更多评论

要查看或添加评论，请登录

ChatGPT to get more out of research papers.

2023年10月27日

Modern machine learning through simulation systems.

Harpreet Singh

Founder @ Brahma AI Systems | AI, Data Sciences, Autonomous Algorithms

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Predictive Analytics

Data Science Explained!

Hand Gesture Recognition using ML Algorithms

LSTM for Enterprise Time Series Forecasting

The Rise of Automated Machine Learning

BxD Primer Series: Support Vector Machine (SVM) Models

BxD Primer Series: Decision Trees for Classification

How Machine Learning is used in Predicting Stock Prices - LSTM

Machine Learning Algorithms: An In-Depth Exploration

BxD Primer Series: K-Nearest Neighbors (K-NN) Models