登录查看更多内容

The Best ML Tool I’ve Used

Shanif Dhanani

I'm helping GTM teams find and analyze data in seconds with AI — no SQL, no fuss.

发布日期: 2022年10月21日

About 18 months ago, the engineering team at Apteo switched from creating machine learning models in Python to using BigQuery ML.

That move has changed our lives.

Well, no, but it has made a huge impact in our ability to move quickly, serve and train models at scale, and get some nice features like A.I. explainability with minimal effort.

My ML origins

When I first started creating basic ML algorithms, I was a junior in college and we were using a tool called “S”, using an IDE that looked a bit like R-Studio. A few years pass, and then I started using Matlab and R. Then I moved on to Python, which made life a lot easier.

A lot of the ML projects I’ve done have had to be productionized, so it made sense to create them in a widely used, object-oriented language that just happened to have some great ML tools.

During the early days of Apteo, we created a lot of reusable code for data pipelines, transformations, model creation, and model evaluation. While I’m still proud of some of the highly generic code we created to make wide recurrent networks that could understand natural language, it was a good amount of work for a small team to undertake.

And dealing with large amounts of data was a real problem.

After a few pivots that led us into a world of having to create new ML models, but this time we wanted to reduce the time it took for us to iterate and deliver new product.

My cofounder found us BigQuery, Google’s amazing data warehouse that handles large amounts of data with minimal latency, and as an added bonus, has built in ML capabilities.

We started building our new models on that and it has been a real boon to productivity.

Vic Cie?lak 5 年前

Python: Bridge between Data Science + Engineering, to…

Sandra Wear 6 年前

KX's developed innovation of AI (Artificial…

Caspian One 3 个月前

Why I like it

There are a few really nice things about BigQuery ML. First, it’s made it easy to create new models using SQL. Using a standard language that I’m familiar with, I can easily spin up a new training session and create a model,?fast.

Unsurprisingly, that brings me to my second point. Whereas it might have taken us 12 hours to train a model in a custom-built Python application, BigQuery can do the same thing in 35–45 minutes. Not having to wait to see the results of a model are huge, especially when you’re trying to iterate fast, or worst case, there’s a bug in the dataset creation process.

It’s also nice because it has several built-in features that you’d normally have to handle yourself when training a new model, including implicit data transformations (one-hot encoding, standardization, etc) that you’d normally have to do explicitly, built-in evaluation metrics, and explainability metrics.

It can also be used to serve models (as long as you don’t need sub-second latencies or real-time predictions). We use it to serve our models, creating batch prediction jobs, where we store the results into new BigQuery datasets which we can then use for aggregation, analysis, or serving key results to our end users.

What I’m hoping for next

While it’s an awesome tool, and, as you probably guessed, the best all-around ML tool I’ve used, there are some things that I’m hoping to see in the future.

First, even though it scales to large datasets, it wasn’t able to handle datasets as large as I would have imagined. The last time I used it, it errored out on a dataset of 100M records, which, while a lot, isn’t?really?that much in the world of machine learning.

Second, building DNNs in it isn’t as robust as using something like Tensorflow. It’s hard to configure each individual layer in the model (though you could reasonably argue that most models shouldn’t necessarily be configured with a ton of layers). It also doesn’t support wide recurrent networks at the time of this writing, nor does it have built in embeddings for NLP. All would be a nice to have, but they’re all also things I can easily live without for now.

I’d also love to see the cost lowered… while it has saved us a ton of development time, we pay for it in terms of money, and there have been a few times where I’ve been told to use reserved instances of BQML (rather than the on-demand version, which is what we have now). Suffice it to say, the reservations they offer are much pricier than what I’d be looking to pay for at the moment.

All-in-all, it’s awesome that Google has provided such a nice tool for data science — an area where we all know that better tooling is highly needed. Highly recommend for your next project.

Brad Smith

Founder of @AutomationLinks | Your go-to "Automation Expert" - helping brands automate their business.

1 年

Great post Shanif Dhanani

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

The Best ML Tool I’ve Used

Shanif Dhanani

I'm helping GTM teams find and analyze data in seconds with AI — no SQL, no fuss.

My ML origins

领英推荐

Why I like it

What I’m hoping for next

更多精彩文章

社区洞察

其他会员也浏览了

KX's developed innovation of AI (Artificial Intelligence)

Amazon SageMaker Object2Vec: Turning Objects into Meaningful Embeddings (with Python Example)

Data Science Portfolios, Speeding Up Python, KANs, and Other May Must-Reads

DATA Pill #092 - MLFlow iceberg, Meta ?? Python

Machine Learning with Python Workshop on Feb 20th

Deploying a Sepsis Prediction API Using FastAPI: A Comprehensive Guide

Tools for Data Collection and Processing: Integrating Python, AI, and Machine Learning

Top 10 Machine Learning Projects on Github

DATA SCIENCE & MACHINE LEARNING Project using Python & Angular | Telecommunication domain | smartSense Solutions

Machine Learning – Do we have multiple options for exploring missing data using Python?

My ML origins

领英推荐

Why I like it

What I’m hoping for next

Why We're Focusing On AI-Powered Customer Success For SaaS At?Locusive

2023年11月29日

Our Playbook For Creating High-Quality SEO Content With?AI

2023年11月20日

Why Function Calls Won't Be Enough To Operate Autonomous Agents For Business

2023年7月19日

9 Best Practices For Designing An AutoGPT?Agent

2023年7月12日

How To Build An Internal Search Engine With ChatGPT: A Complete Guide

2023年7月5日

Everything You Need To Know About ChatGPT And Data Security

2023年6月29日

How Does AutoGPT Work?

2023年6月17日

What Are Vector Databases And Why Do We Need Them?

2023年6月11日

Why You Can't Just Train ChatGPT (And What To Do Instead)

2023年6月7日

How To Get ChatGPT To Answer Questions Using Your Trusted Documents

2023年5月28日

社区洞察

其他会员也浏览了

KX's developed innovation of AI (Artificial Intelligence)

Amazon SageMaker Object2Vec: Turning Objects into Meaningful Embeddings (with Python Example)

Data Science Portfolios, Speeding Up Python, KANs, and Other May Must-Reads

DATA Pill #092 - MLFlow iceberg, Meta ?? Python

Machine Learning with Python Workshop on Feb 20th

Deploying a Sepsis Prediction API Using FastAPI: A Comprehensive Guide

Tools for Data Collection and Processing: Integrating Python, AI, and Machine Learning

Top 10 Machine Learning Projects on Github

DATA SCIENCE & MACHINE LEARNING Project using Python & Angular | Telecommunication domain | smartSense Solutions

Machine Learning – Do we have multiple options for exploring missing data using Python?