Artificial Intelligence on Google Cloud Platform
Srivatsan Srinivasan
Chief Data Scientist | Gen AI | AI Advocate | YouTuber (bit.ly/AIEngineering)
There is no reason beyond doubt that future of AI is on cloud. Cloud along with data fueling knowledge of the business, brings in new degree of accessibility to AI technology
Why cloud in general for AI?
Scale - Instant access to hundreds of compute instances
Speed - Easy availability of specialized device like (GPU/TPU) that can help accelerate AI development
Cloud AI API's - Quick jump start into complex activities rather build from scratch. For cases like speech to text or language translation, enterprise as well might lack data to build models with high accuracy as available in cloud
Cloud AutoML - Train high quality models specific to business needs with citizen data scientist or even by business users
Cloud Bursting - With advances in Hybrid Cloud, start small in local data center and use cloud to scale AI compute
In this article we are going to focus on AI and related services offered by google cloud platform. Let us start by looking at google cloud AI building blocks post recent announcement in Next'19
Some of the key new additions to Google Cloud Platform AI capability was the introduction of AI platform that enables seamless creation of End to End machine learning pipeline, AutoML tables to automatically build and deploy machine learning models on structured data and support for new ML algorithms part of Big Query ML
Below is summary of AI capability added or enhanced as part of google next 19 announcement
If data scientists are life blood of today's data driven enterprise then data engineers are the veins carrying clean blood for machine learning algorithms to be useful
AI development and Training is relatively small fraction of entire end to end Machine Learning life cycle. Data Ingestion, Data Engineering, Feature Engineering, Data Analysis and Validation, Feature Engineering, Model performance monitoring and deploying the model is where a typical Data Engineer + Data Scientist spend 90+% of the time.
While this article is focused on AI capability let us quickly check on how different but integrated google services makes end to end ML possible
Interesting aspect of these services are on how well they integrate with each other to create seamless pipeline. One example is Tensorflow Transform, which uses full passes on input data during model training and is exported as Tensorflow graph to do prediction on single instances during serving. This prevents from training serving skew as same transformations are applied during both stages
Let us now discuss on GCP key capabilities recently announced in google NEXT
AutoML tables
AutoML Tables enables entire team of data scientists, analysts, and developers to automatically build and deploy state-of-the-art machine learning models on structured data at massively increased speed and scale. Every aspect of ML is really automated starting from
- Detecting schema and class distribution
- Helps detect missing value and outliers
- Codeless interface, making it easier for wide range of personas to build models not only data scientists
- Seamless deployment of machine learning models
- Enables model interpretation using model output and feature importance graph
Below diagram summarizes the simplicity of AutoML tables. Once you have your model dataset most of the activity is UI guided with minimal or no coding
AutoML tables also supports automated feature engineering for most data types
Currently it runs models for below algorithms against input dataset based on selected configuration parameters
- Logistics and Linear Regression
- Feedforward Deep Neural Network
- Wide and Deep NN
- Gradient Boosted Decision Trees (GBDT)
- DNN + GBDT Trees
Based on complexity of data it might also run Neural + Tree Architecture Search
BigQuery ML
BigQuery ML brings ML to the data. Models are trained and accessed in BigQuery using SQL. BigQuery ML democratizes the use of ML by empowering data analysts, the primary data warehouse users, to build and run models using existing business intelligence tools and spreadsheets.
While using AutoML tables no knowledge of ML is required with BigQuery ML basic understanding of ML is essential.
Nice illustration on different ML capability along with user personas. Note Cloud ML Engine is now called AI Platform training
Building model in BigQuery ML is a simple 3 step process
Step 1: Create Model
CREATE MODEL `bqml_tutorial.sample_model`
OPTIONS(model_type='logistic_reg') AS
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170630'
Step 2: Evaluate created model
SELECT
*
FROM
ML.EVALUATE(MODEL `bqml_tutorial.sample_model`, (
SELECT
IF(totals.transactions IS NULL, 0, 1) AS label,
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(geoNetwork.country, "") AS country,
IFNULL(totals.pageviews, 0) AS pageviews
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
Step 3: Predict using Final model
SELECT
country,
SUM(predicted_label) as total_predicted_purchases
FROM
ML.PREDICT(MODEL `bqml_tutorial.sample_model`, (
SELECT
IFNULL(device.operatingSystem, "") AS os,
device.isMobile AS is_mobile,
IFNULL(totals.pageviews, 0) AS pageviews,
IFNULL(geoNetwork.country, "") AS country
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170801'))
GROUP BY country
ORDER BY total_predicted_purchases DESC
LIMIT 10
Model performance and metrics can be tracked using BigQuery UI. UI provides details on confusion matrix, ROC curve, precision/recall matrix among others
And finally, most of the below happens behind the scenes during the 3 step model creation
Below is the algorithm support and road map as highlighted in Google Next'19
AI Platform
AI Platform provides seamless creation of end to end ML pipeline starting from ingesting data to preparing, discovering, training and deploying ML models. Below images summarizes AI Platform end to end process
AI Platform comes with managed notebook instance which is integrated with BigQuery, Cloud Dataproc, and Cloud Dataflow, making it easy to go from data ingestion to pre-processing and exploration, and eventually model training and deployment
AI Platform supports Kubeflow, that lets you build portable ML pipelines that you can run on-premises or on Google Cloud without significant code changes. Below is the services available as part of AI Platform that helps build end to end machine learning pipeline
One will also have access to AI technology like TensorFlow and Tensorflow Extended (TFX) tools as you deploy your AI applications to production. In case if you want to know more details on TFX check my multi part series on this topic
Keep watching for future series of TFX..
There were few other announcements in AI space. I will give a quick rundown of key announcements
AutoML Natural Language - Custom entity extraction lets you identify custom fields from input text
AutoML Vision - Object detection, detects multiple objects and provided bounding box co-ordinates
Cloud Solution AI - Introduced Recommendation and Document Understanding AI
Document understanding AI enables companies to digitize, classify, and extract knowledge. It also helps to organize and store knowledge graphs and other extracted data for easy search, query, consumption, and actionable insights
Nice representation of document understanding AI solution architecture is below
Few other products where there was new enhancements or capabilities is highlighted below. You can check references section below to get more information on newly added features
References
Data & AI leader with 22+ years experience. Proven track record in Digital Transformation, AI Strategy, & leading impactful projects.
5 年Excellent Article/Post! What i most like about this article is it's importance relevance to enterprise.
Professor Innovation Management and Global Crusader and Futurist, Advocating for Genuine Sustainability and Purpose-Driven Innovation
5 年As Artificial Intelligence (AI) is the next wave of innovation to enable us to work better, smarter and faster, it’s important to focus also on developing a culture of innovation, including personal disruptive innovation, personal integrity and alignment, to keep up with AI, because computers and robots can’t replace human thinking, creativity, and empathy. Check here HOW https://lnkd.in/dfcwhiQ
Excellent article!
Associate Vice President at Sutherland Global Services
5 年Great read