登录查看更多内容

TensorFlow Serving API & gRPC

Andrew Antonopoulos

Senior Solutions Architect at Sony Professional Solutions Europe

发布日期: 2024年5月25日

To serve models for production applications, one can use REST API or gRPC. gRPC is a high-performance, binary, and strongly typed protocol using HTTP/2, while REST is a simpler, text-based, and stateless protocol using HTTP with JSON/XML.

Here are some differences between gRPC and REST:

Protocol: gRPC uses HTTP/2 for transport, while REST typically uses HTTP/1.1.
Data format: gRPC employs Protocol Buffers for serialisation, while REST usually leverages JSON or XML.
API design: gRPC is based on the RPC (Remote Procedure Call) paradigm, while REST follows the architectural constraints of the Representational State Transfer model.
Streaming: gRPC supports bidirectional streaming, whereas REST is limited to request-response communication patterns.

Using the ML model, TensorFlow Serving can receive client requests and provide responses from the back end. This is a flexible, high-performance serving system for machine learning models designed for production environments. TensorFlow Serving makes deploying new algorithms and experiments easy while keeping the same server architecture and APIs. It provides out-of-the-box integration with TensorFlow models but can be extended to serve other models. More information about TensorFlow Serving can be found here.

TensorFlow can be installed in Ubuntu by using pip:

pip install tensorflow-serving-api

Build & Save a model

Let's assume that we trained an ML model like the following:

np.random.seed(42)
tf.random.set_seed(42)

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28, 1]),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=keras.optimizers.SGD(learning_rate=1e-2),
              metrics=["accuracy"])
model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))

and we make some predictions:

np.round(model.predict(X_new), 2)

And the prediction response is the following:

Now we can save the model:

# Generate model verion
model_version = "0001"
model_name = "my_mnist_model"
model_path = os.path.join(model_name, model_version)
model_path

# Save the model
tf.saved_model.save(model, model_path)

After saving the model, we will create a directory with the necessary files, and we can inspect it with "shuttle".Shutil stands for "shell utilities" and provides a comprehensive set of functions for file and directory operations. Whether you need to copy, move, rename, or delete files and directories, the Shutil module can be used because have user?friendly and efficient functionalities.

First, we will need to import the module:

import shutil

shutil.rmtree(model_name)

And we can explore the tree of the directory:

for root, dirs, files in os.walk(model_name):
    indent = '    ' * root.count(os.sep)
    print('{}{}/'.format(indent, os.path.basename(root)))
    for filename in files:
        print('{}{}'.format(indent + '    ', filename))

And the output will be:

And by using "saved_model_cli", we can check the signatures:

!saved_model_cli show --dir {model_path} --tag_set serve

And the output will be:

Use TensorFlow Serving

To start, the server will require to use the model directory:

os.environ["MODEL_DIR"] = os.path.split(os.path.abspath(model_path))[0]

And run the server in the background on port 8501:

%%bash --bg
nohup tensorflow_model_server \
     --rest_api_port=8501 \
     --model_name=my_mnist_model \
     --model_base_path="${MODEL_DIR}" >server.log 2>&1

And we can check the listening ports:

!lsof -i -P -n | grep LISTEN

The output will show ports 8500 and 8501:

Because REST API supports JSON, we will need to input the data in a JSON format:

import json

input_data_json = json.dumps({
               "signature_name": "serving_default", 
               "instances": X_new.tolist(),
})

And now can use the TensorFlow Serving's REST API to make predictions:

领英推荐

Issue #199 - THE ML ENGINEER ??

Alejandro Saucedo 2 年前

LLMOps Workflow Orchestration

Sankara Reddy Thamma 2 个月前

Docker and Kubernetes for Data Science

Srivatsan Srinivasan 5 年前

import requests

SERVER_URL = 'https://localhost:8501/v1/models/my_mnist_model:predict'
response = requests.post(SERVER_URL, data=input_data_json)
response.raise_for_status() # raise an exception in case of error
response = response.json()

y_proba = np.array(response["predictions"])
y_proba.round(2)

and the output will be:

Use gRPC API

To use gRPC API will require to load the serving API prediction module:

from tensorflow_serving.apis.predict_pb2 import PredictRequest

request = PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = "serving_default"
input_name = model.input_names[0]
request.inputs[input_name].CopyFrom(tf.make_tensor_proto(X_new))

and then the gRPC:

import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc

channel = grpc.insecure_channel('localhost:8500')
predict_service = prediction_service_pb2_grpc.PredictionServiceStub(channel)
response = predict_service.Predict(request, timeout=10.0)

and by using the command:

response

we will get the output:

Also we can convert it to tensor:

output_name = model.output_names[0]
outputs_proto = response.outputs[output_name]
y_proba = tf.make_ndarray(outputs_proto)
y_proba.round(2)

and the output will be:

The most common way is to use REST API, but you should consider the advantages and disadvantages of both options.

#tensoflorserving #restapi #grpc #machinelearning

要查看或添加评论，请登录

Andrew Antonopoulos的更多文章

Sustainable ML - Monitor Power Consumption

2024年5月25日

Sustainable ML - Monitor Power Consumption

Training models will also consider the power consumption of the hardware. The following paper compares the most common…
Blockchain & Web3 Technology

2024年5月22日

Blockchain & Web3 Technology

Blockchain is a technology that securely stores transactional information by linking blocks together in a specific…
NVIDIA Mixed Precision - Loss & Accuracy - Part 2

2024年5月20日

NVIDIA Mixed Precision - Loss & Accuracy - Part 2

Part 1 explained how Nvidia's mixed precision can help reduce power consumption. However, we also need to consider…
NVIDIA Mixed Precision & Power Consumption - Part 1

2024年5月14日

NVIDIA Mixed Precision & Power Consumption - Part 1

Deep Learning has enabled progress in many different applications and can be used for developing models for…
Nvidia GPU & TensorFlow for ML in Ubuntu 24.04 LTS

2024年5月13日

Nvidia GPU & TensorFlow for ML in Ubuntu 24.04 LTS

Tensorflow announced that it would stop supporting GPUs for Windows. The latest support version was 2.

5 条评论
FreeBSD 13 & TCP BBR Congestion Control

2022年4月29日

FreeBSD 13 & TCP BBR Congestion Control

Finally TCP BBR is available for FreeBSD new release 13.x.

2 条评论
Kubernetes - Open Source Tools

2020年6月17日

Kubernetes - Open Source Tools

Kubernetes (also known as k8s or “kube”) is a very popular container orchestration platform that automates many of the…
Cache-Control Headers

2020年6月17日

Cache-Control Headers

The performance of content that is available via web sites and applications can be significantly improved by reusing…
CDN Cache and Machine Learning

2020年6月17日

CDN Cache and Machine Learning

The majority of the Internet’s content is delivered by global caching networks, also known as Content Delivery Networks…
OTT & Mobile Battle in Africa

2019年9月5日

OTT & Mobile Battle in Africa

OTT and specially SVOD is growing in Africa. Recently big OTT providers such as Netflix, muvi, Showmax, iFlix, MTN and…

See all articles

TensorFlow Serving API & gRPC

Andrew Antonopoulos

Senior Solutions Architect at Sony Professional Solutions Europe

Build & Save a model

Use TensorFlow Serving

领英推荐

Use gRPC API

Andrew Antonopoulos的更多文章

社区洞察

其他会员也浏览了

Issue #172 - THE ML ENGINEER ??

Prepare ML Data Faster and at Scale with Open-Source LLMs

Prepare ML Data Faster and at Scale with Open-Source LLMs

The Databricks Drop - 2023-05-29

Korvus: The Future of Efficient AI Workflows with In-Database RAG

Databricks vs. Snowflake: Choosing the Right Platform for Your ML Workflow

LLM Twin Course

Optimizing GenAI Applications.

Comparing Milvus and Cosmos DB for storing AI embeddings

Concept: Building MLflow MCP Server

Build & Save a model

Use TensorFlow Serving

领英推荐

Use gRPC API

Andrew Antonopoulos的更多文章

Sustainable ML - Monitor Power Consumption

Blockchain & Web3 Technology

NVIDIA Mixed Precision - Loss & Accuracy - Part 2

NVIDIA Mixed Precision & Power Consumption - Part 1

Nvidia GPU & TensorFlow for ML in Ubuntu 24.04 LTS

FreeBSD 13 & TCP BBR Congestion Control

Kubernetes - Open Source Tools

Cache-Control Headers

CDN Cache and Machine Learning

OTT & Mobile Battle in Africa

社区洞察

其他会员也浏览了

Issue #172 - THE ML ENGINEER ??

Prepare ML Data Faster and at Scale with Open-Source LLMs

Prepare ML Data Faster and at Scale with Open-Source LLMs

The Databricks Drop - 2023-05-29

Korvus: The Future of Efficient AI Workflows with In-Database RAG

Databricks vs. Snowflake: Choosing the Right Platform for Your ML Workflow

LLM Twin Course

Optimizing GenAI Applications.

Comparing Milvus and Cosmos DB for storing AI embeddings

Concept: Building MLflow MCP Server