TensorFlow Serving API & gRPC
Andrew Antonopoulos
Senior Solutions Architect at Sony Professional Solutions Europe
To serve models for production applications, one can use REST API or gRPC. gRPC is a high-performance, binary, and strongly typed protocol using HTTP/2, while REST is a simpler, text-based, and stateless protocol using HTTP with JSON/XML.
Here are some differences between gRPC and REST:
Using the ML model, TensorFlow Serving can receive client requests and provide responses from the back end. This is a flexible, high-performance serving system for machine learning models designed for production environments. TensorFlow Serving makes deploying new algorithms and experiments easy while keeping the same server architecture and APIs. It provides out-of-the-box integration with TensorFlow models but can be extended to serve other models. More information about TensorFlow Serving can be found here.
TensorFlow can be installed in Ubuntu by using pip:
pip install tensorflow-serving-api
Build & Save a model
Let's assume that we trained an ML model like the following:
np.random.seed(42)
tf.random.set_seed(42)
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28, 28, 1]),
keras.layers.Dense(100, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
optimizer=keras.optimizers.SGD(learning_rate=1e-2),
metrics=["accuracy"])
model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))
and we make some predictions:
np.round(model.predict(X_new), 2)
And the prediction response is the following:
Now we can save the model:
# Generate model verion
model_version = "0001"
model_name = "my_mnist_model"
model_path = os.path.join(model_name, model_version)
model_path
# Save the model
tf.saved_model.save(model, model_path)
After saving the model, we will create a directory with the necessary files, and we can inspect it with "shuttle".Shutil stands for "shell utilities" and provides a comprehensive set of functions for file and directory operations. Whether you need to copy, move, rename, or delete files and directories, the Shutil module can be used because have user?friendly and efficient functionalities.
First, we will need to import the module:
import shutil
shutil.rmtree(model_name)
And we can explore the tree of the directory:
for root, dirs, files in os.walk(model_name):
indent = ' ' * root.count(os.sep)
print('{}{}/'.format(indent, os.path.basename(root)))
for filename in files:
print('{}{}'.format(indent + ' ', filename))
And the output will be:
And by using "saved_model_cli", we can check the signatures:
!saved_model_cli show --dir {model_path} --tag_set serve
And the output will be:
Use TensorFlow Serving
To start, the server will require to use the model directory:
os.environ["MODEL_DIR"] = os.path.split(os.path.abspath(model_path))[0]
And run the server in the background on port 8501:
%%bash --bg
nohup tensorflow_model_server \
--rest_api_port=8501 \
--model_name=my_mnist_model \
--model_base_path="${MODEL_DIR}" >server.log 2>&1
And we can check the listening ports:
!lsof -i -P -n | grep LISTEN
The output will show ports 8500 and 8501:
Because REST API supports JSON, we will need to input the data in a JSON format:
import json
input_data_json = json.dumps({
"signature_name": "serving_default",
"instances": X_new.tolist(),
})
And now can use the TensorFlow Serving's REST API to make predictions:
领英推荐
import requests
SERVER_URL = 'https://localhost:8501/v1/models/my_mnist_model:predict'
response = requests.post(SERVER_URL, data=input_data_json)
response.raise_for_status() # raise an exception in case of error
response = response.json()
y_proba = np.array(response["predictions"])
y_proba.round(2)
and the output will be:
Use gRPC API
To use gRPC API will require to load the serving API prediction module:
from tensorflow_serving.apis.predict_pb2 import PredictRequest
request = PredictRequest()
request.model_spec.name = model_name
request.model_spec.signature_name = "serving_default"
input_name = model.input_names[0]
request.inputs[input_name].CopyFrom(tf.make_tensor_proto(X_new))
and then the gRPC:
import grpc
from tensorflow_serving.apis import prediction_service_pb2_grpc
channel = grpc.insecure_channel('localhost:8500')
predict_service = prediction_service_pb2_grpc.PredictionServiceStub(channel)
response = predict_service.Predict(request, timeout=10.0)
and by using the command:
response
we will get the output:
Also we can convert it to tensor:
output_name = model.output_names[0]
outputs_proto = response.outputs[output_name]
y_proba = tf.make_ndarray(outputs_proto)
y_proba.round(2)
and the output will be:
The most common way is to use REST API, but you should consider the advantages and disadvantages of both options.
#tensoflorserving #restapi #grpc #machinelearning