MLOps for recommenders - Deploying Recommender System in Production
Krishna Yogi Kolluru
Data Science Architect | ML | GenAI | Speaker | ex-Microsoft | ex- Credit Suisse | IIT - NUS Alumni | AWS & Databricks Certified Data Engineer | T2 Skilled worker
MLOps for recommenders - Deploying Recommender System in Production
With apps like Netflix (movie recommendation), Amazon (product recommendation), Spotify (music recommendation), YouTube (video recommendation), and many more, machine learning algorithms have already become a part of our everyday lives without us even realising it. Customised recommendations that adjust based on your past behaviour and preferences are another service that machine learning algorithms are utilised for
MLOps (Machine Learning Operations) is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. This blog will explain why and how MLOps play a critical role in any ML application. We create a simple recommendation system application and outline various ways MLOps can help alleviate different problems that we might encounter when deploying this application in production.
ML Recommendation Systems
For this blog, consider a scenario where you are tasked with building an ML application that recommends similar products. The job of the ML system is to recommend N similar images based on an input image.
To demonstrate how MLOps can provide value to this ML recommendation system, let’s take a step back and build a simple end-to-end ML system that provides recommendations. We’ll use the Fashion product images dataset, which contains 44,000 images and their corresponding categories. Using this dataset, we’ll apply ML magic ?? to recommend similar products to a user. This ML magic will learn a representation from the dataset such that it clusters similar images close together in an embedding space.
The task here is to train a deep-learning model on the image dataset. Since all images have labels associated with them, we can train a Convolutional Neural Networks (CNN) classification model that takes in input an image and predicts the label as output. To do this, we create a script that takes care of pre-processing the dataset in the format required by the model, splitting the dataset and training the model.
The pseudo-code for the training script looks like the following
def split_dataset(whole_dataframe):
"""Split entire dataframe into train, val and test dataframe."""
# split dataframe into 80% train - 20% test ratio
# use sklearn.model_selection.train_test_split method to split
return train_df, test_df
def create_dataset(train_df, test_df):
"""Create datasets from dataframe."""
# use tf.keras.preprocessing.image.flow_from_dataframe method
# to read pandas dataframe
# and output tf.data.Dataset for each dataframe
return train_ds, test_ds
def train_model(train_ds, test_ds, num_classes):
"""Train and evaluate a model."""
# create a CNN classification model
model = tf.keras.Sequential([
...,
layers.Dense(256, activation='relu'),
layers.Dense(num_classes, activation='softmax')
])
# create a optimizer and loss function
# perform training and evaluation on datasets
# monitor metrics like loss and accuracy
# save and return best performing model
return model
Once we have trained the model and are satisfied with the performance on the test dataset, we are ready to extract the embeddings. To extract embeddings from a trained model?—?especially for CNN architectures with two dense layers at the end?—?we get the output of the model from the penultimate dense layer for a given input image. The penultimate, fully connected layer typically has a higher dimensionality (in this case 256) than the final dense layer (in this case equal to several classes), which means that it captures more information.
Hence, we use the penultimate layer instead of the final layer of the model for extracting embeddings. This approach is not unique. It varies for different architectures (e.g. encoder-decoder, using pooling layers) and can have variations for extracting embeddings.
The pseudo-code to extract and store embeddings looks like this:
embed_model = tf.keras.models.Model(inputs=model.input,
outputs=model.layers[-2].output)
def extract_embeddings(img_path, embed_model):
"""Extract embeddings for a particular input."""
# preprocess input
# pass input to model to get embeddings as output
# normalize the embeddings
return normlized_embeddings
def save_image_paths_to_file(img_paths):
"""Save image paths to pickle file."""
# create a list of absolute path to all images
# save absolute path to images in a pickle file
pickle.dump(img_files, open("img_files.pkl", "wb"))
def save_embeddings_to_file(img_paths, embed_model):
"""Save all embeddings for entire dataset to pickle file."""
# create a list containing embeddings of all images
# save all embeddings of all images in a pickle file
pickle.dump(embeddings, open("embeddings.pkl", "wb"))
Now to the final part of our ML application which is to create a recommendation engine. To get similar recommendations, we use the nearest neighbours approach. The code to get recommendations looks like the following
# load the saved embeddings of entire dataset
embeddings = pickle.load(open("embeddings.pkl", "rb"))
# load saved path to images of entire dataset
img_files = pickle.load(open("img_files.pkl", "rb"))
def recommend_similar_images(input_embedings, embeddings):
"""Recommend 5 nearest neighbours for given input feature."""
neighbours = sklearn.neighbors.NearestNeighbors(n_neighbors=5, algorithm='brute', metric='euclidean')
neighbours.fit(embeddings)
distance, indices = neighbours.kneighbors([input_embedings])
return indices
def recommendation_engine(input_img_path):
"""Recommend and display 5 similar images to input image."""
# show input image
display_image(input_img_path)
# extract embeddings for input image
out_embeddings = extract_embeddings(input_img_path, embed_model)
# get indexes of 5 similar images
similar_img_indexes = recommend_similar_images(out_embeddings, embeddings)
# show 5 similar images to input image
for ind in similar_img_indexes:
display_image(img_files[ind])
Summarising our simple ML recommendation application,
MLOps in ML Recommendation Systems
Here is a detailed breakdown of each step for deploying an ML recommendation system that provides 5 similar images as output when given an input image. This type of recommendation system is often used in applications like e-commerce, content discovery, or image similarity search.
领英推荐
# Pseudocode for infrastructure setup
create_server_cluster()
setup_database()
install_machine_learning_frameworks()
configure_hardware_resources()
configure_high_availability()
2. Data Collection and Preprocessing: ?—?Collect a diverse and comprehensive dataset of images to train your recommendation model. ?—?Preprocess the images, which may involve resizing, normalizing, and augmenting the data. ?—?Extract image features using deep learning techniques (e.g., Convolutional Neural Networks) to represent images effectively.
# Pseudocode for data collection and preprocessing
collect_image_dataset()
preprocess_images(dataset)
extract_image_features(dataset)
3. Model Training: ?—?Develop and train a machine learning model, such as a neural network, to learn image embeddings that capture image similarity. ?—?Fine-tune the model using a suitable loss function (e.g., triplet loss) to optimize for similarity comparisons. ?—?Validate the model’s performance on a held-out dataset to ensure it’s learning relevant features.
# Pseudocode for model training
define_neural_network_architecture()
train_model(training_data)
fine_tune_model(loss_function)
validate_model(validation_data)
4. Model Deployment: ?—?Create an API for the trained model that accepts an input image and returns the top 5 similar images based on the learned embeddings. ?—?Choose an appropriate deployment framework or technology, such as Docker containers or serverless computing platforms. ?—?Set up load balancing and auto-scaling to handle increased traffic.
# Pseudocode for model deployment
create_api_endpoint()
deploy_model(api)
choose_deployment_technology()
set_up_load_balancing()
5. Monitoring: ?—?Implement monitoring for the deployed system, tracking key metrics like response time, query throughput, and model accuracy. ?—?Use logging and alerting mechanisms to be notified of any anomalies or issues. ?—?Continuously collect user feedback to improve the recommendation system.
# Pseudocode for monitoring
while true:
response_time = measure_response_time()
accuracy = measure_model_accuracy()
resource_usage = monitor_resource_usage()
log_metrics(response_time, accuracy, resource_usage)
if is_anomaly(response_time, accuracy, resource_usage):
alert_team()
collect_user_feedback()
6. Scaling: ?—?Monitor system load and scale resources as needed to handle increased demand. ?—?Implement horizontal scaling by adding more servers or containers, and use load balancing to distribute incoming requests. ?—?Consider using content delivery networks (CDNs) to cache and serve images to reduce server load.
# Pseudocode for scaling
while true:
if is_traffic_high():
scale_resources_up()
else if is_traffic_low():
scale_resources_down()
7. Continuous Integration and Deployment (CI/CD): ?—?Set up a CI/CD pipeline to automate the testing and deployment of changes to the recommendation system. ?—?Include unit tests, integration tests, and model evaluation as part of the pipeline. ?—?Ensure that updates are thoroughly tested before being deployed to the production environment.
# Pseudocode for CI/CD
while code_changes:
run_unit_tests()
run_integration_tests()
evaluate_model()
if all_tests_pass():
deploy_changes_to_production()
8. Security: ?—?Implement security measures to protect user data and the system: ?—?Secure API endpoints with authentication and authorization mechanisms. ?—?Encrypt sensitive data at rest and in transit. ?—?Regularly update and patch system dependencies to address vulnerabilities. ?—?Implement rate limiting and access controls to prevent abuse.
# Pseudocode for security
implement_authentication()
implement_authorization()
encrypt_data()
update_system_dependencies()
implement_rate_limiting()
9. Maintenance: ?—?Regularly update the recommendation model to adapt to changing user preferences and new data. ?—?Address and fix bugs as they arise. ?—?Maintain documentation and version control for all components of the system.
# Pseudocode for maintenance
while system_running:
update_recommendation_model()
if bugs_detected():
fix_bugs()
maintain_documentation()
maintain_version_control()
By following these steps and embracing MLOps practices, you can deploy and maintain a robust recommendation system that efficiently provides 5 similar images for a given input image in a production environment.
Advantages of?MLOps
Conclusion
As organizations continue to harness the power of recommendation systems to provide personalized experiences to their users and customers, a well-executed deployment strategy and the integration of MLOps principles become essential components of success. By following these best practices and embracing MLOps, businesses can confidently deliver recommendation engines that meet the demands of a rapidly evolving digital landscape, ensuring that their systems are not only reliable and efficient but also poised for continuous improvement and innovation.