CROPLAND's top picks from the rstudio conf 2022: Machine Learning, A.I. and MLOPs
Besides Shiny, A.I. is of course another big topic at a data science conference like rstudio::conf (as if you would expect anything different, right?). When looking at the talks, we were especially inspired by developments that could contribute to shorter development cycles and easier management of models in production. Two talks were especially relevant to this use case.
Julia Silge & Max Kuhn’s keynote on the tidymodels family of packages brings an interesting overview of the philosophy behind, as well as recent developments in, machine learning/A.I. in R. From our perspective, there were two key takeaways in the talk. First off, the introduction of new routines for tuning models. Tuning refers to the practice of selecting settings or parameters for models, e.g., for a neural network this could be the number of hidden layers, the activation functions used in each layer, or the number of neurons per layer.
One particular type of method that deserves attention is “racing”, which is provided by the finetune package. Racing is a more efficient variant of grid search, the practice of iterating over a pre-defined number of candidate model parameters in search of the settings that provide the best results. As with grid search, we need to specify our search space beforehand, but after each candidate model evaluation, the finetune package will automatically remove the parameter combinations that will most likely not produce good results. In the use case that was demonstrated in the talk, this resulted in a 92% reduction of the total number of model evaluations that were required to find the optimum. We think this is an interesting development, because applying ‘racing’ will not only speed up model development cycles for computation-intense models, it will also reduce the final model’s carbon footprint, a serious topic in A.I. nowadays, and overall costs.
Secondly, their talk introduces the vetiver package, which is designed to provide tools to version, deploy, and monitor models in R and Python. This is the main topic of Isabel Zimmerman’s talk, which is also covered in the next bit. So, make sure to read on below!
领英推荐
Isabel Zimmerman’s talk zooms in on the vetiver package for R and Python. Vetiver provides tools for versioning, deploying and monitoring models that were built in either R or Python with packages such as tidymodels, torch, tensorflow, keras, or scikit-learn.
As a first step, you would take a trained model and convert it to a vetiver model. Vetiver models collect all the information needed to store, version, and deploy a trained model. Next, you can use the pins package in R or Python to upload (or, rather, “pin”) your vetiver model to a centralized repository (“board”) that is running on RStudio Connect. When you need to update the model, you simply send a newer version to the board, and the pins package will make sure to keep track of the different versions. With this versioned model, you can easily create a REST API to get predictions (without actually writing the Plumber/FastAPI code yourself).
As a huge bonus, the API automatically generates a visual documentation page (in the openAPI format), which describes the endpoints, the expected data structure and offers an easy way of interacting with the model. As if this wasn’t enough already, vetiver also includes an easy-to-use method for investigating model performance. By supplying a vetiver model with new data and predictions, we can evaluate the model’s performance on new data. These performance metrics can be stored alongside your versioned models on a pins board.