Build a Machine Learning Model in Rust
?
Machine learning is an intriguing concept in computer programming, involving the use of data to train a computer program to perform tasks. During this process, the program learns from data by identifying patterns, reducing the necessity for programmers to hard code rules in certain applications.
Languages like Python and R are widely used for learning and executing machine learning tasks, but they do have their limitations. Some machine learning applications may require operations to be performed with high speed and computational efficiency.
Enter Rust, a powerful and efficient programming language. While Rust may not boast a mature ecosystem like Python or R, its inherent nature makes it ideal for applications demanding speed and efficiency. This tutorial serves as a valuable resource for Rust programmers looking to delve into machine learning, as well as for machine learning engineers seeking to explore the possibilities of machine learning with Rust.
?
Prerequisites
Before you begin, ensure you have the following:?
??
What is Machine Learning?
Machine learning involves creating software models that can understand patterns within data. Training a model means providing it with data so it can learn to recognize these patterns. This training is at the heart of machine learning, enabling the model to learn from the data and perform specific tasks.
?After training, a model can make inferences from new data. These inferences usually fall into two categories: classifications and predictions. A predictive model uses current data to forecast future events, results, or outcomes, offering valuable insights. Conversely, a classification model categorizes objects or concepts based on the data, aiding in organizing and understanding diverse datasets.
?The machine learning process is cyclical and iterative, involving stages of data input, model training, and output generation. This process allows models to continuously refine their understanding of data patterns, improving their ability to predict or classify effectively.
The following diagram is a basic overview of the machine learning process:
?
What is a Decision Tree?
A decision tree algorithm is one of the most straightforward machines learning algorithms, providing a clear and intuitive sense of what machine learning entails. Unlike many other algorithms, it offers a transparent and easy-to-understand visualization of the decision-making process.
A decision tree is used for both classification and regression tasks and is structured like a tree, consisting of the following components:
Root Node: The top node representing the entire dataset, which splits into subsets.
Internal Nodes: Nodes representing the features of the dataset and testing their values.
Leaf Nodes: Terminal nodes that predict the outcome (class labels or continuous values).
Branches: Paths that connect nodes, representing the decision rules applied to features.
?
Here is an example dataset that classifies four animals based on their properties:
In this structure, the decision tree uses the "Is wild?" feature as the root node, then splits into branches based on the "Has round pupils?" feature, and finally classifies each animal.
A model recognizes the pattern(s) in the table, then creates a tree with this structure:
?
Getting Started
There are several tools available for creating machine learning applications in Rust. While all of them are great, this tutorial will focus on using Linfa, a toolkit similar to the popular Python machine learning library, scikit-learn.
?
In this section, you’ll learn how to set up a Rust project for machine learning. The process is relatively simple and involves the following steps:
?
1.???? Create a New Project
First, create a new project called ml-project with the following command:
?2.???? Add Dependencies
?
Next, paste the following dependencies into the Cargo.toml file of your ml-project, under [dependencies]:
?
3.???? Build the Dependencies
?Finally, run the following command to build the dependencies:
?
Explanation of Dependencies:
·?????? linfa: The base package for Linfa machine learning models.
·?????? linfa-trees: A sub-package specifically for building decision tree models.
·?????? linfa-datasets: A package that provides pre-prepared datasets.
The linfa-datasets package is optional. If you prefer to prepare your own dataset, you can skip this dependency and follow the next section on dataset preparation.
?
How to Prepare the Dataset
Most machine learning models used in projects are trained with external data, not the data provided by the toolkit. In this section, you’ll learn how to prepare your own dataset from a CSV file.
First, you need to obtain a dataset. If you don’t have one, you can download a dataset from Kaggle. For this tutorial, we'll use the heart disease dataset. The heart disease dataset looks like this:
In this dataset, the target field indicates whether a person has heart disease: 1 means they have heart disease, and 0 means they do not. The rest of the fields contain details about each person. A model can learn from this dataset and predict if a person has heart disease.
Once you have downloaded the dataset, extract the CSV file into your project’s src folder.
To prepare a dataset, you’ll need to add the csv and ndarray packages to your project. Open Cargo.toml, and add the following under [dependencies]:
?
领英推荐
Now, run cargo build to download the packages, and you are ready to go.
In the following steps, we’ll build a get_dataset function. This function reads the heart.csv file, parses its content, prepares a dataset, and returns the prepared dataset. Let’s get started!
First, import the necessary packages:
Next, write the get_dataset function in main.rs:
?
Finalizing the Dataset Preparation
To complete the dataset preparation, you need to define several helper functions and integrate them into your get_dataset function. Here’s how you can do it step-by-step:
?
Define the Helper Functions:
?
Implement the get_dataset Function:
?
Complete the main Function:
?
Explanation of the get_dataset Function
?
1.???? Initialize the Reader:
?2.???? Extract Headers and Data:
?3.???? Calculate the Target Index:
?4.???? Get Features from Headers:
?5.???? Retrieve Records and Targets:
?6.???? Build and Return the Dataset:
?7.???? Running the Program
?Make the main function as shown above and run the program with the following command
?You should see the dataset printed in the output:
How to Create a Decision Tree Model
In this section, I’ll show you how to create a decision tree model and train it. The dataset I’ll use is the iris dataset provided by linfa-datasets.
The iris dataset contains a record of the sepal width, sepal height, petal width, and petal height of several irises, and classifies each record according to number-labeled species.
The code for the model is simple. Open the main.rs file, and paste the following into it:
Here's an explanation:
?
Import the Necessary Packages:
Fetch the Dataset and Split into Training and Testing Data:
?Initialize the Model and Train it with the Training Data:
?Use the Testing Data to Make Predictions:
?Compare the Predictions with the Actual Values:
If you run the program with cargo run, you’ll get the predicted category and the actual category in the terminal as output:
From the above, you can see that this model is 100% accurate. This won’t always be the case for all machine learning models. If you shuffle the dataset before training the model, the model may not be as accurate anymore.
The goal of machine learning is to be as accurate as possible. Most times, 100% accuracy is not possible.
Conclusion
In this tutorial, you learned a little about machine learning, and you also saw how to create a decision tree model using Rust.
Machine learning models in Linfa follow a similar process in building and training. So, all you need to do to use other types of models is to learn about each one, and you are good to go.
IT Solution Architect & Business Analyst
10 个月Nice article Dau..I am wondering you choose Rush over popular Python for ML Model.
Commercial Director at JYO ENERGY LIMITED
10 个月Great Article Dau, nicely articulate.
nice!!