Running AI/ML in the client - an example (ONNX and Transformers js)
Arijit Mondal
Software Engineer - Frontend @ Microsoft | Tech Lead & Senior Dev | React - Node.js -TypeScript - .NET core - Azure.
This is a continuation from the theoretical part we talked about in Demystifying running ML in the client - concepts. In this article, we will create a React/Node.js application that will summarize an input text using a small language model.
The choice
Transformers.js uses ONNX runtime to run models in the browser. It is created to be functionally equivalent to Hugging Face's transformers python library.
As we saw, there are a few other choices in the previous article, but I find transformers.js lets you more "dip your toe" in the AI by introducing wrapper around ONNX runtime and doing some of the heavy lifting behind their Api, so you can ease into the unknown territory of AI. That is the single reason why I chose it for this article. If you prefer to use ONNX runtime directly or use something like Web LLM, that will also work for learning the same concepts.
If you do not want to go through setting up your own project and just want to run the example project directly, you can skip to the "Running the example project" section below.
The setup
Before proceeding further make sure you have brushed react, typescript knowledge and the previous article on the theory part. I used VS Code and EDGE browser for developing and testing this.
First, let's create a react typescript project. If you aren't sure how to do this, just download my quick create-vite-react-app-cli template from npm and use it to scaffold the project. Run npm install to download to initial dependencies.
Next, let's install remaining dependencies, if using npm,
npm install @xenova/transformers lodash
npm install --save-dev @types/lodash @types/node
lodash isn't strictly necessary here. I used debounce for some optimizations.
At this point if you run the app with `npm run dev` you should see the default vite react template UI.
Minimal UI
Now that we have everything we need, lets add some basic UI elements. The goal is to get summarization from an input. So, we will need an input box, a button and an output box to show the summary. I used text area for the multiline input and output.
Note, some minor tweaks like showing character count, conditionally rendering output and a loading ui is done here in my example code. You can ignore these things for the exercise and just add the minimal 3 controls - input, a submit button and output.
Transformers js initialization
To summarize using transformers js, we need to instantiate an object from pipeline or in this case more specifically we would need a SummarizationPipeline.
await pipeline(pipelineType, modelName)
to get SummarizationPipeline we need to pass pipelineType as summarization and we will use distillbart variant model for summarization here, you can also choose another model if you want here.
const summarizer=await pipeline("summarization", "Xenova/distilbart-cnn-6-6")
Here a good practice is to create the pipeline instance only once instead of recreating it in every re-render. Note, you can also pass additional configuration while creating pipeline with a 3rd parameter for PretrainedOptions.(this is not demonstrated in the example code).
initialization - explanations
Here we used a "hosted pretrained model" from hugging face, you can see the onnx model here. This together with the precompiled wasm enables us compute in the client side itself and get summarization inference in this example. Here, we can take a step back and understand that the deployment forms explained earlier will be downloaded to your browser. And that will run locally in the client computer using web gpu to summarize in client computer itself without any server-side processing.
At this stage further possible to customize based on specific goal with a custom model and compiling and converting the model to use in the initialization step to achieve that goal.
Transformers js compute (running with an input text)
In this stage we run the pipeline initialized earlier with text entered in the input box UI created earlier when the button is clicked. And then we should set the output box with the result we get from pipeline.
领英推荐
const result= await summarizer(inputText)
Note, I have restricted the inputText length in the example code to 1000 characters randomly.
React js part- explanations
The basic idea is to store the prompt/input text in a state variable on change of the input. And then on submit button click, call the transformer js compute part mentioned above to get a response.
In the example code, I have used couple of custom hooks and reducer hook to manage the states from one place easily. You can take a look from the folder, src/common/hooks
case "promptUpdate": {
return {
...state,
prompts: [payload.newPrompt],
};
}
case "resultReceived": {
return {
...state,
generatedText: payload.response,
isComputing: false,
};
}
Furthermore, since I am using a service worker in my example, I am managing the interaction with service worker and main thread from the custom hook, src/useAppWorker.ts
workerRef.current.onmessage = (event) => {
if (event.data.result) {
processSummaryOutput(event.data.result);
} else {
const error = event.data.error;
dispatch({
type: "errorReceived",
payload: { errorMessage: `${error?.name}: ${error?.message}` },
});
}
};
You can potentially just use simple state update on submit button click with the result for this part.
Deployment forms files
Once you execute the code you can inspect network tab to see the onnx model, configuration as well as precompiled wasm files being loaded. You can also inspect the browser application data and cache storage to see these files later.
Running the example project
First clone it from the github and install the dependencies,
git clone https://github.com/ArijitCloud/web-aiml-app.git
cd web-aiml-app && npm install
Then, simply run it by
npm run dev
Now open the localhost url you see in your terminal. The UI should look like that of the published version here, https://clientside-ai-summarizer.vercel.app/
Finally, slowly debug through different section of the code to understand more deeply.
Credit
What Next?
From here we can explore in multiple directions, one such would be setting up a new small language model and hosting it locally and using it with transformers js code. Another path would be finetune the same model with more specific training data and see if we can get desired inferences. Alternatively, can explore towards the web llm way to run a model in the client side and deep dive in those directions.
References
In this guide, we've walked through the process of running AI/ML models in client-side applications. We used the deployment forms explain in previous article to successfully inference a summarization in the client side based on user input. I encourage you to explore further, experiment with different models, and build more complex applications that leverage the power of AI/ML directly in the browser or in the client-side devices.