Geek out time: try LLM and Embeddings on Nvidia NIM with Node.js

Geek out time: try LLM and Embeddings on Nvidia NIM with Node.js

(Also on Constellar tech blog: https://medium.com/the-constellar-digital-technology-blog/geek-out-time-try-llm-and-embeddings-on-nvidia-nim-with-node-js-1d1da6c3945c )

Nvidia NIM was rolled out a long time ago, but I haven’t seen it in action yet. In the past few weeks, I’ve played with the OpenAI API and local LLMs. Out of curiosity, I want to see how Nvidia NIM works. This weekend, I tried calling the Llama model and Embedding model on Nvidia’s NIM using Node.js.

Step 1: Register an Invidia NIM account

The registration for Invidia NIM is straightforward and free for developers.

Step 2: Select an LLM Model from “Models” and generate the test code

For the testing, I have chosen the “llama-3–70b-instruct”. You will see the playground where API key can be generated.

For the LLM integration, NIM also generates the NodeJS code to run on your local machine.

Script for Llama Model (appNIM.js)

import OpenAI from 'openai';
const openai = new OpenAI({
 apiKey: 'API Ky',
 baseURL: 'https://integrate.api.nvidia.com/v1',
});
async function main() {
 try {
 const completion = await openai.chat.completions.create({
 model: "meta/llama3–70b-instruct",
 messages: [{"role":"user","content":"who are you?"}],
 temperature: 0.5,
 top_p: 1,
 max_tokens: 1024,
 stream: true,
 });
 for await (const chunk of completion) {
 process.stdout.write(chunk.choices[0]?.delta?.content || '');
 }
 } catch (error) {
 console.error('Error during completion:', error);
 }
}
main();        

Step 3: Run the script

Run “npm start”, you will see the response from llama on NIM

Step 4: Select the Embedding Model from “Retrieval”

It is quite similar to the selection of LLM. The NVIDIA embed-qa-4 is a GPU-accelerated model designed for generating text embeddings specifically for question-answering (QA) retrieval tasks. It is part of NVIDIA’s suite of models that enhance retrieval-augmented generation (RAG) applications, under preview now.

Unlike the LLM integration, there are no Nodejs codes generated. So we have to write by ourselves.

Step 5: Write the script for calling the Embedding Model.

Based on the reference Curl below,

curl -X POST https://ai.api.nvidia.com/v1/retrieval/nvidia/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer API_Key" \
  -d '{
    "input": ["What is the capital of France?"],
    "model": "NV-Embed-QA",
    "input_type": "query",
    "encoding_format": "float",
    "truncate": "NONE"
  }'        

we can write the NodeJS code appNIMEmbedding.js :

import fetch from 'node-fetch'; // Make sure you installed node-fetch        
async function main() {
  const url = 'https://ai.api.nvidia.com/v1/retrieval/nvidia/embeddings';
  const apiKey = 'API_key';
  const requestPayload = {
    input: ["What is the capital of France?"],
    model: "NV-Embed-QA",
    input_type: "query",
    encoding_format: "float",
    truncate: "NONE"
  };
  try {
    const response = await fetch(url, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${apiKey}`
      },
      body: JSON.stringify(requestPayload)
    });
    if (!response.ok) {
      const errorText = await response.text();
      throw new Error(`Request failed with status ${response.status}: ${errorText}`);
    }
    const data = await response.json();
    console.log('Response:', data.data[0].embedding);
  } catch (error) {
    console.error('Error:', error.message);
  }
}
main();Step 4: Run the Script        
node appNIMEmbedding.js         

The embedding of “What is the capital of France?” is returned.

Wrapping Up

There you go! We get the response from the embedding model from NIM using Node.js. Pretty Cool. Nvidia NIM , and free for developers now. Give it a try. Have fun!

要查看或添加评论,请登录

Nedved Yang的更多文章

社区洞察

其他会员也浏览了