登录查看更多内容

Geek out time: try LLM and Embeddings on Nvidia NIM with Node.js

Nedved Yang

发布日期: 2024年7月15日

(Also on Constellar tech blog: https://medium.com/the-constellar-digital-technology-blog/geek-out-time-try-llm-and-embeddings-on-nvidia-nim-with-node-js-1d1da6c3945c )

Nvidia NIM was rolled out a long time ago, but I haven’t seen it in action yet. In the past few weeks, I’ve played with the OpenAI API and local LLMs. Out of curiosity, I want to see how Nvidia NIM works. This weekend, I tried calling the Llama model and Embedding model on Nvidia’s NIM using Node.js.

Step 1: Register an Invidia NIM account

The registration for Invidia NIM is straightforward and free for developers.

Step 2: Select an LLM Model from “Models” and generate the test code

For the testing, I have chosen the “llama-3–70b-instruct”. You will see the playground where API key can be generated.

For the LLM integration, NIM also generates the NodeJS code to run on your local machine.

Script for Llama Model (appNIM.js)

import OpenAI from 'openai';
const openai = new OpenAI({
 apiKey: 'API Ky',
 baseURL: 'https://integrate.api.nvidia.com/v1',
});
async function main() {
 try {
 const completion = await openai.chat.completions.create({
 model: "meta/llama3–70b-instruct",
 messages: [{"role":"user","content":"who are you?"}],
 temperature: 0.5,
 top_p: 1,
 max_tokens: 1024,
 stream: true,
 });
 for await (const chunk of completion) {
 process.stdout.write(chunk.choices[0]?.delta?.content || '');
 }
 } catch (error) {
 console.error('Error during completion:', error);
 }
}
main();

Step 3: Run the script

Run “npm start”, you will see the response from llama on NIM

领英推荐

Unleashing Apple Silicon's Machine Learning Prowess: A…

Bojan Tunguz, Ph.D. 1 个月前

Seeed Monthly Wrap-up for December 2022: 6 Amazing…

Seeed Studio 2 年前

?? How to Get Lightning-Fast LLMs

AlphaSignal 1 年前

Step 4: Select the Embedding Model from “Retrieval”

It is quite similar to the selection of LLM. The NVIDIA embed-qa-4 is a GPU-accelerated model designed for generating text embeddings specifically for question-answering (QA) retrieval tasks. It is part of NVIDIA’s suite of models that enhance retrieval-augmented generation (RAG) applications, under preview now.

Unlike the LLM integration, there are no Nodejs codes generated. So we have to write by ourselves.

Step 5: Write the script for calling the Embedding Model.

Based on the reference Curl below,

curl -X POST https://ai.api.nvidia.com/v1/retrieval/nvidia/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer API_Key" \
  -d '{
    "input": ["What is the capital of France?"],
    "model": "NV-Embed-QA",
    "input_type": "query",
    "encoding_format": "float",
    "truncate": "NONE"
  }'

we can write the NodeJS code appNIMEmbedding.js :

import fetch from 'node-fetch'; // Make sure you installed node-fetch

async function main() {
  const url = 'https://ai.api.nvidia.com/v1/retrieval/nvidia/embeddings';
  const apiKey = 'API_key';
  const requestPayload = {
    input: ["What is the capital of France?"],
    model: "NV-Embed-QA",
    input_type: "query",
    encoding_format: "float",
    truncate: "NONE"
  };
  try {
    const response = await fetch(url, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${apiKey}`
      },
      body: JSON.stringify(requestPayload)
    });
    if (!response.ok) {
      const errorText = await response.text();
      throw new Error(`Request failed with status ${response.status}: ${errorText}`);
    }
    const data = await response.json();
    console.log('Response:', data.data[0].embedding);
  } catch (error) {
    console.error('Error:', error.message);
  }
}
main();Step 4: Run the Script

node appNIMEmbedding.js

The embedding of “What is the capital of France?” is returned.

Wrapping Up

There you go! We get the response from the embedding model from NIM using Node.js. Pretty Cool. Nvidia NIM , and free for developers now. Give it a try. Have fun!

要查看或添加评论，请登录

Nedved Yang的更多文章

Geek Out Time: Trying newly released OpenAI’s Responses API with Web Search Tool in Google Colab

2025年3月17日

Geek Out Time: Trying newly released OpenAI’s Responses API with Web Search Tool in Google Colab

(Also on Constellar tech blog:…

1 条评论
Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

2025年3月3日

Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

(Also on Constellar tech blog…

2 条评论
Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

2025年2月24日

Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

(Also on Constellar tech blog…
Geek Out Time: “Vibe Coding” on Google Colab with OpenAI & DeepSeek

2025年2月17日

Geek Out Time: “Vibe Coding” on Google Colab with OpenAI & DeepSeek

(Also on Constellar tech blog…

2 条评论
Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

2025年2月10日

Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

(Also on Constellar tech blog…

4 条评论
Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

2025年2月4日

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

(Also on Constellar tech blog…
Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

2025年1月20日

Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

(Also on Constellar tech blog…

2 条评论
Geek Out Time: AI Model Routing — Dynamically Choose Models Based on Question Complexity

2025年1月13日

Geek Out Time: AI Model Routing — Dynamically Choose Models Based on Question Complexity

(Also on Constellar tech blog…
Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

2024年12月23日

Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

(Also on Constellar tech blog https://nedvedyang.medium.

1 条评论
Geek Out Time: Exploring Opensource AnythingLLM — The All-in-One, Easy AI Platform for Local RAG and Intelligent Agents with Just a Click

2024年12月9日

Geek Out Time: Exploring Opensource AnythingLLM — The All-in-One, Easy AI Platform for Local RAG and Intelligent Agents with Just a Click

(Also on Constellar tech blog…

3 条评论

See all articles

Geek out time: try LLM and Embeddings on Nvidia NIM with Node.js

Nedved Yang

Step 1: Register an Invidia NIM account

Step 2: Select an LLM Model from “Models” and generate the test code

Script for Llama Model (appNIM.js)

Step 3: Run the script

领英推荐

Step 4: Select the Embedding Model from “Retrieval”

Step 5: Write the script for calling the Embedding Model.

Wrapping Up

Nedved Yang的更多文章

社区洞察

其他会员也浏览了

Latest Updates: 36K NVIDIA GB200 GPU Cluster, New FLUX Tools, and Qwen2.5-Coder

NVIDIA’s Journey from Graphic Cards to Powering AI Revolution

NVIDIA Sana: Revolutionizing AI with Open-Source Power and Unmatched Efficiency

Nvidia's powerful strategy: Full AI Orchestration

Observations on the first order outputs of LLM’s wrt NVIDIA DGX Reference Architecture employing ChatGPT and Claude – an outside in perspective

Web ML Monthly #17: Test client side AI models via Headless Chrome, Stable Diffusion in <1s, + Chrome mobile now supports WebGPU: run LLMs on a phone

Everything We Know About NVIDIA Project DIGITS

Harnessing WebGPU for Next-Gen AI & Analytics

Text to 3D, NVIDIA Giveaway and more!

Building the Future of MLOps with GPUs: Speed, Scalability and Efficiency

Step 1: Register an Invidia NIM account

Step 2: Select an LLM Model from “Models” and generate the test code

Script for Llama Model (appNIM.js)

Step 3: Run the script

领英推荐

Step 4: Select the Embedding Model from “Retrieval”

Step 5: Write the script for calling the Embedding Model.

Wrapping Up

Nedved Yang的更多文章

Geek Out Time: Trying newly released OpenAI’s Responses API with Web Search Tool in Google Colab

Geek Out Time: Building a Multi-Agent Financial Advisor Copilot with AG2 (formerly AutoGen), OpenAI, and DeepSeek LLM

Geek Out Time: Simulating Distributed Training on TPU & GPU in Google Colab

Geek Out Time: “Vibe Coding” on Google Colab with OpenAI & DeepSeek

Geek Out Time: Mixture of Experts(MoE) vs. CNN: A Google Colab Experiment

Geek Out Time: Knowledge Distillation in TensorFlow- Smaller, Smarter Models in Google Colab

Geek Out Time: Build Your Own Autonomous AI Agent Backed by the Top Open-Source LLM DeepSeek v3 and Browser-Use Web UI-Right in Your Browser

Geek Out Time: AI Model Routing — Dynamically Choose Models Based on Question Complexity

Geek Out Time: AI in the Browser- Run WebLLM for Powerful, Local LLM Experiences

Geek Out Time: Exploring Opensource AnythingLLM — The All-in-One, Easy AI Platform for Local RAG and Intelligent Agents with Just a Click

社区洞察

其他会员也浏览了

Latest Updates: 36K NVIDIA GB200 GPU Cluster, New FLUX Tools, and Qwen2.5-Coder

NVIDIA’s Journey from Graphic Cards to Powering AI Revolution

NVIDIA Sana: Revolutionizing AI with Open-Source Power and Unmatched Efficiency

Nvidia's powerful strategy: Full AI Orchestration

Observations on the first order outputs of LLM’s wrt NVIDIA DGX Reference Architecture employing ChatGPT and Claude – an outside in perspective

Web ML Monthly #17: Test client side AI models via Headless Chrome, Stable Diffusion in <1s, + Chrome mobile now supports WebGPU: run LLMs on a phone

Everything We Know About NVIDIA Project DIGITS

Harnessing WebGPU for Next-Gen AI & Analytics

Text to 3D, NVIDIA Giveaway and more!

Building the Future of MLOps with GPUs: Speed, Scalability and Efficiency