登录查看更多内容

A Serverless Webassembly LLM (llama/gpt) Interface for Go

Samy Fodil, PhD

Founder of Taubyte (github.com/taubyte/tau)

发布日期: 2023年7月26日

In this article, I’ll go over how to use the LLM (or GPT) capabilities of?8ws. I’m assuming that you already know how to create a project and a function on a Taubyte-based Cloud Computing Network. If not, please refer to?Taubyte’s Documentation.

LLAMA Satellite

Our Cloud Computing Network provides LLM capabilities through what we call a Satellite. It does so by exporting llama.cpp capabilities to the Taubyte Virtual Machine, which powers Serverless Functions (or DFunctions, as per Taubyte’s terminology). The source code for the Satellite can be found?here.

LLAMA SDK

Satellites export low-level functions that aren’t very intuitive to use directly. Fortunately, it’s possible to address that with a user-friendly SDK. As of today, we offer a?Go?SDK. The source code can be found?here.

Get Ready

Before proceeding, let’s ensure you have a project and a DFunction ready to go. If not, please refer to?“Create a Function”.

Let’s Code!

A good practice is to clone your code locally using git or the tau command-line. Make sure you have?Go?installed, then run:

go get github.com/samyfodil/taubyte-llama-satellite

Our Basic Function

If you followed the steps from?Taubyte’s Documentation, your basic function should look something like this:

package lib

import (
	"github.com/taubyte/go-sdk/event"
)

//export ping
func ping(e event.Event) uint32 {
	h, err := e.HTTP()
	if err != nil {
		return 1
	}

	h.Write([]byte("PONG"))

	return 0
}

Let’s modify it so it uses the POST body as the prompt. Note: I’ve changed the function’s name to?predict. Ensure this change is reflected in your configuration by setting the?entry point?to?predict?and modifying the?method?from?GET?to?POST.

package lib

import (
	"github.com/taubyte/go-sdk/event"
  "io"
)

//export predict
func predict(e event.Event) uint32 {
	h, err := e.HTTP()
	if err != nil {
		return 1
	}
	defer h.Body().Close()

	prompt, err := io.ReadAll(h.Body())
	if err != nil {
		panic(err)
	}

	return 0
}

Predict

The LLAMA SDK exports two main methods,?Predict?and?Next. Let’s start by creating a prediction:

领英推荐

AWS re:Invent’23 Day 1- Launching New Innovations in…

CloudThat 1 年前

New Year, New You - Serverless 2025

Jason Smith 1 个月前

EKS : Elastic Kubernetes Service

Ashutosh Kumar Sah 4 年前

package lib

import (
	"github.com/taubyte/go-sdk/event"
  "github.com/samyfodil/taubyte-llama-satellite/sdk"
  "io"
)

//export predict
func predict(e event.Event) uint32 {
	h, err := e.HTTP()
	if err != nil {
		return 1
	}
	defer h.Body().Close()

	prompt, err := io.ReadAll(h.Body())
	if err != nil {
		panic(err)
	}

	p, err := sdk.Predict(
		string(prompt),
	)
	if err != nil {
		panic(err)
	}

	return 0
}

This code will submit a request for a prediction to the satellite, which will put it in a queue because predictions are resource-intensive (especially on the GPU), and return a?prediction.

Just like when interacting with any LLM, you can customize the request like so:

p, err := sdk.Predict(
  string(prompt),
  sdk.WithTopK(90),
  sdk.WithTopP(0.86),
  sdk.WithBatch(512),
)

You can find all possible options?here.

Get Tokens

After submitting a prediction to the satellite, you need to collect tokens. You can do so by calling?p.Next(), which will block until a new token is available or the prediction is completed or canceled. Note that you can use?NextWithTimeout?if you’d like to set a deadline.

Now, let’s wrap up our function:

package lib

import (
	"github.com/taubyte/go-sdk/event"
  "github.com/samyf

odil/taubyte-llama-satellite/sdk"
  "io"
)

//export predict
func predict(e event.Event) uint32 {
	h, err := e.HTTP()
	if err != nil {
		return 1
	}
	defer h.Body().Close()

	prompt, err := io.ReadAll(h.Body())
	if err != nil {
		panic(err)
	}

	p, err := sdk.Predict(
		string(prompt),
		sdk.WithTopK(90),
		sdk.WithTopP(0.86),
		sdk.WithBatch(512),
	)
	if err != nil {
		panic(err)
	}

	for {
		token, err := p.Next()
		if err == io.EOF {
			break
		} else if err != nil {
			panic(err)
		}
		h.Write([]byte(token))
		h.Flush() //flush
	}

	return 0
}

The call to?h.Flush()?will send the token to the client (browser) immediately. If you’d like to recreate the AI typing experience provided by ChatGPT, you can use something like:

await axios({
  method: "post",
  data: prompt,
  url: "<URL>",
  onDownloadProgress: (progressEvent) => {
    const chunk = progressEvent.currentTarget.responseText;
    gotToken(chunk);
  },
})

Conclusion

In this guide, we’ve walked through how to leverage the LLM (or GPT) capabilities provided by 8ws on a Taubyte-based Cloud Computing Network. We’ve explored the concept of a LLAMA Satellite and its role in exporting LLM capabilities to the Taubyte Virtual Machine. Furthermore, we’ve discussed the importance and functionality of the LLAMA SDK, which makes interacting with the Satellite’s low-level functions more intuitive.

We’ve gone through a practical example of how to use these tools in a Taubyte project, specifically demonstrating how to fetch tokens and use the Predict method. We’ve also shown how you can fine-tune your requests to the SDK and manage the tokens returned by the Satellite. By the end of the guide, you should be equipped to create a serverless function on Taubyte that can generate predictions from user-provided prompts, similar to how AI like ChatGPT works.

Harnessing the power of Taubyte and the LLAMA Satellite, you’re now ready to incorporate large language model capabilities into your projects, bringing a new level of interactivity and AI-driven responses to your applications.

If you’d like to see these tools in action, check out chat.keenl.ink, a practical implementation of the principles outlined in this guide. It’s a great demonstration of the interactive possibilities these technologies provide. Happy coding!

要查看或添加评论，请登录

Samy Fodil, PhD的更多文章

Building Ollama Cloud - Scaling Local Inference to the Cloud

2024年7月8日

Building Ollama Cloud - Scaling Local Inference to the Cloud

Ollama is primarily a wrapper around , designed for local inference tasks. It's not typically your first choice if…
Building a Cloud Development Kit (CDK)

2024年7月3日

Building a Cloud Development Kit (CDK)

What Exactly is a Cloud Development Kit (CDK)? Imagine you're a developer who needs to set up a bunch of cloud…

1 条评论
Navigating the Cost Frontier: Making Generative AI Affordable in the Cloud with Emerging Technologies

2023年11月10日

Navigating the Cost Frontier: Making Generative AI Affordable in the Cloud with Emerging Technologies

The advancement of cloud computing has ushered in the era of generative AI, capable of crafting everything from text to…
Navigating Rising Cloud Costs: Agile Strategies for Businesses

2023年11月7日

Navigating Rising Cloud Costs: Agile Strategies for Businesses

As the cloud computing landscape evolves, a significant concern for businesses is the upward trend in service costs…

1 条评论
Create a Serverless Project That Runs WebAssembly & Scales Horizontally (Edge-Native)

2023年8月17日

Create a Serverless Project That Runs WebAssembly & Scales Horizontally (Edge-Native)

Traditionally, getting something like this off the ground would require a lot of work, and probably a highly skilled…

See all articles

A Serverless Webassembly LLM (llama/gpt) Interface for Go

Samy Fodil, PhD

Founder of Taubyte (github.com/taubyte/tau)

LLAMA Satellite

LLAMA SDK

Get Ready

Let’s Code!

Our Basic Function

Predict

领英推荐

Get Tokens

Conclusion

Samy Fodil, PhD的更多文章

社区洞察

其他会员也浏览了

Week 15 (8 Apr - 14 Apr)

Modernising Docker on AWS Elastic Container Service (ECS)

Serverless - Is It All It’s Made Out to Be?

Leveraging Azure OpenAI with Terraform

The State of Serverless Functions

Serverless announcements at AWS re:Invent

Transcoding using FFMPEG in AWS Lambda

Forming Serverless Clouds with AWS: CloudFormation, SAM, CDK, Amplify

PaaS vs FaaS? Which should I run my microservices on?

LLAMA Satellite

LLAMA SDK

Get Ready

Let’s Code!

Our Basic Function

Predict

领英推荐

Get Tokens

Conclusion

Samy Fodil, PhD的更多文章

Building Ollama Cloud - Scaling Local Inference to the Cloud

Building a Cloud Development Kit (CDK)

Navigating the Cost Frontier: Making Generative AI Affordable in the Cloud with Emerging Technologies

Navigating Rising Cloud Costs: Agile Strategies for Businesses

Create a Serverless Project That Runs WebAssembly & Scales Horizontally (Edge-Native)

社区洞察

其他会员也浏览了

Week 15 (8 Apr - 14 Apr)

Modernising Docker on AWS Elastic Container Service (ECS)

Serverless - Is It All It’s Made Out to Be?

Leveraging Azure OpenAI with Terraform

The State of Serverless Functions

Serverless announcements at AWS re:Invent

Transcoding using FFMPEG in AWS Lambda

Forming Serverless Clouds with AWS: CloudFormation, SAM, CDK, Amplify

PaaS vs FaaS? Which should I run my microservices on?