Using Groq from Semantic Kernel
András Velvárt
AI Tinkerer. Head of AI at HexaIO. AR/VR, HoloLens Consultant. 17 times Microsoft MVP. CEO at Response Ltd. Pluralsight and LinkedIn Learning Author, International Speaker.
What is Groq?
Groq (www.groq.com) is an engine to run Large Language Models with insane speeds. Meaning, the speed the AI generates text is 20-50 times faster than with the traditional approach of using dedicated GPUs in a data center.
Groq achieves this by using a combination of streamlined hardware and special software. You can learn more about how and why it works on their explanation page - but for now, it is enough to know that Groq can be insanely fast (and therefore efficient and cheap).
Groq can run a number of open source AI models, such as
What is Semantic Kernel?
Semantic Kernel is Microsoft's approach to an AI middleware, and it is the best way to create real-world, production and enterprise-ready AI applications in C#. The Python and Java modules are also being developed.
The Problem
Semantic Kernel supports language models running on OpenAI's Services or Azure OpenAI. It does not explicitly provide support for any other service, such as Groq.
The solution
Fortunately, Groq offers an OpenAI compatible service endpoint at the base URL
领英推荐
https://api.groq.com/openai/v1
If only we could trick Semantic Kernel to use this endpoint instead of the standard OpenAI endpoints, we should be good. There are some limitations compared to the full OpenAI / Azure OpenAI Service, but the basics should work.
The Code
Semantic Kernel doesn't allow us to change the base URL of the service it uses - but it does allow injecting a custom HttpClient that it will use for the requests. So, if we can somehow hijack the Http calls and change the URL's, we should be fine.
We can do this by creating the HttpClient with a custom delegate handler, and using that for the OpenAI ChatCompletion service:
HttpClient httpClient = new(new CustomDelegatingHandler());
kernelBuilder.AddOpenAIChatCompletion("llama3-70b-8192", key, httpClient: httpClient);
The actial CustomDelegateHandler looks like this:
public class CustomDelegatingHandler() : DelegatingHandler(new HttpClientHandler())
{
protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
request.RequestUri = new Uri(request.RequestUri.ToString().Replace("https://api.openai.com/v1", "https://api.groq.com/openai/v1"));
return await base.SendAsync(request, cancellationToken);
}
}
(The key here is that we intercept any http request, and replace the "api.openai.com/v1" with "api.groq.com/openai/v1" in the url).
If you want to see a more complex example, I have published a complete chat sample with Semantic Kernel and Groq at https://gist.github.com/vbandi/c598232952729a1828374fb76943cfcd