Run AI Coding Assistants locally on your machine, without using the cloud, or leaking information
AI being creative. dreamshaper-8 created image using invoke.ai

Run AI Coding Assistants locally on your machine, without using the cloud, or leaking information

A step by step guide to set up AI coding assistants running without cloud (subscription)

<TL;DR>

With the availability of ollama and the Continue extension for VSCode/VSCodium and JetBrains, it is now possible to install and run state of the art coding assistants powered by Large Language Models. This is not only free of charge, it also addresses the data privacy issue, no code or data will leave your computer.

Introduction

Obviously, Generative AI (GenAI) and Large Language Models (LLMs) are all the hype these days. One of the most beneficial use cases for me are coding assistants. These are models trained on a wealth of source code addressing all kinds of problems, written in numerous programming languages.

I find this is where the possible lack of precision and accuracy dont't necessary play such a big role as I use these models to support coding repetitive tasks, like endless cascades of if/elseif/...else code blocks. Or a pattern I find myself doing a lot, like calling a RESTful interface.

The assistants make proposals as I type, and I will only concur with the AI if I feel comfortable with the proposal --- a classic example of human in the loop, the ultimate decision is with me, not the AI. I still use sites such as stackoverflow a lot, there often is some discussion or contextual information that helps me understand the question asked, or the solution proposed.

One of the concerns regarding GenAI is that these models are often running in the cloud, provided as a service with some cost, and a strange feeling about just what data may be transmitted to the AI provider. There is, of course, subscriptions which promise higher (or full) levels of privacy, but, wouldn't it be better to run these assistants locally, on your machine, with no connection to the cloud?

Not only would that save subscription cost (with the effort of getting a purchase order through a corporate purchasing system often being higher than the actual cost), it would solve the data privacy issue: You could work on a super secret, highly valuable software project without the concern of privacy leaks.

So, lets install a state of the art model, IBM's Granite, on your local machine, and connect it to a coding environment (VS Codium), all without any data privacy concerns. It should work on machines with 16 GB RAM, more is better.

Download and install ollama

Ollama is an application, available from https://ollama.com/ , which takes out coding effort otherwises required when downloading and executing open source (data) LLMs from e.g. huggingface. It is an easy to use tool, which offers a command line AI chat frontend, but, mainly an API which can be used to trigger responses from the LLM.

After installing ollama, we download two models which are baked into Continue. In a terminal, or command window, enter the following commands to download llama3 and starcoder2:3b :

% ollama pull starcoder2:3b         

This will download the starcoder model and associated files onto your machine.

ollama progress showing the download progress of the starcoder:2b model

Continue with the Llama3 model.

% ollama pull llama3        
ollama progress showing the download progress of the llama3 model

Be aware of the size of the downloads, 1.7 and 4.7 GB . This may take some time to complete (and will use disk space inside the .ollama folder found under your home folder).

Downloading the models before installing the extension avoids seeing warning and error messages during installation while the extension tries to connect to these models.

Download Continue Extension for VS Code/VSCodium

Me, I am using VSCodium, which is built off the VS Code code minus telemetry and Microsoft specific tweaks which are applied to the VS Code binary distribution. See https://vscodium.com/#why-does-this-exist for a description. It makes little sense to me to consider personal data privacy if, at the same time, the IDE transmits "something" "somewhere".

Both tools are identical and can use the same extensions. We will use Continue, which adds the interactive AI and tab completion functionality by connecting to the ollama package.

So, in VSCode or VSCodium (I use the term IDE from now on), select the extensions icon on the left (the four squares with the top right one floating), then enter Continue in the search field. Chose Install.

Searching and finding the Continue extension for VSCode/VSCodium

The Continue extension recommends dragging its icon to the right hand side of the window, this is a personal choice, why not, you can always change that later. After that, it asks you which model set up you want to use, a service in the cloud, or a local service. Chose "local models", then "continue".

Setting up Continue to use local models

After that, the assistant would guide you through installing ollama, we did this in our initial steps, and also already downloaded the default models. I found that not doing it that way may result in warning popups showing up until ollama setup has finished and the two models have been downloaded.

Checking ollama is installed and working and all initially required local models are downloaded

Click continue.

And you are all set!

What to do with that code assistant

Well, you now have a Generative AI, Large Language Model (GenAI LLM) on your machine, in your dev environment, ready to help you. Best of all, you did not have to pay anything for it, and, even better, it will not talk back to some anonymous cloud server which may, or may not, record what you do.

So, what can you do with it?

What it can do... Tab completion

Tab completion is my favourite feature. What it does, it scans through the vicinity of where you edit the code, and, based on what you type, make intelligent (and I mean intelligent) suggestions on what the code you are about to type could look like.

My favourtite feature... Tab completion while writing code

In the above demo, it understands that the record dictionary has a field datetime_date, so it suggest this field as a criterion for an if statement which loops through the key,value entries of that record.

This, to me is the sweet spot of coding assistants. A lot of code is following certain patterns with variations often only being the names of variables or fields. The logic is very similar in many, many cases.

Also, this is only a suggestion, if you type code that ends up in a different logic, tab completion will pick this up and may adapt to that logic.

What it can do... Write code for me

You can use the prompt to describe the desired functionality. Adding more context (like specifying this should be done in python and use command line arguments) results in a truly neat solution:

An interesting option to get started on a piece of functionality by asking the assistant for a starting point

You can continue asking questions as the model remembers the context, like asking for additional packages which need to be installed to run the above code.

A really useful follow-up functionality, which is sometimes overlooked in other answers found on the internet -- how can I make this work for me?

I found that the answers provided differ significantly between models and the task at hand.

What it can do... What does this code do?

You can select a piece of code, then, in the Continue window on the right, ask any question on that code, like, what does it do. If you trust the results, is is safe to use.

Explaining functionality of code

What it can do... Is this code safe?

This one is actually very tricky, the answers are not always correct, but this could be very, very helpful. The example code is vulnerable to what is called "SQL injection", one of the most commonly exploited vulnerabilities (although, in real applications, this would not be as obvious as in this example). The smaller Granite model (which we will install in a minute, scroll down for instructions) thinks the code was safe, which is not the case:

The 3 billion parameters model thinks this code is safe, which it isn't

The largest model, though, arrives at the correct solution and explains what SQL exploits could look like, and suggests a safe solution using SQLalchemy.

So, this is a really exciting feature, but --- be cautious, safe may not be safe. Using the large model may help, so may a variety of other models. Do not expect the AI assistants to always be correct when they state it was safe, but they may be able to spot a vulnerability you had forgotten.

Adding IBM Granite

You can switch coding assistant models, I was particularly interested in using IBM's recently announced, open source code models (https://research.ibm.com/blog/granite-code-models-open-source ). These are also available on github https://github.com/ibm-granite/granite-code-models and come in multiple flavours/model sizes, and capabilities, between 3 billion and 34 billion parameters, 2 GB to 20 GB file size. The large models will require fast machines with lots of memory, a Mac Studio M1 Ultra with 128 GB RAM does run the largest model in a usable fashion, but it feels like watching a ticker tape.

Download IBM Granite

Similar to what we did initially, download the model of your choice, by typing

% ollama pull granite-code        

in a command shell/terminal window. See https://ollama.com/library/granite-code for available model sizes. To load the largest model (around 20 GB), enter

% ollama pull granite-code:34b        

For that size model you will need a machine with lots of RAM, Granite 34B works on a 64 GB machine on my end.

Configure Continue to also offer IBM Granite

In the Continue window, click on the gear symbol at the lower right. This will open the extension's config file.

The default configuration of Continue

This file contains the model provider (we use ollama), the title for the model, and the model name (which you will need to pull using ollama), so we edit the file from its initial default, which contains the entries for Lllama3 and Starcoder2:

{
  "models": [
    {
      "title": "Llama 3",
      "provider": "ollama",
      "model": "llama3"
    },
    {
      "title": "Ollama",
      "provider": "ollama",
      "model": "AUTODETECT"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Starcoder 3b",
    "provider": "ollama",
    "model": "starcoder2:3b"
  },
  "allowAnonymousTelemetry": true,
  "embeddingsProvider": {
    "provider": "transformers.js"
  }
}        

Lets add the Granite model to the first section labelled "models" like so:

{
  "models": [
    {
      "title": "Llama 3",
      "provider": "ollama",
      "model": "llama3"
    },
    {
      "title": "Ollama",
      "provider": "ollama",
      "model": "AUTODETECT"
    },
    {
      "title": "IBM Granite 3b",
      "provider": "ollama",
      "model": "granite-code:3b"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Starcoder 3b",
    "provider": "ollama",
    "model": "starcoder2:3b"
  },
  "allowAnonymousTelemetry": false,
  "embeddingsProvider": {
    "provider": "transformers.js"
  }
}        

Make sure to add a comma at the end of the line next to the curly braces below the entry reading AUTODETECT. As we are concerned about privacy, consider changing "allowAnonymousTelemetry": true to read "allowAnonymousTelemetry": false . Continue.dev explain their telemetry use at https://docs.continue.dev/telemetry openly, you may consider leaving the feature on after you have satisfied yourself that no sensitive data would be transmitted.

After you saved the file, the extension window at the right, where a small up/down selection box is shown, contains the newly defined entry IBM Granite 3b:

Continue configured for IBM Graphite (and telemetry disabled)

..and you are good to go!

Summary

Data Democratization is close to my heart, and thanks to a thriving open source community who build tools like ollama and Continue, and IBMs policy, it is now possible to use LLM GenAI models locally, without subscription, without leaking sensitive internal IP. It also allows testing multiple models to find the one that suits your needs best.

Also, while I am not affiliated in any way with Continue.dev team, they also offer enterprise functionality which could be attractive for some users, including the ability to fine-tune the model on an internal code base.

Lastly, deploying a coding assistant allows you to also use it on a laptop while in airplane mode.

About Me

I love to work with data, get insights from it which I am able to share, and explainable machine learning/AI. I think that, currently, AI is really suited to help and inspire humans when used correctly.

Except for the title image and the coding assistant screenshots, I did not use AI to write this article.

Viktor Kravchenko

Team Lead - Systems Engineering Toolchain

5 个月

Thanks a lot Klaus Paul for this awesome tutorial, will try it out over the weekend! Guys, you may want to check it out Moritz W. Ernst Würger Dominik Lammers

要查看或添加评论,请登录

社区洞察

其他会员也浏览了