Embracing Open LLMs

Embracing Open LLMs

TL;DR: As we enter the era of Large Language Models, the landscape is rapidly evolving with the two hyperscalers leading the way in developing and commercialising powerful LLMs. Undoubtedly, their primary goal is to extend these capabilities to other cloud platform-centric services to boost margins. However, the rise of open-source LLMs is poised to challenge the giants by offering greater control over data privacy, effectively addressing biases, ensuring offline access and enabling integrations without lock-in.

Premise

While proprietary LLMs, such as Google's latest Gemini and OpenAI's GPT4, maintain a dominant market position, the recent surge in open-source models is turning heads. Meta's release of Llama2 kickstarted this trend, with other players quickly following suit to keep pace. Notable examples include Mistral's recent bold move to share an 8x7b MoE model via a torrent link and the earlier Yi-34B, a bilingual LLM based on Llama2 architecture released by 01.ai, whose founder is none other than the ex-Apple/Microsoft/Google turned AI-focused VC Kai-Fu Lee.

As enterprises seek to leverage LLMs, they are now faced with a complex decision when it comes to integrating LLM technology into their operations. While hyperscaler offers comprehensive tools and services such as AWS Bedrock & Sagemaker, GCP Vertex AI, and Azure AI, there is an underlying tension regarding data privacy and control. Open-source LLMs present a compelling alternative, providing self-hosting capabilities, maintaining data sovereignty, extensive customisation, and, more importantly, preventing corporate data from inadvertently contributing to the training of competitive LLM models.

Outlook

Looking ahead, as LLM technology advances to power autonomous agents capable of real-life tasks (as endorsed by Bill Gates ??), integration with existing business services is set to explode. This will bring to the forefront critical security concerns that developers have been grappling with on a smaller scale, often manually managing functions and calls in isolation. The ability to self-host these AI technologies through Open LLMs amplify the benefits by providing control over data privacy, allowing full control over unrestricted integrations with business services, and arguably enhancing damage control by minimising the involvement of third parties. The autonomous agent tech is already here as part of the LangChain framework, starting with direct function calls, and there's no shortage of effort in the open-source community. Look at the growing list of AI agents listed?here or the most starred GitHub project in this domain. Other commercial offerings like Anyscale, which also leverage Open LLMs, are expected to pop up. It's a matter of time before an open standard/format for Agents is established to handle more sophisticated and complex business processes instead of merely asking today's weather, paving the road for mass adoption as today's Open LLMs have reached sufficient accuracy in understanding intent. The next frontier will undoubtedly involve a democratised Open LLM powered agent (LLM as orchestrator, Agent as Executor), and one cannot deny the fact that there exists a non-zero possibility of open source outcompeting Big Tech in this area. While I'm writing this, a sense of "Open Source is SO BACK" kicks in.

Effortless Consumption of Open LLMs

To illustrate the simplicity of running open-source LLMs, the following shows a quick 10-minute effort deploying your own self-hosted GPTs using all available open-source tools, without one delving into full-blown data science or mastering PyTorch and Transformers or tuning CUDA, although the powerful aspects are always hiding in the details. With this simple setup, you can carry it around during this holiday season and have your own GPT handy. Not only are you breaking away from the mainstream GPT users, you can also emphasize to friends and family that your questions never flow into proprietary LLMs, and you can be bold in asking anything as its data is contained.

OSS tools needed: Docker, ollama, ollama-webui, ngrok

The reproducible steps are as follow:

1. Deployment

  • Provision a Ubuntu Virtual Machine (I'm using Multipass on Apple MBP M2). As LLMs are resource-intensive, allocate the right amount of memory based on the chosen LLM. A rule of thumb is: for a 7b model, allocate at least 8GB of RAM; for a 13b model, allocate at least 16GB of RAM; for a 70b and above model, allocate at least 64GB of RAM.
  • If you are running a VM on Cloud, create a network security group to allow port 22 SSH from source IP; that's the only port you need to open in Firewall.
  • Install Docker:

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update

# Install the latest version:
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin        

  • Install Ollama:

sudo curl https://ollama.ai/install.sh | sh        

  • Modify the Ollama service bind address to 0.0.0.0:

sudo mkdir -p /etc/systemd/system/ollama.service.d

echo '[Service]' | sudo tee -a /etc/systemd/system/ollama.service.d/environment.conf

echo 'Environment="OLLAMA_HOST=0.0.0.0:11434"' | sudo tee -a /etc/systemd/system/ollama.service.d/environment.conf

sudo systemctl daemon-reload
sudo systemctl restart ollama        

  • Install ollama-webui using Docker:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway --name ollama-webui --restart always ghcr.io/ollama-webui/ollama-webui:main        

  • Register for an ngrok account, and obtain your ngrok Authtoken.
  • Install snap package manager, then install & configure ngrok:

sudo apt install snap
sudo snap install ngrok
sudo ngrok config add-authtoken <your-authtoken-here>        

  • Start ngrok tunnel. Expose port 3000 as it's used by ollama-webui. We can make use of the ngrok built-in security features, i.e. secure tunnel, https, authentication.Use OAuth for simple identity security. You can either allow the entire domain (so you can share the access with co-workers or families) or limit access to yourself.

ngrok http 3000 --oauth google --oauth-allow-domain your-domain.com

# or

ngrok http 3000 --oauth google --oauth-allow-email [email protected]        

ngrok                                                                                                                                      (Ctrl+C to quit)
                                                                                                                                                           
Build better APIs with ngrok. Early access: ngrok.com/early-access                                                                                         
                                                                                                                                                           
Session Status                online                                                                                                                       
Account                       [email protected] (Plan: Free)                                                                                          
Update                        update available (version 3.5.0, Ctrl-U to update)                                                                           
Version                       3.3.5                                                                                                                        
Region                        Japan (jp)                                                                                                                   
Latency                       8ms                                                                                                                          
Web Interface                 https://127.0.0.1:4040                                                                                                        
Forwarding                    https://3772-2400-8905-00-f03c-94ff-fe24-b6e.ngrok-free.app -> https://localhost:3000                                         
                                                                                                                                                           
Connections                   ttl     opn     rt1     rt5     p50     p90                                                                                  
                              25      0       0.00    0.02    0.19    6.64                                                                                 
                                                                                                                                                           
HTTP Requests                                                                                                                                              
-------------                                                                                                                                              
                                                                                                                                                           
POST /ollama/api/generate                                                                                                                                  
POST /ollama/api/chat                 200 OK                                                                                                               
GET  /user.png                        200 OK                                                                                                               
GET  /ollama/api/tags                 200 OK                                                                                                                       

2. Set up through Ollama WebUI

  • Access the newly set up Ollama UI, login with the OAuth prompt
  • Go to Settings > Models
  • Pull your preferred Open LLM from the extensive model catalogue available here. Here are some popular choices: llama2 (general purpose, released by Meta), mistral (general purpose for its 7b version), mixtral (general purpose 8x7B MoE), llava (tailored for vision assitance), yi (general bilingual LLM), codellama ( tuned for coding domains based on Llama2, released by Meta).
  • If none of these models meet your needs, visit HuggingFace to explore over 1,000 other open-source LLMs and import the GGUF into Ollama. Get ready to be amazed!

3. Interacting with Open LLM

Below are series of prompts for various LLMs.

  • Mistral 7b: Lazy prompt, lazy output

Mistral 7B

  • Mixtral 8x7B: Policy framing, anyone? LLM doing the best it can:

Mixtral 8x7B MoE, strongest open-weight model among all LLMs

Have to give credit to the nuances it can provide, i.e.: "Malaysia has abundant sunlight", "Evaluate the potential for wind energy in coastal areas, especially in Eastern Malaysia", "Collaborate with ASEAN neighbors to share best practices, technology, and resources in renewable energy development.", "These recommendations prioritize practical solutions that leverage Malaysia's natural resources and regional partnerships while addressing financial constraints.". This is more than ChatGPT 3.5 level.

  • Codellama: Decent code, nothing too impressive. The debugging skill is excellent

Codellama 7b

  • Yi:34b: the best so far in bilingual (Eng, Chi) LLM

Yi:34b-chat, available for commercial use

  • Let's experiment with a system prompt for Mixtral, you can configure this in Settings > General.The prompt I used:

SYSTEM """
You are the Santa Claus, acting as an assistant. You are to bring joy and happiness to everyone including children of all ages.
"""        
Mixtral 8x7B

There's obviously an element of confusion in the user prompt, the model didn't generate a list of Christmas-related snacks. Is that good? ????

  • One can also explore the Open LLM leaderboard on HuggingFace and try out the top-ranking models for a firsthand experience. There's no chance you won't be impressed by the development cycles and efforts to compete for the highest rank. Some models are genuinely exceptional, while others may aim to game the scoring system. It's our responsibility to identify the real gems and state-of-the-art models. Happy prompting!

Reflection

A week ago, I shared my spontaneous view that open-source LLMs have a more profound impact than proprietary LLMs:

This view is not solely based on the claim made by a Google researcher in a leaked memo ("We Have No Moat, And Neither Does OpenAI"), but rather on the idea that collective effectiveness and efficiency of thousands of decentralised teams working collaboratively in open-source communities towards a common goal. The synergistic power of such collaborative effort may surpass the capabilities of individual industry giants, with micro-level progress accumulating to create substantial advancements.

Putting aside its impact on enterprises, unrestricted and democratised Open LLMs have a profound effect on individual growth. With Open LLMs, one could have sped up their hobbyist projects 10x, i.e. smart homes or a personal health app supported by a trusted coding assistant, or even take on a bigger side project like improving crop yields for a family farm, all to be done without relying on commercial GPTs. The most explosive benefits emerge when the outcomes are intricately linked to physical activities , amplifying the transformative power of democratised Open LLM in shaping not just the digital landscape but also the very fabric of livelihood, community, and broader ecosystem.

With Open LLMs at everyone's disposal, builders will build faster, and this flywheel effect will only accelerate innovation in this space. I strongly believe there's never been a better time to build.

Wishing you a very happy festive season and Happy New Year!

Disclaimer

The views expressed herein are my own and do not reflect the official stance of any organization. The technology discussed is rapidly evolving, and as such, this article serves as a snapshot of the current landscape at the time of writing.





要查看或添加评论,请登录

陈伟杰的更多文章

  • Envisioning AI and Explain It To My Wife

    Envisioning AI and Explain It To My Wife

    I was lying on the couch reading an article about AI. It was a long read but (hate to say) I was constantly annoyed and…

  • Anatomy of Cloud Finance (Part 1)

    Anatomy of Cloud Finance (Part 1)

    On one hand, Cloud is conceptually no different than other IT evolution – reduce costs, improve performance, or…

  • A Simple Try on AWS Architectural Pattern

    A Simple Try on AWS Architectural Pattern

    In one fine weekend while having potluck at a friend's place, I was being asked to help a friend, an ITPro ScrumMaster…

    2 条评论
  • 3 Quick and Practical Ways To Secure Your Public Cloud

    3 Quick and Practical Ways To Secure Your Public Cloud

    As public cloud enters its second decade of growth with AWS dominating the market and others CSP (cloud service…

社区洞察

其他会员也浏览了