登录查看更多内容

New Open-source LLM: Google Gemma

Mannan Bhardwaj

AI Researcher | Independent Software Consultant

发布日期: 2024年3月2日

Intro

Google has recently released their newest open-source AI models: Gemma 2b and 7b. These are competitors to other open source models like Llama-2 or Mistral, and seem to be superior when compared via benchmarks. Google also made sure to provide many options for deploying these models, including a special partnership with Nvidia.

Google released two different sized models: A 2B and 7B parameter version. From benchmarks, Gemma beats Llama-2 13B with their 7B. Overall, Gemma seems to be much more superior when compared to their similar parameter count counterparts.

Deployment

Unlike Llama and Mistral, however, Google seems to be focusing a lot more on the deployment of these models, whether it’s through the cloud or locally. It is already available on Google Cloud via Vertex AI or Kubernetes. There are also kaggle and colab notebooks pre-made with a quickstart to Gemma.

When it comes to local deployment, Google teamed up with Nvidia to create the “chat with RTX” app. This would allow users of Nvidia’s 30 or 40 series GPUs to easily run Gemma locally. However, it's also available through TenorRT-LLM and the NeMo framework if you want to run it on Nvidia GPUs.

领英推荐

Regulating AI, Nvidia and Apple make everyone take…

Hindustan Times 1 年前

Nvidia’s Reign is Far From Over: It's only getting…

Tarry Singh 1 年前

?? NVIDIA's AI advantage

Azeem Azhar 1 年前

Using Gemma

As mentioned before, there are a couple ways to actually use Gemma. Since its such a tiny model, I think its most practical use is going to be local use. This is extremely easy to do with HuggingFace’s API

from transformers import AutoTokenizer, AutoModelForCausalLM, GPTQConfig
import torch, time




tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it",  torch_dtype=torch.bfloat16)


input_text = "Write me a poem"
input_ids = tokenizer(input_text, return_tensors="pt")#.to("cuda")
t1 = time.time()
outputs = model.generate(**input_ids, max_length=200)
t2 = time.time()
print(tokenizer.decode(outputs[0]))
print("Time: ", t2-t1, "s")
print("Time per token: ", (t2-t1)/len(outputs[0]), "s/token")
print("Tokens per second: ", len(outputs[0])/(t2-t1), "token/s")

On a CPU, the speed came out to be around 2 tokens per second and only took 2 gigabytes of ram. This will obviously vary with whatever type of hardware you may have.?

Another great option is to grab a software like LM studio that sets all this stuff up for you. It provides a clean front end and uses the Llama.cpp project on the backend. Llama.cpp is an incredibly optimized library for running large language models. It supports quantized models, a variety of different LLMs, and CPU-GPU interop.

Conclusion

Google has made some groundbreaking excursions into the AI field. They were already at the forefront of AI research, so it's natural that they are coming to the forefront of commercial AI as well. There has been, however, many controversies with their image generation. Companies like Google have been paying very close attention to bias within their training data in order to avoid that bias reflecting in the LLM. However, their anti-bias measures ended up backfiring when their models started generating images of black nazis and other inaccurate representations. Although eliminating bias is a noble cause, these kinds of historical inaccuracies are practically racist. Especially with open source models, this is going to be an increasingly important issue that these kinds of companies need to address.

Jitendra Chauhan

CEO & Co-Founder at Detoxio, Detox your GenAI

11 个月

I have developed a Kaggle notebook to Learn TPU v3.8 + Kaggle + LLM Red Teaming For 20 Hours / Week Free. Running Models on TPUs are super fast!!! Try out the link & share - https://www.kaggle.com/code/jaycneo/gemma-tpu-llm-red-teaming-notebook-detoxio-ai/

要查看或添加评论，请登录

Mannan Bhardwaj的更多文章

TinyR1: Recreating DeepSeek R1 at Home!

2025年2月6日

TinyR1: Recreating DeepSeek R1 at Home!

OpenAI O1 pushed the frontier of what is possible with LLMs by tuning an LLM to create chains of reasoning using…

2 条评论
Man VS Machine—A Battle Of Intelligence

2024年11月16日

Man VS Machine—A Battle Of Intelligence

Invented by Warren McCulloch and Walter Pitts, the MucColluoch-Pitts Neuron, more popularly known as the Perceptron…

3 条评论
Agentic AI

2024年6月11日

Agentic AI

One of the most interesting use cases for LLMs is its use in autonomous agents. LLMs by themselves are great for…

1 条评论
ChatGPT is obsolete

2024年4月28日

ChatGPT is obsolete

Whether its for general use, autonomous agents, or creating fine-tuned chatbots, OpenAI has been at the forefront of…
Mixture Of Experts: The Future of LLMs

2024年2月17日

Mixture Of Experts: The Future of LLMs

Intro What made GPT3.5 and GPT4 completely destroy all the competition? Since “Open”AI’s closed source models make it…

4 条评论
Learning to Learn

2023年12月29日

Learning to Learn

Intro As we enter the Age of Information, a new resource has arisen: Data. Every website you visited, everything you…

1 条评论
Virtual Cloning

2023年11月7日

Virtual Cloning

Intro Throughout one’s life, a person can create a significant impact on the internet. Every like, every post, every…

1 条评论
A Journey Through Neural Compression

2023年9月24日

A Journey Through Neural Compression

Introduction At this point of the game, I feel like neural networks have been sort of black boxed, and for good reason.…
Prompting LLMs with LLMs

2023年8月1日

Prompting LLMs with LLMs

Introduction Prompt engineering is the process of creating a prompt such that the LLM knows exactly what to do and how…
Unlocking the true power of LLMs with Vector Embeddings

2023年7月3日

Unlocking the true power of LLMs with Vector Embeddings

What is a Vector Embedding and why is it important? In order to understand the power of vector embeddings, you need to…

3 条评论

See all articles

New Open-source LLM: Google Gemma

Mannan Bhardwaj

AI Researcher | Independent Software Consultant

Intro

Deployment

领英推荐

Using Gemma

Conclusion

Mannan Bhardwaj的更多文章

社区洞察

其他会员也浏览了

Latest Updates: 36K NVIDIA GB200 GPU Cluster, New FLUX Tools, and Qwen2.5-Coder

Could this trillion-dollar AI giant steal Nvidia's throne?

The Intelligent Industrial Revolution

NVIDIA’s Journey from Graphic Cards to Powering AI Revolution

Nvidia's powerful strategy: Full AI Orchestration

Nvidia's Leap into AI's Future: Shaping a New Era for Industries Worldwide

At the Crossroads: From DeepSeek V3 FP8 to Nvidia Blackwell GB200NVL72 FP4

Nvidia Advances GenAI Adoption

Apple vs. NVIDIA: Is Cupertino Secretly Building a GPU Empire for AI?

Text to 3D, NVIDIA Giveaway and more!

Intro

Deployment

领英推荐

Using Gemma

Conclusion

Mannan Bhardwaj的更多文章

TinyR1: Recreating DeepSeek R1 at Home!

Man VS Machine—A Battle Of Intelligence

Agentic AI

ChatGPT is obsolete

Mixture Of Experts: The Future of LLMs

Learning to Learn

Virtual Cloning

A Journey Through Neural Compression

Prompting LLMs with LLMs

Unlocking the true power of LLMs with Vector Embeddings

社区洞察

其他会员也浏览了

Latest Updates: 36K NVIDIA GB200 GPU Cluster, New FLUX Tools, and Qwen2.5-Coder

Could this trillion-dollar AI giant steal Nvidia's throne?

The Intelligent Industrial Revolution

NVIDIA’s Journey from Graphic Cards to Powering AI Revolution

Nvidia's powerful strategy: Full AI Orchestration

Nvidia's Leap into AI's Future: Shaping a New Era for Industries Worldwide

At the Crossroads: From DeepSeek V3 FP8 to Nvidia Blackwell GB200NVL72 FP4

Nvidia Advances GenAI Adoption

Apple vs. NVIDIA: Is Cupertino Secretly Building a GPU Empire for AI?

Text to 3D, NVIDIA Giveaway and more!