Meta Llama 3: A Deep Dive & Zuck's Long Game
Llama = Large Language Model from Meta AI, or a cyber-llama like this one :)

Meta Llama 3: A Deep Dive & Zuck's Long Game


Anyone following my work for some time would know I was not a Facebook and Mark Zuckerberg fan long before the topic became trendy. However, after spending some hours in the past few days deep dive into Meta's Llama 3, I must admit I'm impressed by its technical and the company's long-term play in the global race of AI.

Llama 3

Meta's Llama 3 is the latest iteration of the company's advanced artificial intelligence model. It is a language model trained on a diverse range of open data. Llama 3 is a transformer-based model, which means it uses self-attention mechanisms to generate human-like text based on the input it receives. This release features pre-trained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases. Currently, Llama 3 is only in English, but in the coming months, Meta will release more than 30 languages and 400B parameters (still in training) to enable a fully multilingual, multimodality LLM.

Llama 3 is designed to understand and generate human language in a way that is as natural and accurate as possible. It can answer questions, write essays, summarize text, translate languages, and generate creative content like poetry or stories. It's also capable of following a conversation or a line of thought over multiple exchanges, making it a powerful tool for chatbots and virtual assistants.

The key areas I'm most impressed with are:

  • Performance
  • Emphasis on efficiency instead of size
  • Safety, security, responsibility (who? Meta??)
  • Most importantly - it is Open Source!

Let's break them down.

Performance

OpenAI's GPT-4, or Generative Pretrained Transformer 4, is arguably the most advanced large language model before Llama 3 enters the room. Like Llama 3, GPT-4 is a transformer-based model that uses machine learning to generate human-like text. Both models are capable of impressive feats of natural language understanding and generation. However, the key difference between the two is size vs efficiency. OpenAI, so far, has been pursuing larger and larger models with more and more parameters, which certainly aligns with NVIDIA's interest. I've been saying that you don't go to Mars by building taller and taller buildings on Earth. Like other things in life, bigger doesn't always mean better. Llama 3 managed to pack so much more knowledge with improved reasoning (hard to do) into a dense 8B and 70B model. While everyone else is scaling sparse MoEs (Model of Experts), Meta worked on efficiency and effectiveness. Improvements in the post-training procedures substantially reduced false refusal rates, improved alignment, and increased diversity in model responses. The dev community also saw improved capabilities like reasoning, code generation, and instruction following, making Llama 3 more steerable. <Github Evalution Details>

In contrast, GPT-3 had more than 175B parameters. While OpenAI never disclosed the parameters for GPT-4, it is widely speculated to have more than 1 trillion parameters—massive implications for GPU price and GPU producers' share prices.

Source: Meta

This is not to say that being GPU-rich isn't important anymore. I wish we were heading in that direction, but not yet. Still, Llama 3 demonstrated that organizations can be GPU-rich but utilize them inefficiently by obsessing over size instead of innovating on efficiency.

Safety, Security, Responsibility

Typing these three words about Facebook's holding group slightly disorientates me. Perhaps Yann LeCun is leading a different culture in the AI arm of the organization, or Zuckerberg is finally growing up? Either way, Meta has consistently made the right decisions on its AI development. Llama 3 has made considerable efforts in the safety, security, and responsible use of the open-source model. How to handle ethical and safety considerations is now a differentiator between OpenAI, which is supposed to be open, and Meta. Meta and OpenAI have tried to ensure their models are used responsibly, but their approaches differ. OpenAI has implemented measures to prevent GPT-4 from generating harmful or inappropriate content, but it's a proprietary model with no disclosure of details of the model and data used for training. We would have to trust the bros on the top to do the right thing. Llama 3 is open source, so Meta has focused on making Llama 3 understand and respect user boundaries from the system level. Along with Llama 3, Meta also released Llama Guard 2, Code Shield, and CyberSec Eval 2. In addition, your data doesn't have to leave your machine - privacy protection from Meta, if one can imagine it.

Source: Meta

<Paper: CyberSecEval2 >

< Meta Llama 3 Responsible Use Guide (RUG) >

Llama 3 Jailbreaks aren't as prevalent as GPT-4 (and definitely MUCH better than Gemini), yet Meta achieved that while reducing censorship compared to Llama 2 to reduce false refusal.

Source: Ollama <


Zuck's Long Game

On the surface, Meta's strategy for making Llama 3 open source is part of the company's broader commitment to transparency and collaboration in the field of artificial intelligence. By making Llama 3 open source, Meta aims to allow researchers and developers worldwide to use and improve upon their model.

Open-sourcing Llama 3 also allows for greater scrutiny of the model. This means that the wider AI community can identify and address any potential biases in the model. It also allows for the development of new applications and use cases for the model that the original developers may not have anticipated.

Beneath the surface, making Llama 3 open-source is a win-win strategy for Meta. Llama 3's performance and efficiency make it hard to justify using GPT-4. Even if OpenAI releases GPT-5, there is a high chance it plateaus as the largeness of the model produces diminishing returns. Moreover, Llama 3 400B is still in training, and the performance might be within the ballpark if it tracks the 8B and 70B trend. By making Llama 3 open-source, Zuck might have killed OpenAI. OpenAI has about $2b in revenue and is most likely loss-making. Meta makes over $100b in gross profit and can outspend on talent and compute by a factor of 10 at least. Further, few human beings on this planet understand the importance of building a large, sticky online community than Mark Zuckerberg. Based on his track record, everyone should be skeptical when Zuck offers you something for free. For now, the winners are application developers and API hosts.

Centralized AI

Meta's Llama 3 represents a significant leap forward in AI language models. The decision to make Llama 3 open-source is the right decision for advancing the field in general, but I can't say this unequivocally, given Facebook and the founder's history. However, the advent of such powerful models also raises important questions about the centralization of AI. While open-source models like Llama 3 are a step in the right direction, the control and development of these models are still largely in the hands of a few tech giants.

Frank Herbert, the author of Dune, one of my two favorite Sci-Fi book series of all time, wrote this in 1960s, and it's even more relevant today:

“Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them.”


Useful links:

<get started > <Github: Model Details > <Meta 24k GPU Cluster > < Meta Llama 3 Responsible Use Guide (RUG) > <Paper: CyberSecEval2 >


Joseph A S.

Onboarding cloud native and AI workload on Ampere ARM-native Infrastructure

6 个月

?"Llama 3 makes generative AI accessible. It’s a very big deal. Llama democratizes generative AI.” https://www.dhirubhai.net/posts/joez280_a-fireside-chat-between-jensen-huang-and-activity-7189148890279854080-cLgW

Henri Hagenow

CEO AIME GmbH, GPU-Cloud, Machine Learning, AI Consulting | UX-Expert

7 个月

You can now deploy your own Llama 3 and operate it with the new AIME API Server! https://www.dhirubhai.net/feed/update/urn:li:activity:7189255261956579328

回复

Great article Jen. Always impressed how you can simplify the most complex things and save me and others time.

Mokena Makeka

Curator/Director of the Civic Projects Lab | Special Advisor to the Vice President

7 个月

What strikes me is the speed/rate of change in this field-which suggests that static or even historic models of understanding are ill equipped to model the future potential of this area. The field is so young, we don’t even have benefit of meaningful hindsight or data points to convincingly shape future trajectories. There are however a number of humanistic reasons why we must all still remain curious and engaged. (Politicians, academics, civil society)For example, how the digital divide continues to accelerate- what or who are the casualties of war and the new territories their successes lay claim to? These are some of the questions I have, and they excite me beyond measure. Thank you for keeping your finger on the pulse of innovation.?"The world is changing very fast. Big will not beat small anymore. It will be the fast beating the slow" — Rupert Murdoch.

Adrien Ramelet

Formateur IA Générative et outils numériques - Directeur d’agence Dynabuy

7 个月

The quality of the images generated is very far from Midjourney but objectively I am impressed by the speed of execution of the prompts! The text generation is very fast and the source links at the end of the answer are great I think.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了