Article - The Rapidly Evolving Landscape of Large Language Models

Article - The Rapidly Evolving Landscape of Large Language Models

by Torsten Reidt

Torsten Reidt, member of our Cactai Team, delves in this article into the intricacies of the most prominent large language model (LLM), highlighting its pros and cons, complexity, costs, and deployment models for comprehensive understanding and analysis.

Large Language Models (LLMs) are transforming the business landscape, automating customer service, and unlocking predictive insights from vast datasets. As pioneers like Meta, OpenAI, and Google push the boundaries of AI innovation, the LLM landscape is evolving at an unprecedented pace. With breakthroughs and announcements emerging daily, staying ahead of the curve is essential.

In this post, we’ll look closer at three leading LLMs – Meta’s recently released LLama3 70B, OpenAI’s GPT-4, and MistralAI’s Mixtral 8x22B. We will be comparing their key facts, capabilities and limitations to bring new business, enhance customer engagement or transform the company operations.

Before we start with a short description of each considered LLM, let’s take a moment to familiarize ourselves with some features in the area of LLMs:

  • Open source: Free access, modification, and distribution of code/model, with collaborative development
  • Closed source: Proprietary code/ model, restricted access, and modifications only by the owner
  • Context window: refers to the maximum amount of input text that a model can process and consider when generating a response or making predictions.
  • Parameter: refers to the number of learnable variables (weights) of an LLM. In general, the number of parameters defines the size of the LLM. As of now, the more parameters a model has, the more capacity it has to learn and represent complex patterns in the data.
  • Function calling: refers to the ability of the LLM to detect if external tools (API calls, custom functions…) are necessary to fulfil the given task and ultimately call the needed external tools.
  • Open weight: refers to releasing only the pre-trained parameters or weights of the LLM itself. This allows others to use the model for inference and fine-tuning. However, the training code, original dataset, model architecture details, and training methodology are not provided.

With that out of the way, here is a short description of the contenders ??:

openAI GPT-4 (March 14, 2023)

GPT-4 is not the latest model available from openAI but has been chosen for the comparison due to data availability. The model can be accessed via the openAI API. There is no official information available concerning the model architecture or parameter count, but unofficial sources suggest a Mixture of Expert architecture (a combination of multiple specialized LLM) with a total of 1.8T parameters. GPT-4 is optimized for the use in English language but can ingest text in different languages and respond accordingly. It features a context window of 64k tokens.

mistralAI Mixtral 8x22B

The latest open-source model from Mistral AI. It is a sparse Mixture-of-Experts model with a total of 141B parameters. It features fluency in English, French, Italian, German and Spanish and has a context window of 64k tokens.

Meta Llama3 70B

As already mentioned, this is one of the latest models of the Meta Llama family. Equipped with a context window of 8k tokens using 70B parameters. 5% of the training data consisted of non-English data covering 30 languages, so certain multilingual capabilities are given.

How are Large Language Models evaluated?

Evaluating Large Language Models (LLMs) objectively can be a complex task, as it requires assessing their performance across various aspects. However, there are some common benchmarks which are often used to rank the models relative to each other.

  • MMLU (Massive Multi-task Language Understanding) assesses the multitask accuracy of the models.
  • AGIEval assesses the models in the context of human-centric standardized exams, such as math exams.
  • BIG-bench (Beyond the Imitation Game Benchmark), focuses on tasks that are believed to be beyond the capabilities of the current language model
  • ARC-Challenge (Abstraction and Reasoning Corpus) assesses human-like general fluid intelligence
  • DROP (Discrete Reasoning Over Paragraphs) assesses reading comprehension capabilities

Here is how the three models performed on the mentioned benchmarks:

So now that we have the numbers clear, we always want to choose the model which performs best on the majority of the benchmarks, right? Well, the answer to that is not that simple.

The use case is decisive

While a high score on benchmark tests can indicate how well a model generalizes to unseen data for a given task, your use case is the most important consideration for model selection. Imagine for example the following two points.

  • The task you want the model to perform well on is different than the benchmark
  • Your data is different to the data used in the benchmark, if you use a different language than English, for example.

In both cases, there might be other models available which could perform better for your problem setting. In addition to the performance-related criteria, other aspects are worth considering when choosing the LLM. Let’s revisit the three LLMs selected for this article, along with a practical example for each.

GPT-4:

  • Business case: e-commerce platform
  • Use case: A highly conversational and engaging customer chatbot in the English language.

Mixtral 8x22B:

  • Business case: Online translation platform that needs to support various languages, including some lesser-resourced languages.
  • Use case: Backbone for translation due to multilingual capabilities. Can be fine-tuned due to its open-source nature.

Llama3 70B:

  • Business case: Large-scale text classification system that needs to process millions of documents daily
  • Use case: LLama3 70B’s efficient architecture and optimized performance combined with the cost cost-effectiveness. Can be fine-tuned if needed.

Licensing and Cost [inference]

GPT-4 is a closed-source LLM and the use via the openAI API is bound to a fixed price per 1M token. Since LLama3 70B and Mixtral 8x22B are open-source models, there is no cost per token, the cost much rather depends on how the models are deployed. For comparison, deployment options based on price per 1M token have been selected.

Deployment

The choice between cloud and on-premises deployment for LLMs should be guided by the specific needs and capabilities of the organization, balancing factors like cost, control, scalability, and security. Each deployment option comes with its distinct advantages and challenges. This section applies to the models Mixtral 8x22B and Llama3 70B only as the openAI GPT-4 is a closed-source model.

Cloud based platforms

Deploying LLMs in the cloud involves utilizing the computational power and resources of a cloud service provider. This approach offers scalability, as businesses can easily adjust their usage based on demand without the need for upfront investment in physical hardware. Cloud deployment also ensures that updates and maintenance are managed by the provider, reducing the IT burden on the company. However, this model depends heavily on internet connectivity and can raise concerns regarding data security and privacy, as sensitive information is processed and stored off-site.

An LLM can be trained/deployed or hosted in several available options such as:

  • Amazon SageMaker
  • Google Cloud AI Platform
  • Microsoft Azure Machine Learning

The choice of which one to pick depends, among other things, on existing infrastructure or tools in your company. If you already use Amazon for other applications, maybe you don’t want to add a different provider for the deployment of the LLM. Other points to consider would be preferred frameworks or specific needs.

The cost for deployment depends on many factors such as availability or data volume. The driving cost factor however is the size of the chosen LLM which ultimately defines the hardware (GPU) needed.

A rough estimation for inference with the Meta Llama3 model, quantized to 4 bits (which means reduced in size with some performance losses) is about 5$ per hour on the Amazon SageMaker “ml.g4dn.12xlarge” instance. This instance provides 48GB of GPU memory and can be used for inference. For fine-tuning or training of the LLM, an instance with better performance should be used.

On-premises deployment

On-premises deployment involves setting up the LLM infrastructure within a company’s local environment. This approach gives organizations full control over their data, enhancing security and compliance with regulations, particularly critical for industries like healthcare and finance. On-premises solutions also allow for customization that might be necessary for specific organizational needs. The drawbacks include higher initial costs for hardware and infrastructure, as well as the need for ongoing maintenance and technical support, which can be resource-intensive.

The cost of a typical Deep Learning Workstation starts at a price of around €7,000. Such a workstation is equipped often with two consumer-grade GPUs, although it depends on the actual requirements and the purpose of the deployment (trained model? inference use?). However, it’s essential to also consider the software and overall configuration, as well as ongoing maintenance and upgrade needs, to ensure optimal performance.

Data Privacy and Security

Both data privacy (referring to the rights and governance surrounding personal data) and data security (referring to the measures and technologies used to protect data from unauthorized access, breaches, and theft) are foundational to building trust in technological systems. They require ongoing attention and adaptation to evolving threats and regulatory landscapes. Ensuring that both privacy and security are prioritized is essential for safeguarding the rights and interests of all stakeholders involved in the digital ecosystem. A deployed LLM should be treated as any other deployed application concerning unauthorized access, data breaches and cyber threats. In addition, the data privacy has to be looked at closely. Some providers use the user input for training purposes which could lead to unwanted data leakage.

Conclusion

The landscape of Large Language Models (LLMs) is rapidly evolving, with new functionalities emerging almost daily. Each model comes with its unique strengths and weaknesses. We have compared three prominent LLMs to illustrate key considerations for leveraging their powerful capabilities in driving digital transformation, enhancing customer experiences, and uncovering hidden insights. As the AI landscape advances, it’s evident that those who effectively utilize LLMs will gain a competitive edge.

At Cactus, our dedicated CactAI team is enthusiastic about exploring the optimal AI solutions tailored to your unique business needs, partnering with you to identify the most effective Large Language Model that aligns with your specific use case, thereby accelerating your business growth and enhancing your operational efficiency.

Let us help you harness the full potential of AI to drive innovation and achieve competitive advantages in your industry ??

@Cact#Cactai #BlogArticle #LLM, #Meta, 3Mistral,


Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

5 个月

Your insightful article by Torsten Reidt provides a comprehensive overview of the dynamic landscape surrounding large language models (LLMs), shedding light on their multifaceted nature and evolving applications. Drawing parallels with the historical progression of AI technologies, it's fascinating to observe how LLMs have rapidly transformed from theoretical concepts to indispensable tools driving innovation across various domains. However, amidst their potential, one might ponder the ethical implications and societal ramifications of widespread LLM adoption, considering past instances where technological advancements have both revolutionized industries and posed unforeseen challenges. Delving deeper into Torsten's exploration, a compelling question arises: How can we ensure responsible development and deployment of LLMs, fostering ethical AI practices while maximizing their transformative potential?

要查看或添加评论,请登录

Cactus的更多文章

社区洞察

其他会员也浏览了