Today I am interested in the AI generative battle between major technology companies like Google, Meta, OpenAI, Baidu, and Microsoft. With the ChatGPT craze showing no signs of cooling down, just yesterday, Meta announced a new model called LLaMA.
According to Mark Zuckerberg's interpretation, "LLaMA is a high-performance open-source language model from META AI - FAIR" (Meta AI, 2023). It is an acronym for "Large Language Model Meta AI" and will be licensed non-commercially to researchers and organizations affiliated with governments, civil society, and academia (Malik & Paul, 2023). This model was trained on a large publicly available dataset without using any exclusive and inaccessible datasets (in over 20 languages with Latin and Kirin alphabets) with billions of tokens.
From my research, LLaMA (Touvron et al., 2023) is a collection of foundational language models ranging from 7 billion to 65 billion parameters (Meta AI, 2023). META explained that "the 13 billion parameter LLaMA version could perform better than GPT-3, the predecessor to the large language model used to develop the ChatGPT chatbot. The 65 billion parameter LLaMA version could 'compete' with Google's Chinchilla70B and PaLM-540B models, which are even larger than the model Google used to introduce the Bard chatbot recently" (Malik & Paul, 2023).
If we mention Large Language Model (LLM), we know that it is "a deep learning model trained on large amounts of text data to predict the next word in a sentence or a paragraph" (Dilmegani, 2023), with advantages and disadvantages such as:
- Natural Language Generation: This allows LLM to have the ability to generate natural language with high accuracy. It helps improve performance and speed in language-related tasks such as machine translation, sentiment analysis, and text summarization...
- Multitasking: An LLM model is trained to perform various tasks from sentiment analysis to machine translation or voice recognition. Therefore, it can be widely applied and useful in many different fields.
- High accuracy: LLM is designed based on new model architectures such as Transformer, so LLM often achieves higher accuracy than traditional models.
- Requires large amounts of training data: As mentioned above, the LLM model requires a large amount of data for training. Without enough data, the model may not achieve the highest effectiveness and may lead to overfitting.
- High computational costs: Training the LLM model requires large computing resources, including both hardware and software. This can be a significant barrier for researchers and small to medium-sized businesses with limited budgets.
- Difficult to understand and prone to failure: The LLM model is susceptible to failure due to weaknesses such as "forgetting" and "catastrophic interference." This can lead to inaccurate results in different situations. Additionally, the LLM model has the ability to generate text with incorrect or inappropriate content if there is not enough information in the training data, or if the training data is deficient or biased.
However, according to the spokesperson for META, the LLaMA model surpasses many competitors "thanks to "cleaner" data and "structural improvements" in its model, which helps improve the stability of the training process."
- Malik, Y & Paul, K. (2023, Feb 25). Meta heats up Big Tech's AI arms race with new language model. Reuters. Retrieved on 2023, Feb 26, from https://www.reuters.com/.../meta-launch-ai-language.../
- Introducing LLaMA: A foundational, 65-billion-parameter large language model. (2023, Feb 24). Meta AI. Retrieved on 2023, Feb 26, from https://ai.facebook.com/.../large-language-model-llama.../
- LLaMA: Open and Efficient Foundation Language Models. (2023, Feb 24). Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. Meta Research. Retrieved on 2023, Feb 26, from https://research.facebook.com/.../llama-open-and.../
- Dilmegani, C. (2023, Feb 16). Large Language Model Training in 2023. AI Multiple. Retrieved on 2023, Feb 26, from https://research.aimultiple.com/large-language-model.../