LLM Parameters explained

LLM Parameters explained

Anyone reading about LLM encounters one term regularly – parameters.? For example, Grok-1 has 314B parameters while ChatGPT-4 has 1.8T.?

We are obsessed with bigger is better which may not be entirely true as shown by DeepSeek. DeepSeek-V3 has only 671B parameters which has outperformed ChatGPT-4 in many aspects.

Before jumping to LLM, let’s try to understand the basics.

The term model is from the realm of statistics; where a model is just a mathematical function describing data.

Say you want to predict some variable y depending on another variable x with a linear model. In this case, you would write your model as

????????????????????????????????????????????y = a * x + b, with two parameters - a and b

???????????????????and an input x.

In layman’s language, a and b are two dials that can be adjusted to predict the value of y for a given input x.

ML models are essentially the same with one minor change. a and b parameters morph into vectors to encapsulate the complexity.? For example, a vector representing “Apple” will encapsulate the concept of fruit, its similarity with other fruits, its differences with other fruits, types of apples, Apple as an iPhone maker, Apple as stock listed on the stock exchange, etc.? I hope you get the picture.

Any practical ML model will consist of several such linear equations and with many parameters and each parameter is a multi-dimensional vector.

y1 = a11 x1 + a12 x2 + a13 * x3 +? …? ? + b1

y2 = a21 x1 + a22 x2 + a23 * x3 +? …? ? + b2

…..

yn = an1 x1 + an2 x2 + an3 * x3 +? …? ? + bn

But here is the central question, what is a parameter in the LLM context?

Most of the literature across the internet says that parameters are dials and levers that fine-tune the model’s understanding and generation (text, picture, audio, and video). This statement does not specify which dials and levers. Like any complex machine, LLMs have a variety of classes of dials and levers.? Consider a simple machine – a car. It has dials and levers to manipulate speed, lights, temp control, seat adjustment, etc. Similarly, parameters in LLM symbolize a class of dials and levers. The parameters in a LLM include two classes:

Weights – Technically speaking a “weight” specifies the multiplication factor for input to a neuron, coming out of another neuron from the previous neural layer.


For non-technical persons, weight is one of the parameters to specify the importance of the connection between one neuron to another out of billions.?

Collectively Weights essentially symbolize knowledge (sic!) embedded in a model.

Biases – It is a term that?guides?the learning algorithm (e.g. gradient descent with back-propagation) towards a?specific?set of solutions. The bias term in the linear regression model is a way of biasing the learning algorithm: one assumes that the straight-line function does not necessarily go through zero, and this assumption affects the type of functions that the model can learn.

Biases act as starting points, guiding the model’s interpretation before training.

More parameters, more power?

Increase in the number of parameters, generally speaking, a more powerful model to handle more complex tasks. Simplistically speaking, more number parameters signifies more number of neurons and/or more number of layers. However, there are architectural approaches available that break this cozy relationship. For example, the Mixture of Expert (MoE) approach tumbles this applecart. Also, the size of the embedding vector during training, the amount of data used in training, and focused training data are other factors that impact LLM’s power to resolve complex contextual problems.

The increasing number of parameters also impacts power consumption, computational resources, and efficiency (time taken to respond to a prompt).??

Conclusion

The parameter count for an LLM is like the size of your toolbox, the bigger the toolbox the better you are equipped to handle any challenge.

If your business proposition is to generate customization configuration of warehouse robots, why do you care that Apple is a fruit or Apple is the iPhone maker?

Saurabh Sharma

Technical Program Manager | CSM | Integration Middleware, Knowledge Management, Web-based Enterprise Applications

1 周

Nice read!

回复
Garima Jain

Product Manager I ex-Salesforce I Stanford I SAFe Agile Practitioner | Driving Strategy & Innovation in Cross-Functional Product Development

2 周

Insightful!

回复
Boby Jacob

CEO at ROVIS Management Solutions FZ LLE

3 周

Insightful..

回复

要查看或添加评论,请登录

Tushar Jain的更多文章

  • The Evolution of Technology in Expressing Human Thoughts

    The Evolution of Technology in Expressing Human Thoughts

    The evolution of technology has profoundly transformed how humans express and communicate their thoughts, progressing…

  • DeepSeek: A Disruptor in AI & Large Language Models

    DeepSeek: A Disruptor in AI & Large Language Models

    What is DeepSeek? DeepSeek is making waves in the AI landscape, challenging industry giants with its innovative…

    4 条评论
  • What is Artificial Intelligence (AI)

    What is Artificial Intelligence (AI)

    Traditional Computer Systems vs. AI Systems Before defining Artificial Intelligence (AI), it is helpful to understand…

    1 条评论
  • Data: A Modern Resource with Unique Power Beyond the Oil Analogy

    Data: A Modern Resource with Unique Power Beyond the Oil Analogy

    The comparison between data and oil has become a familiar talking point in discussions about the digital economy. While…

  • Should we fear AI?

    Should we fear AI?

    Upon conducting an analysis encompassing social media, writable internet platforms (including blogs, podcasts, vlogs…

  • Baloney Detection Kit for an Agilest

    Baloney Detection Kit for an Agilest

    Baloney Detection kit for Agilest is a set of cognitive tools and techniques that fortify the mind against falsehoods…

  • Four keys for Leaders to during a crisis

    Four keys for Leaders to during a crisis

    In any crisis especially which has a significant component of uncertainty – people want leaders to provide the…

  • Artificial Intelligence for a Middle Schooler

    Artificial Intelligence for a Middle Schooler

    A few days back my middle schooler asked what Artificial Intelligence is. At that moment I realized, how difficult to…

  • Scrum: Scrum Guide vs. SAFe

    Scrum: Scrum Guide vs. SAFe

    Recently, Den Sunny published a write up on the difference between Scrum Guide’s Scrum and SAFe’s Scrum. In the end…

  • Are efficiency and productivity good enough?

    Are efficiency and productivity good enough?

    Recently, I was at Newark airport, terminal C. From the terminal, only one airline operates United Airlines.

社区洞察

其他会员也浏览了