LLM Parameters explained
Tushar Jain
Transformational Servant Leader | Innovating at the Intersection of Tech, Strategy & Operations Driving Success by Effective & Performance Orientation | Passionate about Transforming Ideas into Company Success
Anyone reading about LLM encounters one term regularly – parameters.? For example, Grok-1 has 314B parameters while ChatGPT-4 has 1.8T.?
We are obsessed with bigger is better which may not be entirely true as shown by DeepSeek. DeepSeek-V3 has only 671B parameters which has outperformed ChatGPT-4 in many aspects.
Before jumping to LLM, let’s try to understand the basics.
The term model is from the realm of statistics; where a model is just a mathematical function describing data.
Say you want to predict some variable y depending on another variable x with a linear model. In this case, you would write your model as
????????????????????????????????????????????y = a * x + b, with two parameters - a and b
???????????????????and an input x.
In layman’s language, a and b are two dials that can be adjusted to predict the value of y for a given input x.
ML models are essentially the same with one minor change. a and b parameters morph into vectors to encapsulate the complexity.? For example, a vector representing “Apple” will encapsulate the concept of fruit, its similarity with other fruits, its differences with other fruits, types of apples, Apple as an iPhone maker, Apple as stock listed on the stock exchange, etc.? I hope you get the picture.
Any practical ML model will consist of several such linear equations and with many parameters and each parameter is a multi-dimensional vector.
y1 = a11 x1 + a12 x2 + a13 * x3 +? …? ? + b1
y2 = a21 x1 + a22 x2 + a23 * x3 +? …? ? + b2
…..
yn = an1 x1 + an2 x2 + an3 * x3 +? …? ? + bn
But here is the central question, what is a parameter in the LLM context?
Most of the literature across the internet says that parameters are dials and levers that fine-tune the model’s understanding and generation (text, picture, audio, and video). This statement does not specify which dials and levers. Like any complex machine, LLMs have a variety of classes of dials and levers.? Consider a simple machine – a car. It has dials and levers to manipulate speed, lights, temp control, seat adjustment, etc. Similarly, parameters in LLM symbolize a class of dials and levers. The parameters in a LLM include two classes:
Weights – Technically speaking a “weight” specifies the multiplication factor for input to a neuron, coming out of another neuron from the previous neural layer.
For non-technical persons, weight is one of the parameters to specify the importance of the connection between one neuron to another out of billions.?
Collectively Weights essentially symbolize knowledge (sic!) embedded in a model.
Biases – It is a term that?guides?the learning algorithm (e.g. gradient descent with back-propagation) towards a?specific?set of solutions. The bias term in the linear regression model is a way of biasing the learning algorithm: one assumes that the straight-line function does not necessarily go through zero, and this assumption affects the type of functions that the model can learn.
Biases act as starting points, guiding the model’s interpretation before training.
More parameters, more power?
Increase in the number of parameters, generally speaking, a more powerful model to handle more complex tasks. Simplistically speaking, more number parameters signifies more number of neurons and/or more number of layers. However, there are architectural approaches available that break this cozy relationship. For example, the Mixture of Expert (MoE) approach tumbles this applecart. Also, the size of the embedding vector during training, the amount of data used in training, and focused training data are other factors that impact LLM’s power to resolve complex contextual problems.
The increasing number of parameters also impacts power consumption, computational resources, and efficiency (time taken to respond to a prompt).??
Conclusion
The parameter count for an LLM is like the size of your toolbox, the bigger the toolbox the better you are equipped to handle any challenge.
If your business proposition is to generate customization configuration of warehouse robots, why do you care that Apple is a fruit or Apple is the iPhone maker?
Technical Program Manager | CSM | Integration Middleware, Knowledge Management, Web-based Enterprise Applications
1 周Nice read!
Product Manager I ex-Salesforce I Stanford I SAFe Agile Practitioner | Driving Strategy & Innovation in Cross-Functional Product Development
2 周Insightful!
CEO at ROVIS Management Solutions FZ LLE
3 周Insightful..