登录查看更多内容

LLM Parameters explained

Tushar Jain

Transformational Servant Leader | Innovating at the Intersection of Tech, Strategy & Operations Driving Success by Effective & Performance Orientation | Passionate about Transforming Ideas into Company Success

发布日期: 2025年2月24日

Anyone reading about LLM encounters one term regularly – parameters.? For example, Grok-1 has 314B parameters while ChatGPT-4 has 1.8T.?

We are obsessed with bigger is better which may not be entirely true as shown by DeepSeek. DeepSeek-V3 has only 671B parameters which has outperformed ChatGPT-4 in many aspects.

Before jumping to LLM, let’s try to understand the basics.

The term model is from the realm of statistics; where a model is just a mathematical function describing data.

Say you want to predict some variable y depending on another variable x with a linear model. In this case, you would write your model as

????????????????????????????????????????????y = a * x + b, with two parameters - a and b

???????????????????and an input x.

In layman’s language, a and b are two dials that can be adjusted to predict the value of y for a given input x.

ML models are essentially the same with one minor change. a and b parameters morph into vectors to encapsulate the complexity.? For example, a vector representing “Apple” will encapsulate the concept of fruit, its similarity with other fruits, its differences with other fruits, types of apples, Apple as an iPhone maker, Apple as stock listed on the stock exchange, etc.? I hope you get the picture.

Any practical ML model will consist of several such linear equations and with many parameters and each parameter is a multi-dimensional vector.

y1 = a11 x1 + a12 x2 + a13 * x3 +? …? ? + b1

y2 = a21 x1 + a22 x2 + a23 * x3 +? …? ? + b2

…..

yn = an1 x1 + an2 x2 + an3 * x3 +? …? ? + bn

But here is the central question, what is a parameter in the LLM context?

领英推荐

Artificial Intelligence #169

Andriy Burkov 1 年前

Netcompany Snippets #1

Netcompany 1 年前

Trapped in Threads

Sam Hilsman 3 个月前

Most of the literature across the internet says that parameters are dials and levers that fine-tune the model’s understanding and generation (text, picture, audio, and video). This statement does not specify which dials and levers. Like any complex machine, LLMs have a variety of classes of dials and levers.? Consider a simple machine – a car. It has dials and levers to manipulate speed, lights, temp control, seat adjustment, etc. Similarly, parameters in LLM symbolize a class of dials and levers. The parameters in a LLM include two classes:

Weights – Technically speaking a “weight” specifies the multiplication factor for input to a neuron, coming out of another neuron from the previous neural layer.

For non-technical persons, weight is one of the parameters to specify the importance of the connection between one neuron to another out of billions.?

Collectively Weights essentially symbolize knowledge (sic!) embedded in a model.

Biases – It is a term that?guides?the learning algorithm (e.g. gradient descent with back-propagation) towards a?specific?set of solutions. The bias term in the linear regression model is a way of biasing the learning algorithm: one assumes that the straight-line function does not necessarily go through zero, and this assumption affects the type of functions that the model can learn.

Biases act as starting points, guiding the model’s interpretation before training.

More parameters, more power?

Increase in the number of parameters, generally speaking, a more powerful model to handle more complex tasks. Simplistically speaking, more number parameters signifies more number of neurons and/or more number of layers. However, there are architectural approaches available that break this cozy relationship. For example, the Mixture of Expert (MoE) approach tumbles this applecart. Also, the size of the embedding vector during training, the amount of data used in training, and focused training data are other factors that impact LLM’s power to resolve complex contextual problems.

The increasing number of parameters also impacts power consumption, computational resources, and efficiency (time taken to respond to a prompt).??

Conclusion

The parameter count for an LLM is like the size of your toolbox, the bigger the toolbox the better you are equipped to handle any challenge.

If your business proposition is to generate customization configuration of warehouse robots, why do you care that Apple is a fruit or Apple is the iPhone maker?

Saurabh Sharma

Technical Program Manager | CSM | Integration Middleware, Knowledge Management, Web-based Enterprise Applications

1 周

Nice read!

Garima Jain

Product Manager I ex-Salesforce I Stanford I SAFe Agile Practitioner | Driving Strategy & Innovation in Cross-Functional Product Development

2 周

Insightful!

Boby Jacob

CEO at ROVIS Management Solutions FZ LLE

3 周

Insightful..

查看更多评论

要查看或添加评论，请登录

Tushar Jain的更多文章

The Evolution of Technology in Expressing Human Thoughts

2025年3月10日

The Evolution of Technology in Expressing Human Thoughts

The evolution of technology has profoundly transformed how humans express and communicate their thoughts, progressing…
DeepSeek: A Disruptor in AI & Large Language Models

2025年1月30日

DeepSeek: A Disruptor in AI & Large Language Models

What is DeepSeek? DeepSeek is making waves in the AI landscape, challenging industry giants with its innovative…

4 条评论
What is Artificial Intelligence (AI)

2024年11月20日

What is Artificial Intelligence (AI)

Traditional Computer Systems vs. AI Systems Before defining Artificial Intelligence (AI), it is helpful to understand…

1 条评论
Data: A Modern Resource with Unique Power Beyond the Oil Analogy

2024年11月5日

Data: A Modern Resource with Unique Power Beyond the Oil Analogy

The comparison between data and oil has become a familiar talking point in discussions about the digital economy. While…
Should we fear AI?

2024年1月2日

Should we fear AI?

Upon conducting an analysis encompassing social media, writable internet platforms (including blogs, podcasts, vlogs…
Baloney Detection Kit for an Agilest

2020年6月25日

Baloney Detection Kit for an Agilest

Baloney Detection kit for Agilest is a set of cognitive tools and techniques that fortify the mind against falsehoods…
Four keys for Leaders to during a crisis

2020年6月4日

Four keys for Leaders to during a crisis

In any crisis especially which has a significant component of uncertainty – people want leaders to provide the…
Artificial Intelligence for a Middle Schooler

2020年2月20日

Artificial Intelligence for a Middle Schooler

A few days back my middle schooler asked what Artificial Intelligence is. At that moment I realized, how difficult to…
Scrum: Scrum Guide vs. SAFe

2020年2月13日

Scrum: Scrum Guide vs. SAFe

Recently, Den Sunny published a write up on the difference between Scrum Guide’s Scrum and SAFe’s Scrum. In the end…
Are efficiency and productivity good enough?

2019年12月24日

Are efficiency and productivity good enough?

Recently, I was at Newark airport, terminal C. From the terminal, only one airline operates United Airlines.

See all articles

LLM Parameters explained

Tushar Jain

Transformational Servant Leader | Innovating at the Intersection of Tech, Strategy & Operations Driving Success by Effective & Performance Orientation | Passionate about Transforming Ideas into Company Success

领英推荐

Tushar Jain的更多文章

社区洞察

其他会员也浏览了

How to Export Your ChatGPT Data, and Those of Other AIs

How to Use ChatGPT for Financial Forecasting

#172: Memory Overload: When LLMs Know Too Much!

#6: CXO Conundrums: Build own LLM or leverage external LLMs(ChatGPT/BARD) or better still integrate both- external and internal?

I’m Chat-GPT and No, I’m not getting stupider

December 2024 News

Is your data the key to AI's future? Google's Gemini might think so.

Are all AIs the same?

ChatGPT - A master at data comprehension, a pretty good analyst and an entry level Data Scientist

领英推荐

Tushar Jain的更多文章

The Evolution of Technology in Expressing Human Thoughts

DeepSeek: A Disruptor in AI & Large Language Models

What is Artificial Intelligence (AI)

Data: A Modern Resource with Unique Power Beyond the Oil Analogy

Should we fear AI?

Baloney Detection Kit for an Agilest

Four keys for Leaders to during a crisis

Artificial Intelligence for a Middle Schooler

Scrum: Scrum Guide vs. SAFe

Are efficiency and productivity good enough?

社区洞察

其他会员也浏览了

How to Export Your ChatGPT Data, and Those of Other AIs

How to Use ChatGPT for Financial Forecasting

#172: Memory Overload: When LLMs Know Too Much!

#6: CXO Conundrums: Build own LLM or leverage external LLMs(ChatGPT/BARD) or better still integrate both- external and internal?

I’m Chat-GPT and No, I’m not getting stupider

December 2024 News

Is your data the key to AI's future? Google's Gemini might think so.

Are all AIs the same?

ChatGPT - A master at data comprehension, a pretty good analyst and an entry level Data Scientist