登录查看更多内容

LLMs complexity

Esau Rodriguez Sicilia

Chief Technology Officer at Scope Better | Leading Tech Innovations | Ex Octopus, Ex Triller

发布日期: 2024年10月24日

I went to the University long ago, and then Neural Networks were very far from where they are now. During the last years, the computing capacity has increased so much that it's made it possible to create more complex networks, and this in turn led to more research done on the topic and really impressive results we are all accustomed to nowadays with GPT-4, Claude, etc. Right now the key players have created humongous networks that are incredibly useful.

I didn't learn much about NN at the university, so I've been trying to get up to speed recently. I normally learn by doing, but in this case, I had to understand the underlying concepts before I could start doing. I started watching the excellent content from 3Blue1Brown.

During the series, they cover the theoretical details with the right balance of precision and simplicity, which is really good. However, as I said before I personally tend to learn by doing rather than by reading, listening or watching. Lucky for me LLMs are a hot topic right now so it was easy to find good content with a more practical approach. The great content by Andrej Karpathy came to the rescue. The next video is where things are connected together for me.

I've been playing with NanoGPT since then. It's small enough, that you can run it on your laptop with either CPU or GPS, it can run in Apple Silicon using MLS. From doing this it became very apparent how costly is to train and run a NN for LLMs purposes. To achieve decent results you need several layers, several heads, and several iterations, and the number of calculations needed increases very quickly to the point that you need to train for a very long time even with this toy-like network.

I've been counting FLOPS ever since, and it's amazing to read researchers' suggested computer configuration for their local work at $10k or learning the price of one A100 is around that mark too, and companies like OpenAI are using hundreds if no thousands of them.

Another author I love content from is Shaw Talebi and in the next video, he quotes a paper estimating the computer power needed to train a model based on the number of parameters it uses.

领英推荐

GenAI Core Topics Explained in Simple Pictures

Vincent Granville 11 个月前

Can Hardware-Based AI Make Neural Networks More…

ChandraKumar R Pillai 2 个月前

What Is Computer Vision? Explanation, Types + Examples

Neil Sahota 2 年前

For context, ChatGPT-4o is estimated to have around 1.8 Trillion params so according to the table in the video it would require over 1.27e+26 Flops. The consumer laptop I'm currently using can do around 5 TFlops, that is 5e+12. Let's forget about memory for now. If I was to train a model like that on my computer it would take me 289 million days. There is much better-equipped hardware for this, but that hardware is not cheap. A company in the LLM space fighting for every bigger network will need an incredible amount of money to be able to train its networks.

A state-of-the-art modern H100 can do up to around 1000 TFlops so still many of them would be required to accomplish the training in a reasonable time each of which will cost over $40000.

Running an LLM is not cheap either, you might be able to use much more commodity hardware but memory requirements grow with the number of parameters the network has so even if you could run on CPU a lot of memory would be needed. You can use quantification, etc. but the requirements are still high.

Probably the most recognizable company in AI right now is OpenAI. They have been very successful at creating a lot of awareness and interest in the space and they are even monetising it quite well according to reports, but they are potentially in trouble. They are reportedly expected to lose $5b this year and they will need to raise to be able to keep operating in as little as 12 months. Building cool technology and great business is not always the same thing.

One could expect the Moore Law will come to the rescue and in a few years the computing capacity will increase so much that NN would be created with commodity hardware, the problem is that NN is growing quite significantly in size too, so the advantages on computing power will be likely eaten by an increasing number of parameters.

I do think we will need to start looking at making the networks not only bigger but also more efficient. How can we achieve the same quality output with a small number of parameters? That's the key for me.

I look into this as if we are currently living in the era of muscle cars, very big engines, not very efficient and over time we have cars with similar capacities in terms of acceleration and speed but with much smaller, efficient engines. There is work being done on this front. For instance the Liquid Neural Networks. They are task-specific but generate pretty interesting results.

Reuben Verghese

CEO, DIAGNAL

3 个月

Interesting. Thanks Esau Rodriguez Sicilia

要查看或添加评论，请登录

Esau Rodriguez Sicilia的更多文章

Got an idea?

2024年11月6日

Got an idea?

I've been working for many years in technology, and often, people come to me telling me about the great ideas they have…
The fine print: scaling up

2024年10月17日

The fine print: scaling up

At the very beginning of my career, I was lucky to gain exposure to scaling up systems. It was a different world back…

1 条评论
The startup team

2024年6月11日

The startup team

When I was a kid my dad used to play basketball in a local amateur league. Every Saturday during the season we went…
The startup MVP

2024年1月12日

The startup MVP

Everybody has ideas all the time, and many of them are good, but the reality is that ideas are worth very little. The…

1 条评论
Unicorns vs Centaurs

2023年12月4日

Unicorns vs Centaurs

A Unicorn is defined as a company with a $1b valuation while a Centaur is defined as a company with annual recurring…

2 条评论
The dangers of AI

2023年11月27日

The dangers of AI

This year we've seen lots of traction on the AI/ML front. This has made people start feeling afraid of the possible…
The 3 keys to starting a tech project

2023年6月30日

The 3 keys to starting a tech project

Starting a new tech project can feel a bit overwhelming. On the one hand, you have a lot of energy and desire to get…

1 条评论

See all articles

LLMs complexity

Esau Rodriguez Sicilia

Chief Technology Officer at Scope Better | Leading Tech Innovations | Ex Octopus, Ex Triller

领英推荐

Esau Rodriguez Sicilia的更多文章

社区洞察

其他会员也浏览了

Emergent behaviour: applying the AI paradigm shift to the built environment

PINN: A birthplace of Safe LLMs

#artificialintelligence #132:? LLMs and Graph neural networks - an overview and impact

What Does "Intel Inside" Really Mean? Answer: Big Data Powered-AI Corporation

Top 10 AI Concepts Every Scientific R&D Leader Should Know

Artificial Intelligence #43: How do AI chips / neural chips work

The Technological Singularity: An AI Expert’s Perspective on the Threshold of Superintelligence

The History of A.I.

Artificial Neural Networks Should Work Like the One in Our Head

Will quantum computers kill AI as we know it?

领英推荐

Esau Rodriguez Sicilia的更多文章

Got an idea?

The fine print: scaling up

The startup team

The startup MVP

Unicorns vs Centaurs

The dangers of AI

The 3 keys to starting a tech project

社区洞察

其他会员也浏览了

Emergent behaviour: applying the AI paradigm shift to the built environment

PINN: A birthplace of Safe LLMs

#artificialintelligence #132:? LLMs and Graph neural networks - an overview and impact

What Does "Intel Inside" Really Mean? Answer: Big Data Powered-AI Corporation

Top 10 AI Concepts Every Scientific R&D Leader Should Know

Artificial Intelligence #43: How do AI chips / neural chips work

The Technological Singularity: An AI Expert’s Perspective on the Threshold of Superintelligence

The History of A.I.

Artificial Neural Networks Should Work Like the One in Our Head

Will quantum computers kill AI as we know it?