登录查看更多内容

SanGuo GPT - Update 9/17/2023

Ping Zhou

AI Infra, LLM, Cloud & Edge, HW/SW co-design

发布日期: 2023年9月18日

Not much update recently, just a few minor changes.

Calculate perplexity in training and generation.

I started with perplexity score in training, but it turned out to be just the exp() of cross entropy loss. Not much useful as I'm already logging cross entropy losses.

Perplexity in generation is more interesting:

perplexity = exp(-1/N sum(log(p(w_i|w_1,...,w_{i-1}))))

So basically when I generate the text, I accumulate the log() of each generated token's probability, take a negative average at the end and do an exp().

Now it can report the perplexity after generating a sequence:

Loading model from checkpoints/sanguogpt-v0.1.pth
Loading token map file from c2i.json and i2c.json
Using mps device
30 tokens generated in 4.497 seconds, avg 6.671 tokens/sec.
Perplexity of generation: 3.5431
 禅位汉统恭王寻张鲁行七十余营

　　却说晋王司马炎奔入宫赴曹

Visualize the embeddings of the tokens

I added code in training loop to log the embedding tables periodically, so that I can see the progress along the way. I then project the embeddings onto 3-D space and visualize it in TensorBoard.

Embeddings at initial state, should be random.

I don't really see much difference between the two embedding tables - they both look pretty random to me. Maybe it's because I only tried 1000 steps?

Code repo is here, feel free to play with it: https://github.com/zhoupingjay/sanguo-gpt

要查看或添加评论，请登录

Ping Zhou的更多文章

LlamaPi Robot Updated with Llama-3.2

2024年10月2日

LlamaPi Robot Updated with Llama-3.2

In the previous post, LlamaPi Robot was backed by Llama-3.1 8B.

1 条评论
Building LlamaPi Robot - Challenges and Takeaways

2024年9月21日

Building LlamaPi Robot - Challenges and Takeaways

Introduction Recently I created a project LlamaPi that demonstrates Voice + LLM + Robotics on a low-power device…

2 条评论
Quantum Machine Learning - Getting Started with TensorFlow Quantum

2020年11月23日

Quantum Machine Learning - Getting Started with TensorFlow Quantum

Earlier this year, Google announced TensorFlow Quantum, a framework for building Hybrid Quantum-Classical Machine…
Alibaba Open Channel SSD!

2018年7月6日

Alibaba Open Channel SSD!
Alibaba committed to use Intel Optane SSD

2017年3月28日

Alibaba committed to use Intel Optane SSD

From academic research to product development, and to deployment in world-class infrastructure..
Still remember the time when I was asked to port U-boot for Apple, and how Intel missed the opportunity

2016年4月23日

Still remember the time when I was asked to port U-boot for Apple, and how Intel missed the opportunity

Saw this article online recently: Intel made a huge mistake 10 years ago. Now 12,000 workers are paying the price.

See all articles

SanGuo GPT - Update 9/17/2023

Ping Zhou

AI Infra, LLM, Cloud & Edge, HW/SW co-design

Ping Zhou的更多文章

社区洞察

其他会员也浏览了

Letters to the CEO: machine learning

Machines that dream: A brief introduction into developing artificial general intelligence through AI-Kindergarten

Can GPT4 help you learn anything?

Based on my definition of intelligence： the ability to gain information, how well could GPT4 do compared to GPT3.5? Ready to be blown away...

Stochastic Gradient Decent

LLaVA-o1, a Vision-Language Model with step-by-step reasoning.

A Less Technical Introduction to Machine Learning

What if you could push your AI models to be 10 times faster?

Understanding the Task Solving Mechanism in LLMs

Supervised vs Unsupervised learning

Ping Zhou的更多文章

LlamaPi Robot Updated with Llama-3.2

Building LlamaPi Robot - Challenges and Takeaways

Quantum Machine Learning - Getting Started with TensorFlow Quantum

Alibaba Open Channel SSD!

Alibaba committed to use Intel Optane SSD

Still remember the time when I was asked to port U-boot for Apple, and how Intel missed the opportunity

社区洞察

其他会员也浏览了

Letters to the CEO: machine learning

Machines that dream: A brief introduction into developing artificial general intelligence through AI-Kindergarten

Can GPT4 help you learn anything?

Based on my definition of intelligence： the ability to gain information, how well could GPT4 do compared to GPT3.5? Ready to be blown away...

Stochastic Gradient Decent

LLaVA-o1, a Vision-Language Model with step-by-step reasoning.

A Less Technical Introduction to Machine Learning

What if you could push your AI models to be 10 times faster?

Understanding the Task Solving Mechanism in LLMs

Supervised vs Unsupervised learning