SanGuo GPT - Update 9/17/2023

Not much update recently, just a few minor changes.

  • Calculate perplexity in training and generation.

I started with perplexity score in training, but it turned out to be just the exp() of cross entropy loss. Not much useful as I'm already logging cross entropy losses.

Perplexity in generation is more interesting:

perplexity = exp(-1/N sum(log(p(w_i|w_1,...,w_{i-1}))))        

So basically when I generate the text, I accumulate the log() of each generated token's probability, take a negative average at the end and do an exp().

Now it can report the perplexity after generating a sequence:

Loading model from checkpoints/sanguogpt-v0.1.pth
Loading token map file from c2i.json and i2c.json
Using mps device
30 tokens generated in 4.497 seconds, avg 6.671 tokens/sec.
Perplexity of generation: 3.5431
 禅位汉统恭王寻张鲁行七十余营

  却说晋王司马炎奔入宫赴曹        

  • Visualize the embeddings of the tokens

I added code in training loop to log the embedding tables periodically, so that I can see the progress along the way. I then project the embeddings onto 3-D space and visualize it in TensorBoard.

Embeddings at initial state, should be random.
Embeddings after 999 steps.

I don't really see much difference between the two embedding tables - they both look pretty random to me. Maybe it's because I only tried 1000 steps?

Code repo is here, feel free to play with it: https://github.com/zhoupingjay/sanguo-gpt


要查看或添加评论,请登录

Ping Zhou的更多文章

社区洞察

其他会员也浏览了