课程: Introduction to Large Language Models

今天就学习课程吧!

今天就开通帐号,24,700 门业界名师课程任您挑!

Comparing LLMs

Comparing LLMs

- [Instructor] How do we even compare large language models? That's a great question. I don't think we have a perfect answer yet, but we have made great progress over the last few months. Usually, we only focus on how good a model is at a task, but we don't know if that same model generates false information. So instead of just looking at one metric, a Stanford University research team proposed HELM, or the Holistic Evaluation of Language Models. With HELM, the Stanford research team worked together with the main large language model providers, and they were able to benchmark the models across a variety of data sets and get a more holistic view of model performance. The HELM benchmark is a living benchmark and should change as new models are released. I'll just cover the first couple of benchmarks and you can explore the rest further if you're interested. So let me just go ahead and scroll down a little bit. So here, each row…

内容