When LLMs are Super Confident ?? ?

When LLMs are Super Confident ?? ?

OpenAI’s Prover-Verifier Games bring this same level of experience to LLM outputs. This innovative technique ensures that LLM results are not only accurate but also easily understandable, enhancing trust and usability.

OpenAI recently said that it is using the Prover-Verifier Games technique to enhance the legibility of LLM outputs, particularly for grade-school math problems, by training smaller verifiers to judge the correctness of solutions, thus making the outputs more understandable for humans.?

In other words, the OpenAI paper shows that teaching a small LLMs to double-check the work of a bigger LLMs is like having a diligent student explain their math homework to a tutor, ensuring the solutions are clear and easy to understand for everyone.

"One way to increase confidence in the outputs of LLMs is to support them with reasoning that is clear and easy to check — a property we call legibility," said OpenAI, saying that this makes complex AI outputs more trustworthy and comprehensible.

Agree to Disagree?

The next time you see more options in your output when using ChatGPT (powered by GPT-4o or advanced models), remember that OpenAI is playing games with you (RLHF) or using what you may call Prover-Verifier Games.?

“Our algorithm iteratively trains small verifiers to predict solution correctness, 'helpful' provers to produce correct solutions that the verifier accepts, and 'sneaky' provers to produce incorrect solutions that fool the verifier," shared OpenAI, highlighting its innovative training method, inspired by the Prover-Verifier Game, which aims to improve both the robustness of the verifier and the clarity of the solutions generated by the LLM.?

OpenAI said Prover-Verifier enhances the legibility of AI outputs, thus aiding human oversight.?

Otherwise, a model focusing solely on correctness may lead to complex and unintelligible solutions, underscoring the need for such methods that balance accuracy with clarity.?

Researchers from Microsoft Research in Bangalore recently revealed that current LLM compression methods, like quantisation, can alter model behaviour undetected by traditional accuracy metrics, proposing the use of KL-Divergence and flips metrics for a more comprehensive evaluation, in its recent paper titled ‘Accuracy is Not All You Need’.

Works in progress

Implementing Prover-Verifier Games comes with challenges and limitations. OpenAI said that the technique relies on having a dataset with known correct answers, which may not always be available, particularly in more complex or less well-defined domains.?

Ergo, the partnerships. Last month, TIME and OpenAI entered a multi-year strategic partnership to integrate the former’s journalism with the latter’s products like ChatGPT, expanding global access to reliable information, following similar collaborations with Le Monde, Prisa Media, Vox Media, The Atlantic, and News Corp.

Dataset diversity and difficulty. It is important to note that the empirical study was conducted by OpenAI on a relatively simple dataset (grade-school math problems), which may not fully capture the challenges of applying this method to more complex or diverse datasets.?

Initialisation with human-written math derivations. OpenAI said that the initial high performance of the prover might be due to its pre-training on human-written math data, which may not be representative of more generalised AI applications.

Lastly, OpenAI also said that training the AI system to produce legible solutions might limit its performance, and suggested an alternative approach, which could use separating the solution-generation process from the explanation process to avoid this limitation.?

In another update, Google DeepMind introduced FLAMe, a family of foundational autorater models that outperform existing proprietary models in quality assessment tasks, trained on 5 million human judgments, and designed to reduce the challenges and costs of human evaluation of LLM outputs.

AWS, recently, unveiled AuditLLM, a novel tool for auditing large language models using a multi-probe approach, designed to streamline and provide a comprehensive audit trail.

What’s next??

OpenAI believes that this new technique shows promising results for establishing trust in LLM outputs “even if they become more capable than humans in the future”.?

Further, it stated that the company will continue to inspire future work on semi-supervised/unsupervised setups for improving human judge legibility with few or no ground truth labels.??

Read the full story here.?

要查看或添加评论,请登录

AIM Research的更多文章

社区洞察

其他会员也浏览了