Explore the Future with Gen AI: Your Weekly Passport to Innovation!

Explore the Future with Gen AI: Your Weekly Passport to Innovation!

We are back with another exciting edition, ready to dive into the fascinating world of GenAI, and the future-shaping tech trends.

LLMs as Reasoning Agent

Large Language Models (LLMs) have garnered significant attention for their remarkable performance across various natural language understanding tasks. However, their abilities as reasoning agents have come under scrutiny due to a surprising phenomenon known as the "Reversal Curse."

The concept of the "Reversal Curse" in large language models (LLMs), is a phenomenon where these models fail to generalise from "A is B" to "B is A" statements. This failure of generalisation is surprising and highlights a fundamental flaw in the logical deduction capabilities of LLMs.

Example:

For instance, if an LLM is trained on the sentence "Olaf Scholz was the ninth Chancellor of Germany", it will not automatically be able to answer the question "Who was the ninth Chancellor of Germany?" correctly.

This is because the model does not generalise the learned information in the reversed direction. The likelihood of the model providing the correct answer is not higher than for a random name, indicating a lack of understanding of the relationship between the entities involved. According to research published by Microsoft and Allen Institute a few years ago about a strong correlation between the accuracy for a particular number and its frequency in pre-training

To understand this phenomenon an experiment was conducted on real-world knowledge, using a dataset of celebrity questions, where the models are required to identify the reverse relationships between celebrities and their parents. The model's performance on this task is also poor, indicating that the Reversal Curse is not limited to synthetic facts but also extends to real-world knowledge.

Findings:

  1. These experiments were conducted on various LLMs, including GPT-3, Llama-1, and Llama-7B. The results are consistent across different model sizes and families, suggesting that the Reversal Curse is robust.
  2. The statistical analysis, including paired t-tests and Kolmogorov-Smirnov tests, was used to compare the log probabilities of the correct name and a random name. The results of these tests further support the existence of the Reversal Curse.

Proposed Solution:

  1. Leveraging human feedback on reasoning errors can substantially enhance the reasoning abilities of LLMs.
  2. By curating datasets with fine-grained error corrections, the models' performance and calibration on challenging multi-hop reasoning can be improved, addressing the challenge of nonsensical explanations and building trust with users.

Future work:

The Reversal Curse raises important questions about the generalisation abilities of LLMs. Despite their impressive performance on many tasks, these models seem to struggle with basic logical deduction. They have deduced that memorization can offset the need to do reasoning as a first principle.

Future research should focus on understanding the underlying mechanisms of the Reversal Curse and exploring ways to mitigate its effects. This could potentially lead to the development of more effective and reliable language models.

Sources: https://aws.amazon.com/blogs/machine-learning/improve-multi-hop-reasoning-in-llms-by-learning-from-rich-human-feedback/

https://www.dhirubhai.net/pulse/towards-agi-improving-llms-reasoning-nazar-trilisky/

https://arxiv.org/abs/2308.09267https://github.com/atfortes/LLM-Reasoning-Papers

https://bootcamp.uxdesign.cc/improving-the-reasoning-capability-of-llms-diverse-7468eea3499b

https://medium.com/aiguys/paper-review-llm-reversal-curse-41545faf15f4


LLM for Legal Reasoning

Much like judges follow a structured process to render judgments, there's a machine learning model called Legal Judgment Prediction (LJP) that predicts legal case outcomes. LJP uses case-specific factors like crime type, victim-defendant relationship, and evidence to predict the defendant's guilt or innocence, offering a computational approach akin to how judges make decisions.

However, it's important to recognize that machine learning models have their limitations in the legal context.

  • It provides the defendant's guilt without providing explanations.
  • It needs to be trained on a large amount of datasets provided by experts.

In addressing these limitations, Large language models (LLMs) show complex reasoning abilities in legal judgment prediction. In lieu of ML models (LJP), LLMs predicting judgement along with law articles and justification significantly enhance the explainability of models. But simply inferring LLMs (GPT-3 in this case) with zero short prompting is not found to be sufficient. It is important to teach LLMs to generate responses with intermediate reasoning steps with only a few examples.

Solution and Methodology

Legal Syllogism Prompting: Teaching Large Language Models for Legal Judgement Prediction

  • Legal Syllogism Prompting (LoT) emerges as a valuable tool for enhancing the capabilities of large language models (LLMs) in legal judgment prediction.
  • The dataset used for this purpose is the Chinese AI and Law challenge dataset (CAIL2018), encompassing cases with factual descriptions and legal judgments, consisting of law articles, charges, and prison terms.

Let's delve into the evaluation of different prompting techniques with LLM:

Zero-Shot Prompting Without Chain of Thought

  • This method directly requires the model to output the charges.
  • This baseline model predicted whether the defendant is guilty or not with an accuracy of 30%.

Chain-of-Thought (CoT) Prompting

  • CoT incorporates the prompt, "Let us think step by step," to guide the model's thought process.
  • CoT is a technique employed by LLMs to generate a logical chain of thought leading to the answer to a given question.
  • Surprisingly, the accuracy of this model was found to be lower than that of the baseline model. We think this is because the intermediate reasoning steps generated by Zero-shot CoT do not conform to legal reasoning.

Large Language Model (LoT) Prompting

LoT, when provided with a legal syllogism prompt and without any learning or fine-tuning, surpasses the performance of the other two methods on the CAIL2018 sampled dataset as it allows LLM to access and use Legal syllogism knowledge more effectively.

Following were the advantages achieved:

  • Sensitivity: LoT is sensitive to the context and the legal aspects of a case.
  • Explainability: LoT provides explanations that are more consistent with legal reasoning.
  • Selectivity: LoT's major premise helps in selectivity by ensuring that only relevant legal facts and principles are considered in the reasoning process.
  • Generality: LoT's mechanism allows it to provide predictions for both law articles and judgments within the same framework.

Limitation of LoT

  • Human Feedback: Though Legal syllogism involves deductive reasoning from premise to conclusion, practical reasoning, where judges interpret law and reconstruct facts, is crucial. Research on LLMs' practical reasoning abilities is essential.
  • Context Length: The LLMs have a limit on the length of tokens. If it outputs too many tokens in the intermediate steps, models may not be able to give the entire output.

Source: https://arxiv.org/pdf/2307.08321.pdf


Open Source LLMs vs. Proprietary LLMs

Folks interested in developing Generative AI tools have two primary choices when deciding what to build upon open-source or private large language models (LLMs).

  • Chatbots like OpenAI’s ChatGPT and Google’s Bard are built using private and proprietary LLMs, or systems that are trained on large amounts of data that learn to generate text. Examples: GPT3.5, GPT 4.0
  • On the other hand, open source is computer code that can be freely used and modified by anyone on the internet. Open-source LLMs allow developers to download that code and fine-tune the models for specific tasks using their own data. Examples - Llama 2, Falcon-40B

Considering the pros and cons for both, the choice of LLMs can depend on performance, business needs and priorities:

  1. Time to market: Advisable to initially use proprietary LLMs (OpenAI etc.)
  2. Performance: Proprietary LLMs are better at certain tasks than others like translating languages, and generating text.
  3. Use case: Proprietary LLMs are easy to use and deploy, for text classification, question-answering LLMs are required with no infrastructure and are ready to use.
  4. Style tuning: Open sources LLMs can be fine-tuned to propriety styles and language tones.
  5. Cost-effective: open source LLMs are affordable as compared to open AI.
  6. Data security: open source LLM can be deployed on-premises giving more control over your data and security.

Source: https://www.dhirubhai.net/pulse/llm-economics-which-cheaper-deploy-open-source-llms-openai-nawaz


ChatGPT: A Technical Journey

ChatGPT has evolved from a research prototype to a versatile and powerful language model with the ability to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way, all while connected to the internet for real-time access to information.

ChatGPT's technical journey has been marked by the following milestones:

  • November 2022: Research preview released, showcasing ChatGPT's ability to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.
  • March 2023: ChatGPT upgraded to GPT-4, a large language model with 100 trillion parameters that can generate more realistic and coherent text.
  • September 2023: OpenAI enabled ChatGPT to access real-world information through Google Search and introduced ChatGPT powered by GPT-4, equipped with enhanced text generation, translation, creative writing, question answering, and voice/image processing capabilities.

Take a closer look at the infographic highlighting ChatGPT's evolution into a versatile and powerful language model.

Sources: https://techcrunch.com/2023/09/28/chatgpt-everything-to-know-about-the-ai-chatbot/

https://help.openai.com/en/articles/6825453-chatgpt-release-notes#h_026e1e86a4


Recent Business News

Novo Nordisk and Valo Health to Use AI to Develop Treatments for Cardiometabolic Diseases

Novo Nordisk partners with Valo Health (privately held U.S. health tech Company), investing $60 million upfront and up to $2.7 billion based on milestones, to harness AI for cardiometabolic disease treatment. This collaboration signifies AI's potential to advance drug development.

Vice Health and Wellness Using AI to Change the Way We Lose Weight and Manage Addiction

Vice Health and Wellness Inc. (Vice) is leveraging AI and nutraceutical technology to develop "Vice Versa AI," an AI-driven platform aimed at providing personalized and sustainable weight loss and obesity management solutions. The AI app aims to revolutionize weight loss by tailoring programs to individual users based on data, enhancing adherence and success rates. The global weight loss and management market is poised for significant growth, making Vice's AI solutions strategically positioned in this expanding sector.

Amazon moves into healthcare generative AI with $4B investment

Amazon is investing $4 billion in Anthropic, an AI company, to bolster its healthcare-focused generative AI efforts. Amazon plans to incorporate Anthropic's AI assistant, Claude, into Amazon Bedrock, enhancing drug development and healthcare services. This move escalates competition in the healthcare AI sector, with Microsoft also making substantial investments in the field.

SAP Announces New Generative AI Assistant Joule

SAP is integrating Joule, an AI system, across its cloud enterprise portfolio to provide contextual insights. This AI technology enhances productivity and business outcomes securely. Joule will be integrated into various SAP applications and platforms, improving user experiences by offering intelligent responses to questions or problems in plain language. It will first be available with SAP SuccessFactors and SAP S/4HANA Cloud. Joule aligns with SAP's broader strategy for an enterprise AI ecosystem.

XYB partners with Google Cloud for generative AI

XYB, a coreless banking platform, is teaming up with Google Cloud to integrate generative AI. This collaboration aims to expedite the development of innovative financial products and streamline processes for banks, non-banks, and fintech firms. XYB's coreless banking platform powered by generative AI will enable hyper-personalized products and foster industry innovation, addressing market gaps and accelerating the provision of banking services.


If you are looking for Generative AI Solutions, check out our offerings at www.perpetualblock.io


Saurav Drolia

XLRI | Business Development | Marketing | Strategy | Key Account Management | HealthTech | Fintech | RetailTech | InsurTech | Entrepreneur | Engineer

11 个月
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了