Why LLMs Hallucinate; GraphGPT; Inside Microsoft’s small LLM; Deploy Tiny Llama on AWS EC2; Fine-Tune LLM using PyTorch; and More
Danny Butvinik
Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter
Editor's Paper Recommendations
Why LLMs Hallucinate and How to Get (Evidential) Closure: Perceptual, Intentional, and Extensional Learning for Faithful Natural Language Generation : We show that LLMs hallucinate because their output is not constrained to be synonymous with claims for which they have evidence—a condition that we call evidential closure. Information about the truth or falsity of sentences is not statistically identified in the standard neural probabilistic language model setup and cannot be conditioned to generate new strings. We then show how to constrain LLMs to produce output satisfying evidential closure. A multimodal LLM must learn about the external world (perceptual learning); it must learn a mapping from strings to states of the world (extensional learning); and, to achieve fluency when generalizing beyond a body of evidence, it must learn mappings from strings to their synonyms (intentional learning). The output of an unimodal LLM must be synonymous with strings in a validated evidence set. Finally, we show a heuristic method called Learn-Babble-Prune that ensures the output from an LLM is accurate by ignoring output that is not the same as claims the LLM has evidence for.
GraphGPT: Graph Instruction Tuning for Large Language Models : Graph Neural Networks (GNNs) have an advanced understanding of graph structure via recursive information exchange and aggregation among graph nodes. To improve model robustness, self-supervised learning (SSL) has emerged as a promising approach for data augmentation. However, existing methods for generating pre-trained graph embeddings often rely on fine-tuning with specific downstream task labels, which limits their usability in scenarios where labeled data is scarce or unavailable. To address this, our research focuses on advancing the generalization capabilities of graph models in challenging zero-shot learning scenarios. Inspired by the success of large language models (LLMs), we aim to develop a graph-oriented LLM that can achieve high generalization across diverse downstream datasets and tasks, even without any information available from the downstream graph data. This work presents the GraphGPT framework that aligns LLMs with graph structural knowledge with a graph instruction tuning paradigm. Our framework incorporates a text-graph grounding component to establish a connection between textual information and graph structures. We propose a dual-stage instruction tuning paradigm with a lightweight graph-text alignment projector. This model looks at how self-supervised graph structural signals and task-specific graph instructions can help LLMs understand complex graph structures and get better at adapting to different tasks that come after. Our framework is evaluated on supervised and zero-shot graph learning tasks, demonstrating superior generalization and outperforming state-of-the-art baselines.
Solving the multiplication problem of a large language model system using a graph-based method : The generative pre-trained transformer (GPT)-based chatbot software ChatGPT possesses excellent natural language processing capabilities but is inadequate for solving arithmetic problems, especially multiplication. Its GPT structure uses a computational graph for multiplication, which has limited accuracy beyond simple multiplication operations. We developed a graph-based multiplication algorithm that emulated human-like numerical operations by incorporating a 10k operator, where k represents the maximum power to base 10 of the larger of two input numbers. Our proposed algorithm attained 100% accuracy for 1,000,000 large-number multiplication tasks, effectively solving the multiplication challenge of GPT-based and other large language models. Our work highlights the importance of blending simple human insights into the design of artificial intelligence algorithms. Keywords: Graph-based multiplication; ChatGPT; multiplication problem.
Meet SingleStore Pro Max, the Powerhouse Edition
In the rapidly changing landscape of AI and real-time analytics, the foundation of your applications—the data platform—is no longer an optional frill but a must-have. It's the springboard for innovation, the hidden force behind every breakthrough application.
Introducing SingleStore Pro Max: The Powerhouse Edition
--
领英推荐
Are you looking to advertise a product, job opening, or event to an audience of over 40,000 AI researchers and engineers? Please reach out to us on?LinkedIn? to explore your options.
Enjoy the newsletter? Help us make it bigger and better by sharing it with colleagues and friends.
--
Industry Insights
Growth Zone
?
Expert Advice
Your exploration of AI's cutting edge is impressive, highlighting the importance of understanding both the capabilities and limitations of current technologies. ?? It's clear you're committed to not only keeping up with AI advancements but also to practical applications and leadership in uncertain times. Generative AI could significantly enhance the quality of your work, offering efficient solutions to synthesize complex information, automate content creation, and optimize analytics. ?? I'd love to show you how generative AI can streamline your processes and elevate your newsletter's insights. Let's chat about the potential it holds for you – book a call with us to explore the possibilities! ?? Cindy
Digital Transformation | Data-Driven | Digital Twin | AI Enthusiast | Servant Leader | Strategic Vision |
10 个月Good one Danny Butvinik. Applying evidential closure to LLMs can significantly increase reliability and precision, crucial for high-stakes environments. However, this approach might restrict the model's ability to innovate and creatively solve problems, potentially missing out on more efficient or novel solutions. Balancing accuracy with creative problem-solving is essential, especially in dynamic fields. For example in my personal exploration, was amazed when an LLM is fed with specification to generate code which is pretty close to any human could potentially achieve.
?? 6x LinkedIn Top Voice | Sr AWS AI ML Solution Architect at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 12+ Years in AI | MLOps | IIMA | 100k+Followers
10 个月Nice one Danny Butvinik. Considering the potential for "hallucinations" in LLMs and the limitations of "small LLMs," how can businesses implement these technologies responsibly and effectively in real-world applications while managing uncertainty and risk?