Oz ??alan的动态

Creative Director & Generative AI Consultant for Creatives

1 个月

Humanity’s Last Exam —a new benchmark— shows how far AI still lags behind human expertise. ?? A new benchmark called “Humanity’s Last Exam” has been introduced to evaluate large language models (LLMs) using 3,000 challenging questions across various subjects, including mathematics. Developed by nearly 1,000 subject-matter experts from over 500 institutions worldwide, this benchmark aims to assess LLMs at the frontier of human knowledge. Notably, current state-of-the-art LLMs demonstrate low accuracy on this benchmark, highlighting a significant gap between their capabilities and expert human performance. Paper: https://lnkd.in/edN3uW9A #AI #LLM #Benchmark #ArtificialIntelligence #HumanExpertise

2 条评论

Oz ??alan

Creative Director & Generative AI Consultant for Creatives

3 周

Dan Hendrycks shared this:

Dr. Jalil A.

?Pharmacist Doctor?? ??Healthcare AI & Tech?? ?? Project Management?? ?? Data Analytics ?? Talk about #Healthcare Innovations #AI in Healthcare #Wearable Health Tech #Blockchain in Healthcare #Robotics in Healthcare

1 个月

Oz ??alan what about deepseek?

查看更多评论

要查看或添加评论，请登录

最相关的动态

Multiplatform.AI

1,715 位关注者
7 个月
举报此动态
Unveiling the Geometric Insights of Large Language Models by Tenyx The research from Tenyx delves into the geometric analysis of large language models (LLMs), focusing on their reasoning capabilities. It highlights that LLMs have shown significant advancements in various tasks, with reasoning being a pivotal area for development. Enhancing reasoning often involves increasing model size and context length through techniques like chain of thought, retrieval augmented generation, and example-based prompting. These methods, though effective, also escalate computational costs and inference latency in practical applications. https://is.gd/bVE9iZ #ai #aitechnology #artificialintelligence #llm #machinelearning #tenyx
赞评论
要查看或添加评论，请登录
Multiplatform.AI

1,715 位关注者
7 个月
举报此动态
Research Finds Large Language Models Exhibit Bias Yet Remain Valuable for Complex Data Analysis The integration of large language models (LLMs) into qualitative research methodologies signifies a significant advancement in analytical capabilities. By aligning their outputs with human perspectives and enhancing reflexivity through tools like the AI Sub Zero Bias cards, LLMs empower researchers to navigate complex and controversial subjects with greater insight and nuance. https://is.gd/nnJReg #ai #aitechnology #artificialintelligence #dataanalysis #llm #machinelearning
赞评论
要查看或添加评论，请登录
Multiplatform.AI

1,715 位关注者
9 个月
举报此动态
University of Chicago study finds GPT-4 outperforms human analysts in predicting corporate profits A recent study conducted by the University of Chicago has unveiled a remarkable revelation: large language models (LLMs), particularly exemplified by GPT-4, have surpassed human analysts in the realm of predicting corporate profits. This groundbreaking research, outlined in a working paper titled “Analyzing Financial Statements with Large Language Models,” has profound implications for the trajectory of financial analysis and decision-making, according to Venture Beat. https://is.gd/csqKAM #AI #artificialintelligence #financialanalysis #FinancialServices #GPT4 #llm #machinelearning
赞评论
要查看或添加评论，请登录
WashU IT

927 位关注者
10 个月
举报此动态
Excited to enhance AI reliability? Join us at WashU's workshop on reducing hallucinations in large language models with professor and speaker Ruopeng An on April 23. - Learn to spot and fix non-existent citations, factual mistakes, and misinterpretations - Get proven strategies to minimize AI hallucinations, (boosting trust in generated content) - Dive into practical techniques with hands-on case studies, from prompt engineering to fact-checking This workshop is tailored for both faculty and students, offering essential skills to enhance AI content reliability. Learn more and register here: https://lnkd.in/gTkU4Amh #AI #Research #WashU #Workshop #AIreliability
赞评论
要查看或添加评论，请登录
Michael Spiegel GAICD

CEO I COO | Chairman I Non-Executive Director I Artificial Intelligence I SaaS I Tech Entrepreneur I International Keynote Speaker | Leadership
9 个月
举报此动态
Humans are here to stay. And we’re pretty intelligent, capable and complex machines. Knowing what makes something relevant or not requires an extremely high level of intelligence. Read about our research into the limitations of Large Language Models. #generativeai #LLMs #linguisticengineering
AutogenAI

17,483 位关注者
9 个月已编辑

AutogenAI has led new primary research highlighting the limitations of Large Language Models, with our findings reinforcing the need for human intelligence in software built on this technology. Our researchers at AutogenAI found that while Large Language Models are adept at solving complex problems, they struggle with simple logic problems that most humans consider easy. Continue reading to learn more about our groundbreaking results:?https://lnkd.in/emg7TBwj #artificialintelligence?#generativeai?#bidwriting?#proposalwriting?#ai
1 条评论
赞评论
要查看或添加评论，请登录
Analytics Vidhya

199,005 位关注者
9 个月
举报此动态
Compelling insights from the paper "FLAME: Factuality-Aware Alignment for Large Language Models," authored by Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, and their colleagues from the University of Waterloo, Carnegie Mellon University, and Meta AI. This research addresses a crucial gap in AI development: enhancing the factuality of AI responses. Why This Research is Crucial: ?? Addressing AI Hallucinations: The conventional training processes often lead AI to generate more false facts, known as hallucinations. FLAME introduces a factuality-aware alignment that significantly reduces these inaccuracies. ?? Innovative Training Techniques: By implementing factuality-aware supervised fine-tuning and reinforcement learning, FLAME guides language models to produce more accurate and truthful responses, adhering closely to factual correctness. Key Insights: ?? Enhanced Factuality with Direct Preference Optimization: FLAME's approach incorporates direct preference optimization (DPO) to prioritize factuality, ensuring that the AI's output remains within the bounds of verified information. ?? Balancing Factuality and Helpfulness: Despite focusing on factuality, FLAME maintains the model's ability to follow instructions effectively, demonstrating that you don't have to sacrifice helpfulness to achieve truthfulness. #analyticsvidhya #datascience #generativeai
赞评论
要查看或添加评论，请登录
Multiplatform.AI

1,715 位关注者
8 个月
举报此动态
Unpacking the Influence of Retrieval Augmented Generation (RAG) on Language Models: A Mechanistic Analysis Researchers from Microsoft, the University of Massachusetts Amherst, and the University of Maryland, College Park, delve into the impact of Retrieval Augmented Generation (RAG) on language models (LMs), specifically examining how it influences reasoning and factual accuracy. Their study investigates whether LMs increasingly rely on external RAG context rather than their parametric memory when generating responses to factual queries. https://is.gd/kJqgxa #AI #artificialintelligence #llm #machinelearning #retrievalaugmentedgeneration
赞评论
要查看或添加评论，请登录
英特尔研究院

119,731 位关注者
7 个月
举报此动态
A novel online evaluation protocol for test time adaptation (TTA) methods? A novel task-centric angle for the pre-trained weights of large language models (LLMs)? A general form of dynamic convolution that redefines the basic concepts of kernels? These are but a few of the innovative research topics being presented by Intel Labs researchers at this year's ICML. Learn more about our event activities in this blog. https://intel.ly/3YfWbvA #ICML #ML #AI #Research
赞评论
要查看或添加评论，请登录
Multiplatform.AI

1,715 位关注者
7 个月
举报此动态
University of Auckland Unveils ChatLogic: A Breakthrough in Enhancing Multi-Step Reasoning for Large Language Models Researchers from the The University of Auckland have introduced ChatLogic, a pioneering framework designed to enhance the multi-step reasoning capabilities of large language models (LLMs), marking a significant breakthrough in AI technology. This innovative system addresses a critical limitation in current LLMs, which often struggle with complex deductive tasks that require coherent and logical multi-step reasoning. https://is.gd/INJ5sf #ai #aitechnology #artificialintelligence #chatlogic #llm #machine learning #universityofauckland
赞评论
要查看或添加评论，请登录
Tales Matos, PhD

Secretário de finan?as na Secretaria de Finan?as de S?o Gon?alo do Amarante - Ceará
7 个月
举报此动态
?? Exploring Contextual Hallucinations in LLMs! Keeping my strategy to life long learning, I am sharing insights from a recent study on mitigating contextual hallucinations in large language models (LLMs). The paper introduces the "Lookback Lens," a lightweight classifier leveraging attention maps to detect and reduce hallucinations. Key Contributions: - Utilizes attention weights to identify hallucinations. - Effective across different models and tasks without retraining. - Reduces hallucinations in summarization tasks by 9.6%. Strengths: - Simple yet effective approach. - Generalizable across models. - Significant reduction in hallucinations. Weaknesses: - Limited by the sampling capabilities of LLMs. - Increased inference time due to candidate sampling. Read The full paper. #AI #MachineLearning #NaturalLanguageProcessing #Research #Innovation
赞评论
要查看或添加评论，请登录

3,128 位关注者

查看档案关注

Oz ??alan的动态

更多文章

Visual Communication Design Sensitivity in Architecture