Bard vs. ChatGPT; Jina Embedding 2; Text2Structure; Does GPT-4 Pass Turing Text?; Transformer As Graph2Graph; and More.

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

发布日期: 2024年2月6日

Editor's Paper Recommendations

From Text to Structure: Using Large Language Models to Support the Development of Legal Expert Systems: Encoding legislative text in a formal representation is an important prerequisite to different tasks in AI and law. For example, rule-based expert systems focused on legislation can support laypeople in understanding how legislation applies to them and provide them with helpful context and information. However, analyzing legislation and other sources to encode it in the desired formal representation can be time-consuming and a bottleneck in developing such systems. Here, we investigate to what degree large language models (LLMs), such as GPT-4, can automatically extract structured representations from the legislation. We use LLMs to create pathways from legislation, according to the JusticeBot methodology for legal decision support systems, evaluate the pathways, and compare them to manually created pathways. The results are promising, with 60% of generated pathways being rated as equivalent or better than manually created ones in a blind comparison. The approach suggests a promising path to leverage the capabilities of LLMs to ease the costly development of systems based on symbolic approaches that are transparent and explainable.

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents: Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like information retrieval, semantic clustering, and text re-ranking, most existing open-source models, especially those built on architectures like BERT, struggle to represent lengthy documents and often resort to truncation. One common approach to mitigate this challenge involves splitting documents into smaller paragraphs for embedding. However, this strategy results in a much larger set of vectors, consequently leading to increased memory consumption and computationally intensive vector searches with elevated latency. To address these challenges, we introduce Jina Embeddings 2, an open-source text embedding model capable of accommodating up to 8192 tokens. This model is designed to transcend the conventional 512-token limit and adeptly process long documents. Jina Embeddings 2 achieves state-of-the-art performance on a range of embedding-related tasks in the MTEB benchmark and matches the performance of OpenAI's proprietary ada-002 model. Additionally, our experiments indicate that an extended context can enhance performance in tasks such as NarrativeQA.

Does GPT-4 Pass the Turing Test?: We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%) but falling short of chance and the baseline set by human participants (63%). Participants' decisions were based mainly on linguistic style (35%) and socio-emotional traits (27%), supporting the idea that intelligence is insufficient to pass the Turing Test. Participants' demographics, including education and familiarity with LLMs, did not predict the detection rate. This suggests that even those who deeply understand and interact with systems frequently may be susceptible to deception. Despite known limitations as a test of intelligence, we argue that the Turing Test continues to be relevant as an assessment of naturalistic communication and deception. AI models that can masquerade as humans could have widespread societal consequences, and we analyze the effectiveness of different strategies and criteria for judging human likeness.

Transformers as Graph-to-Graph Models: We argue that transformers are essentially graph-to-graph models, with sequences being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs into the latent graphs learned by pre-trained Transformers. Adding iterative graph refinement provides a joint embedding of input, output, and latent graphs, allowing non-autoregressive graph prediction to optimize the complete graph without any bespoke pipeline or decoding strategy. Empirical results show that this architecture achieves state-of-the-art accuracies for modeling various linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.

Are you looking to advertise a product, job opening, or event to an audience of over 40,000 AI researchers and engineers? Please reach out to us on?LinkedIn?to explore your options.

Enjoy the newsletter? Help us make it bigger and better by sharing it with colleagues and friends.

Bernard Marr 6 个月前

Top 10 StealthGPT Alternatives for Humanizing…

Anna Y. 4 个月前

How ChatGPT Became Possible - Rise of LLMs

Michael Spencer 1 年前

Industry Insights

Growth Zone?

How to Play to Your Strengths

Expert Advice

The AI Vanguard

43,715 位关注者

Harel Wilner

NLP Developer and Prompt Engineer| Building NLP Models to Optimize Personal Branding | LLM| Pytorch | spaCy

7 个月

The results of GPT-4's performance in the Turing Test, outperforming ELIZA and GPT-3.5, raise interesting questions about what truly constitutes 'passing' such a test in the era of advanced LLMs

Data & Analytics

7 个月

Exciting updates in AI and analytics! Can't wait to dig in. ??

Marc Castricum

7 个月

Great insights on the latest developments in AI, ML, DL, and analytics! ??

AAMIR KHAN

Full Stack Data Scientist ? Quantitative analysis ? Entrepreneur ? AI Researcher ? Consultant

7 个月

Exciting lineup! Always fascinated by advancements in AI and ML...Looking forward to diving into the latest trends and insights shared in this newsletter. Keep up the great work!

1 次回应

Aleeza Ishwal

Let me Boost your Business with Artificial Intelligence | AI Solutions | AI Product Owner | Business Automation | Business Strategy

7 个月

Sounds like an informative read! ????

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Bard vs. ChatGPT; Jina Embedding 2; Text2Structure; Does GPT-4 Pass Turing Text?; Transformer As Graph2Graph; and More.

Danny Butvinik

Chief Data Scientist | 100K+ Followers | FinCrime | Writer | Author of AI Vanguard Newsletter

Editor's Paper Recommendations

领英推荐

Industry Insights

Growth Zone?

Expert Advice

The AI Vanguard

43,715 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

ChatGPT vs Bard: Who?Wins? (Surprising Results)

?? AI is the real Web 3

ChatGPT Over Time; LLMs on Graphs; Why Llama2 new ChatGPT Rival; OpenAI Playground For Beginners; and More;

?? ChatGPT failing 92% of the time

ChatGPT And Large Language Models Are A Privacy Ticking Bomb

??Top ML Papers of the Week

ChatGPT Plus: OpenAI's New Frontier in Language and Image Integration

Concerns Over GPT-4: Assessing Performance and Ensuring Responsible AI Development

What’s LLMOps and Why It Matters to Your Career

OpenAI Releasing GPT-4o Promises to Change the Game - The Daily Dose of Digital - 17/05/24

Editor's Paper Recommendations

领英推荐

Industry Insights

Growth Zone?

Expert Advice

The AI Vanguard

43,715 位关注者

Assessing GPT-4 on Reasoning; Mathematical Perspective On Transformers; Family Of Multimodal Models; Why Small LMs Are The Next Thing; and More.

2024年4月18日

First Hallucination-Free LLM; Fine-Tune or Retrieval; Privacy Issues in LLMs; New Embedding Model by Google; What Resilience Means and More.

2024年4月4日

LLM Fine-Tuning on Graphs; How To Evaluate LLMs; Uncovering Knowledge Gaps Using RAG; Claud 3 on Bedrock; Overcoming Limits Of RAG; and More.

2024年3月12日

Generation Model – What Do They Know? Cracking Length Generalization: AI's Reasoning Evolution; Can We Drastically Reduce Training Costs?; and More.

2024年3月3日

Multimodal LLMs; Orca 2; Cosmopedia – Largest Open Synthetic Data by Huggin Face; How To Fine-Tune On Single GPU; and More.

2024年2月27日

ChatGPT vs Gemini; Uncertainty Quantification in GenAI; GPT-4 vs. GPT-4V vs. Humans On Abstraction and Reasoning; Private vs Public LLMs; and More.

2024年2月20日

Survey on Hallucination in LLM; LLM’s Understanding Math; GPT4All Open-Source LMs; Next Chapter of Gemini; Improved GPT-4 Performance; and More.

2024年2月13日

Hallucination in LLMs – Perspectives and Remediations; Fine-Tuning With Feedback; What LLMs DO NOT KNOW; LLaMA 2 Explained; and More.

2024年1月30日

What Algorithms Can Transformers Learn; Reasoning Agent for Graphs; Supervised Fine-Tuning; Context Understanding in LLMs; and More.

2024年1月23日

Why LLMs Hallucinate; GraphGPT; Inside Microsoft’s small LLM; Deploy Tiny Llama on AWS EC2; Fine-Tune LLM using PyTorch; and More

2024年1月16日

社区洞察

其他会员也浏览了

ChatGPT vs Bard: Who?Wins? (Surprising Results)

?? AI is the real Web 3

ChatGPT Over Time; LLMs on Graphs; Why Llama2 new ChatGPT Rival; OpenAI Playground For Beginners; and More;

?? ChatGPT failing 92% of the time

ChatGPT And Large Language Models Are A Privacy Ticking Bomb

??Top ML Papers of the Week

ChatGPT Plus: OpenAI's New Frontier in Language and Image Integration

Concerns Over GPT-4: Assessing Performance and Ensuring Responsible AI Development

What’s LLMOps and Why It Matters to Your Career

OpenAI Releasing GPT-4o Promises to Change the Game - The Daily Dose of Digital - 17/05/24