AI Newsletter
Ievgen Gorovyi
Founder & CEO @ It-Jim | AI Expert | PhD, Computer Vision | GenAI | AI Consulting
Another week - another cool updates in the world of AI!
Anthropic's Claude Tools & New Models
Anthropic just gave Claude a new ability to interact with your computer, automating tasks like form-filling by “seeing” your desktop. Through periodic screenshots, Claude identifies and completes each step autonomously, moving closer to true AI desktop assistants. Plus, their new models—Claude 3.5 Sonnet and Haiku—show performance boosts, even outpacing GPT-4 in some tasks.
Microsoft’s Autonomous Agents in Copilot Studio
Microsoft has upgraded Copilot with autonomous agents that respond to real-time triggers across business systems, running tasks without human input. These agents are designed to handle tasks like lead research, customer support, and supplier tracking, reacting autonomously to various business triggers without human input. For instance, a sales qualification agent can prioritize outreach to top leads, while a supplier communications agent tracks performance to prevent delays.
Meta's Spirit LM and Quantized Llama Models
Meta’s Spirit LM is a language model that takes both text and audio input, outputting responses in either format, opening possibilities for more seamless multimodal interactions. Their quantized Llama models, designed to run on mobile devices, use a compressed format that retains quality while reducing model size, making powerful AI more accessible on-the-go.
IBM Granite 3 Models for Enterprise
IBM’s Granite 3 models are enterprise-tailored , supporting tasks like document classification, retrieval, and summarization at lower costs. The new Granite 3.0-based watsonx Code Assistant supports the C, C++, Go, Java and Python languages with new application modernization capabilities for enterprise Java Applications. IBM said the assistant has yielded 90% faster code documentation for certain tasks within its software development business. Models are designed for customization with internal company data, allowing organizations to enhance specific workflows with efficiency and ease.
OpenAI’s Advanced Voice Mode and Notable Exit
OpenAI expanded its advanced voice mode to Plus users across Europe and other regions, enhancing the interactivity of conversational AI. Additionally, OpenAI’s Senior Advisor for AGI, Miles Brundage, announced his departure, sparking discussions around the future of AI and its ethical readiness. Brundage emphasized that the tools available publicly are close to what research labs are working on, indicating transparency in the advancement of AGI.
Runway's Act-One
Runway’s Act One transforms video animation by mapping real human expressions and voice onto animated characters with impressive accuracy. It’s as simple as recording a video of yourself and selecting an animated character to apply your expressions to, from lip-syncing to nuanced facial emotions. This tool is a game-changer for content creators, enabling production-quality animations without the need for costly equipment or large production teams.
Mochi One: Open-Source AI Video Generator
Mochi One, developed by Genmo, is a groundbreaking open-source video generation model under the Apache 2.0 license, allowing full modification for commercial use. Designed to run on consumer hardware (with a powerful GPU), Mochi One can create both realistic and animated videos, making it a versatile choice for developers and businesses alike. As an open-source tool, it opens up possibilities for customization and improvement from the developer community, and it has already gained significant traction on platforms like Hugging Face.
HaiperAI 2.0: High-Quality Video Generation
HaiperAI’s new video generator , Haiper 2.0, delivers strong performance in video creation, supporting resolutions up to 4K and 60 fps. This tool enables users to produce videos using text-to-video and image-to-video formats, creating everything from animated dances to expressive animal clips. While not yet a leader compared to models like Gen-3, it’s an affordable, accessible tool for experimenting with video generation, and it includes 300 free credits to start.
Stable Diffusion 3.5
Stable Diffusion just released version 3.5 , and it’s a solid improvement over the last version, resolving prior issues like distorted images. The new model, which can be downloaded for free, offers both high-resolution “large” and faster “turbo” versions, so users can choose between quality and speed. Available on platforms like Hugging Face, Stable Diffusion 3.5 runs on consumer hardware, making powerful image generation accessible to a wider audience.
Midjourney’s Editor and Retexturing Tool
Midjourney now lets users upload their own images and apply creative changes using AI. With the new image editor, users can mask out areas, add new objects, or completely change textures. Retexturing allows users to keep the image structure while adding fresh, stylized layers, like turning a simple scene into a colorful, psychedelic world.
Traditional Information Retrieval (IR) systems—think web search engines and recommenders—rely on rigid, single-step filtering methods. But since 2022, the rapid development of large language models (LLMs) is reshaping IR. Introducing Agentic Information Retrieval (Agentic IR), a transformative approach leveraging AI agents to expand IR tasks, interactivity, and personalization.
Key Takeaways:
?? Adaptive Architecture: Unlike static traditional IR, Agentic IR utilizes AI agents for continuous observation, reasoning, and action, adapting to multi-step tasks and a variety of user scenarios.
?? Enhanced Techniques: With methods like prompt engineering, retrieval-augmented generation, and multi-agent systems, Agentic IR offers a more dynamic, user-tailored information flow.
?? Practical Applications: From life assistance to business analytics and coding support (like GitHub Copilot), Agentic IR systems deliver personalized, proactive insights, moving beyond simple query-answering.
?? The Road Ahead: Agentic IR is set to redefine information entry points, making digital interaction more intuitive and responsive. Its challenges, including memory and tool integration, underscore the complexity of this leap.
Aya Expanse, our newest state-of-the-art multilingual family of models! These models mark a significant leap in bridging language gaps, delivering unparalleled performance across 23 languages while outperforming other leading open-weight models in the field.
Key Highlights:
? Dual Power Options: Aya Expanse is available in 8B and 32B parameter sizes, offering scalable options for researchers and developers globally. The 8B model democratizes access, while the 32B model pushes the frontier in multilingual AI.
?? Unmatched Performance: Aya Expanse 32B outshines competitors like Gemma 2 27B, Mistral 8x22B, and even the Llama 3.1 70B. With win rates from 51.8% to 76.6% across languages, it sets a new benchmark for multilingual excellence.
?? Innovative Training Techniques: Aya Expanse combines several breakthrough strategies:
?? Advancing Multilingual AI: Aya Expanse builds on our Aya collection—the largest multilingual dataset to date, with 513 million examples, and Aya-101, a model covering 101 languages. Together, they push forward our mission to expand multilingual capabilities in LLMs.
In a new theoretical exploration, researchers from Michigan State University and Amazon present an upgraded approach to Few-Shot Chain-of-Thought (CoT) prompting, which strengthens reasoning in large language models (LLMs) by incorporating both correct and incorrect reasoning paths during demonstrations.
?? Key Insights:
?? Experimental Results:
Models like GPT-3.5 Turbo, GPT-4o-mini, Gemini Pro, and DeepSeek 67B were tested across various datasets (e.g., Disambiguation QA, Tracking Shuffled Objects, GSM8K). Results showed that models exposed to a combination of correct and incorrect reasoning paths outperformed those with only correct paths, with performance gains reaching up to 6.60%.
This method shows promise in training LLMs to think more robustly and handle reasoning tasks with greater accuracy by learning from both right and wrong approaches.
The paper presents an in-depth review of data generation techniques critical to the development and lifecycle of large language models (LLMs), with a focus on addressing the pressing challenges of data scarcity and quality in the wake of expanding datasets. It categorizes two primary approaches—data augmentation and data synthesis—and examines their roles across key phases, from pre-training and fine-tuning to preference alignment and application.
The authors highlight several pressing issues associated with synthetic data, including privacy, where synthetic data may inadvertently contain identifiable elements, thus complicating anonymity efforts. Data watermarking is explored as a possible solution, though it raises concerns about balancing privacy with traceability.
In terms of performance, synthetic and augmented data are noted to have limitations in generalization and domain adaptability, as they may lack the variability of real-world data, leading to challenges such as overfitting and domain misalignment. These challenges impact the transferability of LLMs across tasks and applications, with notable examples in sensitive fields like healthcare.
Our recent work digs into feature steering as a technique to manage social biases in LLMs like Claude 3 Sonnet. By tweaking specific features in the model, we explored how to adjust outputs without reducing model quality. Here’s what we learned:
?? Promising Initial Results
?? Challenges & Off-Target Effects
?? A Bias-Neutralizing Feature
Takeaway: Feature steering shows potential, but there’s work ahead. Our results call for more research to make steering techniques reliable and ready for real-world application.
Recent research reveals that large language models (LLMs) often reflect the ideological perspectives of their creators, challenging the notion of "neutral" AI. This study, led by researchers from Ghent University and the Public University of Navarre, analyzed popular LLMs like Google's Gemini-Pro and Mistral to uncover ideological stances embedded in their outputs. Here are some key insights:
?? Language and Cultural Variance: LLMs generate distinct responses based on language and cultural context. For example, models prompted in Chinese show more favorable views toward Chinese values, while Western LLMs align more with Western ideologies, even when prompted in English.
?? Differing Model Ideologies:
?? Regulatory Implications: The study suggests that enforcing "neutrality" in LLMs could be problematic, as neutrality is culturally defined. Instead, transparency in model design and encouraging diverse, local LLMs could support ideological balance and prevent monopolization in AI.
This paper dives into reasoning enhancement in OpenAI's o1 model, focusing on complex tasks like mathematics, coding, and commonsense reasoning. The authors benchmark o1 against Test-time Compute methods such as BoN, Step-wise BoN, Self-Refine, and Agent Workflow, with GPT-4o as a backbone.
Key findings indicate:
Noteworthy papers (October 2024)
About us:
We also have an amazing team of AI engineers with:
We are here to help you maximize efficiency with your available resources.
Reach out when:
Have doubts or many questions about AI in your business? Get in touch! ??