Latest in AI News

Latest in AI News

AI Model Releases and Developments

  • Jamba 1.5 Launch: @AI21Labs released Jamba 1.5, a hybrid SSM-Transformer MoE model available in Mini (52B - 12B active) and Large (398B - 94B active) versions. Key features include 256K context window, multilingual support, and optimized performance for long-context tasks.

  • Claude 3 Updates: @AnthropicAI added LaTeX rendering support for Claude 3, enhancing its ability to display mathematical equations and expressions. Prompt caching is now available for Claude 3 Opus as well.

  • Dracarys Release: @bindureddy announced Dracarys, an open-source LLM fine-tuned for coding tasks, available in 70B and 72B versions. It shows significant improvements in coding performance compared to other open-source models.

  • Mistral Nemo Minitron 8B: This model demonstrates superior performance to Llama 3.1 8B and Mistral 7B on the Hugging Face Open LLM Leaderboard, suggesting the potential benefits of pruning and distilling larger models.
  • Phi-3.5 and Flexora: The Phi-3.5 model was noted for its safety and performance. @rohanpaul_ai praised the model's capabilities. Additionally, Flexora's adaptive layer selection outperformed existing baselines, as stated by @rohanpaul_ai.

AI Research and Techniques

  • Prompt Optimization: @jxmnop discussed the challenges of prompt optimization, highlighting the complexity of finding optimal prompts in vast search spaces and the surprising effectiveness of simple algorithms like AutoPrompt/GCG.

  • Hybrid Architectures: @tri_dao noted that hybrid Mamba / Transformer architectures work well, especially for long context and fast inference.

  • Classifier-Free Diffusion Guidance: @sedielem shared insights from recent papers questioning prevailing assumptions about classifier-free diffusion guidance.

AI Applications and Tools

  • Spellbook Associate: @scottastevenson announced the launch of Spellbook Associate, an AI agent for legal work capable of breaking down projects, executing tasks, and adapting plans.

  • Cosine Genie: @swyx highlighted a podcast episode discussing the value of finetuning GPT4o for code, resulting in the top-performing coding agent according to various benchmarks.

  • LlamaIndex 0.11: @llama_index released version 0.11 with new features including Workflows replacing Query Pipelines and a 42% smaller core package.

  • MLX Hub: A new command-line tool for searching, downloading, and managing MLX models from the Hugging Face Hub, as announced by @awnihannun.
  • uV Virtual Environments: uv virtual environments offer rapid installation and dependency management. @reach_vb showcased how uv creates lightweight virtual environments quickly.
  • LangChain and LangSmith Updates: resource tags in LangSmith help efficiently manage projects, datasets, and deployments. @LangChainAI introduced these enhancements for better workspace organization.
  • Multi-Agent Systems in Qdrant and LangChain: Multi-agent role-playing and semantic caching in Qdrant make AI systems more robust. @iqdotgraph shared how these integrations aim to enhance data processing and retrieval workflows.
  • Codegen: A new tool for programmatically analyzing and manipulating codebases was introduced. @mathemagic1an highlighted its ability to safely transform code at scale, visualize complex code structures, and support AI-assisted development.

  • Claude Usage: @alexalbert__ shared a day-long log of using Claude for various tasks, demonstrating its versatility in everyday scenarios like recipe creation, email management, and content writing.

  • Metamate: An internal AI assistant for Meta employees was discussed. @soumithchintala mentioned its capabilities in building custom agents for team-specific knowledge and systems.

AI Development and Industry Trends

  • Challenges in AI Agents: @RichardSocher highlighted the difficulty of achieving high accuracy across multi-step workflows in AI agents, comparing it to the last-mile problem in self-driving cars.

  • Open-Source vs. Closed-Source Models: @bindureddy noted that most open-source fine-tunes deteriorate overall performance while improving on narrow dimensions, emphasizing the achievement of Dracarys in improving overall performance.

  • AI Regulation: @jackclarkSF shared a letter to Governor Newsom about SB 1047, discussing the costs and benefits of the proposed AI regulation bill.

  • AI Hardware: Discussion on the potential of combining resources from multiple devices for home AI workloads, as mentioned by @rohanpaul_ai.


1. AI Model Releases and Benchmarks

  • Jamba 1.5 Jumps Ahead in Long Context: AI21 Labs launched Jamba 1.5 Mini (12B active/52B total) and Jamba 1.5 Large (94B active/398B total), built on the new SSM-Transformer architecture, offering a 256K effective context window and claiming to be 2.5X faster on long contexts than competitors.Jamba 1.5 Large achieved a score of 65.4 on Arena Hard, outperforming models like Llama 3.1 70B and 405B. The models are available for immediate download on Hugging Face and support deployment across major cloud platforms.
  • Grok 2 Grabs Second Place in LMSYS Arena: Grok 2 and its mini variant have been added to the LMSYS leaderboard, with Grok 2 currently ranked #2, surpassing GPT-4o (May) and tying with Gemini in overall performance.The model excels particularly in math and ranks highly across other areas, including hard prompts, coding, and instruction-following, showcasing its broad capabilities in various AI tasks.
  • SmolLM: Tiny But Mighty Language Models: SmolLM, a series of small language models in sizes 135M, 360M, and 1.7B parameters, has been released, trained on the meticulously curated Cosmo-Corpus dataset.These models, including datasets like Cosmopedia v2 and Python-Edu, have shown promising results when compared to other models in their size categories, potentially offering efficient alternatives for various NLP tasks.
  • LM Studio 0.3.0 Drops Major Updates: LM Studio 0.3.0 introduces a revamped UI with enhanced chat organization, automatic context handling, and multi-model loading capabilities, significantly improving performance for local models.Despite the improvements, users reported bugs in model loading and system prompts, urging others to report issues as they arise.
  • Mistral Nemo 12b Fine-Tuning on 8gb GPU: Mistral Nemo 12b can be fine-tuned on an 8gb GPU, specifically the RTX 4050, making it accessible for testing and prototyping.This wider accessibility opens up possibilities for more engineers to rapidly iterate and test models without needing high-end hardware.
  • Flux: Black Forest Labs' FLUX model, developed by former Stable Diffusion team members, is gaining traction:Low VRAM Flux: New technique allows running Flux on GPUs with as little as 3-4GB of VRAM.GGUF quantization: Successfully applied to Flux, offering significant model compression with minimal quality loss.NF4 Flux v2: Refined version with improved quantization, higher precision, and reduced computational overhead.Union controlnet: Alpha version released for FLUX.1 dev model, combining multiple control modes.New Flux LoRAs and checkpoints released, including RPG v6, Flat Color Anime v3.1, Aesthetic LoRA, and Impressionist Landscape.FLUX64 - LoRA trained on old game graphics


2. AI Development Tools and Frameworks

  • Aider 0.52.0 Adds Shell Power to AI Coding: Aider 0.52.0 introduces shell command execution, allowing users to launch browsers, install dependencies, run tests, and more directly within the tool, enhancing its capabilities for AI-assisted coding.The release also includes improvements like ~ expansion for /read and /drop commands, a new /reset command to clear chat history, and a switch to gpt-4o-2024-08-06 as the default OpenAI model. Notably, Aider autonomously generated 68% of the code for this release.
  • Cursor Raises $60M for AI-Powered Coding: Cursor announced a $60M funding round from investors including Andreessen Horowitz, Jeff Dean, and founders of Stripe and Github, positioning itself as the leading AI-powered code editor.The company aims to revolutionize software development with features like instant answers, mechanical refactors, and AI-powered background coders, with the ambitious goal of eventually writing all the world's software.
  • LangChain Levels Up SQL Query Generation: The LangChain Python Documentation outlines strategies to improve SQL query generation using create_sql_query_chain, focusing on how the SQL dialect impacts prompts.It covers formatting schema information into prompts using SQLDatabase.get_context and building few-shot examples to assist the model, aiming to enhance the accuracy and relevance of generated SQL queries.


3. AI Research and Technical Advancements

  • Mamba Slithers into Transformer Territory: The Mamba 2.8B model, a transformers-compatible language model, has been released, offering an alternative architecture to traditional transformer models.Users need to install transformers from the main branch until version 4.39.0 is released, along with causal_conv_1d and mamba-ssm for optimized CUDA kernels, potentially offering improved efficiency in certain NLP tasks.
  • AutoToS: Automating the Thought of Search: A new paper titled "AutoToS: Automating Thought of Search" proposes automating the "Thought of Search" (ToS) method for planning with LLMs, achieving 100% accuracy on evaluated domains with minimal feedback iterations.The approach involves defining search spaces with code and guiding LLMs to generate sound and complete search components through feedback from unit tests, potentially advancing the field of AI-driven planning and problem-solving.
  • Multimodal LLM Skips the ASR Middle Man: A researcher shared work on a multimodal LLM that directly understands both text and speech without a separate Automatic Speech Recognition (ASR) stage, built by extending Meta's Llama 3 model with a multimodal projector.This approach allows for faster responses compared to systems that combine separate ASR and LLM components, potentially opening new avenues for more efficient and integrated multimodal AI systems.
  • ERP Prompts (Score: 87, Comments: 20): The post discusses advanced techniques for erotic roleplay (ERP) with AI models, focusing on creating detailed character profiles and enhancing immersion. It provides specific prompts for generating complex characters with unique traits, backstories, and intimate details, as well as techniques like "Inner Monologue" and "Freeze Frame" to deepen the roleplaying experience. The author emphasizes the importance of building anticipation and crafting realistic interactions, encouraging users to provide detailed inputs to elicit more engaging responses from AI models.Users discussed formatting techniques for inner monologue, with suggestions including using brackets ?monologue? or HTML comments in SillyTavern. These methods allow characters to have hidden thoughts that influence future token generations.Interest was expressed in the author's creative writing setup for non-erotic content, with requests for a detailed post on the topic. Users also inquired about recommended AI models for erotic roleplay, with one mentioning Midnight Miqu 1.5 70B.Several comments praised the author's writing style and creativity, with one user stating they'd "rather get it on with you than any well-prompted, well-stacked, well-misbehaved LLM." Users also requested additional prompts and techniques for their own AI-assisted writing endeavors.


4. AI Industry News and Events

  • Autogen Lead Departs Microsoft for New Venture: The lead of the Autogen project left Microsoft in May 2024 to start OS autogen-ai, a new company that is currently raising funds.This move signals potential new developments in the Autogen ecosystem and highlights the dynamic nature of AI talent movement in the industry.
  • NVIDIA AI Summit India Announced: The NVIDIA AI Summit India is set for October 23-25, 2024, at Jio World Convention Centre in Mumbai, featuring a fireside chat with Jensen Huang and over 50 sessions on AI, robotics, and more.The event aims to connect NVIDIA with industry leaders and partners, showcasing transformative work in generative AI, large language models, industrial digitalization, supercomputing, and robotics.
  • California's AI Regulation Spree: California is set to vote on 20+ AI regulation bills this week, covering various aspects of AI deployment and innovation in the state.These bills could significantly reshape the regulatory landscape for AI companies and researchers operating in California, potentially setting precedents for other states and countries.


5. AI Safety and Ethics Discussions

  • AI Burnout Sparks Industry Concern: Discussions in the AI community have raised alarms about the potential for AI burnout, particularly in intense frontier labs, with concerns that the relentless pursuit of progress could lead to unsustainable work practices.Members likened AI powerusers to a "spellcasting class", suggesting that increased AI model power could intensify demands on these users, potentially exacerbating burnout issues in the field.
  • AI Capabilities and Risks Demo-Jam Hackathon: An AI Capabilities and Risks Demo-Jam Hackathon launched with a $2000 prize pool, encouraging participants to create demos that bridge the gap between AI research and public understanding of AI safety challenges.The event aims to showcase potential AI-driven societal changes and convey AI safety challenges in compelling ways, with top projects offered the chance to join Apart Labs for further research opportunities.
  • Twitter's AI Discourse Intensity Questioned: A recent tweet by Greg Brockman showing 97 hours of coding work in a week sparked discussions about the intensity of AI discourse on Twitter and its potential disconnect from reality.Community members expressed unease with the high-pressure narratives often shared on social media platforms, questioning whether such intensity is sustainable or beneficial for the AI field's long-term health.
  • AI Engineer Meetup in London: The first AI Engineer London Meetup is scheduled for September 12th, featuring speakers like @maximelabonne and Chris Bull.Participants are encouraged to register here to connect with fellow AI Engineers.
  • Infinite Generative Youtube Development: A team is seeking developers for their Infinite Generative Youtube platform, gearing up for a closed beta launch.They are looking for passionate developers to join this innovative project.

OpenAccess AI Collective

  • Mistral Fine-Tuning is Crack: A member remarked that Mistral's large fine-tuning is 'crack', indicating exceptional performance but providing no further details.
  • Jamba 1.5: Faster Inference and Long-Context Capabilities: AI21's Jamba 1.5 models offer up to 2.5X faster inference than similar models and enhanced long-context capabilities, aiming at business applications with features like function calling and structured output.These models are released under the Jamba Open Model License.
  • Phi 3.5 Mini: Exploding Gradients: A user reported experiencing exploding gradients with the microsoft/Phi-3.5-mini-instruct model, persisting even after lowering the learning rate to 1e-15.Attempts to fix it included switching optimizers to paged_adamw_8bit.
  • Flash Attention Performance Troubles: A user encountered errors while trying to use Flash Attention for faster training but resolved the issue by switching to eager attention.This indicates that Flash Attention may not be fully compatible with the model.
  • Accelerate Adds fp8 Support: Accelerate has added support for fp8, indicating potential integration with Axolotl, although integration points remain uncertain.Discussion revolved around exploring how to effectively incorporate this new support.


Meanwhile in AI Engineer land, the Gorilla team updated the Berkeley Function Calling Leaderboard (now commonly known as BFCL) to BFCL V2 ? Live, adding 2251 "live, user-contributed function documentation and queries, avoiding the drawbacks of dataset contamination and biased benchmarks." They also note that multiple functions > parallel functions:

a very high demand for the feature of having to intelligently choose between functions (multiple functions) and lower demand for making parallel function calls in a single turn (parallel functions)

The dataset weights were adjusted accordingly:

Depth and breadth of function calling is also an important hyperparameter - the dataset now includes rare function documentations that contain 10+ function options or a complex function with 10+ nested parameters.

GPT4 dominates the new leaderboard, but the open source Functionary Llama 3-70B finetune from Kai notably beats Claude.

AI LLM Model Leaderboard



Your insights into AI advancements are truly valuable for professionals and enthusiasts alike. Keep up the fantastic work!

回复

要查看或添加评论,请登录

Saptarshi Mukerji的更多文章

社区洞察

其他会员也浏览了