Towards AI #103: Apple integrates GenAI
Towards AI
Making AI accessible to all with our courses, blogs, tutorials, books & community.
Also, Qwen2, “Kling” text-to-video, Buffer of Thoughts, and more!?
What happened this week in AI by Louie
While the week started with some impressive new open model releases in China (Qwen2 LLM and Kling text-to-video model), anticipation was always building towards Apple’s WWDC keynote and AI announcements. As with any Gen AI production use case, Apple had to decide which features to build into its products first, how to implement them, and choose between many tradeoffs. This includes what the feature’s user benefits from relative to risks (like hallucinations and reputational damage from viral failure cases). Do we prioritize capability or latency/cost? (which can involve a decision between on-device and cloud models of various sizes). Do we use open-source models, in-house models, or external closed models? How do we balance user privacy and data security relative to ease of use and potential collection of future training and human feedback data? Towards AI can help with these questions, by the way, with our customized Generative AI courses and consultancy!?
Apple has chosen to get started with three tiers of intelligence for different features: 1) a small on-device in-house 3BN parameter LLM, 2) a larger server-based in-house LLM (which look a little above GPT3.5 level) with inference on Apple silicon and many new privacy and security features and 3) a ChatGPT integration with Siri for access to the capabilities of GPT-4o. Many of Apple’s first features are geared around smarter search (including a semantic understanding of media), prioritization of alerts and emails, transcription, summary, writing, and image tools. There are also hints of more agentic capabilities with Siri enabled to take actions in and across apps.?
Why should you care??
With Apple’s 1 billion users and often trend-setting products, we think Apple’s AI choices are important for the direction of the whole industry. While the integration of ChatGPT into Siri seems like a big win for OpenAI - we do not think the relationship feels exclusive. Apple stated they would also later integrate Google’s Gemini model, and we think its new “App Intents API” and ability to connect third-party apps to Siri will likely lead to an open playground of third-party LLM models and products being integrated to various degrees. At the same time, however, data security and privacy with the often highly personal data stored within your iPhones and Macs are much easier to manage with on-device models or Apple private cloud models (though we still expect skepticism on how safe your data is here), so we expect pressure towards vertical integration for many features and capabilities. In any case, we think Apple’s late entry into the Generative AI and long overdue revamp of Siri will provide a lot of opportunities for AI and LLM developers going forward!
— Louie Peters?—?Towards AI Co-founder and CEO
Hottest News
The Qwen2 series is an advancement over the Qwen1.5, introducing five enhanced AI models with new features such as support for 27 additional languages and improved coding and mathematics functions. The standout Qwen2-72B offers superior safety and can comprehend lengthy contexts of up to 128K tokens. These models are available on Hugging Face and ModelScope.?
Mistral introduced mistral-finetune for developers who want to fine-tune Mistral’s open-source models on their infrastructure. The codebase is built on the LoRA training paradigm and facilitates serverless fine-tuning. Users can try it by registering on their la Plateforme.?
OpenAI is reinstating its robotics division, focusing on creating AI models for robotic applications in collaboration with external robotics companies. This is a strategic pivot from producing in-house hardware to empowering humanoid robots through partnerships, as evidenced by investments in entities like Figure AI.?
A group of current and former employees from prominent artificial intelligence companies, including OpenAI and Google DeepMind, have issued an open letter calling for increased transparency and protections for whistleblowers within the AI industry. The letter, which calls for a “right to warn about artificial intelligence,” is one of the most public statements about the dangers of AI.?
Chinese short-video app Kuaishou has launched a text-to-video service similar to OpenAI’s Sora. The Kling AI Model, in the trial stage, can process text into video clips up to 2 minutes long with 1080p resolution, supporting various aspect ratios.
Five 5-minute reads/videos to keep you learning
The article introduces a competition that challenges participants to integrate multiple fine-tuned LLMs to improve their performance and adaptability to novel tasks. Competitors will utilize pre-trained expert models with up to 8 billion parameters from the Hugging Face Model Hub, available under research-friendly licenses. The competition aims to minimize the costs and challenges of training LLMs from the ground up by utilizing existing models.?
We know AI models hallucinate, but scholars Michael Townsend Hicks, James Humphries, and Joe Slater from the University of Glasgow argue that these inaccuracies are better understood as “bullshit.” This article explains why these inaccuracies might be better described as bullshit.?
It is essential to train AI models to have good character traits and to continue to have these traits as they become more extensive. This article from Anthropic explains the process behind crafting the personality of its Claude AI model, using ‘Character Training’ to help instill curiosity, thoughtfulness, and diverse viewpoints.
Researchers have employed sparse autoencoders to break down GPT-4's neural network into 16 million human-interpretable features, allowing for enhanced comprehension of AI processes. In this post, Open AI explains it further. They have also shared a paper detailing their experiments and methods.?
This article introduces RapidIn, a framework designed to efficiently estimate the influence of training data on large language models (LLMs) by compressing gradient vectors into low-dimensional representations called RapidGrads. RapidIn addresses challenges related to scalability, computational efficiency, and handling massive datasets.?
Repositories & Tools?
1. Vectorize, built for RAG, turns unstructured data into perfectly optimized vector search indexes.
2. Spreadsheet is all you need: a nanoGPT pipeline packed in a spreadsheet created to understand how GPT works.
3. Replicate allows you to run and fine-tune open-sourced AI models using an API.
4. transformers.js allows you to run the transformers directly in your browser.
5. Build your own X is a compilation of well-written, step-by-step guides for re-creating technologies like AR, Bots, Torrent, etc., from scratch.
Top Papers of The Week?
This paper argues that open-endedness — the ability to create new, learnable ‘artifacts’ — is the key to achieving artificial superhuman intelligence (ASI). It provides a concrete formal definition of open-endedness through the lens of novelty and learnability. It also examines the safety implications of generally capable open-ended AI.?
Seed-TTS encompasses advanced autoregressive and non-autoregressive text-to-speech models capable of generating human-like speech with emotional variability, speaker similarity, and naturalness. It also showcases proficiency in end-to-end speech generation and editing through a diffusion-based architecture.?
This paper introduces Buffer of Thoughts (BoT), a thought-augmented reasoning approach for enhancing large language models' accuracy, efficiency, and robustness. They use a meta-buffer to store a series of informative high-level thoughts called thought-template and then retrieve a relevant thought-template for each problem and adaptively instantiate it with specific reasoning structures.?
This paper analyzes the structured relationship between Transformers and state-space models (SSMs) using matrix analysis, introducing a theoretical framework that connects the two. It also presents an improved architecture, Mamba-2, which builds on its predecessor by being significantly faster (2-8 times) and maintaining comparable performance in language modeling tasks.?
This paper proposes MASA, a novel method for robust instance association learning, capable of matching any objects within videos across diverse domains without tracking labels. MASA learns instance-level correspondence through exhaustive data transformations, and leverages object segmentation from the Segment Anything Model (SAM).?
Quick Links
1. Stability AI has launched Stable Audio Open, an AI model that generates sound from text descriptions using royalty-free samples geared towards non-commercial use. The model was trained using around 486,000 samples from free music libraries, such as Freesound and the Free Music Archive.
2. AI-powered search startup Perplexity is facing accusations of plagiarizing content from news outlets like Forbes, CNBC, and Bloomberg through its Perplexity Pages feature. While Perplexity includes small logos linking to the sources, the posts do not mention the publications by name.
3. Hugging Face and Pollen Robotics created an open-source robot. Pollen designed the humanoid robot and partnered with Hugging Face to train it to do various household tasks and safely interact with humans and dogs.
Who’s Hiring in AI
Interested in sharing a job opportunity here? Contact [email protected].
Think a friend would enjoy this too? Share the newsletter and let them join the conversation.
Apple’s new approach with three tiers of AI intelligence is a game-changer. It's exciting to see how this will open up opportunities for AI and LLM developers.