The AI Canvas Newsletter #14
Applied Data Science Partners
Predict, optimise and automate your business with our end-to-end data science solutions driving measurable value.
The AI Canvas: Your weekly palette of inspiration, insights, and innovation in the world of AI.?
Written by?Oli Wilkins.?
Google's Gemini 1.5 and the Leap in AI Contextual Understanding?
Google's latest AI model, Gemini 1.5, introduces a substantial enhancement in performance with a pioneering long-context window capable of processing up to 1 million tokens. This advancement allows for deeper analysis and understanding across various data types, from extensive codebases to lengthy video content. The model also incorporates a Mixture-of-Experts architecture, improving efficiency in training and application, and is currently available for limited preview to developers and enterprise customers.?
Find out more on the announcement page.?
Sora: The AI That Crafts Videos from Text Descriptions?
Sora is an AI model designed to create videos from textual prompts, producing scenes that range from realistic cityscapes to imaginative animations. The model, which is being tested by visual artists and red teamers, can generate videos up to a minute long, with a focus on adhering to the details of the user's instructions. Despite its capabilities, Sora is still being refined to overcome challenges in physical simulation and temporal consistency.?
Checkout OpenAI’s announcement here.?
Stable Diffusion 3: Text-to-Image AI Evolves?
Stable Diffusion 3, the latest text-to-image AI model, offers enhanced capabilities for generating multi-subject images with superior quality and accurate spelling. Currently in early preview with a waitlist open for sign-ups, this model spans from 800M to 8B parameters, ensuring scalability and creative flexibility.?
Find out more here.?
Mistral Large: A Competitor to GPT-4 with Multilingual Prowess?
Mistral AI introduces Mistral Large, a language model that rivals GPT-4 in performance, offering advanced reasoning and multilingual support for English, French, Spanish, German, and Italian. Available on La Plateforme and Azure, it provides developers with features like JSON formatting and function calling, alongside the efficient Mistral Small for latency-sensitive tasks.?
Read more here.?
Gemma: Google's New Open Models?
Google has unveiled Gemma, a suite of open models designed to empower developers and researchers in creating AI responsibly. The release includes lightweight Gemma 2B and 7B models, a Responsible Generative AI Toolkit for safe application development, and comprehensive support across major AI frameworks and hardware platforms. Gemma models are optimised for performance and safety, with a commitment to commercial usage under responsible terms.?
Read more here.?
BASE TTS: Amazon's Pioneering Text-to-Speech Model?
Amazon has developed BASE TTS, a new technology that turns text into speech that sounds strikingly natural. By learning from a vast amount of speech data, this system can handle complex sentences with ease, making computer-generated voices more relatable and easier to understand.?
Have a listen to the samples here.?
Genie: Crafting Interactive Worlds from Images?
Genie is a novel foundation world model that can create interactive, playable environments from various image prompts, including photographs and sketches. Trained on internet videos without action labels, Genie learns to understand controllable elements and infer consistent latent actions, paving the way for endless virtual world generation and the development of generalist AI agents.?
Have a read here.?
Meta’s V-JEPA: Enhancing Machine Perception with Self-Supervised Video Analysis?
领英推荐
Meta's release of the Video Joint Embedding Predictive Architecture (V-JEPA) marks a significant advancement in machine intelligence, focusing on self-supervised learning to interpret complex interactions within videos. The model, which operates under a non-commercial license, offers researchers a new tool to enhance AI's grasp of the physical world, promising more efficient learning and adaptability for a variety of tasks without the need for extensive labelled data.?
Have a read here.?
Technical Reads?
Thinking about High-Quality Human Data – Lilian Weng?
“High-quality data is the fuel for modern data deep learning model training. Most of the task-specific labeled data comes from human annotation, such as classification task or RLHF labeling (which can be constructed as classification format) for LLM alignment training. Lots of ML techniques in the post can help with data quality, but fundamentally human data collection involves attention to details and careful execution.”?
“The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations.”?
“It is increasingly viable to use synthetic data for pretraining, instruction-tuning, and preference-tuning. Synthetic data refers to data generated via a model or simulated environment, instead of naturally occurring on the internet or annotated by humans.”?
Neural network training makes beautiful fractals - Jascha Sohl-Dickstein?
“My five year old daughter came home from kindergarten a few months ago, and told my partner and I that math was stupid (!). We have since been working (so far successfully) to make her more excited about all things math, and more proud of her math accomplishments. One success we've had is that she is now very interested in fractals in general, and in particular enjoys watching deep zoom videos into Mandelbrot and Mandelbulb fractal sets, and eating romanesco broccoli. My daughter's interest has made me think a lot about fractals, and about the ways in which fractals relate to a passion of mine, which is artificial neural networks.”?
“Exploring the crucial step of identifying specific needs before selecting AI tools, ensuring technology serves as a solution, not just innovation.”?
Projects and Code?
“Tech demo gives anyone with an RTX GPU the power of a personalized GPT chatbot.”?
“Temporian is an open-source Python library for preprocessing ? and feature engineering ?? temporal data ?? for machine learning applications ??”?
“Open-source observability for your LLM application, based on OpenTelemetry.”?
“Detect file content types with deep learning”?
Learning?
“In this lecture we build from scratch the Tokenizer used in the GPT series from OpenAI. In the process, we will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.”?
Business and Trends?
Quick Links?
?? Don't miss your weekly dose of cutting-edge AI innovations with?The AI Canvas?newsletter!?
Subscribe now to ensure you never miss out on these transformative insights.?
Looking for more specialised consultancy? At?ADSP?we’re a team of data experts who build AI products with purpose.?
We deliver data science projects for companies who want to harness the power that AI can bring to their organisation. Get in touch at [email protected].?
Stay tuned with?The AI Canvas podcast?for in-depth episodes exploring Generative AI's transformative role across various industries.?