Voxel51's Filtered Views Newsletter — April 26, 2024
Welcome to Voxel51’s bi-weekly digest of the latest trending AI, machine learning and computer vision news, events and resources! Subscribe to the email version.
?? The Industry Pulse
AI Wings of Fury: AI Agent-Powered Fighter Jet Takes to the Skies
Ok, that was kind of a clickbaity headline, but it’s pretty much what happened!?
The US Air Force Test Pilot School and the Defense Advanced Research Projects Agency (DARPA) have successfully installed AI agents in the X-62A VISTA aircraft as part of the Air Combat Evolution (ACE) program.?
The teams conducted over 100,000 lines of flight-critical software changes across 21 test flights, culminating in the first-ever AI vs human within-visual-range dogfights. The breakthrough demonstrates that AI can be used safely in aerospace applications, paving the way for future advances. The X-62A VISTA will continue to serve as a research platform to advance autonomous AI systems in aerospace.Looks like AI has finally ‘flying’ colors in aerospace! I highly recommend checking out this YouTube video that DARPA put out to learn more.
Phi-3: where less is more, and ‘mini’ means maximum impact!
Microsoft has overshadowed the Llama-3 launch with their latest line of small language models (SLMs) – Phi-3!
The Phi-3 family includes three models: phi-3-mini with 3.8 billion parameters, phi-3-small with 7 billion, and phi-3-medium with 14 billion. The phi-3-small and medium compete with or outperform GPT 3.5 across all benchmarks, including the multi-turn bench (and by a decent amount). It’s not so good on TriviaQA due to its limited capacity to store “factual knowledge,” but honestly, that’s not even an interesting benchmark to care about.
What is interesting, though, is how they curated their dataset. They created a dataset that used simple, easy-to-understand words like those a 4-year-old could understand. They also created synthetic datasets called “TinyStories” and “CodeTextbook” using high-quality data from larger language models. This supposedly makes the models less likely to give wrong or inappropriate answers.
Microsoft’s Phi-3 SLMs are the proof that sometimes, smaller is smarter.
Another one from Microsoft: VASA-1!
VASA-1 is a Microsoft Research project that generates realistic talking faces in real-time based on audio input.
VASA-1 brings talking faces to life with its cutting-edge technology. The system generates facial movements and expressions that perfectly sync with audio input, creating a seamless and realistic experience. Moreover, it does this in real-time, crafting animations on the fly as the audio is spoken. The result is a lifelike appearance that’s uncannily similar to real human faces, complete with intricate skin texture, facial features, and nuanced expressions.
Seriously, this thing is a trip. Go check out the website. None of the images are of real people, but the lip-audio synchronization, expressive facial movements, and natural head motions fooled me.
Use of generative AI tools
?? While generative AI has generated a lot of hype and excitement, Gartner’s recent survey of over 500 business leaders found that only 24% are currently using or piloting generative AI tools. The majority, 59%, are still exploring or evaluating the technology. The top use cases are software/application development, creative/design work, and research and development. Barriers to adoption include governance policies, ROI uncertainty, and a lack of skills. Gartner predicts 30% of organizations will use generative AI by 2025.
?????? GitHub Gems
COCONut is a modernized large-scale segmentation dataset that improves upon COCO in terms of annotation quality, consistency across segmentation tasks, and scale and introduces a challenging new validation set.?
Dataset Highlights
Construction
?? Good Reads
This week’s good read is Nathan Lambrt’s slides from his guest lecture session for Stanford CS25, titled Aligning Open Language Models.
Honestly, I wouldn’t normally recommend slides as a good read (because that’s weird)…but this is an exception for three reasons:
In the slides, he showcases the rapid progress in aligning open language models, driven by innovative techniques, community efforts, and the availability of open-source resources. He then briefly traces the evolution of LMs from Claude Shannon’s early work in 1948 to the emergence of transformers in 2017 and the subsequent release of influential models like GPT-1, BERT, and GPT-3. He observes GPT-3 -3’s rise in 2020, with its surprising capabilities and potential harms, highlighted the need for aligning LMs with human values and intentions.
Nathan also provides a brief history of the following:
领英推荐
??? Good Listens
Multimodal AI has developed through model architectures, training datasets, and key insights.?
Unified models can translate between data types and generate images from text, answer questions about images, and translate between languages. Larger and more capable models are expected to emerge in the coming years. The latest episode of the Practical AI podcast discusses AI’s rapid advancement and the shift towards multimodal AI models in 2024.
?The hosts dive into the history and key developments that enabled today’s multimodal AI systems:
The hosts also discuss Udio, a new AI tool that generates complete songs from text prompts, including music, lyrics and vocals. This raises questions about AI and creativity:
The ability of AI to operate across multiple modalities is rapidly advancing and will likely continue to accelerate in the coming years. Those who align themselves with these new creative tools and capabilities may be best positioned in this fast-changing landscape.
?????? Good Research
Evaluating retrieval-augmented generation (RAG) systems, which combine information retrieval and language generation, has been a challenging task due to the reliance on extensive human annotations
A recent research paper introduces ARES, a novel framework that aims to address this issue by providing an automated, data-efficient, and robust evaluation approach.
We’ll summarize the paper using the PACES method.
Problem
Traditional methods for evaluating the quality of generated responses of a RAG system rely heavily on expensive and time-consuming human annotations, which can introduce subjectivity and inconsistency in the evaluation process. To address this issue, an automated evaluation framework called ARES has been proposed. It leverages synthetic data generation and machine learning techniques to provide reliable and data-efficient assessments of RAG system performance.
Approach
ARES approach has four components:
Claim
The paper’s main claim is that ARES provides an effective and efficient framework for evaluating RAG systems without relying heavily on human annotations.?
The authors argue that ARES can accurately assess the performance of RAG systems in terms of context relevance, answer faithfulness, and answer relevance, while significantly reducing the need for time-consuming and expensive human evaluations.
They claim that ARES has the potential to become a standard evaluation framework for RAG systems, enabling researchers and practitioners to assess and compare the performance of different RAG architectures more effectively.
Evaluation
ARES is evaluated on various datasets, and its effectiveness in ranking RAG systems accurately is shown with limited human annotations. However, testing ARES on a broader range of datasets and RAG system architectures would be valuable.
Substantiation
The evaluation results substantiate the paper’s main claim that ARES is an effective and efficient framework for evaluating RAG systems.?
The high correlation between ARES rankings and human judgments across different datasets and evaluation dimensions supports the claim that ARES can provide reliable assessments of RAG system performance while requiring significantly fewer human annotations compared to traditional approaches.?
The ablation studies and robustness experiments further strengthen the validity of the proposed framework.
???. Upcoming Events
Check out these upcoming AI, machine learning and computer vision events! View the full calendar and register for an event.