Hi! Here’s your Wednesday, October 2, 2024, edition of AI in the News. I could call it the Product Hunt of AI because it’s packed with product launches! I’ve also added a new section at the end with my curated list of useful tools.
- Tools for proofreading text and transcribing audio are “very handy.”
- Generating summaries of web articles or removing objects from a photo “were so hit or miss that they should be ignored.”
- “This is all to say that Apple Intelligence is worth watching over the next few years to see whether it evolves into a must-have product, but that it’s not a compelling reason to splurge on new hardware.”
- Nvidia has launched the NVLM 1.0 family of open-source AI models, including the 72 billion parameter NVLM-D-72B, which competes with proprietary models like GPT-4.
- The NVLM-D-72B model excels in both visual and textual tasks, improving text-only performance by an average of 4.3 points after multimodal training.
- The NVLM project also introduces innovative architectural designs, including a hybrid approach that combines different multimodal processing techniques.
- Axios focuses on the release of OAI’s speech-to-speech engine to developers. “The move paves the way for a wave of AI apps that offer conversational voice interfaces.”
- The Financial Times underscores the importance of agents in OAI’s strategy. “OpenAI anticipates that AI-powered assistants will become mainstream by 2025, with Kevin Weil stating, ‘We want to make it possible to interact with AI in all of the ways that you interact with another human being.’”
- Microsoft is enhancing its Copilot AI with voice, vision, and a more encouraging personality, aiming to create a friendly AI helper. An experimental feature called Copilot Vision will allow the AI to see users’ screens and respond to cursor actions.
- Mustafa Suleyman, CEO of Microsoft AI, states, “AI companions now see what we see, hear what we hear, and speak in the same language that we use to communicate with one another.”
Can AI Keep Our Secrets?
- Workplace AI tools are increasingly recording and sharing sensitive information without the discretion of human assistants, leading to potential privacy violations and uncomfortable situations.
- While companies like Otter and Zoom provide options for users to manage sharing settings, many employees are unaware of these features. Privacy advocate Naomi Brockwell emphasized, “the technology is proliferating so fast, and people haven’t really internalized how invasive it is.”
- Experts argue that companies must take responsibility for educating employees about AI tools and their risks.
- Users are increasingly sharing personal details with chatbots like ChatGPT, often revealing more than they would on traditional platforms. OpenAI CEO Sam Altman noted, “I have been positively surprised about how willing people are to share very personal details with an LLM.”
- The potential for monetizing chat data raises ethical questions. Companies may use sensitive information to generate targeted ads, continuing a trend of extracting user secrets for profit.
Deep Dive
- Utility companies are facing heightened risks from extreme weather events, such as hurricanes and wildfires, which are becoming more frequent and intense due to climate change.
- They are exploring AI-driven technologies to better predict storms and enhance grid resilience.
- A researcher highlights the need for cooperation “to make sure the methods and models are the same, then make all that data accessible, especially for smaller utilities with fewer resources, like rural cooperatives.”
See also: IBM and NASA recently released open-source AI model for weather and climate applications
- A Reddit user prompted Google’s NotebookLM AI to create a podcast from a document filled with the words “poop” and “fart,” resulting in a surprisingly insightful discussion on art and meaning.
Briefly noted
Tools you can use
- OpenAI’s new transcription model, Whisper Turbo, is a distilled version of the Large v3 model, 8x faster and 40% more VRAM efficient. You can run it 100% locally in your browser, keeping your data private. Try it here.
- Google released FRAMES, a new dataset to push RAG systems to their limits. It features 824 tough questions needing info from 2-15 Wikipedia articles, covering everything from history to health. Check it out here.
- NVIDIA’s new vision-language model, NVEagle, is available in 7B and 13B models, with improved visual perception using MoE vision encoders. Explore it here.
If you enjoy what you read, don’t hesitate to share the love with your friends and give us a shout-out when you spread the word! ??