AI & Startups April 1- April 7
image source: OpenAI

AI & Startups April 1- April 7

Gen AI Product News

OpenAI unveils AI voice cloning tool

image source: OpenAI

OpenAI has unveiled a preview of Voice Engine, a model that can clone human voices from a 15-second audio sample and generate natural-sounding speech.

The details:

  • The model is able to preserve the accent and emotions of the original speaker in generated speech.
  • Voice Engine is currently being tested by a small group of trusted partners, including AI startup HeyGen.
  • OpenAI has implemented safety measures like watermarking and proactive monitoring to prevent misuse.
  • The company revealed it first developed the tech in late 2022 and has been using it to power voices in its text-to-speech API and ChatGPT.

Microsoft and OpenAI plan $100B supercomputer

Image source: Getty Images

Microsoft and OpenAI are planning a $100 billion data center project, nicknamed "Stargate," to house a supercomputer with millions of AI chips to power OpenAI's next-gen models.

The details:

  • The Stargate project could cost over $100 billion, making it 100x more expensive than today's largest data centers.
  • Microsoft would likely finance the project, which executives aim to launch as soon as 2028.
  • Stargate is reportedly designed to support AI chips from various manufacturers, lessening the companies' reliance on Nvidia's GPUs.

Apple ’s ReALM ‘outperforms‘ GPT-4

Image source: Apple

In a new research paper, Apple researchers introduced ReALM, a new AI system that can understand on-screen tasks, conversational context, and background processes.

The details:

  • ReALM uses a new approach of converting screen info to text — allowing it to bypass bulky image recognition parameters for more efficient on-device AI.
  • The model takes into account both what's on the user's screen and what tasks are active.
  • According to the paper, Apple's larger ReALM models substantially outperformed GPT-4, despite having fewer parameters.

Example use case: If scrolling through a website and you want to call a business, a user could tell Siri to “call the business“, and Siri would be able to “see“ the phone number on the website and call it directly.

OpenAI adds image editing to DALL-E 3

image source: OpenAI

OpenAI has introduced a new feature that allows users to edit images generated by DALL-E 3 directly within ChatGPT, providing a more streamlined way to customize AI-generated images.

The details:

  • The DALL-E editor enables users to select specific areas of an image and prompt changes.
  • Users can add, remove, or modify objects and characteristics within a selected region of an AI-generated image.
  • The editor is accessible via the web interface and the ChatGPT mobile app, with slight variations in the editing process between platforms.

Stability AI launches Stable Audio 2.0

Image source: Stability AI

Stability AI just released Stable Audio 2.0, an updated AI audio generation model that can create high-quality songs up to three minutes long with a coherent structure from a single text prompt.

The details:

  • The new model introduces audio-to-audio generation, allowing users to upload and transform their own audio samples using prompts.
  • 2.0 also offers expanded sound effect generation and style transfer capabilities, providing more creative control for artists and musicians.
  • The model was trained exclusively on a licensed dataset from AudioSparx, with opt-out requests honored to ensure fair creator compensation.

Apple explores home robots

Image source: Midjourney

Apple is reportedly exploring the development of personal home robots as a potential "next big thing" after the company's electric vehicle project fizzled out earlier this year.

The details:

  • Apple engineers are working on a mobile robot that can follow users around the home.
  • The tech giant scrapped its decade-long ‘Project Titan’ EV in February, with robotics work now shifting to home devices.
  • The company is currently advertising for robotics-related roles on its website, seeking ML researchers and engineers.

Google considers AI search paywall

Image source: Google

Google is reportedly considering charging for new ‘premium’ AI-powered search features, marking the first time the company would put any of its core search engine products behind a paywall.

The details:

  • Google is developing tech to deploy AI-enhanced search as part of its premium subscription services, which already includes Gemini access.
  • The traditional search engine would remain free, but certain AI-powered search enhancements would be limited to subscribers.
  • AI is costly to run compared to Google's current search model, potentially threatening the company's $175B search advertising cash cow.

Gen AI VC

Chip Startup SiMa.ai Raises $70 Million to Quicken AI on Cars and Robots

Bioptimus raises $35 million seed round to develop AI foundational model focused on biology

Startup Manifold secures $15M for its AI-based clinical research platform

Spatial AI biomarker startup Nucleai raises $14 million led by Merck’s VC arm

SaaS entrepreneur Raisinghani’s new AI venture SiftHub nabs $5.5M to boost sales efficiency

Ailytics raises US$2.7M to power next generation scenario-based AI monitoring for heavy industries

Vodex.ai powers up with $2 million seed investment for Gen AI Sales Boost

Supersimple closes on $2M pre-seed round to deliver big data insights with explainable AI

Inner AI raises $2M and launches AI platform

AI Agents

Open-sourced AI software developer agent

Image source: SWE-agent

Researchers from Princeton NLP have developed SWE-agent, an open-source system that turns GPT-4 into an AI software engineering agent that can autonomously solve issues in GitHub repositories.

The details:

  • SWE-agent achieves accuracy similar to that of Devin (a recently viral AI agent) on the SWE-bench benchmark, resolving 12.29% of issues autonomously.
  • The agent has an average task completion time of 93 seconds.
  • The system interacts with a specialized terminal, allowing it to open and search files, edit specific lines, and write and execute tests.

AI Research

Octopus brings smarter AI agents to mobile

Image source: Stanford

Stanford researchers just introduced Octopus v2, a new framework for on-device AI agents that outperforms GPT-4 in accuracy by fine-tuning language models with special functional tokens.

The details:

  • Octopus v2 uses supported functions as special tokens and fine-tunes on a small dataset to learn when to use each function.
  • With just 100 training samples, the model achieved 98% accuracy in selecting the right function, surpassing GPT-4.
  • The optimized 2B parameter model runs on-device without

Anthropic uncovers 'many-shot jailbreaking'

image source: Anthropic

Anthropic researchers have discovered a new "jailbreaking" technique called "many-shot jailbreaking" that can evade the safety guardrails of large language models (LLMs) by exploiting expanded context windows.

The details:

  • Many-shot jailbreaking involves inserting a series of simulated dialogues into the input to exploit LLMs' in-context learning abilities.
  • The likelihood of eliciting a harmful response increases with the number of dialogues (or "shots") included in the prompt.
  • The effectiveness of many-shot jailbreaking is related to the process of "in-context learning," where LLMs learn using the prompt context.
  • Anthropic has informed other AI researchers and companies about this vulnerability and is actively working on mitigation strategies.


要查看或添加评论,请登录

社区洞察

其他会员也浏览了