PSA: AI detectors are "neither accurate nor reliable"
Jan Tissler
CONTENTMEISTER ? – AI Content Strategy, AI Content Creation, Generative AI Workshops and Trainings, German Content, Translation and Transcreation.
There are many services that claim to be able to recognize AI text with "99% accuracy" - without providing any proof. At the same time, there are services that claim to adapt AI texts in such a way that no detector can recognize them.
Both cannot be true at the same time.
Granted: I can recognize an unmodified text from GPT 3.5 - without any tool. Certain phrases are common. The structure seems always the same. I've seen articles here and there on the net where I was "99%" sure: This is copy-pasted 1:1 from this (now obsolete) AI.
But we don't talk much about GTP 3.5 anymore. A tool like Claude 3 Opus has a very good, surprisingly human writing style.
This improvement in quality leads to an effect Gizmodo describes in an article worth reading: AI detectors flag text as AI-generated if, for example, it is too well-written.
Yes, really: Correct grammar and punctuation are now suspect. A professional writing style and content structure according to well-known style guides? That can only be AI.
Of course, this is complete nonsense.
The scientific study "Testing of detection tools for AI-generated text" came to a similar conclusion after examining 12 services:
"The researchers conclude that the available detection tools are neither accurate nor reliable."
It doesn't get any clearer than that.
Another problem: Even using helpful AI-powered tools could lead to mislabeling. Think of services that find mistakes in a text, improve wording, and so on. Are these improved and corrected texts suddenly "AI-generated"? Certainly not.
In other words, there is a high probability that AI recognition will be completely wrong. AI texts will slip through undetected while human-written texts will be falsely labeled as AI work.
So don't rely on these vendors' promises.
Instead, consider why the use of AI is seen as a problem in the first place. Then you can address that issue and discuss it with the content team.
Because one thing is clear: even if an AI delivers excellent copy, a human should always have the last word. It should also be clear when and for what purpose AI tools can be used, where their weaknesses lie, and what they are generally not well suited for.
T O O L S
New version and features for ChatGPT alternative Claude
Anthropic's new language model Claude 3.5 Sonnet is causing a stir in the AI community. It reportedly outperforms previous models such as GPT-4 in benchmark tests and impresses users with its performance. It can handle complex tasks such as game or web development.
Despite weaknesses in simple cognitive tasks, Claude 3.5 Sonnet shows the pace of development in the area of large language models, putting pressure on the competition.
In addition to Claude 3.5 Sonnet, Anthropic introduces the new "Artifacts" feature. This allows users to interact directly with the results of their queries. Whether it's designs, emails or other content: They can now be viewed and edited directly in the Claude app.
Anthropic also introduces the "Projects" feature. It allows teams to collect and organize relevant documents, code and knowledge in one central place. With a context window of 200,000 tokens (the equivalent of a 500-page book), Claude can now process and understand large amounts of organization-specific information.
These innovations point to Anthropic's vision to evolve Claude from a pure chatbot to a comprehensive work tool for businesses. The goal seems to be to bring knowledge, documents and ongoing work together in one place, similar to Notion or Slack.
New AI models for video: Luma Dream Machine and Runway Gen-3 Alpha
Luma AI has introduced "Dream Machine", a new AI system for video generation. Unlike similar systems such as those from OpenAI ("Sora"), Dream Machine is available for free for everyone to use. Users can create 5-second video clips simply by entering text. However, the quality of the results is not always convincing. The startup itself lists the weaknesses of Dream Machine on the product page. Source: VentureBeat
But that's not all: Runway, a pioneer in the field of AI-generated videos, is introducing its latest model, Gen-3 Alpha. It enables realistic 10-second video clips with precise details and a wide range of expressions. The model, which is available to paying subscribers, promises faster generation times and is designed to support all existing modes such as text-to-video and picture-to-video. Runway emphasizes that Gen-3 Alpha is based on a new infrastructure for multimodal training and represents a step toward "general world models" (see glossary below). Source: VentureBeat
Other tools in brief
ElevenLabs has released a new tool that allows video creators to quickly and easily add sound effects to their clips. The app analyzes uploaded videos and suggests different sound effects that can be integrated directly into the videos via an interface.
A new iOS app from ElevenLabs allows users to listen to an AI-generated audio version of articles, books or documents on the go.
Genspark is a new AI-powered search engine that uses generative AI to create summaries of search results, similar to Google's AI Overviews or Arc Search, but claims to achieve higher quality through specialized models.
Augie Studio introduces a new AI platform for creating social media videos easily and at scale. The platform includes features such as AI-powered script, voice-over, and image creation, as well as editing tools to customize videos.
Meta is releasing a series of new AI models for audio, text and watermarks. Meta is also making two sizes of its Chameleon multimodal text model available for research. These models can be used to perform tasks that require both visual and textual understanding, such as image annotation.
Microsoft has unveiled Florence-2, a versatile AI model that can handle various image processing tasks with a single, unified approach. Available under an MIT license, the model appears to outperform larger specialized models in areas such as image annotation and object recognition, despite its compact size, and could help companies save on investments in separate task-specific models.
Stability AI has released Stable Diffusion 3 Medium, a smaller version of its image generation model that can run on PCs with as little as 5GB of VRAM. According to Stability AI, the model offers comparable quality to the larger version and could therefore be an attractive option for users with limited resources.
Former Meta engineers Fryderyk Wiatrowski and Peter Albert have developed an AI agent called Jace that is designed to perform tasks in the browser independently, such as booking a hotel.
According to Nvidia, their new Nemotron-4 340B open language model will revolutionize the generation of synthetic data and enable companies to develop custom AI models.
Kong Inc. has released its "AI Gateway", a platform designed to make it easier for enterprises to control and securely deploy generative AI in various cloud environments. According to Kong, the gateway enables the integration and management of various AI technologies through a single interface and provides security features to prevent misuse by manipulating the input (prompts) to AI models.
LiveBench is a new benchmark for large language models developed by a team of scientists. Unlike existing benchmarks, it uses constantly updated questions from current sources and automatically scores the answers based on objective criteria. The team has taken special care to avoid the risk of "contamination", where the training data of a language model contains the test data of a benchmark. This means that the results of the benchmark should actually reflect the model's abilities in new situations, and not just its ability to reproduce already known content.
A new benchmark test from Sierra shows that even advanced language models such as GPT-4o still struggle with more complicated tasks in everyday scenarios, achieving a success rate of less than 50 percent. The test, called TAU-bench, is designed to help developers evaluate the performance of AI agents in realistic situations, taking into account factors such as multiple interactions and complex tasks.
N E W S
AI search engine Perplexity under fire
AI startup Perplexity is under fire. Several media outlets have accused the company of copying content from websites without permission. Especially controversial: Perplexity is said to have bypassed blocks meant to prevent this.
领英推荐
Perplexity CEO Aravind Srinivas denies the accusations. He speaks of misunderstandings and refers vaguely to the use of third-party providers. At the same time, he emphasizes that old rules for web crawlers need to be reconsidered in the age of AI. Critics see this as an attempt to shirk responsibility. Some publishers are threatening legal action.
The debate over Perplexity highlights a fundamental problem: how do AI companies handle third-party content? Perplexity now plans to share profits with some publishers. However, it is questionable whether this will be enough to calm the waters.
Sources: Pixel Envy, Wired, Axios, Fast Company
Music labels sue music AI start-ups Suno and Udio
The major music labels Universal Music Group, Sony Music Entertainment and Warner Records have sued the AI companies Suno and Udio. They accuse them of having used copyrighted works on a massive scale for their AI music generators without permission.
The lawsuits were filed by the Recording Industry Association of America (RIAA) in Boston and New York. The labels are demanding damages of up to 150,000 US dollars per work.
Suno and Udio allow users to create songs by entering text. According to the statement of the claim, their AI models can produce deceptively real imitations of well-known artists and songs.
The defendant companies reject the allegations and emphasize that their technology is designed to generate new content.
Sources: The Verge, VentureBeat
More news in brief
Ilya Sutskever, co-founder of OpenAI, has launched a new startup called Safe Superintelligence Inc. He is joined by Daniel Levy and Daniel Gross. The founders see "superintelligence" as the most important technical problem of our time. Such systems would go far beyond the capabilities of the human brain. Sutskever was previously part of the superalignment team at OpenAI, which focused on controlling powerful AI systems. Source: VentureBeat
ChatGPT company OpenAI acquires two startups: Rockset and Multi. Rockset, known for its real-time analytics database, will strengthen OpenAI's infrastructure. The five-person team at Multi, specialists in screen sharing and collaboration, will strengthen the development of the ChatGPT desktop application. Industry experts are speculating about possible new features such as AI computer control or enhanced teamwork.
Speaking of OpenAI, according to a report by The Information, the startup is experiencing rapid growth in revenue. Annual revenue is said to have doubled in the last six months to $3.4 billion. Most of the revenue comes from subscriptions to ChatGPT and from fees paid by developers. Microsoft also pays OpenAI a portion of the revenue from the sale of language models through its Azure cloud platform. OpenAI disputes these figures.
And OpenAI for the third time: The much-hyped new "voice mode" for ChatGPT has been delayed by at least a month. It should now be available in late July or early August, initially for a small group of users only.
Researchers have found a way to dramatically improve the energy efficiency of large language models without sacrificing performance. Using their system, a language model with billions of parameters can be run on as little as 13 watts. The researchers have also developed proprietary hardware that further maximizes energy savings.
Meta's labeling of images as "Made with AI" is causing confusion and criticism. The company wants to identify photos on its platforms that have been created using AI tools, but the automatic label does not work reliably. Many users report that photos that have only been edited with AI are also being labeled. Photographers criticize that simple image editing should not be equated with AI-generated content. Meta says it is working to improve the labeling.
AI startup Stability AI, which specializes in image generation and is known for its Stable Diffusion tool, is acquired by a group of investors led by former Facebook president Sean Parker after financial difficulties. The investors provide $80 million to recapitalize the company. The goal is to develop a viable business model for Stability AI under the leadership of new CEO Prem Akkaraju. Paid versions of the AI models for businesses and tools for integrating the technology are planned.
Apple is delaying the introduction of new AI features in Europe. The company cites privacy and security concerns related to the interoperability requirements of the Digital Markets Act (DMA). Experts see this as a sign that companies are taking competition laws more seriously. Apple promises it is working on a solution to make the features available to EU customers.
Many AI models that power chatbots advertise themselves as "open source," but do not fully release the code and training data. A new study shows that many large companies describe their models as "open weights", meaning that researchers can use them, but have no access to the underlying date and can't make fundamental changes to them. The lack of transparency in the training data is a particular obstacle. Smaller companies and research groups, on the other hand, rely on truly open source models that enable the advancement of AI research.
Google's AI research lab DeepMind has developed a new technology called V2A that can automatically generate appropriate soundtracks, sound effects, and even dialogue for videos. While V2A seems promising, DeepMind admits that the quality of the audio generated is not yet perfect. For now, it is not generally available.
Despite numerous AI integrations into applications such as Salesforce or Adobe Photoshop, significant sales have apparently not yet materialized, Bloomberg reports. Many companies are still unsure about appropriate pricing models. Meanwhile, hardware and cloud vendors are benefiting a lot more from the AI boom.
Amazon is apparently planning a major relaunch of its Alexa service, which has been losing money for years. According to Reuters, the company is planning to equip Alexa with a new AI system that will be offered in two tiers. A premium version could be available for around $5 per month and enable tasks such as composing emails or ordering food. Whether this move will make Alexa profitable remains to be seen, as competition in the AI space is fierce and users expect Alexa to be free of charge.
G O O D ? R E A D S
AI as a tool, not a replacement for critical thinking
The article "Turning the Tables on AI" proposes an innovative approach to artificial intelligence. Instead of using AI as a substitute for your own thinking, the author argues for using it as a tool to promote critical thinking. He provides practical tips on how to use ChatGPT as an idea generator, question poser, and editor without giving up your own creativity and authorship.
Understanding the AI hype cycle
In his article for VentureBeat, Samir Kumar, co-founder of Touring Capital, analyzes the current AI hype cycle. He cautions against jumping to conclusions and reminds us of previous technology waves, such as the smartphone revolution. Kumar emphasizes that the first innovators are often not the long-term winners. He advises founders and investors to pay particular attention to data strategies, regulatory foresight, and cybersecurity when it comes to AI startups. According to Kumar, the quality and quantity of training data is critical to the success of AI models. He expects more innovation, but cautions against exaggerated expectations and recommends listening to researchers and developers.
AI vs copyright
This article by Tim O'Reilly discusses the complex copyright issues surrounding the training and use of AI. He argues that instead of litigation, a solution must be found that benefits both AI developers and creators. O'Reilly suggests that AI companies should respect copyrights, provide attribution, and pay for results rather than training.
C U R I O U S ? F I N D
ChatGPT as a craftsman
Reddit user Zaryatta76 asked ChatGPT for an illustrated guide to mounting a TV on the wall. As you can see above, the AI has a rather unusual idea. Perhaps this is a veiled petition from the chatbot to leave our many screens behind and enjoy some fresh air instead?
Or maybe the image AI is simulating illustrated instructions without really understanding their content.
One of the two.
G L O S S A R Y
General World Model
A general world model is an ambitious concept in artificial intelligence. The goal is to create an AI system that can understand and simulate the world as comprehensively as a human. Imagine a virtual assistant that can not only generate text or images, but also understand, predict, and respond to complex real-world situations. For example, it could assess the consequences of an action, interpret social interactions, or find creative solutions to a variety of problems. Such a model would be a digital representation of our world and how it works, making it a powerful but challenging goal for AI research.