Tested: Generative AI by iStock
Still from a video showing Mona Lisa rapping, generated with Microsoft's VASA-1

Tested: Generative AI by iStock

[German version of this issue | Subscribe to the German edition]

There are many AI image generators. But they often have one problem: copyright. First, it is not always clear where the training material comes from. Second, it is not certain to what extent you can get into legal trouble with such images.

Such image generators are often out of the question for companies and other organizations because they are (or seem to be) too risky to implement.

This is where the opportunity lies for vendors whose business is directly threatened by AI image generators: Stock photo platforms.

I have now been able to test "Generative AI by iStock", a collaboration between image giant Getty Images and hardware specialist Nvidia. It promises "built-in legal protection". I received credits for 100 generated images.

To judge the quality of the generator in detail, I would have to test it longer and more intensively. My impression is that it is on the level that is currently expected. In other words, good results can be achieved - sometimes with more, sometimes with less effort. Typical problems such as deformed hands are also present at times.

Useful functions include "Refine" to change certain parts of an image, and "Extend" to create a wide format result from a square template, for example. These functions are also available for images from the existing iStock catalog. I think this is a good idea because I often find that an image almost fits, but I don't have the resources to change it.

I liked the interface of the image generator. It makes it easy to select the image format or color mood. A helpful feature is the ability to use an already successful image as a starting point for other variations with just one click.

As usual, the starting point is a prompt. With iStock, there are input fields for the main subject, an action, the environment, and visual aesthetics. This ensures that the desired result is described with sufficient precision. A negative prompt is also available to avoid certain elements in the results.

However, it is not always easy for non-experts to fill in these fields. When I go looking for stock photos, I often have only a vague idea of what I want. I take a step-by-step approach to the end result and draw inspiration from what I find on the platform.

When asked about this, Grant Farhall, Chief Product Officer at Getty Images and iStock, explained that they have introduced a "lookbook" for this purpose. It brings together examples of motifs and styles that can serve as inspiration.

In May, companies will also be able to define specifications so that the images generated will be more in line with the internal style guide.

Bottom line

Even in this early version, iStock's image generator is an interesting offering for businesses. It is a useful addition to the Stockphoto platform. A video AI has also been announced - in collaboration with the startup Runway.


T O O L S

Meta's impressive ChatGPT alternative Llama 3

Meta presents Llama 3, the latest generation of its speech models, which is freely available for download. The models are said to surpass the performance of many competitors and can even compete with some of the best proprietary models. Llama 3 is said to excel at multiple-choice questions, programming tasks, and mathematical problems. In addition to the models themselves, Meta is presenting a standalone chatbot based on Llama 3 that is positioned as a direct competitor to ChatGPT and others. This chatbot integrates Meta's image generator "Imagine". Read more about it on VentureBeat.

ChatGPT update brings "memory" feature and temporary chats

ChatGPT has received several updates. One of the most interesting is the "memory" feature, which allows ChatGPT to remember information that users communicate to it. For example, you can store details about yourself or your company that the chatbot can access when needed. Other new features include the ability to have temporary chats that are deleted after 30 days and improved chat history management.

Other tools in brief

The new AI model AdaKWS from speech recognition specialist aiOla claims to be able to convert speech correctly into text, even if it is technical jargon. The model achieves an accuracy of 94.6% - better than OpenAI's Whisper.

Microsoft's VASA-1 can make human portraits sing and talk. It only needs a still image and an audio file with speech to generate moving lips, matching facial expressions and head movements. Microsoft emphasizes that this is a research demonstration only, with no plans to bring it to market.

London-based Synthesia introduces "Expressive Avatars," a new generation of AI avatars that adapt their facial expressions, gestures, and tone of voice to the context of the spoken content. This makes it possible to create more realistic and emotional AI videos for marketing, training or patient communication.

VideoGigaGAN outperforms previous methods of video upscaling, creating videos with a high level of detail and consistency. The approach is based on the GigaGAN image upscaler and solves its video processing problems through special techniques that result in sharper and smoother videos. Source: Hacker News

Microsoft introduces Phi-3 Mini, its smallest AI model to date, which can compete with models such as GPT-3.5 despite its small size, making it ideal for companies with smaller data sets and limited budgets.

Adobe integrates its Firefly AI image generator directly into Photoshop, allowing users to create images using text input and then edit them using familiar Photoshop tools.

Apple releases OpenELM, a set of small, freely available AI models that can run directly on devices like laptops or smartphones and perform tasks such as text generation efficiently. While not industry-leading in performance, OpenELM seems to provide a solid foundation for future research and development in on-device AI.

The Amazon Q AI chatbot is now generally available to help businesses with tasks such as knowledge discovery, software development, and data analysis.

Anthropic, creator of ChatGPT competitor Claude, introduces the "Team" business solution. It includes access to the three latest Claude models, has increased usage limits, admin tools and invoice management, and offers longer context windows for uploading large documents for editing. In addition, an iPhone app is now available.

Snowflake introduces Arctic, a new open language model designed specifically for complex enterprise tasks such as generating SQL queries and code or following instructions.

Cohere releases a toolkit to accelerate the development of generative AI applications in the enterprise. With pre-built applications and easy-to-implement components, the toolkit aims to reduce development time.

Salesforce Einstein Copilot, an AI-powered tool for enterprise, is now generally available and is designed to help sales teams be more productive through generated text and automated actions. Einstein Copilot can break down and execute complex tasks, such as identifying the best sales opportunities and creating email drafts.

OpenVoice allows users to realistically clone voices in different languages and accents, and even control emotions and speaking styles. The latest version, OpenVoice V2, offers improved audio quality, native support for multiple languages, and is available free for commercial use. Source: Hacker News


N E W S

Mysterious chatbot appears and disappears again

A mysterious chatbot called "gpt2-chatbot" caused a stir among experts this week. Its capabilities appeared to be on a par with GPT-4, but its origin remained unknown. After a short time, the bot disappeared from the scene again, presumably due to the high level of interest. Speculation about the developer, including OpenAI, Google and Anthropic, is rampant. It remains to be seen who is behind the "gpt2-chatbot" - whether it is a new AI model or even a new developer.

More news in brief

Researchers at Meta and the University of Southern California have developed Megalodon, a new architecture for AI models. It allows language models to process significantly larger amounts of text without using a lot of memory.

More than half of Americans have already tried generative AI, with the majority (82%) seeing it as enriching their creativity and simplifying their lives. The technology is especially popular for personal projects (81%) and research and brainstorming (64%). These are the findings of a study by Adobe Analytics.

Researchers at DeepMind have found that large language models can learn new skills through hundreds or even thousands of examples in the prompt, without the need to fine-tune the model. This method enables companies to quickly prototype and develop AI applications.

Although the majority of IT managers worldwide rate artificial intelligence as a top priority, a recent survey shows that most companies are not yet adequately prepared to use it. The main obstacles are a lack of IT infrastructure and unclear guidelines for the ethical use of AI.

The Austrian consumer organization noyb is suing OpenAI because ChatGPT disseminates false information about individuals, which potentially violates the GDPR.

Google DeepMind develops "Gecko", a new standard to evaluate the capabilities of AI image generators. It is designed to help better understand the strengths and weaknesses of AI models and drive their development.

AI-powered search tool Perplexity closed a new round of funding and is valued at more than $1 billion. With the new capital and a new Enterprise Pro Plan, the company plans to expand globally and offer its secure, AI-powered search service to enterprises.


G O O D ? R E A D S

Cyc: An Almost Forgotten AI Project

For four decades, the Cyc project has been working on giving machines the ability to reason. To this end, researchers have built a vast knowledge network consisting of millions of concepts and rules. Cyc can draw conclusions, answer questions, and even reconcile conflicting information from different domains. Despite these impressive achievements, Cyc is largely forgotten today. New approaches based on machine learning have revolutionized AI research. But the Cyc project is not giving up, and is looking for ways to combine its strengths with those of systems like ChatGPT. Whether the time of rule-based systems like Cyc will return remains to be seen.

The Economic Impact of Generative AI

In his report, "The Economic Impact of Generative AI," MIT researcher Andrew McAfee examines the potential impact of AI tools on the economy. While generative AI may change some jobs or even make them obsolete, the author predicts an overall increase in the demand for workers, albeit in new occupational areas. Comprehensive training initiatives will be essential to manage the change.

Ethical issues surrounding seemingly human AI applications

The increasing "humanization" of AI systems raises ethical and legal questions. In an interesting article on VentureBeat, James Thomason warns of the downsides of human-like AI. He is particularly concerned about the use of AI in sensitive areas such as therapy and education, where human empathy and understanding are essential. Thomason urges tech companies to develop ethical guidelines and be transparent with users to maintain trust and avoid legal risks.


C U R I O U S ? F I N D

How ChatGPT imagines the inside of a volcano

Illustration created by ChatGPT showing the inside of a volcano, including a temple and waterfalls

A Redditor asked ChatGPT to create an illustration showing a cross-section of the active volcano Mount Rainier in the US state of Washington. The result can be seen above - including a temple and a waterfall. Other users also tried, with even more curious results...


G L O S S A R Y

Edge AI

Today, most AI applications run in the cloud on powerful, specialized computers in data centers. But experts say this will not always be the case. There will also be an increasing number of small models that run directly on users' devices, from PCs to smartphones.

This is called edge AI, after the concept of edge computing. It is meant to signal that the real work is done at the "edge" of the network, not centrally.

This is made possible by advances in chip technology and new methods for developing powerful AI models that require far fewer resources. One approach, for example, is to train a model for a well-defined task rather than as a jack-of-all-trades like ChatGPT.

One advantage is that such locally running AI models do not require internet access and therefore work as usual, even when mobile phone reception is poor. In addition, users' data remains on their devices and is not sent to the cloud for processing.


Woodley B. Preucil, CFA

Senior Managing Director

11 个月

Jan Tissler Very Informative. Thank you for sharing.

回复
Woodley B. Preucil, CFA

Senior Managing Director

11 个月

Jan Tissler Very interesting. Thank you for sharing

要查看或添加评论,请登录

Jan Tissler的更多文章

社区洞察

其他会员也浏览了