Multimodal AI meets architecture & design
Multimodal AI refers to systems that can process and understand multiple modes of data such as natural language, images, videos, and speech. Its applications range from conversational AI to computer vision and speech recognition.
The increased adoption of multimodal AI is likely to bring about significant changes in various industries and aspects of human life. Some of the expected changes include:
Tech companies are making strides in multimodal AI
Tech companies are now making strides in multimodal AI to improve search and content generation.
Today, an AI model trained on video data can be used for predicting video content, a model trained on text can be used for text predictions, and so on.
To go beyond this, multimodal AI research aims to be more holistic, using a single AI model to conceptualize information across multiple types of data like text, 2D images, and videos to make a prediction.
For example, in early 2021, OpenAI trained an AI model called DALL-E to generate images based on a text prompt.
In the image below, the AI generates avocado-shaped armchairs following a prompt for the same.
In January 2022, OpenAI released DALLE-2, which improves the original model’s output image resolution by 4x.
In May 2022, 谷歌 launched Imagen, a text-to-image project that reportedly outperforms OpenAI’s model in terms of the quality of images generated, as well as the alignment between the input (text) and output (AI-generated image).
Earlier this year, Meta published a paper called “Omnivore: A Single Model for Many Visual Modalities.” The paper details an AI model that, when trained to recognize 2D images of pumpkins, can also recognize pumpkins in videos or 3D images without requiring additional training for the latter 2 media types.
Multimodal AI is growing beyond academic research labs to find practical applications. 谷歌 , for instance, is using multimodal AI to improve search. In the future, a user could take a photo of their hiking boots and ask a query like, “Can I use these to hike Mt. Fuji?” The search engine would recognize the image, mine information on the web about Mt. Fuji from text, image, and video data, and connect the dots to provide a relevant answer.
Multimodal AI research is poised to go beyond corporate research labs to power the next era of search and content generation, among other applications.
领英推荐
But where does the money flow?
Sequoia Capital released an interesting piece a few months ago on this brave new world, and seems to go all in.
Below is the most recent mapping of the space - as you can see, no one is expected to be left on the side of the impact.
What can be expected in architecture & design?
Clearly, AI has the potential to augment and assist architects and designers in their work, but I believe it is unlikely to replace them in the near future. While AI can generate designs, it is still limited (and to some extent, is bound to remain limited) in its ability to understand and incorporate the nuances and complexities of human preferences, cultural context, and ethical considerations.
I mean, every ChatGPT prompt has a thesis, an antithesis, and a conclusion. That’s not artistic expression, that’s just resources being curated about the so-called ‘state of the art’.
The same goes with architecture & design generative AI solutions. While feeding from words, prompts, or images, they simply ‘generate’ – they don’t ‘create’. Yes, DALLE-2, Midjourney or Models Lab (formerly Stable Diffusion API) are fun to 'generate', but that's not an architecture or design creation per se. It's just a digital asset born from the ashes of other assets and training models.
Architects and designers bring unique creative visions and critical thinking skills to the design process, and their expertise and judgment are needed to evaluate the outputs generated by AI algorithms. While AI can serve as a tool for architects and designers, helping them to save time and improve their efficiency, the final decisions and creative direction of the design process will still be left to human architects and designers.
Truth be told, and that’s a good news – architecture and design are not purely technical fields and require a deep understanding of human behavior, culture, and aesthetics. AI algorithms may not ever be able to fully grasp these aspects of design, and human architects and designers will likely continue to play a critical role in shaping our built environment.
Should architects & designers revert to blockchain to protect their IP?
Most certainly, blockchain technology has the potential to help IP in various industries By creating a secure and decentralized digital ledger, blockchain allows for the creation of tamper-proof records of ownership, provenance, and transfer of IP rights.
For architects and designers, using blockchain to manage IP rights is bound to be the way to go - thereby, it can provide a reliable and transparent way to keep track of creations and ensure that creators are properly credited and compensated for their work. It will also simplify the process of licensing and selling their designs, as well as preventing unauthorized use or infringement of their IP. That's a big big issue when we see the current turmoils surrounding Getty Images , Models Lab (formerly Stable Diffusion API) , Adobe and others - software providers feeding on the intelligence of others to train AI-powered models without protecting the original rights.
Sure, blockchain is a relatively new technology, and its applications for IP protection are still evolving. There will also be some challenges to overcome, such as ensuring the accuracy and completeness of the information recorded on the blockchain, as well as the issue of interoperability between different blockchain systems.
But that's the way forward.
That's, right here, right now - a massive use case for enterprise blockchain, architects, designers and AI-engines to collaborate.