Use of AI in Audio Post
Redefining artistic and technical Boundaries in Sound Design for film
Introduction
The rising popularity of AI to improve performance on a specific task is undeniable. It can not go unnoticed due to its continuous reputation of unlocking indefinite possibilities whilst threatening a variety of occupations across a range of industries.
Machine Learning in audio processing has a rich history, with applications ranging from speech recognition to music generation. The field continues to advance rapidly, and new plugins with AI capabilities are regularly introduced to the market.
“One cannot fully understand these concepts by adopting a unilateral view of its study; only through the combined efforts of many different disciplines will the concept become attainable” (Miranda, 1995, p.60). Nevertheless, I am interested in a limited aspect of this evolution, where AI’s explicit knowledge-based kind of intelligent systems are used in sound design for screens. When we refer to an intelligent system, we mean one that collaborates with the user and offers useful degrees of automated reasoning to facilitate laborious and time-consuming activities (e.g. determining the proper stream of synthesis parameters for each desired individual sound).?
“Sound in audiovisual media does not merely complement images. Instead, the two channels together engage audio-vision, a special mode of perception that transforms both seeing and hearing.” (Chion, 1994, p.3.) This description of sound design for the screen itself is very human irrespective of the technical nature of the process. The perception of an audio professional is required to instigate the art of sound design.“We know that mental imagery is commonly evoked by auditory phenomena. Mental imagery activates areas of the brain related to memory and perception and evokes an emotional response, affecting engagement and impacting consumer perception and choice. In other words, mental imagery influences the way a stimulus is processed and perceived, and it follows then that we should attempt to understand what mental imagery our sound design evokes in users” (Collins and Johnston, 2023, p.125). It can be interesting to observe how this human skill transcends into a machine’s understanding of the job in a technical as well as artistic sense. What is the future scope in the enhancement of the existing performance of this technology in terms of practical utility and its impact on the concerned media’s experience (Film, Television, Video Games and VR).
By investigating the historical relevance, recent developments, ethical considerations regarding its use by sound designers and other audio professionals and the artistic and technical utility of AI in sound design for screen, we can analyze the scope for the future of AI in sound for screen. To undertake this investigation into applying machine learning and artificial intelligence (AI) techniques for creative objectives, such as sound design for films I’ve chosen to cover the data after the end of the 20th, As this industry often followed established industry practices and traditions. Professionals relied on conventional methods and tools, and there may have been resistance to adopting AI due to unfamiliarity or scepticism about its effectiveness or possibly the lack of processing capabilities of professional devices before the early 2000s.
Theoretical and Historical Analysis
Machine learning has a long history. The convergence of AI and audio processing technologies has led to the evolution of its use in screen sound.?
Researchers led by Professor Barry Vercoe in the late 1990s explored how machine learning algorithms could be used to create, interpret, and modify audio signals. The research has significant significance as it sheds light on machine learning's ability to understand the nuances of audio signals and meaningfully alter them. This groundbreaking work served as a basis for later advancements in AI-driven audio processing, inspiring scientists, engineers, and industry professionals to push the boundaries of what was achievable in the realm of sound design and synthesis for example AIVA (Artificial Intelligence Virtual Artist) and Google Magenta.?
In the early 2000s, processing power increased, opening the door to more sophisticated and instantaneous sound design applications. The post-production process in the film industry was streamlined by third-party solutions like iZotopes RX series (introduced in the early 2000s), which included tools for audio restoration and repair. While not exclusively AI-based at the time, it employed sophisticated algorithms for noise reduction, spectral repair, and audio enhancement tasks. These algorithms were designed to intelligently analyze and process audio signals to achieve high-quality results in tasks related to audio restoration and editing. Similarly the expansion of virtual Foley and sound libraries that utilized rule-based and categorization systems to generate realistic sound effects. Audio processing tools like the aforementioned examples that used interactive and adaptive sound capabilities paved the way for more complex innovations in the programming of audio middleware solutions and audio processing tools.?
Inclusion of AI-powered features in DAWs, such as Avid's Pro Tools incorporating machine learning algorithms for intelligent pitch correction, streamlined tasks like vocal processing, sound effects generation, dialogue enhancement, and even creating musical scores. Audio Design Desk, an AI-powered DAW, delivered new features like as an AI-driven sample browser and automatic video synchronization that enables the designer to test out multiple sample options on the required region. Google’s Magenta Studio's integration with Ableton Live allows musicians to incorporate AI-assisted music creation directly into their digital audio workstations. It offers plugins for generating melodies, harmonies, and rhythms, providing a creative tool for artists. This demonstrates the trend of incorporating AI into DAWs to expedite sound creation and scoring procedures for video and film production.
We can establish that a pattern of AI's path in screen sound design has been consistently expanding. From the pioneering groundwork laid by researchers to the widespread usage of AI-powered tools, the history of AI in sound design exemplifies the infinite drive for innovation. From early neural network research at the MIT Media Lab in the late 1990s to the integration of powerful machine learning algorithms into Digital Audio Workstations in the 2000s. This convergence of AI and audio processing expedited old procedures to save time in the editing phase and enabled quicker decisions, reducing the need for iterative processes. This opened up new creative opportunities.?????????????????????????
For a better understanding of real industry-based examples that I cite ahead in the document, I have categorized these new developments into the following categories:
The term "automated audio processing" refers to AI programs that automate traditional audio processing operations like mixing, mastering, spatial audio processing, and dynamic processing.
Algorithms can analyze large collections of sounds, detect patterns, and create innovative textures, tones, and effects that human composers may not have intuitively created. Innovative audio exploration powered by artificial intelligence (AI) offers a frontier where technology and creativity meet to push the limits of what is musically possible.
This advanced technology leverages AI algorithms to analyze and manipulate audio data, addressing challenges such as noise reduction, equalization, and overall improvement in fidelity.
This capability enables AI systems to process information, make decisions, and interact with users or environments in near-instantaneous time frames.
tasks such as object recognition, speech recognition, and content analysis
AI-driven signal processing techniques provide unique ways to modify, synthesize, and adapt audio material, resulting in a more immersive and dynamic cinematic or multimedia experience.
Recent Case Studies
Two fascinating examples that incorporated the aforementioned technology into the Audio post-production process successfully are The Irishman(2019) and Prospect (2018). The following is a detailed analysis of the used techniques and how they contributed to the vision of the film.
In Martin Scorsese's cinematic production of The Irishmen (2019), the incorporation of artificial intelligence proved vital, stretching beyond its traditional application in visual effects to embrace the world of audio engineering. This comprehensive application of AI technology, particularly in the fields of dialogue editing and restoration, demonstrated its ability to address complex issues connected with large-scale visual alterations while maintaining auditory coherence.
A noteworthy issue in the film's production was the extensive use of visual effects to de-age the actors. While fulfilling its aesthetic goals, this transformative visual technique carried the risk of inconsistency with the original audio recordings. To address this possible dissonance, AI methods were carefully used to synchronize and improve the audio components and make the old actors sound younger when they needed to. The AI-driven dialogue editing aspect of the production most likely required the use of advanced algorithms capable of detecting and correcting differences in the tonal and emotive characteristics of the actors' voices across different age groups. This strategic integration aimed at fostering a seamless auditory experience, mitigating any perceptible discordance resulting from the visual alterations.
Furthermore, the use of AI technologies in audio restoration efforts within "The Irishman" demonstrates a sophisticated application of technology. In this case, AI systems might perform tasks like noise reduction, equalization, and overall audio improvement. These techniques, based on machine learning concepts, sought to mitigate any inadvertent degradation induced during the complex post-production procedures. The overall goal was to elevate the auditory quality and fidelity of the film's soundtrack
Another such example is the science fiction film Prospect (2018), a thoroughly immersive auditory experience. The film's sound team collaborated with Mach1, a spatial audio technology business, to augment and shape the spatial audio soundtrack using AI algorithms.
领英推荐
I could observe the dynamic and immersive soundtrack in the film that could smoothly adapt to the film's ever-changing locales, ranging from the peaceful vastness of an alien planet's surface to the stressful and constrained interiors onboard a spacecraft. Traditional sound design methods would have struggled to achieve the necessary versatility and precision.
Mach1's spatial audio technology, powered by AI, allowed for the construction of a three-dimensional audio experience that responded in real-time to the viewer's perspective and on-screen action. The AI algorithms examined visual clues and ambient aspects before changing the spatial properties of the audio to match the growing narrative and the viewer's point of view. This not only increased suspense and realism but also produced a more intimate link between the audience and the story's unfolding events.
The AI-driven spatial audio in Prospect (2018) demonstrated how technology may intelligently enrich the film's aural landscape, delivering levels of immersion and adaptability that traditional sound design methods may struggle to reach. This example demonstrates AI's transformative impact on sound design, contributing to a more engaging and dynamic cinematic experience.
The collaboration between film sound teams and AI businesses demonstrates the industry's common commitment to realizing the revolutionary potential of artificial intelligence (AI). This strategic relationship recognizes AI as more than just a technology tool but as a driving force in defining the future of filmmaking.
The continuous cooperation demonstrates a shared desire to investigate the entire range of AI's possibilities for improving spatial audio and overall cinematic soundscapes. By integrating AI, the film industry expects not only increased productivity but also a reimagining of storytelling possibilities and audience engagement. The combination of AI and traditional filmmaking techniques represents a new era in which creativity is enhanced, production procedures are expedited, and audiences feel a greater sense of immersion and connectivity with cinematic storylines. As these collaborations progress, they have the potential to uncover other innovations, solidifying AI's role as a catalyst for revolutionary advances in the art and technology of filmmaking.
Personal Experience
I recently used Supertone’s GOYO Beta to isolate dialogues throughout a short film. The entire sequence consisted of a Couple chatting about their failed relationship in a restaurant. Ideally, I would like to Isolate the dialogues so that I can independently manipulate or mix them, on top of that adding other components separately like ambience, sound effects and foley can result in increased quality of the overall film.? To simply Isolate the dialogues in such a case I would normally start by noise profiling all the dialogues between the two characters and apply a high-pass filter to attenuate low-frequency noise. I would’ve also required a sample of room tone recording to fill in any gaps between dialogues and then Use spectral editing tools to reduce or eliminate these frequencies while preserving the dialogues. After manually cutting or attenuating the clicks and pops I would’ve received the desired result, i.e. pre-processed isolated dialogue without any background noise possibly.?
Whereas with GOYO beta it was just a matter of dragging and dropping the plug in. It efficiently analysed the characteristics of the given input and after processing the input through its unique and complex algorithm it separates the room reverb, dialogue and ambience (background noise in our case) the interface provided me with three knobs that can control the levels of room reverb, dialogue and ambience
And just like that I had accomplished a task in a tiny fraction of the time it usually takes through the manual process.
Ethical Considerations
The widespread use of AI technologies in sound creation may generate concerns about job displacement among human sound designers and professionals, thus it is vital to strike a balance between the efficiency gains brought by AI and the maintenance of employment. Upskilling and reskilling programs can help professionals adapt to changing conditions.
AI algorithms used in sound design may unintentionally perpetuate biases present in training data, resulting in biased audio processing results; however, careful curation of training data, regular audits of AI algorithms for bias, and the implementation of fairness and transparency standards can help address bias concerns. Promoting transparency and explainability in AI systems can help build trust. Designing systems that give insights into how decisions are made allows people to better understand and assess technology.
There is disagreement about whether AI-powered technologies will replace or restrict human creativity in sound creation. The emphasis on AI as a tool for cooperation and enhancement, rather than a replacement for human innovation, might help alleviate worries. Encouraging creative workers to actively design and lead AI outcomes can foster healthy teamwork.
It can be difficult to determine ownership and rights to AI-generated work, which might lead to intellectual property issues. Clear contractual agreements and legal frameworks must be developed to specify ownership and rights in AI-generated audio output.
The use of AI technologies may accidentally exclude persons or groups that do not have access to or knowledge of the technology. Promoting inclusion via education and training programs, as well as making AI technologies more user-friendly, may all assist in solving accessibility challenges.
Conclusion and Potential Future of AI in Sound Design
The study suggests an optimistic future characterized by collaboration between sound designers and AI technologies, underlining the possibility of a synergistic interaction. Rather than viewing AI as a threat to conventional creative positions, the focus is on using the distinct strengths of both human experts and intelligent systems.
This collaborative approach transcends disciplines, picturing a future in which professionals in computer science, psychology, and audio engineering work together to create AI systems that combine technical accuracy with a grasp of human perception in sound design. The fusion of multiple knowledge bases is expected to result in complete sound design techniques that handle both the technical complexities of audio processing and the nuances of human auditory perception.
In this interdisciplinary landscape, the research suggests that academia will play a crucial role, with experts from various domains collaborating to create AI systems that are not only technically proficient but also attuned to the nuanced demands of human-centric sound design. This collaborative effort is expected to result in advanced AI tools that are more adept at capturing the complexities of human hearing and delivering tailored solutions for creative endeavours in audio production.
Furthermore, the future outlook includes significant advancements in neural audio synthesis. The research anticipates that AI systems will refine their ability to generate highly realistic and detailed audio, going beyond mere replication to simulate a wide range of acoustic environments and musical instruments. This progress hints at the potential for creating truly immersive and authentic soundscapes, enhancing the overall quality of audio content in various media.
As technology advances, neural audio synthesis is poised to reach new heights, providing sound designers with a suite of strong tools to push the boundaries of creativity and produce exceptional aural experiences. The collaboration of human specialists with AI technologies is projected to reshape the landscape of sound design, opening up new avenues for innovation and artistic expression.
AI will help to make media material more accessible and inclusive for people with diverse abilities.
Reference list
Brandt, C. and Zaillian, S., 2019. The Irishman [online]. IMDb. Available from: https://www.imdb.com/title/tt1302006/.
Caldwell, C. and Earl, Z., 2018. Prospect [online]. IMDb. Available from: https://www.imdb.com/title/tt7946422/.
Collins, K. and Johnston, H., 2023. A Free Verbalization Method of Evaluating Sound Design: the Effectiveness of Artificially Intelligent Natural Language Processing Methods and Tools. A Free Verbalization Method of Evaluating Sound Design, 1 (1).
Davide Rocchesso, 2003. Introduction to sound processing. Firenze: Mondo Estremo.
Miranda, E. R., 1995. An Artificial Intelligence Approach to Sound Design. Computer Music Journal, 19 (2), 59.
Mishra, A., 2023. Understanding Google Magenta: An Overview of Google’s Open-Source Music and Art Project [online]. Medium. Available from: https://medium.com/@abhishekmishra13k/understanding-google-magenta-an-overview-of-googles-open-source-music-and-art-project-48ea9ee80024.
Murch, M. C. F. by W., 2019. Audio-Vision: Sound on Screen: second edition [online]. Columbia University Press. Columbia University Press. Available from:? ? https://cup.columbia.edu/book/audio-vision-sound-on-screen/9780231185899#:~:text=Chion%20argues%20that%20sound%20film [Accessed 22 Jan 2024].
Raghav Chadda
Student at Appalachian State University
8 个月Very informative. Thanks
CEO & Co-Founder at ClearTrust | Score your traffic
10 个月Good stuff