Language, translation and AI
J?rgen Christian Wind Nielsen
Master of Arts - MA at University of Roskilde
A day about AI and language and communication, using audiovisual translation and more.
Themes
The conference offered knowledge, insights, debate & networking – key focal points were:
Language & technology in practice and in theory. AV translation, translation and technology – end user quality & research. Future role of human intelligence - creators, editors or proofreaders? Ownership of AI content. Culture is language and language is culture – how do Tech Giants handle that?
Opening
Jan Pedersen, moderator. "AI is the saviour, or it will destroy us all. Future of leisure or destruction?" Thank you to organizers.
Lone Garde: Promote cultural diversity and competitiveness: Creative Europe, Desk Denmark. Special funding for translation and subtitling. Balance culture and technology.
Amalie: Foss: Welcome to all organizers, audience online and audience present. How does technology affect language, positively or negatively. "Work with quality and refuse not to work with quality".Thank you to sponsors.
"Speak the same language in terms of agreement on where we want to go with our culture and language."
Anders S?gaard:
Language and its relation to technology. Make information accessible to as many people as possible and in as many formats as possible. Large Language Models: Under the hood. A mathematical function. What language models are good? Find the right functions. Estimate values of models. Trained on input and output, which is all language. Neural networks. Business of estimating a model, take you from on language to another. Trained on continuation of text. Download text from the internet and start training. Fiddling around with the parameters. Getting a copy return of the internet. Memorizing all the words in Umberto Eco's library. Not possible for a human being. Its easier if you know something about the language, and know the languages. Having a good model would be smart. List of concepts, strings of text, points in the coordination system. Clouds of concepts in points over the head. Computer vision cloud. Trained on language data induced in images data. Similar understanding.
Amazing democratic potential if you don't speak a language.
Concerns: Copyright, mainstream bias (quality), evaluation. Who won the world cup: It assumes that I meant football for men?
Limecraft, Belgium. Maarten Verwaest online: Language technology in practice. Using AI - What Works and What Doesn't (Yet?)?Presentation of the company. AI transcription. AI subtitling. Translation and adjustment. Giving a note on live subtitling. AI incomplete, inaccurate, not usable. Giving contact details. Anyone welcome. Sharing slides in pdf afterwards.
Peter Bjerre Rosa, DR, Danish Public Servicebroadcaster: Guidelines for subtitling. Cutting edge technology seen from inside DR. Three takeaways: Bottom-up approach to AI. AI can handle anything .. decently. Just not very well. Try it! Don't be discouraged by the autopilot. Workshops with robots. Custom GPT and Whisper and Speechmatics. Not scientific research. List of 10 disciplines. "Translate an episode of a famous sitcom based on the English subtitles : black/back wordplay. Good at analyzing, bad at being creative. Proofreading. Proofread a text with intentional errors. 17 out of 20 errors. Missed a simple typo. Can we trust it? Live subtitling of live interview on TV. Do it in real time affects quality. Traditional subtitles are presented as a sentence, AI word by word. Some tasks worth investigating. Report available. Cherry-picking promising tools. Custom make Apps - applications. 2024 Olympic Games: From internship to real job, ideal case - massive event. Robot got one job. Alternative no subtitling. 10,000 athlete names. Noisy environment. Very specific sports terminology. The computer knows when it is on shaky grounds. "It went as we expected". No speaker diarization: Who says what. Great tool, in the hands of a professional. A hybrid model would perhaps work best. Fake news, credibility: A pilot in the cockpit is recommended. Good feedback from users.
Peter Juel Henrichsen, Danish Language Council: The Central Word Register and its use in language technology. How do we work to enhance Danish language technology. Dictionaries and text is language resources, not images. These resources are expensive and hard to get. Everything on the web is obsolete and too little. Exasmples on translation errors from Danish. Lack of context may be a problem. Pauses may be a problem. Understanding sentiment may be a problem. Chatbots lack understand of situation. Funny examples were given. Lemmas and grammatical forms. Want to make the register free accessible and to include all Danish dictionaries. Indexation of the existing dictionary and the central register. Online alert of changes in Danish writing rules. The Danish Central Word Register may connect the various resources. Danish Giga Word Corpus will have index to the central register. How do we link back to Google Translate, and others? Danish companies start using the register references which will eventually update the machine translation programs.
Companies often worry about their brand and tone of voice, but apparently not when it comes to interactive chatbots, I add.
Maarit Koponen, University of Eastern Finland: Research in AV Translation, Translation and Technology. [email protected], Technology in translation workflows. Edit and correct machine translation output. ISO standard18587:2017. Is machine translation related to AI? 2004-2010: Machine translation output was awful, now much better. Postediting. Discussion of machine translation workflow. AV translation is different. Subtitle translation workflow, the audio aspect. Worked with Finnish National Broadcaster. MeMAD project, Methods for Managing Audiovisual Data. Tested automated speech recognition. Productivity is the main reason for using the technology. Time needed to translate a given amount of text. Amount of text translated within a given amount of text. Increased productivity as a rationale for using the technology. @memadproject / memad.eu / [email protected] / Postediting: Cognitive effort is going up. Post-editing can be slower than "from scratch" translation (Terribile 2023). Depends also on language pairs. Average productivity values can be misleading. Automatic reduction of counting is unfair. Impact on users (Koponen et al. 2020a).
Language in the human-machine translation. A more human machine translation. Lilt Translation System. Language models can give ideas. Technology can aid and augment translators' work also in the AV field - but usefulness is not guaranteed. Member of FIT translation technology standing committee. Quote from FIT 2022. The agency of the professionals to decide which tools to use.
Dr. Ana Guerberof-Arenas, Uni. of Groningen: Quality in AV Translation from the point of view of the end user: Basic creativity concepts and instruments. Creativity and automation in literary and AV translation. Reception studies. Creative process and technology INCREC Project. Funded by Horizon program. Definitions of creativity. Changes over time. Photography was not considered an art in the beginning. Can technology be creative? Poetry translation. Maybe a little flat, but can give access. Human creativity supported by machines. Robot creativity supported by humans?
Not all creativity is good. Manevolent creativity. Not always beneficial. Atom bomb.
Multiple angles to study creativity. Why, who, what, how, where or when? Circumstances, how do they affect your creativity. The 4 P framework. Now upgraded to 7. Can technology kill or help translators' creativity? "Creativity is born out of adversity". "Units of creative potential".
Benjamin Boe Rasmussen, the Danish Actors Guild: The Human Role in the Age of AI Driven Media. Challenges: Revolution and revelation, it comes with a price. Case examples. A voice taken and cloned from various jobs. The vocal identity had been stolen. Illegal use of voice. A voice bank selling cloned voices? Cannot make money from the voice anymore. Vocal brand. Fight for copyright of voices and pictures. Work with Copyright Alliance in Denmark. The voices are on the internet. Tech-companies can harvest them unlawful. "Our voice is our work". The tech companies have created the infrastructures for our societies. We need to change of attitudes.
Academics have a similar contractual problem. Translators have the problem.
Christian D?lpher, Legal Consultant, Danish Union of Journalists: Who Owns AI Content?- Member of the Copyright and AI Working Group of the Danish Ministry of Culture. Discussing issues related to copyright. Originality requirements required to be copyright protected. Only copyright to people, not to machines. Referring to copyright law. Copyright is challenged by AI. Same legal basis as always. Text and Dataming Regulations were not created for the purpose of training AI. Uncertain reach of the TDM rules in the EU copyright directive and EU AI act.
What to do: Don't transfer you contractual rights. Opt out of AI/TDM if possible. Influence lawmakers! Opt out of the job? Do it and protest in writing! Make some noise!
Tesh Sidi, Tech engineer specialised in Big Data, member of the Spanish Parliament for SUMAR: Culture is Language and Language is Culture - How in the world can tech giants handle that?
领英推荐
The political voice. Talks about AI in the Spanish Parliament. Algorithms form our perception of the world, like babies believe that the images that they see is the real world. Important how we use the data. How can society benefit? Beginning was commitment to human rights, feminism, etc. We learn in the environment where we are. Supervision of data is important. Data may be used for other purposes. Who owns the data. Who chooses how to train the data? We cannot build data without bias, but we can train them better. Ethetical principles. Remember the powers behind these companies. We believe we are free and can share. The idea of individual freedom. The idea of freedom and transparency is just an idea. Everyone in social media believe they are activists and that their views are important. The companies don't need our bodies, they need our minds. Information regimes. We live in an information regime. The market doesn't regulate itself. The companies can't regulate themselves. We need a public alternative. Algorithms start regulating the productivity of the employees. A lot of chatbots and Apps are being introduced. The use of data will affect still more areas of society. Started an agency to bring civil society together to control data. Each region in Spain owns the data of the citizens of the region. Data belongs to the citizens. Hate speech is a serious problem. Digital platforms can increase prices without explaning. Use data to make better legislation. AI and data serving society. Data is a public good! Open future! AI-colonization. Is AI going to improve our jobs? Is AI going to reduce the time spent on our jobs?
We need to talk about downsizing instead of productivity. Peoply escape from the heat in Madrid and go North, and citizens in the North start spreading fake news about the dangers in the North, like sharks in the sea, to make tourists go away. Environmental costs! You need to regulate. Everything else is regulated. Regulate the use of algorithms. Problems in the physical world can be multiplied on the internet. Power tech people make subjective decisions to what is allowed, to what is hate speech.
Closing accounts is solving small problems, not giving solutions.
Conclusions and Recommendations
Can we make a statement on the basis of this meeting? Hire a human! Human in the loop! Why do we have to have the machines to do what we do? Why are we in the situation that we have to do the cleaning up? What do the professionals need? Human work with technology in the loop! Why do we centralize technology development? Why don't computers don't correct the human translation? Shift money to other actors! The Augmented Translator Manifesto from AVTE.
Cost reduction is important to companies, translating into various languages and maintaining quality. Access to smaller languages comes with a cost, but it is more democratic. Justified use of technology.
Summarizing challenges:
Copyright issue, for training and for ownership.
Quality: Its not good enough
Creativity
Mainstream bias
Environmental impact
Lack of regulation
It is a good tool, a great tool
Human in the loop, putting the human in the driver's seat. The augmented translator
Increase quality through variety of tools
Human centered translation
Workflows, fragmentation
Philosophy and humanities
Photos: EsF / KS Language & AI - 12/9-2024 https://photos.app.goo.gl/6dyFzZueh8SmS4AR6
Attending the conference was FREE, thanks to the support of:
Dansk Journalistforbund - Medier & Kommunikation, Creative Europe MEDIA Desk Denmark, Nordisk Film & TV Fond, Subline, Svensk Medietext, Norsk audiovisuell oversetterforening, Forum for Billedmedieovers?ttere, AudioVisual Translators Europe