Automatic writing – the rise and rise of AI journalism

Automatic writing – the rise and rise of AI journalism

(This article is based on an analysis written for Roland Berger Consulting)

If you think disruption of the media business is old news, think again. It’s only just getting started, and the largest threat or opportunity is looming within sight: robot writers.

The Washington Post. The Wall Street Journal. The New York Times. They are, without doubt, the big names in US media history, and, according to Warren Buffet, their future is in doubt if they can’t adapt and find new sources of income. It isn’t a profound prediction. He’s not the first to say it and it has been said often enough in the last few years. The advent of the internet has been a major threat to the media landscape and business. As Buffet says, alternative digital sources of income are vital to a publishing house’s survival. 

 Old models and old certainties are no guarantees. Simultaneous to the internet disruption is the way in which past success doesn’t mean a favorable future outcome. The credibility of "media brands" is becoming less and less important to the majority of consumers. Alongside the NYT, the Wapo and the WSJ are the London Times, Financial Times, Le Monde or Japan's Asahi Shimbun all of who might be able to retain their significance as media brands for a few more years, but even they will have to make huge investments to survive in the face of new, technology-driven media giants. For the 3.9 billion people around the world still living without internet access today, traditional media brands won’t necessarily play a part in their lives once these people embark on the world wide web.

 Add to these changing circumstances a new threat. Or rather an opportunity for the media business. It isn’t simply about how the media is delivered – online, print, mobile – but is about how it is generated. How it is written. How it is produced.

 According a recent BBC news report, some ninety percent of all news content will, by 2022, be written by "robots”. Digitization and the rapidly increasing amount of data available are making it possible for large parts of current reporting to be created by a computer: the weather, football and stock markets have been the first areas in which "natural language generation" programs have been able to deliver good, readable stories. Today, a computer at the Norwegian NTB news agency even writes a large proportion of its election reporting. That said, editor-in-chief Mads Yngve Storvik, stresses that he can’t see a robot being able to conduct an interview any time soon. At US news agency Associated Press, a computer produces 10,000 economic and baseball reports every month. Under its new owner, Jeff Bezos, the billionaire founder of Amazon, the Washington Post is rapidly developing a new Content Management System (CMS) which has put automated content generation at its heart from the very start.

 Robot journalism will likely lead to thousands of media job losses around the world. It might not mean that all is lost for the journalists who can see a way through. Investigative stories (such as the Panama Papers-style reports), outstanding portraits and profiles – the kind of content that differentiates a publication from its competitors could thrive in this new media age. No matter how good, a stock market report will never win any journalism prizes, whether it was written by a robot or a human.

But can the news media become fully automated. The dream might be for day-to-day, high-frequency and personalized news business handled by computers that never need to stop for breaks, but will it always need the human touch? What about the questions of judgment and tone?

 While machine learning already works well when it comes to replicating the tone of very specific media (tabloid, serious, B2B) artificial intelligence still finds it difficult to summarize the most relevant messages from documents and data if a human has not previously provided examples to explain what the key findings might be. Until now software has not had the world knowledge to realize that a rate drop of more than 3% in a day would normally be unusual for a stock market heavyweight like VW, but that it may well occur in conjunction with new revelations surrounding, say, the diesel scandal. Now, however, in such cases, software can look for a suitable quote from analysts on precisely this rate trend and can incorporate it perfectly, both in terms of content and language. This would have been unthinkable just a year ago.

 The robot journalist is getting ready for the next step in its evolution, but its human colleagues still have to tell it which subjects are worth writing about and what data should be used. That could soon change. It is already relatively simple to use algorithms to automatically work out topics which take into account the frequency of keywords in internet searches, alongside crosschecking the potential theme against the intensity of discussion of the event on social media. The current topics that are shaping opinions can thus be identified with a great degree of reliability, so now the robot simply has to track down pictures from image databases using the perfect keywords. Fully automated videos could also be created on many such topics in this way too.

 There is another significant reason for developing new media content at high frequencies that is rapidly updated. Thomas Scialom, a researcher at the French natural language generating start-up Récital puts his finger on it: "Mobile media use means that there are ever fewer visual aids to help readers understand content, and the amount of information is also limited by the size of the screen. At the same time, the time spent reading content on a mobile is also falling." Shorter information, written in the tone of the target audience, with content personalized for the target group too – that's not something a human writer can produce, but AI journalists can. For example, why should someone have to read a report about all NASDAQ values if they only need a report on trends for Apple shares? Could a graphic provide that information?

Automated text generation can do more: the report on Apple shares can not only summarize historic trends, but also provide rankings – is it only Apple shares that are under pressure today, or is the trend also affecting China's Tencent? Is Amazon on the up and Alibaba plummeting? Does it have something to do with the start of the vacation in the USA or China? Has disappointing economic data been released? The more sources that are used, the better a text will be than a graphic.

 Anglo-German business news agency dpa-afx was one of the first to develop a template solution several years ago: it was simply a case of filling in the gaps in pre-written sentences with new data. The sentences provided today are much more varied and sophisticated, but this basic principle is actually only being replaced very slowly. Hamburg-based computer linguist Dr. Patrick McCrae explains, "The perfect solution for content automation must consequently not only have extensive opportunities for language diversification, but also have the ability to incorporate powerful analysis. Text generation will not progress far beyond the dynamic filling in of gaps in the texts unless artificial intelligence is involved. Truly interesting texts with diverse content will be created if surprising, non-trivial findings can be extracted from the relevant data sources. That's exactly why we need artificial intelligence."

 Here’s a powerful example: one German digital publisher can generate a view of the monthly employment and training markets for 411 regions across Germany at the touch of a button, focusing on different professional groups or levels of education, if desired. With each analytical pass the software makes discoveries in the mountain of data that an editor would have only spotted by chance, if at all.

 However, there is a great challenge in this relatively new world of automation that is causing all the providers, including Google and IBM Watson, to thrash their disks. McCrae sums it up succinctly: "Automated text comprehension without any limitations in terms of subject is a problem that science has yet to solve.

 It would be a kind of perpetual media motion if new texts could be created from an incalculable number of flowing texts. It would be the salvation of all news agencies if it were possible to start with one single text created by software and turn it into 120 different versions for 120 newspaper customers. If one new summary could be written automatically based on dozens of archive articles about Donald Trump. Personal and custom-written replies would be possible in the field of customer communication, without the need to first store hundreds of potential answers to hundreds of potential questions in table form. However, when will software be able to fully understand flowing text, including irony, sarcasm or annotations? Scialom puts the problem quite simply: "The biggest challenge is currently getting machines to understand unstructured data." Once this challenge has been addressed only a small band of journalists will be necessary, but they will likely be writing highly specialized and unique contributions for which readers around the world will be willing to pay a decent price. And we will just have to wait to see what publications will have survived for them to be writing them in. 

Wolfgang Zehrt, Berlin (January 2018)



要查看或添加评论,请登录

社区洞察

其他会员也浏览了