Prospects for a lie-detecting algorithm to apply to online news
In the world of mathematical propositions, some are provable (true) and some are disprovable (false). But, troublingly, a vast number are neither provable nor disprovable. It isn’t just that no one has yet had the patience or wisdom to resolve these limbo-land propositions. It is impossible to prove or disprove them. That is a proposition that has been proven. It is Godel’s incompleteness theorem.[1]
The incompleteness of demonstrable truth and falsehood is an inauspicious start to the task of devising a way to find and destroy fake news. I mean news that really is fake and not just inconvenient to someone with a bullhorn. Nevertheless, we proceed because we must. We may not be able to get to the bottom of everything, but we may still be able to call out some howlers and reaffirm some useful, valid messages.
Where do you turn when humans can’t be trusted?
To focus our efforts, let’s look at the type of news that is most contentious: news about politics. This is one area where government regulation doesn’t sit well. We’re happy to have the government regulate product safety or pollution guidelines, but we are alarmed by the idea that the government would decide what people can say about the government itself. We can all see the regulator has a conflict of interest.
When you get right down to it, no one is really neutral when expressing an opinion about the government of the day. This is part of the attraction of getting a computer algorithm to help, if it can be done. There is some reason for hope.
Why I think that computers could do this
One example is the problem of redistricting—drawing the boundaries of electorates. The history of redistricting skulduggery is long and rich. From rotten boroughs in England to Gerrymandering in Massachusetts, many clever ruses have been concocted to allow minorities to achieve Parliamentary and Congressional majorities. In those cases, the problem was that the strategic decisions about boundary design were made by the same politicians who stood to gain by them. Fortunately, a mathematical solution to this conflict of interest problem is already available, if there is a will to use it. Scientifically-minded people have recognised for some time that fair redistricting could be done by a computer program. It turns out that there is a unique way of drawing electoral boundaries in any geographic area with an equal number of voters in each district, as long as the shapes of these districts are as convex as possible—ie, more like balloons or polygons and less like salamanders or writhing snakes. That’s a math problem that can be solved without even asking which party the people in each district are likely to support. It is objective. The human element and conflict of interest can be removed by computer.
Another example is the method used by Google to find internet sites that provide objective, high quality information about any topic of interest. The Page-rank algorithm finds a way to cut through all the noise, chatter and misdirection to find authoritative sources—an impressive achievement in the present disinformation age. A key point here is that Google relies on the topology of the web to determine the reliability of information, not subjective judgements (or paid advertising). It is objective because Google doesn’t even look at the information itself.
It’s a bit like the military concept of signal intelligence—understanding an enemy’s command and control structure simply by tracing the way signals propagate within their network in response to a spy-plane incursion. You don’t know what they’re saying, but you do know who they’re saying it to. The chain of messages reveals the chain of command.
Analogously the Page-rank algorithm looks for and evaluates authorities (mavens) and hubs (connectors or referrers) on a topic. It does this purely by mapping the hyperlinks from one site to another. Authorities are sites that have lots of inbound links. Hubs are sites that have lots of outbound links.
But what’s a good hub or authority? A good authority is one that is pointed to by good hubs, and a good hub is one that points to good authorities. Scientific citation indexes rely on the same concept.[2] If this chicken and egg problem sounds too hard to crack, mathematics has the answer, or rather two answers. This problem is really no harder than the problem of solving a system of n equations in n variables. In linear algebra, one way of solving this is to invert an n x n matrix. Of course, when n is more than 100 billion pages that type of computation is out of the question.
However, the other answer from mathematics is recursion: start with an approximation, calculate all the page ranks, then update the approximation and re-calculate all the page ranks. Iterate until the page ranks converge to stable values. Part of Page and Brin’s genius was that they found a way to solve this complex system of simultaneous equations using a fairly simple recursive technique that is well within a modern computer’s capability and converged quickly.
Given that success, it does not seem too much of a stretch to think that a similar type of algorithm can help us to find truthful news, or at least to evaluate the reliability of information from any particular source. A reliable news outlet is really little different from a good authority on the web. A reliable media guide is really little different from a good hub.
For a third example that looks further into the deep space of tech possibility, there may be a variant of blockchain techniques that could help to verify the chain of custody of certain high-value news. This technology is already being used to validate a host of official records, including but not limited to bank transactions. As the unit costs fall, the scope of possible application of the method increases. Maybe real news will be blockchain-validated back to original sources in future? Fake news could be red-flagged automatically because the certificates would be invalid. In an era where deepfake videos are becoming a problem, this type of solution could come into its own.
Clearly, there’s a long way to go and much to debate concerning the limits of free speech. I’ve hoped to show you in this brief note that the many conflicts of interest could plausibly be addressed through intelligent use of computer technology and mathematics. Many of the ideas that would be needed exist now and have been implemented in relevant settings.
As to the objectivity of computers, I don’t pretend that algorithms are value-neutral. It’s just that you can usually see what the assumptions are and what values are embedded in them. You can then make your own choices about whether to accept the advice and how to correct for any inherent biases.
The examples in this article are explained in more detail in my book
“Economics even a President could understand”, published on Amazon in January 2020.
https://www.amazon.com.au/dp/B084C7RXYH
(see Chapter 3—“Botcoin” on blockchain methods, Chapter 6—“Are politicians indispensable?” on solving the Gerrymander problem and Chapter 7—“Where’s Walter?” on how Google’s methods can help us find reliable news)
An audiobook version is also available through Audible and other outlets.
[1] Space does not permit me to do justice to Godel’s work here. However, an excellent explanation for the non-mathematician is available in Douglas Hofstadter’s 1980 masterpiece “Godel, Escher Bach: an eternal golden braid.” It is a long book, but full of diagrams, word puzzles and other thought-provoking stuff (including translations of Lewis Carroll’s poem Jabberwocky into French and German). Hofstadter is a professor of computer science.
[2] That is no coincidence. The Page-rank algorithm had its origins in the meta-analysis of scientific citations.