Finding Fake News in Nebraska with Hoaxy
Fake news has been around as long as journalism. The Great Moon Hoax of 1835, Yellow Journalism, you name it, lies and misinformation have been used to peddle untruths and, most importantly, make money. The internet, despite some stern op-eds, did not invent fake news and did not invent the human desire to read fake news. It simply harnessed an already valuable market.
It did, however, make it much more deliverable to mass audiences.
This is no small feat. From the era of mass media in the post-war world through today, the ability to disseminate a message at larger scales has been the dream. Our information economy thrives on it; we need data, we need shares, we need retweets, we need likes. We need information.
What gets tricky is when we decide we need facts.
There is a difference, for example, between misinformation and disinformation. Misinformation is sharing untrue facts or material without knowing it’s untrue. Disinformation is purposefully sharing false information, or sharing information with the intention to mislead or obfuscate the truth. They are both actively present in information and network environments, and can prove equally harmful.
From 2016 on, a focus of renewed vigor has engulfed fake news dissemination on social media networks. The election of Donald Trump crystallized the growing problem of foreign influence in elections, as well as the viral, contagion-like spread of memes and fake news stories designed to influence and sway the target populations. Despite this renewed focus, fake news continues. Every day.
But we’ve also gotten better at visualizing it.
Enter Hoaxy.
Hoaxy is a web-based tool that visualizes the spread of articles online. Hoaxy tracks shared links and stories from low-credibility sources and independent fact-checkers. It goes back to 2016. Hoaxy also includes a bot score, called the botometer. The botometer checks the activity of a Twitter account and gives it a score, 0 to 5, with 0 being human (or verified human) and 5 being very likely a bot. Various factors help determine if it’s a bot, particularly its engagement with others, its quotes from others, who retweets it or replies to it, etc. Many bots stand out by appearing overnight, tweeting nothing of their own, and simply retweeting a political or incendiary post. They lack context and nuance, and do not display the human characteristics usually attached to real Twitter accounts.
Hoaxy’s main mode is to chart and map tweet diffusions and sharing structure. The structure of tweet diffusions is decentralized, i.e., there is no single core from which all other accounts receive their information. As illustrated here by Paul Baran’s 1959 paper, networks have different structures depending on their information source. A broadcast model is centralized—think of television. The Twitter platform itself is centralized, at least for the moment, with individuals staying in the same platform. Distributed describes the internet, with its nodes and servers and routers and exchange of packets. The decentralized network is Twitter’s users, with many different users amplifying and spreading messages independently that are then picked up by others and retweeted in kind. It models a contagion or a virus. See figure below.
Users, of course, can and frequently are bots. When we say bots, we don’t mean human androids sitting at a computer retweeting political nonsense. Bots are fundamentally simple to program and deploy on Twitter. Twitter, via Tweepy, allows access to its development API. A bot, or an automated Twitter account that is programmed to retweet certain things (such as from sources, hashtags, or other identifiers), can be easily programmed with Python. Thousands, and potentially millions of such bots exist on Twitter. As soon as they’re suspended or deleted from the platform, more appear.
A bot retweet can be shared by other bots in its network, leading to an environment with a message entirely amplified by programmed bots. This spread and information diffusion can also reach humans directly, as seen in the 1lucyhannah cluster. There are many blue nodes, indicating likely human interaction with programmed bot tweets. See figure below.
For this project, I wanted to see what articles had been published online that were then shared among Twitter’s users independently. I searched for “Nebraska” (we don’t have a lot of traffic for that in general on Hoaxy). I picked an article entitled “Watch: Antifa Protesters Harass, Assault Police Supporters in Nebraska”. This article was published on Breitbart on November 23, 2020. It received quite a lot of tweets and retweets.
It’s also been debunked.
In June 2020, at the height of the protests over the murder of George Floyd, a claim was reported that Antifa had posted an ad on Craigslist offering to pay protesters to cause as much chaos and destruction as possible. The ad said they’d pay up to 1,000 people $25 per hour to pose as protesters in Lincoln and Omaha during the BLM protests. This ad was ruled false by Snopes on June 11. White nationalist group Identity Evropa was revealed to be behind it, intentionally misleading with fake ads and tweets purported to be from “Antifa”.
In media studies, this is referred to as black propaganda, a variety created and presented by the propagandizer to appear as though it came from a source inside the group. White propaganda is definitely announced by the propagandizer as coming from a source outside the propagandized. There is also gray propaganda, which has a mixed and indefinite criterion. Various white nationalist groups have made a habit of creating black propaganda with the intention of implicating Antifa in criminal or riotous acts. This also occurred after the capitol attack in January.
Nevertheless, the article was disseminated and shared widely as fact. In this version, Antifa paid to cause violent disruption and attack police. Retweets have revealed likely extensive bot amplification of this claim. Above, you can see the clustered influential structure of tweet diffusions. Hubs, such as 1lucyhannah, tweet the article directly. It is then retweeted by the nodes connected by outgoing edges in the graph. Red nodes are on the bot end of the botometer spectrum; blue are the ones toward the human end. Green and yellow are indeterminate, and could prove to go either way.
Another cluster is the Breitbart news tweet directly. You can see one of the potential problems of Hoaxy’s algorithm: Breitbart isn’t a bot, though it’s rated red on the botometer scale. Distinguishing between bot behavior and human behavior that is indistinguishable from a bot is a challenge that makes bot mapping in an online environment a unique challenge. Think of it as a reverse Turing Test. The other cluster, smaller than the first two, is for an account called HamiltonStrick1. Clicking on the account shows the equivalent of a 404, meaning it’s quite likely the account was indeed a bot and is now removed.
When a story is tweeted, the spread of it via retweets is called a diffusion. Diffusions, and the rapid spread of misinformation and disinformation, can lead to an information cascade, when a story takes off so fast that it literally cascades down like a waterfall. It’s shared and disseminated quickly across networks, spread by bots to humans and vice versa. This diffusion network takes blooming shapes, as shown in the figure below. The lone two at the bottom are independent tweets (direct from source, not retweets) that got no traction or retweets of their own.
Before long, a fake story is spreading in clusters. This story was bound by its echo chamber, and did not lead to a viral cascade, as evident by the figure to the right. The story had a low reproduction (R) rate. When R is greater than 1, each person who receives the message will, on average, spread it to more than one additional person. The person who received it then does the same, leading to exponential growth. This exponential growth is the coveted virality, an epidemic of a story. This story had an R rate far below 1, meaning it started with seed cases and then burned itself out before successfully passing the threshold and causing an outbreak.
The clusters represent networks that follow each other. The story did not get much viral traction across Twitter as a whole, nor did it break free of its echo chamber and cross into mainstream accounts, or accounts unconnected with the seed and cluster accounts. The people who shared it, from bots to users, likely don’t know (or particularly care) about the truth of the story. What matters is that it supports prior political beliefs. Few larger hubs shared it, meaning less nodes picked it up for retweet.
One of the fundamental challenges of combating fake news is response time. It can take hours, and occasionally days, to organize a factual rebuttal of a claim that has gained significant traction online. By the time facts are out, the claim has been spread, and those susceptible to it have seen it. Rebuttals often also lend credence to the claim in the first place, making it seem as though it was potentially factual enough to require a response. For those inclined to believe it, no factual response matters. The story has been told.
Here is a diffusion network of the Snopes fact-checked follow-up about the story:
Far less traction than the initial article.
It remains debated whether or not fact-checking erroneous articles is always destined to fall on deaf ears. After all, echo chambers with preferential attachment exist across political spectrums. Those inclined to hear or see what they want to see or hear won’t be swayed by articles shared with accounts they don’t follow or like. Belief in political values and ideals is still stronger than 100% factual basis. Research has shown that fake, emotional stories spread far faster and widely than real stories. Fake news is closer to fiction publishing as an industry, news and articles designed to capture attention and provide entertainment but not really educate. It appeals to our sense of the world, and plays to its audience and its strengths.
For this article, the narrative was the most important facet. Antifa was deemed secretly responsible for the ills of the BLM protests and property damage, an organized campaign by the Left, the declared enemy of the Right. The story petered out because of no evidence, and achieved its purpose before disappearing into the online mire, never to be heard from again in a significant way. New stories exist, after all.
Content is closely linked in our online world, but it comes with a price: there is far more of it than at any time before. The barrier to entry for broadcasting a message, any message, to thousands or millions of people has never been lower. Twitter accounts made up entirely of bots can diffuse information to large or scattered populations automatically based on simple Python scripts. But with so much information, and so much overload, the trick isn’t just in amplifying something but separating the signal from the noise.
And there is a lot of noise.
As for Hoaxy as a platform, it works very well as a research project in its beta form. It’s always being refined, and rightfully so. Its bot detection could use some work; its ability to find and study articles could also be better honed. There are several other Nebraska articles that turned up on there, but I’ll save those for future pieces. None of them achieved significant traction either, but they remain interesting examples of attempts.
Hoaxy has many potential applications, and I’d love to see where it goes in the future. It has a wide variety of uses, especially for tracking misinformation and disinformation in the COVID or vaccine era. I used it to monitor and track information campaigns during our vaccine outreach at Nebraska DHHS, to try and find tweets and fake news before they took root. Hoaxy also lets you search through live tweets rather than just articles, and it’s a helpful feature for seeing how much traction something is potentially getting online.
The Holy Grail of social network and media creation remains the ability to predict (and thus engineer) virality for posts. Despite some good attempts, I suspect this ability will remain elusive for the foreseeable future. Like disease epidemics, we’re just never quite sure what will spread past an R rate of 1, though we’re getting great at tracking it once it does. Hoaxy will likely provide more valuable research opportunities for this in the future.