Propaganda Network Analysis
Use of Social Media Platform for spreading rumors and propaganda amid pandemic constitutes a global digital threat that needs to be recognized and managed.
[ Audience: Technical Cyber and Intelligence. Reading time 7.5 minutes] Ideas and influence travel fast on social media platforms. The ability to influence narrative and perception has arguably become the most leveraged form of power.
During the ongoing COVID pandemic, the Twitter social media platform has been weaponized by some actors for achieving geopolitical goals which I briefly covered in a previous article. I discussed government-linked Twitter accounts @zlj517, @spokespersonchn leveraging Twitter platform to skew pandemic related conversations around the globe with their one-way megaphones.
In this article, I gathered data and apply Social Network Analysis (SNA) techniques to investigate the COVID-19 pandemic propaganda network created by @zlj517 https://twitter.com/zlj517. As of this writing, the same network also transports warnings to the United States to back off from meddling with developments in Hong Kong.
Approach
For SNA, we first create a network (network = Nodes + People + Organizations ) where nodes of the network represent the people followers of @ zlj517( Twitter handles) and the edges represent the followers relationship between the handles. Brief steps for creating the network for analysis are as follows:
Starting from target handle @zlj517 > Crawl followers of targets 1st hop neighbors; > Crawl followers of followers - 2nd hop neighbors; > Make the first hop neighbors as a node of the network and finally;> Find the intersection of first and 2nd hop neighbors for each node and add edges accordingly. Data is saved in two separate files for first hop and second hop network.
Method for collecting data
For data collection, crawl first level network of @zlj517. Separately,crawl 2nd hop neighbors. Initial data exploration could be limited to select fields or max possible,86 fields. For initial data exploration, I collected 14K rows of most recent activity data collected on May 23,2020. For initial data exploration, I used first and second level handles. For a detailed analysis, one could attempt to extract upto 80 coloums for each handle.
Twitter Data Features - Max Possible [1] "user_id" "status_id" "created_at" "screen_name" "text" [6] "source" "display_text_width" "reply_to_status_id" "reply_to_user_id" "reply_to_screen_name" [11] "is_quote" "is_retweet" "favorite_count" "retweet_count" "quote_count" [16] "reply_count" "hashtags" "symbols" "urls_url" "urls_t.co" [21] "urls_expanded_url" "media_url" "media_t.co" "media_expanded_url" "media_type" [26] "ext_media_url" "ext_media_t.co" "ext_media_expanded_url" "ext_media_type" "mentions_user_id" [31] "mentions_screen_name" "lang" "quoted_status_id" "quoted_text" "quoted_created_at" [36] "quoted_source" "quoted_favorite_count" "quoted_retweet_count" "quoted_user_id" "quoted_screen_name" [41] "quoted_name" "quoted_followers_count" "quoted_friends_count" "quoted_statuses_count" "quoted_location" [46] "quoted_description" "quoted_verified" "retweet_status_id" "retweet_text" "retweet_created_at" [51] "retweet_source" "retweet_favorite_count" "retweet_retweet_count" "retweet_user_id" "retweet_screen_name" [56] "retweet_name" "retweet_followers_count" "retweet_friends_count" "retweet_statuses_count" "retweet_location" [61] "retweet_description" "retweet_verified" "place_url" "place_name" "place_full_name" [66] "place_type" "country" "country_code" "geo_coords" "coords_coords" [71] "bbox_coords" "status_url" "name" "location" "description" [76] "url" "protected" "followers_count" "friends_count" "listed_count" [81] "statuses_count" "favourites_count" "account_created_at" "verified" "profile_url" [86] "profile_expanded_url" "account_lang" "profile_banner_url" "profile_background_url" "profile_image_url"
Research Hypotheses and Questions
There are two types of networks. Socio centric = Whole networks,Creates one network; and Egocentric = Personal networks,Creates many stand alone networks.
In this given use case we can examine the activities of the person and the org represented by the persona,both Socio and ego centric analysis could be applied.
Security Analysts performing qualitative or quantitative analysis could then ask several types of questions in a SNA exercise, such as:
- What does the network look like and who are the influential users in the given network ?
- What does the community structure look like ?
- Can we determine spam or bot activity in the network ?
- How is information being diffused through the network ?
- Are there any attributed activities ?
- Are there signs of ban evasion ?
- Can we determine any coordinated activity ?
- Are there "fake accounts" or "manufactured account activity"?
- Who are the most influential users in the network who are engaged with the networks mission versus those who are acting as information processing nodes but not engaged?
- Is the the ?ow of information from the target to his network mediated by a few individuals who act as ?lters and ampli?ers or is the flow natural ?
In this exercise, I have extracted a small set of network data and explore it to gain insights. This is not mean to be exhaustive analysis but provides enough details for OSINT practitioners who could use employ SNA techniques to achieve Threat Intelligence objectives, the threat of internet wide propaganda.
@zlj517's Network
The visualization below shows the social network of Lijian Zhao 赵立坚 Chinese Government Spokesman from the Information Department, Foreign Ministry, China as of the date that I collected raw data.
Based on the data crawling strategy described earlier in this article, I crawled @zlj517s two-hop neighbors and built a network using Python code (posted on github). The nodes in the network represent two-hop neighbors of @zlj517, while an edge (u, v) exists between two followers if u and v follows each other or one of them follows the other. Note that the network is undirected and simple and does not contain bidirectional or directed edges. Interesting observation - there are 16619 nodes and 16307 edges in the network representing a sparse network with 1.96 average degree. Surprisingly, the diameter of the network is 28. See visualization.
Before diving deep into further network analysis, first it’s necessary to understand the network and make sense out of it statistically. In order to better understand the network, we run basic statistical algorithms on the network and gather few metrics of the network.
Looking into these statistics, we see that the network is sparse and surprisingly has diameter of 28. A network diameter is defined as the shortest distance between the two most distant nodes in the network. In the context of social networks especially, this observation clearly contradicts the idea of six degree of separation.
A second interesting observation is that the network doesn't follow power-law degree distribution, which contradicts the well-known and highly studied concept about social networks. It’s very surprising that the average degree is ~2, which means that each user has an average of 2 followers - followee in this network. This is rather strange ! Aren't there any group of people that know each other and also follow the same user?
This effect/phenomenon is also observed from the number of connected components which is 573. Contradicting the highly-known and well-established concepts about social networks, here it's worth to investigate the network and figure out the reasons. ( Twitters Trust and Safety crews may wish to investigate if the network of @zlj517 meets the terms of the platform)
Influential Users in @zlj517s network
[OpenOrd and ForceAtlas layout algorithms were used in the above viz]
Finding influential users (IU's) of a given network is an important activity that threat intelligence Security analysts must undertake. The reason is that IU's help to categorize the behavior of the desired node. To this end, I studied and explore influential users of @zlj517's network .See IU viz in Giant component.
I used PageRank (PR) algorithm to find IUs of the network. It is very evident that @zlj517's network contains many IUs ( nodes with circles in the center). Given @zlj517's position in PRC, this may not be surprising observation or an oddity, however, we can zoom into the IUs and examine them closely for any clues.
I chose top 10 influential network from the network and investigated their profiles manually. The visualization of top 10 IUs is shown below.
Handle @MrDines59367817 stands out as IU however this profile has gone missing from Twitter or deactivated. This is the top node having highest PR score highlighted in the above visualization. Other influential users including https://twitter.com/YongzuoLi , https://twitter.com/ankittyagi03 as shown in the network below.
There are some strange things happening in the network. The highest PR influential user @MrDines59367817 https://twitter.com/MrDines59367817 is following 1038 users, yet only 23 followers - all this was created since his account was created in May 2020. This is a bit odd. The most influential account has since gone missing
Second, I notice that there is at least one first hop follower @ArpanaRaj12 who joined Twitter in May 2020 and has all of a sudden started following Diplomatic,News channels and Chinese studies professors. This account may be a real person but the timing of account creation and the follower profile does appear odd.It may have been created to aid or support the information transportation machinery or propaganda mission. There are similar such observations that appears off and odd.
Another strange observation is a name which is hybrid - a male Indian first name and Chinese last name: Naveen Chen. See below. This is a clearly a fake account and shows a pattern that aligns with other fake accounts which have been created around the same time and show a pattern of followership.
The highest influencer of @zlj517 appears to be an Indian male who is following over 1000 China connected persons or orgs.Most likely this is a paid freelancer hailing from India and working for PRC media. https://twitter.com/MrDines59367817
[updated on May 30,2020] Following publication of this article, the influential user network has undergone significant change. The top IUs account appears suspended as of May 30,2020.
Insights and Takeaways
The network of @zlj517 is interspersed with China and non China government officials, journalists, sub network of high influencer accounts which were created en-mass as recent as May 2020. There exist a set of newly created (April-May 2020) created twitter accounts with no followers. One of the most influential use of @zlj517 is @MrDines59367817 whose account was suspended by Twitter for violation of Twitter platform rules.
The network is transporting information concentrated on projecting a positive and benovelent image of China while blaming United States for the pandemic. Additionally, the network is being used to warn countries not to meddle with internal affairs of Hong Kong.
Exploratory data analysis indicates persons from India and Pakistan have embedded themselves in the China led propaganda network. Their activity appears to be focussed on increasing "follows" over retweets.
Author motivation
Data analysis, Threat Intelligence, Social Network Analysis(SNA) , Graph Theory, Applied Mathematics.
References
All data, code used for this analysis is posted on GitHub. Citations will be updated.
Article Updates and Revisions
May 30,2020 - Twitter suspends Influential User Mr Dines (Dinesh Sharma) https://twitter.com/MrDines59367817
June 01,2020 - Upon request of viewers data,code, references used to generate the visualizations and analysis is posted as open source. https://propagandanetworkanalysis.github.io/
June 11,2020 Twitter discloses first of its kind PRC State backed information campaign connected 32,242 accounts.