When the Bad is Good and the Good is Bad: Understanding Cyber Social Health through Online Behavioral Change
Amit Sheth
NCR Chair & Prof; Founding Director, AI Institute at University of South Carolina
This is a prepublished version of (cite as): Ugur Kursuncu, Hemant Purohit, Nitin Agarwal, Amit Sheth, When the Bad is Good and the Good is Bad: Understanding Cyber Social Health through Online Behavioral Change, IEEE Internet Computing, January/February 2021. DOI: 10.1109/MIC.2020.3045232 Published Version
Online platforms have facilitated the exchange of harmful content at an unprecedented scale in the form of disinformation, hate speech, cyberbullying, and extremism. Its insidious impact has been observed in events related to health, disaster recovery, elections, finance, climate communication, and terrorism. These trends have led to a rising prominence of social media analytics in academia, public health, politics, and homeland security, using computational techniques. In the pursuit of understanding online malevolent behavior, research in social media analytics has seen significant development of advanced techniques [1]. Yet, it has been challenging to detect, monitor, counter, and overcome the malevolent behavior by ill-intentioned actors due to the complex nature of social media, and other large-scale socio-technical infrastructures [2]. As harmful content is rich in subjectivity and emotion, it is challenging to understand individual messages in terms of its features and the human decision-making processes that foster its diffusion. Hence, the meaning of language varies depending on the source’s intent and the state of the target’s belief system, allowing the bad to be perceived as good due to positive social construction and vice versa [3]. Additional complexity arises when bad actors use coordinated actions with intentions for harming other individuals for a variety of goals from manipulation to harassment. This becomes even more dangerous if malicious groups or state actors orchestrate their actions [4], involving bots [5] and human actors, to disseminate misinformation and persuade individuals on its truthfulness by malicious groups, threatening an individual, our society, or democratic institutions at large [6]. These efforts have led to negative consequences. Recent examples demonstrating the urgency to address this scourge include: (i) the successful campaigns spreading mis/disinformation around human health (e.g., COVID-19 - see https://cosmos.ualr.edu/covid-19, Zika), (ii) cyberbullying, harassment, and hate speech by individuals and groups [7], (iii) extremist groups (e.g., ISIS, White Supremacy) spreading their propaganda.
In spite of significant progress in technologies to fight negative uses of social media, it has been challenging to detect, monitor, counter and overcome the malevolent behaviors and use by ill-intentioned actors.
Such information distorts the existing belief in the memory of each individual, challenging the human tendency to avoid conflict with the existing belief. Moreover, the repetition of exposure to such content helps consolidate these beliefs in memory. A carefully constructed sequence of harmful content persuades the target causing a change in behavior. Researchers can characterize individuals based on different facets and dimensions, such as their intentions, biases, socio-cultural affinity, and motivations through learning representations from thick data [8]. For instance, the individual who propagates misinformation will be represented differently from the recipient individual targeted to this misinformation. On the other hand, it is crucial to have a theoretical grounding of the design of computational models in well-established social theories concerning neural, psycho-linguistic, and cognitive processes in the decision-making of humans. The insights derived from this thick data modeling approach will contextualize the big data analytics at a larger scale and provide deeper insights. (See the section on thick data modeling.)
Harmful content flows through online communities, where information diffusion of such content occurs through complex dynamic interactions. What constitutes harmful, cyberbullying, or misinformation varies based on the existing belief system of an individual as well as the diffusion of the content. Hence, modeling the cognitive processes of individuals driving decision-making and actions requires incorporating a multi-dimensional understanding of messages including politics, religion, hate among others. In this case, experimental evaluation of theories, techniques, models/algorithms, and provenance of information, and trust perception of its source are fundamental in each stage of diffusion (see the diffusion of new harmful content for more). Human behavior changes upon exposure to harmful content online, due to the process involving information leading to persuading a human to take an action. This behavioral change might occur gradually at cognitive, neural, and social levels. Figure 1 illustrates a conceptual design to demonstrate modeling this behavioral change through cumulative measurements at these levels. In the figure, the innermost layer tasks (e.g., mitigation, deriving actionable insights) leverage the power of the outer layers (e.g., explainability, disposition) to give insights into mitigation of misinformation [9]. These (online/offline) measurements performed at cognitive, neural, and social levels (e.g., fMRI, state signals, and social media metrics) will provide context for a richer and sensible analysis. This approach can be applied to model specific use cases: (i) cyberbullying and/or harassment leading to mental health issues, (ii) extremist propaganda leading to radicalization, (iii) misinformation and disinformation leading to damaging one's own or others’ health and well-being.
Figure 1: Conceptual design that demonstrates modeling at cognitive, neural, and social levels for cumulative measurements in prediction, explainability, and mitigation of misinformation.
In the following sections, we describe thick data modeling and its utility to understand the content, its flow in a network, the trust and provenance factors, and the diffusion of harmful content. Then we discuss how the insights from these analyses would provide richer context to the big data analysis, especially for combating malicious attacks online. Before we briefly introduce the accepted papers, we also provide a brief discussion on the fairness of these approaches, ethical considerations, and their implications in society.
Thick Data Modeling for Understanding Message Content
Deriving insights from big data alone may lead to overlooking important details, given the complexity of human behavior online. Hence, a thick data modeling approach to analyzing cognitive, neural and social dimensions, will provide a contextual understanding of big data analysis [10, 11]. To understand the message content, multi-dimensionality in models requires operationalizing abstract models of behavior from contextual dimensions, such as culture, politics, and psychology, rendering them computationally accessible. Further, modeling new as well as existing information environments for humans online mandates a holistic approach that cultivates representation of misinformation to understand its effects on the existing memory and cognitive processes. An individual or a group attempts to change the belief system of its target, harass or bully, and in some cases incite the target to carry out an action. This can be a benign action, such as buying a product, or a destructive action such as radicalizing a disoriented or lonely individual into a violent extremist. Information online is perceived by the human brain based on the existing belief, accordingly, changing patterns in behavior, language, and cognitive processes. For instance, even if the individual does not adopt the misinformation due to the existing belief, the influence of such misinformation continues (Liar’s Dividend) [12].
As harmful content is usually subjective depending on the context, assessment of the actual meaning for a concept is crucial for reliable analysis. Different semantics of concepts affect an individual’s decision-making. This mandates learning such representations from structured and unstructured factual knowledge resources as prior. For instance, in the context of Islamist extremism, the true meaning of “jihad” can be propagated as harming others in the name of religion; on the other hand, in the context of the religion of Islam, it can be self-struggle to become a better person or fight for self-defense. [13] Hence, distinguishing these semantic differences in the assessment of such narratives and incorporating prior knowledge in contemporary models is essential. It is unlikely that a language model built from a large corpus will provide a clear context as it would be possible by using a knowledge graph or ontology. [14]
Flow of Harmful information, Trust, and Provenance
Gradually minimizing the spread of incorrect beliefs via introducing corrective information has been found useful [15]. Sources of corrective information are mainstream news media, certain government sources, socio-cultural transcripts, treatise (e.g., Quran, Bible phrases), trusted collective intelligence (e.g., Wikipedia), and trusted data sources (e.g., USAfacts.org). Other approaches to debunking [16], such as rebuttal, factual elaboration, identifying intentions, as well as trust and provenance-based information credibility, have been effective as well. Propagation of misinformation across different platforms poses challenges to measure trust and capture provenance for information. Modeling trust and provenance require explainability and assessing the veracity of information and their impact on decision making. However, since messages between source and target could employ different features over time, each stage of diffusion offers unique content characteristics for different contextual dimensions. Hence, we need to better understand how the message with the misinformation is adopted and/or propagated by an individual. For instance, recent studies [17] of online radicalization by ISIS via persuasive tactics and strategies demonstrated the need for such an approach. These studies have observed that neuro-linguistic, cognitive, and behavior changes were largely contextualized by religious, hate, and ideological dimensions. As these problems require incorporating theories from social science concerning human behavior, thick data modeling can be an essential approach to decipher these complex patterns.
The Diffusion of New Harmful Content
As harmful content flows through the online social networks, we need to have a framework to conceptually model the dynamics of diffusion of this information. Prior research [18] describes the diffusion of new information in five stages: (i) Acquiring new knowledge: exposition to new information, (ii) Persuasion: a favorable attitude is formed, (iii) Adoption: the result of persuasion, (iv) Taking action: on the adopted information (e.g., propagation through the network), (v) Confirmation: reinforcing the information via the outcome of the action. This process is gradual and primarily influenced by the contextual information in online communities. As the information environment changes upon exposure to new information, contextual predictive factors vary in each stage. Hence, developing theories concerning how the diffusion of misinformation takes place online will require a specific focus on the causal chain of events that triggers the transition to the next stage. For understanding this diffusion process, sometimes orchestrated by groups, a pressing need exists to develop robust mechanisms to capture such discourse online and derive insights utilizing novel approaches that go beyond current statistical and network science approaches. These models are mostly dependent on existing datasets that contain inherent biases and lack the most current information. Hence, this leads to another significant challenge in developing methods for automated dynamic assessment of harmful narratives as per the dynamic nature of events occurring in a fast-paced world.
Contextualized Big Data Analysis to Combat Malicious Behaviors
To this end, the measurements mostly concern smaller scales of data points, and the outcome might also be translating to a smaller effect. On the other hand, the information derived from these outcomes will provide a contextual understanding of the bigger grand scheme of the problem at hand. Specifically, to understand online behavioral change and its reflection at the society level, we need to design complex system studies to test the feasibility and efficacy of possible combatting approaches. This design will require formally characterizing information environments to understand malicious attacks or campaigns and modeling the gradual diffusion processes to understand persuasive harmful dissemination campaigns. As described earlier, we can test hypotheses with smaller data, concerning the driving factors of these attacks and the diffusion process. Such a complex system study will provide insights on how and more importantly why a particular information adoption process does or does not work. This will further inform counter-tactics and strategies.
Fairness and Ethical Considerations
The implications of these analyses on sensitive issues for individuals as well as society; hence, it is imperative to develop fair algorithms and models taking ethical considerations into account. Bias is usually inherent in the data or introduced in the processing, algorithmic design, or evaluation phases [19]. As researchers, we have a very limited understanding of the implications of computational models that we design for harmful online behaviors. Thus, the very models we design might inadvertently reinforce and amplify the biases. The impact of such pitfalls in fair algorithmic design might be elevated when deployed in an application used by millions of people. For instance, a recent study [20] showcased how bias in verified data for extremism might lead to potentially unfair social discrimination against innocent individuals. While removing biases from data in its entirety may not be possible, we need to explore solutions that will mitigate bias and promote fairness in the model.
<Please go to the published version for introduction to nine articles in the special issues edited by Ugur Kursuncu, Hemant Purohit, Nitin Agarwal, and Amit Sheth.>
Acknowledgment: Figure 1 had emerged during interactions that also involved Huan Liu, Amanuel Alambo, Amit Almor, Douglas Wedell, Marco Valtorta, and Valerie Shalin.
Practice Lead @ AnalyticsWise Inc | Sense Making 360 | Action Learning | Change Making
3 年Dr.Amith: Thank you for sharing. Related question in social health impact by Covid to our lives and livelihoods, there is a ton of information, but limited understanding to assist decisions at individual level and policies at state and country levels. Is there a Covid semantic model that updates we learn more and more ?