Measuring Social Quality during the COVID-19 pandemic based on mental health & addiction: Insights from semantic analysis of News & Social Media*
Amit Sheth
NCR Chair & Prof; Founding Director, AI Institute at University of South Carolina
The big idea: Using my team’s core artificial intelligence expertise in automating the use of knowledge graphs and machine learning for measuring mental health and social ills such as prescription drug and substance issues and gender-based violence, we can monitor social media to capture evolving human experience that changes faster than conventional survey-based approaches. At the Artificial Intelligence Institute of the University of South Carolina (AIISC), my team of AI researchers has processed more than 700 million social media posts since mid-March, complemented by more than 700 thousand unique news articles. I find troubling indicators of a growing, anticipated mental health crisis that is specific to different states.
Why it matters: Far more people will suffer the consequences of COVID-19 than the actual number of positive cases. This crisis is affecting human health, finances and economy, and society as a whole in an unprecedented manner. We examine big data for indicators of a possible epidemic of clinical depression, growing panic/anxiety concerns, and worsening substance use disorders in big data. Advanced warnings help policy-makers prepare for growing needs. However, scaling up mental services to meet a surge will require innovation in service provision, as the United States was already failing to meet the demand before this crisis.
How we do our work: Analyzing social media cannot rely on conventional syntactic, keyword-based processing. Online conversations contain short informal text, diverse opinions, ambiguity, misinformation, and disinformation, written by users of a specific age and background in a particular location, with topic drifts due to the changing real-world situation. Our prior work on the use patterns of different forms of cannabis and synthetic cannabinoid illustrates the challenge of ambiguity. A keyword-based collection of posts related to “spice” ---a street name for synthetic marijuana or cannabinoid---picks up irrelevant references to “pumpkin spice latte” that must be ruled out with the use of domain knowledge (improving precision). Our work on prescription drug and substance abuse illustrates the need for general concepts to capture word differences between users. Buprenorphine is an opioid used to treat opioid use disorder. In our Drug Abuse Ontology (DAO), there are 29 occurrences of alternative terms such as bupe or bupy (slang), sub (for the brand name Subutex or Suboxone). Identification of all these is critical for good coverage (i.e., recall). Domain knowledge represented as a knowledge graph or ontology, combined with natural language processing and machine learning techniques enabled us to solve these problems using semantic processing. The resulting analysis allowed us to make discoveries, such as loperamide abuse, which led to follow on toxicology studies and resulted in a FDA warning.
To understand overall social well-being, we define a Social Quality Index (SQI), which is an empirical measure based on the levels of depression, anxiety, substance abuse, and addiction. We associate this measure with any geographic area at any time of interest. Our curated knowledge graph supporting semantic processing is based on DSM-5, a manual used to train mental health professionals developed by the American Psychiatric Association, Unified Medical Language system (UMLS) that integrates medical terminology, classification and coding standards, DBPedia for broader concepts based on Wikipedia, and others. We apply this index to not only social media but news, which can help to explain social media results we gathered. We also compare our results to more traditional survey-based data by the Substance Abuse and Mental Health Administration (MHA) that is considered a gold standard.
What did we find? We determined a state-specific evolution of SQI content across news that is not predicted by general state mental health rankings. Because news is multilingual (e.g., Spanish news, Middle-eastern news outlets), the team combined state-of-the-art neural-attention based semantic sentence parsing leveraging a multilingual health-related knowledge graph with 5.5 Million concepts to provide a uniform, language-independent representation of news content for ~700K unique news articles from NEWS API, GDELT, and web crawling from January 2020 to March 2020. We were able to table ~120K unique entities, ~105K events across 780 DBpedia categories and 225 concepts in UMLS, DSM-5, PHQ-9, and DAO and allowed us to characterize the impact of social distancing, isolation, fear, and panic across different states. The resulting SQI for January news data correlates modestly with the gold-standard MHA state rankings (p ≈ 0.1)[1]. However, this correlation further declines with March data, indicating the disruption of MHA mental illness-based state rankings created from gold-standard survey data. There is also a marked change in the SQI rankings between January and March news stories (p < .0001), documenting more disruptions in the components of SQI. For example, New Hampshire moved from its moderately compromised ranking to the worst in the nation, with threats to addiction recovery, paranoia, and spikes in unemployment. Louisiana moved from its moderate social quality to fifth-worst in the nation, consistent with reporting of new cases of Depression and Anxiety. In contrast, news stories from Pennsylvania and Maryland suggest a modest improvement during the measured time period but still shows poor SQI.
Thus, January results had a higher correlation with gold standard data, but by March there was significant divergence as the pandemic had significantly different effects on different states (Figure 2). Figure 3 provides insight via the changes in phrases or concepts affecting depression-related concepts.
Figure 2: Shows the gold standard MHA ranking and news analysis phrase-clouds of mental health topics for January and March for the state of Oregon. Oregon had moved from the third quartile to the first quartile (i.e., among the worst states in social quality, for anecdotes, see), and corresponding March news phrase clouds indicate the depression and substance abuse topics.
Figure 3: Week-by-Week Timeline of change in the severity of depression concepts in Oregon from global news articles for the month of March. The phrase clouds show a move from more generic terms, such as “feel worse” or “low spirits” to more specific terms, such as “suicide intent” and “major depression” related to more acute mental health and addiction.
Social Media results: We have analyzed > 700 million tweets since March 14, 2020, establishing location using geo-codes and user profile content. Similar to the news, we also calculated the SQI for social media by state. Preliminary analysis suggests a modest correlation between the data from the week of March 14-20 (week 1), with the MHA gold standard (p ≈ 0.1), likely because our data already reflect the impact of the pandemic. But the correlation falls off over the weeks of March 21-27 (week 2) and March 28 - April 3 (week 3) (p > 0.1) potentially due to environmental influences. This pattern is expected because the static and stale MHA cannot account for the dynamic influence of the coronavirus on the public. The public response in social media to the outbreak changes over time, but the difference between weeks 1 and 2 is significant, which coincided with an increasingly stark warning of an impending escalation in the crisis and its outcomes such as inadequate hospital capacity and deaths. We have begun to examine trends over the three weeks of data analyzed so far. There is a significant national worsening of SQI between weeks 1 and 2 (p < .01), which appears to stabilize for the remaining weeks. There are significant correlations (p < .05) between the depression indicators in our SQI and a state-specific COVID-19 severity index for the first two weeks of data. These initial results support the continued use of SQI over social media as a real-time indicator of public mental health decline in response to COVID-19.
Figure 4. Darker colors in (a,b,c) indicate better social quality conditions. States becoming lighter, especially between weeks 1 and 2, corresponding to worsening social quality and include emerging hotspots in California, Michigan, New York, Virginia, Georgia, and Florida. (d) reflects the more prevalent topics and issues related to COVID-19, which are about shortages of medical supplies, struggles of businesses, public services and response from the government.
The SQI declined in week 2 for many states (see Figure 4). This was the week when the confirmed cases spiked sharply, with the initiation of serious public health measures of social distancing and business closure. SQI improves in week 3 although not back to levels in week 1. Social media content suggests that the public is trying to comprehend the seriousness of the outbreak in the first week. During the second week, people doubted the adequacy of current containment and mitigation efforts. The overall SQI improvement in the third week may reflect a population resilience regarding the outbreak and its implications, encouraging staying home, social distancing, dissemination of helpful information on COVID-19 (e.g., how it transmits, necessary precautions to take), and perhaps impending government actions (see Figure 4.d).
On the other hand, some states’ SQI continuously worsened throughout the three weeks, primarily Michigan and Georgia as these states became emerging hotspots. The potential explanation for the decline includes financial impact on businesses, government response, documented shortages in medical supplies such as ventilators, drugs (e.g., chloroquine, hydroxychloroquine), and gears for medical professionals. However, the effects differ by state. Depression-related social media chatter from Michigan persisted throughout the three weeks (Figure 5), while addiction and substance use related conversations are more prevalent in Georgia (Figure 6). Among the substances for addiction, alcohol is prominent followed by pain pills and illicit drugs such as cocaine and meth.
Figure 5: As the SQI worsened in Michigan, depressive chatter persisted on social media and moved toward more severe concepts.
Figure 6: As the SQI worsened in Georgia, addiction and substance use (and not mental health) were bigger contributors (the figure shows that the addiction-related chatter, specifically alcohol, pain pills and illicit drugs such as cocaine and meth, became more prevalent).
Coming Soon: (if you have read this far, please revisit next week when we will have more to share!)
- Analysis of domestic violence (gender-based violence)
- GenZ vs Millennials:
- comparison of SQI by age categories, and the contributing factors;
- the impact of school closings on GenZ,
- the impact of business closing on Millenials
Next steps to address what we don’t know: Our analysis of social media provides a foundation for predicting growing mental health care needs. These predictions are based on both changes in the environment and the human response to those changes. However, the true relationship between such threats in the environment and human response is unknown. Decades of psychological research documents the absence of linear relationships between physical changes in the environment (like noise levels or brightness) and human response. We see hints of non-linearity in the changing social sensing with the time. One outcome of the unfortunate COVID-19 event is the data to quantify the relationship between this kind of environmental threat and human response. The far-reaching implications of such an analysis include the development of more agile social indicators and alarms that will help policy-makers detect and prepare for emergent threats, for responders to be more responsive to the impact of rapidly evolving crises.
* The AIISC team includes Manas Gaur, Vedant Khandelwal, Ugur Kur?uncu, Vishal Pallagani, Valerie Shalin, Amit Sheth, and several others in a supportive role.
[1] P values represent the probability of such a result by chance, where smaller values indicate that chance is not likely responsible. For example, p = .05 is less likely the result of chance agreement than p = .10 but both of these are less likely the result of chance than p = .25.
Relevant Articles:
- Shades of Knowledge-Infused Learning for Enhancing Deep Learning
- Importance of background knowledge in Context Modeling
- Sentinels of Breach: Lexical Choice as a Measure of Urgency in Social Media
- Mapping social media to clinically grounded mental health categories in DSM-5 for a comprehensive understanding of mental illness
- Assessment of severity of mental illness from social media
- Identifying Personal Communication and Sentiment in Drug-Related Tweets
Prior projects that inform this project (you can find more relevant research on these pages):
● Modeling Social Behavior for Healthcare Utilization in Depression (NIMH R01)
● eDrugTrends: Trending Social media analysis to monitor cannabis and synthetic cannabinoid use (NIDA R01)
PhD CSE Ohio State | Data Enthusiast
4 年Great read, Dr. Sheth! I’m anticipating tuning in this week to learn more about “GenZ vs. Millennials” and the comparison of SQI categories and its contributing factors. Very interesting!