登录查看更多内容

What is the current thinking in complexity science, applied mathematics, and computational social science on analysis of social media

Tina D Purnat

Health Expert in Digital, Policy, Tech & Social Determinants

发布日期: 2021年2月4日

This is one of eleven primers that the organizing team of the WHO infodemiology conference (June/July 2020) prepared to feed into multidisciplinary discussions in working groups that were discussing a public health research agenda. The primer is not intended to be exhaustive review of literature, but more a rapid review and a starting point for discussion. I will be publishing the primers over the course of next weeks. Hope you find them useful as well. Thank you to colleagues from Demand for Immunization Team at US CDC for participation in primer preparation.

Definitions and Key Concepts

Complexity science: An emerging multidisciplinary field for understanding complex physical, biological, and social systems. It acknowledges the limitations of traditional reductionist approaches used to understand complex systems (e.g. standard statistical methods based on averaging of a system’s many components). It provides an alternative framework by integrating the network of relationships between components within and between systems, and by accounting for uncertainty [1],[2].
Emergent behavior: A term used in complexity science to describe a system’s behaviors that arise from the relationships between its components rather than from the components themselves2.
Computational social science: A new interdisciplinary field that studies human behavior and social interactions through the analysis of “big data” without relying heavily on traditional research methods used in social and behavioral sciences (e.g. surveys of narrowly defined populations). It exists at the intersection of varied disciplines, including social sciences, computer and information science, physics, and mathematics [3].
Big data: A type of data that are high volume, high velocity (speed of data in and out), and high variety in terms of range of data types and sources [4].
Unstructured data: A type of data that lacks the structural organization usually needed for analysis, including text, images, video, and audio data. Unstructured data constitute 95% of big data [5].
Machine learning: The automated detection of meaningful patterns in data using computational programs that can “learn” and “train” themselves based on existing datasets. Examples of technologies that use machine learning include search engines and face detection for digital cameras [6].

Leveraging Emerging Insights from New Sciences and Big Data

In the three related fields of complexity science, computational social science, and applied mathematics, innovative methods that were traditionally not available to social and behavioral scientists are being used to analyze large volumes of unstructured data obtained from social media. One of the common characteristics of these methods is that many of them involve applying recent advances in artificial intelligence and machine learning [7]. For example, natural language processing (NLP), which refers to a range of computational techniques used for automatic analysis of human language, has been used to detect online hate speech and fake news [8],[9],[10],[11]. Further, convolutional neural networks (CNN), a subfield of machine learning originally designed for processing image data, can also be applied to the analysis of visually-driven social media such as Instagram [12]. Leveraging these computational tools, researchers are able to explore unprecedentedly high volumes of unstructured data on a global scale.

Understanding offline human interactions and behaviors based on data extracted from the online world is another approach common to the three fields, particularly complexity science. For example, through the analysis of Twitter and credit card shopping data, complexity scientists demonstrated that online interactions were segregated by income just as physical interactions were[13]. Similarly, by using machine learning to analyze spatiotemporal metadata associated with Twitter posts (i.e time posted and geolocation of users), it is possible to investigate the dynamics of illegal wildlife trade taking place physically[14]. By triangulating data sources and types, these disciplines provide insight into the nature of both offline and online worlds.

The frameworks and methods discussed above have been employed in various studies looking at public health information and misinformation, ranging from predicting the veracity of online rumors about the 2014 Ebola epidemic[15] to NLP-based analyses of over 200,000 online posts to examine pregnant women’s information-seeking behaviors[16]. Computational methods can also be used to identify and assess risks of negative health outcomes. An AI using CNN successfully estimated the risk of alcohol abuse based on images and texts people had shared on Instagram[17]. Likewise, a machine learning model trained on 44,000 child electronic health records identified children at risk of not being vaccinated[18]. Understanding online social networks is another way in which these disciplines contribute to public health. For example, a group of complexity scientists analyzed a global pool of around three billion Facebook users to provide a system-level understanding of the contention surrounding pro-, anti-, and undecided vaccination views19. They mapped the online ecology of clusters (i.e. Facebook pages and their members) holding differing vaccination views and conclude that anti-vaccination clusters are highly entangled with undecided clusters, while pro-vaccination clusters are more peripheral. They also used mathematical formulae to predict the conditions needed to prevent the spread of anti-vaccination narratives, including manipulating the rate at which links between sets of clusters are created[19].

A number of COVID-19-related studies and research protocols informed by the three disciplines are starting to be published[20]. For example, NLP approaches are being used to identify the main topics of COVID-19-related posts shared by Twitter users [21] and to understand their perceptions toward mitigation policies [22]. Others have approached the topic with a broader scope. Using a range of computational tools, a group of researchers performed comparative analysis of more than 8 million comments and posts collected from five social media platforms (Twitter, Instagram, YouTube, Reddit and Gab) [23]. Based on this analysis, they developed a model for characterizing the “reproduction numbers” of information for each platform. There are more opportunities for research in these areas, especially because researchers are openly sharing social media data sets. There is a public repository of Twitter data containing more than 123 million tweets related to COVID-19, which is actively being updated on a weekly basis [24].

To summarize, given the importance of social media in infodemiology, complexity science, computational social science, and applied mathematics will all be essential because they equip researchers with tools necessary for analyzing unstructured data on a global scale. Future research can look into ways of leveraging these tools beyond analysis and explore how they can inform interventions that would address issues associated with the COVID-19 infodemic.

[1] New England Complex Systems Institute. (2019). Research. New England Complex Systems Institute. https://necsi.edu/research

[2] Siegenfeld, A. F., & Bar-Yam, Y. (2020). An Introduction to Complex Systems Science and its Applications. ArXiv:1912.05088 [Physics]. https://arxiv.org/abs/1912.05088

[3] Cornell University. (2020). Computational Social Science. Masters in Computational Social Science. https://as.cornell.edu/block/computational-social-sciences

[4] Chen, S.-H., & Yu, T. (2018). Big Data in Computational Social Sciences and Humanities: An Introduction. In S.-H. Chen (Ed.), Big Data in Computational Social Science and Humanities (pp. 1–25). Springer International Publishing. https://doi.org/10.1007/978-3-319-95465-3_1

[5] Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144. https://doi.org/10.1016/j.ijinfomgt.2014.10.007

[6] Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.

[7] Columbia University. (2020). Computational Social Science | Data Science Institute. https://www.datascience.columbia.edu/computational-social-science

[8] FakerFact. (2017). FakerFact. About FakerFact. https://www.fakerfact.org/

[9] Oshikawa, R., Qian, J., & Wang, W. Y. (2020). A Survey on Natural Language Processing for Fake News Detection. ArXiv:1811.00770 [Cs]. https://arxiv.org/abs/1811.00770

[10] Schmidt, A., & Wiegand, M. (2017). A Survey on Hate Speech Detection using Natural Language Processing. Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, 1–10. https://doi.org/10.18653/v1/W17-1101

[11] Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent Trends in Deep Learning Based Natural Language Processing. ArXiv:1708.02709 [Cs]. https://arxiv.org/abs/1708.02709

[12] Lopez Pinaya, W. H., Vieira, S., Garcia-Dias, R., & Mechelli, A. (2020). Chapter 10—Convolutional neural networks. In A. Mechelli & S. Vieira (Eds.), Machine Learning (pp. 173–191). Academic Press. https://doi.org/10.1016/B978-0-12-815739-8.00010-9

[13] Morales, A. J., Dong, X., Bar-Yam, Y., & ‘Sandy’ Pentland, A. (2019). Segregation and polarization in urban areas. Royal Society Open Science, 6(10), 190573. https://doi.org/10.1098/rsos.190573

[14] Minin, E. D., Fink, C., Hiippala, T., & Tenkanen, H. (2019). A framework for investigating illegal wildlife trade on social media with machine learning. Conservation Biology, 33(1), 210–213. https://doi.org/10.1111/cobi.13104

[15] Vosoughi, S., Mohsenvand, M. ‘Neo,’ & Roy, D. (2017). Rumor Gauge: Predicting the Veracity of Rumors on Twitter. ACM Transactions on Knowledge Discovery from Data, 11(4), 1–36. https://doi.org/10.1145/3070644

[16] Wexler, A., Davoudi, A., Weissenbacher, D., Choi, R., O’Connor, K., Cummings, H., & Gonzalez-Hernandez, G. (2020). Pregnancy and health in the age of the Internet: A content analysis of online “birth club” forums. PloS One, 15(4), e0230947. https://doi.org/10.1371/journal.pone.0230947

[17] Hassanpour, S., Tomita, N., DeLise, T., Crosier, B., & Marsch, L. A. (2019). Identifying substance use risk based on deep neural networks and Instagram social media data.

[18] Bell, A., Rich, A., Teng, M., Ore?kovi?, T., Bras, N. B., Mestrinho, L., Golubovic, S., Pristas, I., & Zejnilovic, L. (2019). Proactive advising: A machine learning driven approach to vaccine hesitancy. 2019 IEEE International Conference on Healthcare Informatics (ICHI), 1–6. https://doi.org/10.1109/ICHI.2019.8904616

[19] Johnson, N. F., Velásquez, N., Restrepo, N. J., Leahy, R., Gabriel, N., El Oud, S., Zheng, M., Manrique, P., Wuchty, S., & Lupu, Y. (2020). The online competition between pro- and anti-vaccination views. Nature, 1–4. https://doi.org/10.1038/s41586-020-2281-1

[20] Bullock, J., Luccioni, A., Pham, K. H., Lam, C. S. N., & Luengo-Oroz, M. (2020). Mapping the Landscape of Artificial Intelligence Applications against COVID-19. ArXiv:2003.11336 [Cs]. https://arxiv.org/abs/2003.11336

[21] Abd-Alrazaq, A., Alhuwail, D., Househ, M., Hamdi, M., & Shah, Z. (2020). Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study. Journal of Medical Internet Research, 22(4), e19016. https://doi.org/10.2196/19016

[22] Lopez, C. E., Vasu, M., & Gallemore, C. (2020). Understanding the perception of COVID-19 policies by mining a multilanguage Twitter dataset. ArXiv:2003.10359 [Cs]. https://arxiv.org/abs/2003.10359

[23] Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., Zola, P., Zollo, F., & Scala, A. (2020). The COVID-19 Social Media Infodemic. ArXiv:2003.05004 [Nlin, Physics:Physics]. https://arxiv.org/abs/2003.05004

[24] Chen, E., Lerman, K., & Ferrara, E. (2020). Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set. JMIR Public Health and Surveillance, 6(2), e19273. https://doi.org/10.2196/19273

Katherine Bond, ScD

Connecting ideas and people to improve health

4 年

This is a really helpful synthesis and exciting emergent field! Thank you, Tina D Purnat. Bob Spoer, you may be interested if you haven't seen it.

查看更多评论

要查看或添加评论，请登录

Tina D Purnat的更多文章

Handy dandy reference for frontline health workers addressing health misinformation and other digital harms

2025年3月14日

Handy dandy reference for frontline health workers addressing health misinformation and other digital harms

Friends urgently asked me today for a list of resources in the US context that frontline health workers could use to…

24 条评论
Chatbots, AI agents and questions about risk assessment and types of harm

2025年3月13日

Chatbots, AI agents and questions about risk assessment and types of harm

I just applied for an AI and technology governance fellowship, and below are my research and learning questions in…
When users train AI chatbots to be their own behavioral interventions for self-harm

2025年3月10日

When users train AI chatbots to be their own behavioral interventions for self-harm

A few days, ago, a new report by Graphika dropped into my inbox: School Shooters, Anorexia Coaches, and Sexualized…

3 条评论
The illusion of safety: How Meta and other platforms are dismantling Trust & Safety online

2025年1月31日

The illusion of safety: How Meta and other platforms are dismantling Trust & Safety online

When Mark Zuckerberg announced earlier this month about the changes to Meta platforms in their trust and safety…

3 条评论
The unintended fallout of USAID Cuts: Erosion of trust in health systems

2025年1月31日

The unintended fallout of USAID Cuts: Erosion of trust in health systems

Lots of news has been reported in the past few days about the alarming consequences of staff layoffs at USAID and the…

23 条评论
Should you give a dollar to WHO?

2025年1月26日

Should you give a dollar to WHO?

A couple of days ago, a new campaign started circulating on social media asking people to donate one dollar in support…

12 条评论
The rise of the “Faked-Up” information world: Why we’re tuning out the collective

2024年11月7日

The rise of the “Faked-Up” information world: Why we’re tuning out the collective

Imagine waking up, reaching for your phone, and scrolling past a virtual influencer who seems to know exactly how you…

2 条评论
Why commercial determinants of health matter for us all

2024年11月1日

Why commercial determinants of health matter for us all

In public health, we’re well-versed in discussing social and environmental factors that shape our health, but one…

5 条评论
Do people want health to be political?

2024年10月17日

Do people want health to be political?

In recent years, it has become common to hear the phrase "Health is political." Many public health professionals…

9 条评论
What does a community resilient to misinformation and infodemics look like?

2024年10月15日

What does a community resilient to misinformation and infodemics look like?

Have you ever asked yourself what are the fingerprint characteristics of a community resilient to misinformation? The…

7 条评论

See all articles

What is the current thinking in complexity science, applied mathematics, and computational social science on analysis of social media

Tina D Purnat

Health Expert in Digital, Policy, Tech & Social Determinants

Tina D Purnat的更多文章

社区洞察

其他会员也浏览了

The Growing Role of Computational Scientists in Scientific Discovery

The Legacy of Algorithms: The Rise of Computer Science and AI

Modelling as a Science, not a Hype

Data Mining Your Body

Brick by Brick: Dr. Fortune Mhlanga on Expanding Representation in AI and Data Science

Genetic Algorithm in AI – A Powerful Optimization Technique by Brolly Academy

What Researchers Should Know About Efficient Data Enrichment

Neuromorphic Data Computing: A Wicked Paradigm Shift

Graph Basics: Definitions and Terminology

The Role of Big Data in Scientific Research

Tina D Purnat的更多文章

Handy dandy reference for frontline health workers addressing health misinformation and other digital harms

Chatbots, AI agents and questions about risk assessment and types of harm

When users train AI chatbots to be their own behavioral interventions for self-harm

The illusion of safety: How Meta and other platforms are dismantling Trust & Safety online

The unintended fallout of USAID Cuts: Erosion of trust in health systems

Should you give a dollar to WHO?

The rise of the “Faked-Up” information world: Why we’re tuning out the collective

Why commercial determinants of health matter for us all

Do people want health to be political?

What does a community resilient to misinformation and infodemics look like?

社区洞察

其他会员也浏览了

The Growing Role of Computational Scientists in Scientific Discovery

The Legacy of Algorithms: The Rise of Computer Science and AI

Modelling as a Science, not a Hype

Data Mining Your Body

Brick by Brick: Dr. Fortune Mhlanga on Expanding Representation in AI and Data Science

Genetic Algorithm in AI – A Powerful Optimization Technique by Brolly Academy

What Researchers Should Know About Efficient Data Enrichment

Neuromorphic Data Computing: A Wicked Paradigm Shift

Graph Basics: Definitions and Terminology

The Role of Big Data in Scientific Research