Culturomics: Reflections on the Potential of Big Data Discourse Analysis Methods for Identifying Research Trends

Mulugeta Zewdu ( Bu Saleh )

Independent Researcher at Independent Researcher on Common Cause system at Part-time-researcher

发布日期: 2018年11月14日

Vered Silber-Varod, The Open University of Israel, [email protected]

Yoram Eshet-Alkalai, The Open University of Israel, [email protected]

Nitza Geri, The Open University of Israel, [email protected]

Online Journal of Applied Knowledge Management A Publication of the International Institute for Applied Knowledge Management Volume 4, Issue 1, 2016

Abstract

This study examines the potential of big data discourse analysis (i.e., culturomics) to produce valuable knowledge, and suggests a mixed methods model for improving the effectiveness of culturomics. We argue that the importance and magnitude of using qualitative methods as complementing quantitative ones, depends on the scope of the analyzed data (i.e., the volume of data and the period it spans over). We demonstrate the merit of a mixed methods approach for culturomics analyses in the context of identifying research trends, by analyzing changes over a period of 15 years (2000-2014) in the terms used in the research literature related to learning technologies. The dataset was based on Google Scholar search query results. Three perspectives of analysis are presented: (1) Curves describing five main types of relative frequency trends (i.e., rising; stable; fall; rise and fall; rise and stable); (2) The top key-terms identified for each year; (3) A comparison of data from three datasets, which demonstrates the scope dimension of the mixed methods model for big data discourse analysis. This paper contributes to both theory and practice by providing a methodological approach that enables gaining insightful patterns and trends out of culturomics, by integrating quantitative and qualitative research methods.

Keywords: Culturomics, quantitative methods, discourse analysis, big data, textual analytics, learning technologies, mixed methods model for big data discourse analysis.

Introduction

Big data discourse analysis is a current trend in research and practice, which aims at extracting valuable knowledge from the ample digital textual data available in the information age. In the context of identifying cultural trends, this approach was initially termed in 2010 asculturomics (Bohannon, 2010; Michel et al., 2011). Culturomics focuses on the quantitative exploration of massive datasets of digital text. In this study, we examine the potential of culturomics and suggest a mixed methods model for big data discourse analysis, which argues that quantitative methods alone are not sufficient for effective big data discourse analysis and that in order to identify meaningful trends and patterns, qualitative methods should be applied alongside the quantitative ones. The potential of culturomics, as well as the need for a mixed methods approach, is empirically demonstrated in this paper in the context of identifying research trends. We analyzed terminology related to the learning technologies research field over a period of fifteen years (2000-2014), using a dataset based on search query results of Google Scholar, which is an open- source large-scale diachronic database. Studies on the evolution of new research disciplines are usually conducted by experts in the studied field who have retrospective insights on it (inter alia, Belanger & Jordan, 2000; Laurillard, 2013).

Another conventional method for studying the evolution of a discipline is by obtaining data via surveys (e.g., Hart, 2007-2014, an annual Learning Tools Survey that has been compiled from the votes of 1,038 learning professionals from 61 countries worldwide). Both methods require experts' participation and rely on human knowledge. Rather than employing a quantitative approach, or a qualitative one, for investigating discourse trends, we propose a mixed methods model for big data discourse analysis, which provides a methodological approach for selecting the appropriate combination of quantitative and qualitative methods as a function of the data scope that encompasses both the amount of data and its temporal expansion.

Theoretical background

The massive data-driven approach emerged during the 1990ies from the corpus linguistics field of research (Sinclair 1991; 2004) as a standard methodology for learning about linguistic phenomena from textual (written or spoken) data (Webber, 2008). The main advantage of corpus linguistics is that it helps revealing patterns and features of a certain theme in big data sets, allowing a broader perspective on entire large corpora or texts rather than on specific phenomena within them (Rayson, 2008).

The term culturomics – the quantitative exploration of massive datasets of the digital text was coined by Michel et al. (2011). They demonstrated its utility by creating a dataset of trillions of words from 15 million books in Google Books, and making it accessible to the public so that researchers can search for patterns of linguistic change over time in the frequency distribution of words or phrases. Michel et al. (2011) claimed that utilizing unique text-analysis methods on corpora of massive data enables identifying and measuring cultural patterns of change, which are reflected by the language choices represented in the texts.

N-gram analysis is one of the main computational text mining techniques, which culturomics utilizes. An n-gram is a sequence of words of length n. An n-gram analysis is based on calculating the relative frequency that a certain n-gram appears in a dataset (for a description of n-gram analysis, see Soper & Turel, 2012). Michel and Lieberman Aiden (2011) demonstrated the value of the culturomics approach in a historical study of flu epidemics, where they showed how the search for the term influenza resulted in a graph whose peaks correlated with flu epidemics that killed massive populations throughout history.

Similarly, Bohannon (2011a) demonstrated the power of culturomics in his study of the Science Hall of Fame (SHoF) – a pantheon of the most famous scientists of the past two centuries. The common criteria for inclusion of scientists in SHoF are based on conventional rigorous metrics of scientific excellence, such as the amount of citations and journals' impact factor (Bar-Ilan, 2008). Bohannon (2011a) suggested culturomics as an alternative method for measuring a scientist's impact, by weighing the cultural footprint of scientists across societies and throughout history.

One of the advanced things one can do with frequency analysis of massive texts is to measure "how fast does the bubble burst?" (Michel & Lieberman Aiden, 2011), or in other words, what is the life span of scientific trends or a buzzword before they disappear from the research literature? In their study, Michel & Lieberman Aiden (2011) found that the "bubble" bursts faster and faster with each passing year, and claimed: “We are losing interest in the past more rapidly”. Furthermore, an absence of certain terms or people during a period, or after a certain point of time, which a culturomic analysis reveals, might provide insights into cultural phenomena (Bohannon, 2010). Following the rise of culturomics, it has been utilized in many other disciplines (Tahmasebi et al., 2015), such as history, literature, culture and social studies, (e.g., Hills & Adelman, 2015; Willems, 2013), accounting (Ahlawat & Ahlawat, 2013), and physics (Perc, 2013).

Attempts have been made in various disciplines to apply a variety of quantitative computational methods on the available massive volumes of data, in order to make informed predictions of trends (e.g., Radinsky, Agichtein, Gabrilovich, & Markovitch, 2011; Radinsky, & Horvitz, 2013). Soper & Turel (2012) performed a culturomic analysis of the contents of the monthly magazine of the Association for Computing Machinery (ACM), Communications of the ACM, during the years 2000-2010, and demonstrated how it can be used to quantitatively explore changes over time in the identity and culture of an institution. Soper, Turel, & Geri (2014) used culturomics to explore the theoretical core of the information systems field. Other computational methods that were used to identify trends of change in research fields were citation analysis, co-citation analysis, bibliometric coupling, and co-word analysis (Van Den Besselaar & Heimeriks, 2006).

The field of learning technologies undergoes frequent changes as new technologies emerge, experienced, and adopted or abandoned in practice. These changes are reflected in the themes of publications in this research field (Geri, Blau, Caspi, Kalman, Silber-Varod, & Eshet-Alkalai, 2015). Raban and Gordon (2015) used bibliometric analysis methods in order to identify trends of change in the field of learning technologies over a period of five decades. While the quantitative methods that Raban and Gordon (2015) applied revealed some insights, some of their findings demonstrate the need for a complementary qualitative evaluation. Silber-Varod, Eshet-Alkalai, and Geri (2016) used a data-driven approach for analyzing the discourse of all the papers published in the proceedings volumes of the Chais Conference for the Study of Innovation and Learning Technologies, during the years 2006-2014. Chais conference is conducted annually in Israel. In terms of culturomic research, Silber-Varod et al.’s (2016) corpus, which included 730,000 words (in Hebrew and in English) of 553 articles, is considered small. Silber-Varod et al.’s (2016) study demonstrated the potential of such an analysis, by showing that Israeli researchers of learning technologies are concerned primarily with pedagogical aspects of technology adoption and use, as well as identifying prominent key terms.

However, it also exemplified the limitations of quantitative methods in identifying trends during relatively short periods, and a profound qualitative interpretation of the results was required in order to gain valuable insights. More recently, the use of visualization tools, such as Wordle, enabled a better illustration of these trends (Viegas, Wattenberg, & Feinberg, 2009). A well-known example of a visualization tool that represents a prediction of technological change over time is the Gartner's Hype Cycle of assimilating emerging technologies (Fenn, 1995), in which innovative technologies are plotted on a graph according to their maturity and predicted the status of assimilation or disappearance (Linden, & Fenn, 2003).

Similar to Gartner's Hype Cycle (Fenn, 1999), and based on implications of our culturomic study of learning technologies discourse (Silber-Varod, et al., 2016), we hypothesize that terminology of the learning technology field would show five main patterns of change over time: rising; stable; fall; rise and fall; rise and stable. We argue that these patterns reflect their maturity status.

The rising curve reflects terminology at the positive hype. The fall curve reflects terminology at the negative hype. The rise and fall curve reflects a terminology at the disillusionment point (Fenn, 1999); and the rise and stable curve reflect a plateau of productivity. Since our data spanned over the years 2000-2014, we would expect to find the rising curve for terms that represent technologies and research trends that emerged in the last years, and the fall curve for older terms. In the stable curve, we would predict to find the oldest core terms of the research discipline, and in the rise-fall curve, we would anticipate to find rather hype terms, which lost their relevance rapidly. Nevertheless, in order to fully interpret the findings, additional qualitative analysis would be required, as our mixed methods model for big data discourse analysis suggests.

A mixed methods model for big data discourse analysis

The ubiquitous presence of digital textual information, which is often available online, calls for appropriate methods and tools for content analysis, which would provide valuable knowledge. The information systems field has been facing similar challenges of extracting knowledge from data for several decades. The common four level hierarchy of knowledge extraction is, from bottom to top: Data, Information, Knowledge and Wisdom (DIKW), known as the DIKW hierarchy, the "Knowledge Pyramid" and the "Information Hierarchy" (Ackoff, 1989; Rowley, 2007; Zeleny, 1987). At the lower levels of this pyramid (data and information), mainly automatic quantitative tools are used to extract the relevant content, which is required for decision-making.

However, at the higher levels of the pyramid (i.e., knowledge and wisdom), human intervention may be essential (Alavi & Leidner, 2001; Davenport & Prusak, 1998). Similar concerns were raised regarding the interpretation of quantitative culturomic analyses findings (Bohannon, 2011b), which emphasized the need for human discretion.

Akin to knowledge and wisdom extraction, which require certain degrees of human intervention, we suggest that qualitative methods should be systematically integrated in culturomic analyses. We propose a mixed methods model for big data discourse analysis in order to represent the interaction between the data scope, which is the amount of data and the period it spans over, and the type of appropriate analysis on a quantitative-qualitative scale. The larger the scope of a database, the more likely it is that quantitative methods may provide meaningful insights, and there would be less need for complementary qualitative analyses. Table 1 presents a schematic description of the mixed methods model for big data discourse analysis. The scope construct, with its two dimensions: Size of the dataset and period analyzed, provides a useful systematic approach for planning culturomic analyses. Moreover, the mixed methods model expands the possibility to perform valuable culturomic analyses of databases, which one or both dimensions of their scope limit the effectiveness of quantitative methods.

Table 1. A Mixed Methods Model for Big Data Discourse Analysis

[Size of the dataset Period analyzed]---> Long period of time; Short period of time

[Relatively small]---> Mixed methods; Mainly Qualitative methods

[Relatively big Period]---> Mainly Quantitative methods; Mixed methods

_____________________________________________________

Methodology

Search queries design - In order to identify trends and patterns of change over time in terminology related to learning technologies research, we used the open-source diachronic database of Google Scholar. Search engines are one of the most important tools that Internet users rely on. Popular search engines, such as google.com and yahoo.com, are powered by information retrieval, data mining, and machine learning algorithms. According to Chen, Kiciman, & Wang (2008), conventional search engines return deterministic results. Such objectivity provides dependable and reproducible results.

We compiled a list of 186 search terms. A similar amount of search queries, 200, was used by Segev, Aviv, & Baram-Tsabari (2014) for analyzing temporal patterns of scientific information seeking on Google and Wikipedia over a period of 264 weeks between December 2007 and December 2012. The 186 search terms list was based on two sources:

Keywords used in the 553 papers presented at the annual Chais Conference for the Study of Innovation and Learning Technologies, over the years 2006-2014, as reported in Silber-Varod et al. (2016).
The Directory of Learning & Performance Tools (Hart, 2007-2014), which lists over 2,000 terms related to learning and working in education and at the workplace. Terms were selected from the webpage "Top 100 Tools 2007-2014", which indicates the rating of each term since the first survey in 2007. From this list, we extracted the generic terms (not actual labels).

We cross validated the tools and the keywords from both lists, thus creating a new 186 terms list. Each of the 186 terms was searched twice according to the following setup:

1. Each query was performed using quotation marks for the term (e.g., "Facebook group"), in order to retrieve only the exact phrase. In addition, a set of sub-terms was used as anchors for all the queries in order to contextualize the results in the learning technologies field. The anchor sub-terms were "learning technologies" OR "learning technology", i.e., query results were of papers that included at least one of these terms.

2. The queries were performed per year, from 2000 to 2014. In addition to these 15 years, we searched for results until 1999. Thus, each term had results of 16 periods, with the set of anchor terms. In total, 2,976 search queries were carried out.

3. In order to validate the data, we duplicated the search queries after six months for a subset of 24 terms. Although the absolute numbers changed (higher results at the second run), the difference was minor, and more importantly, the relative frequencies were not changed at all. This validation helps to overcome the effect of the limitations of Google Scholar as a research tool.

Limitations of Google Scholar search results - For the purpose of the present study, the advantages of using Google Scholar lie in its massive volume of data, each datum being identified by the year of publication, and in the open access that it provides for all. However, search engines have some limitations, and several methodological aspects should be clarified:

Indeterminacy of date: Queries per year are not 100% accurate since there are documents (i.e., articles) that are not assigned to a specific year. In some cases, documents dated decades ago are assigned to recent years due to a new platform or website that publishes them. Thus, the results of the assignment of papers per year are not fully accurate. Nevertheless, our empirical study suggests that these deviations are negligible.
Indeterminacy of context: Although the queries were carried out with anchor terms, the context of the search term is still vague to some extent, and not all papers in the results belong to the examined learning technologies field. It is possible that the search term will appear as a paper in the reference list only, or in an article that focuses on another research field.
Indeterminacy of results: Search engines' purpose is to provide the user quick results of the input query. Therefore, even if the query resulted in tens of thousands results, the probability that an average user will look at each one of them is zero. However, for our purposes, it is important to be aware that even the tens of thousands of results are not to be considered the "whole population". Thus, we are limited in our knowledge about the full size of the query population.

Notwithstanding, although the results are inaccurate with regard to exact amounts of terms, in the relevant context, and during the examined period, the results provide satisfying trends, as demonstrated in the validity check we performed, which is described above (point 3).

Relative frequency and keyness analysis - The opening of the Internet for public use in the early 1990s resulted in a general exponential increase of all sorts of published content. It may be argued that since the amount of academic papers, which are published each year, grows, the most expected pattern of a frequency curve is a rising curve. Therefore, following prior culturomics analyses (Soper & Turel, 2012) we used relative frequencies of terms.

After recording the absolute frequency of each of the 2,976 query results, we calculated the relative frequency of each term, according to the following formula: # of search results per term per year X / Total # of all search results per year X.

This method allowed us to contextualize the amount of search results within our own database and create graphs that show trends of change over time in the research field of learning technologies, as represented by the distribution of major terms. Furthermore, we tracked unique occurrences of wording and terminology throughout the year For this purpose, we used an analysis based on the comparison of each year's frequency to a reference corpus. The reference corpus consisted of previous years' frequencies, as this allows identifying unique occurrences of a term in a given year (Rayson, Berridge, & Francis 2004; Künstler, Maiwald, & Saage, 2008). The comparison was carried out by Log-likelihood calculations (Anthony, 2011; Rayson, 2008), which enable the comparison of frequencies between a given corpus and a reference corpus. Thus, a high Log-likelihood value represents high keyness (a method that uses the log-likelihood ratio statistic to compare frequencies and then rank them in terms of the significance of differences.) of a certain term in a certain year, while lower values represent low keyness.

The design of our research enables achieving systematic results via Google Scholar, which minimizes the aforementioned limitations of search engines. As our method is carried out within a given research field, and the analysis is based on relative frequencies calculated from a given database, we thus contend that the word frequency observations represent accurate trends.

Results

The results presented in this section demonstrate the potential of quantitative big data discourse analysis for identifying research trends. We present three general perspectives of analysis. The first perspective uses curves to describe five main types of relative frequency patterns found throughout the years. The second perspective provides the top key-terms per each year. These two analyses are based on the dataset of our Google Scholar query results. The third perspective demonstrates the scope dimension of the mixed methods model for big data discourse analysis, with data from three datasets.

Relative frequency curves

The rising curve - is interesting for its starting point, i.e., whether the rise is evident from the first examined year (e.g., "Blended learning" in Figure 1), or it begins during the examined period (e.g., "Facebook group" in Figure 1). However, a rising curve may turn in the following years into a rise and fall curve, or a rise and stable curve.

Key Terms and Their Relation to the Emergence of Technologies

The keyness analysis is shown in, where we summarized the top three growing key terms (out of the 186 terms in the database) according to Log-Likelihood calculations. It can be seen in that "Internet" was a relatively key term in 2000, and "e-learning" emerged from 2000 until 2006. Learning Management System (LMS) was one of the top key terms in 2003, and Moodle started its emergence in 2005. The term Wiki was among the key terms from 2006 until 2011. In 2011, “social networks” was one of the top key terms, and in 2013, research on Massive Open Online Courses (MOOCs) emerged prominently.

Following the results in, we performed an exploratory study, which aimed to identify the gap between the introduction of new technologies or systems and their evolvement in the research discourse. We looked for the time the technologies related to the terms in emerged, and found the following: With regard to Internet (key term in 2000), by the end of 1994, the total number of websites was still relatively small, but many notable websites were already active. Wikipedia (key term in 2006) was launched in 2001. Moodle’s (key term in 2005) first version was released in 2002, Facebook (key term in 2011) was launched in 2004, and the first MOOC (key term in 2013) was launched in 2008. When calculating the gap between the appearance of a certain technology and its research ascent, as represented by the keyness results, we found that it took between three years to seven years for the technology to prominently merge in the learning technologies research field.

The mixed methods model for big data discourse analysis suggests that as the scope of a database increases, the presented perspective broadens; hence it is more likely that quantitative methods would provide meaningful results. As the scope decreases, researchers should be aware of the limited perspective and the need for complementing qualitative analyses for interpreting the results would increase.

Discussion

This paper introduced a novel mixed methods model for big data discourse analysis, and contributed to both theory and practice by providing a methodological approach that enables gaining knowledge out of big data discourse analysis, by integrating quantitative and qualitative research methods. Furthermore, the paper demonstrated the potential, as well as the limitations, of culturomics, by applying it for identifying trends of the learning technologies research field during the years 2000-2014. This paper focused on the methodological aspects of the suggested model. Therefore, further study, both quantitative and qualitative is required for elaborating the analysis, interpreting the results, and gaining a deeper understanding of research trends in the learning technologies field.

The implications of the mixed methods model for big data discourse analysis go beyond the demonstrated application of identifying research trends. The mixed methods model could be applied as a useful approach for other sorts of research that involves textual data, as well as for practical applications, such as analyzing commercial textual databases. The five types of term curves (rising; stable; fall; rise and fall; rise and stable) along with the keyness analysis may also serve as useful tools for identifying trends.

Conclusions

Culturomics (Bohannon, 2010; Michel et al., 2011) is an emerging research field, which relies on quantitative analysis methods. This paper suggests that systematically adding a qualitative aspect to a culturomic analysis may considerably improve the potential of gaining insightful findings out of big data discourse analysis, and provides an approach for selecting the appropriate mix of quantitative and qualitative methods. We have empirically shown the potential of culturomics for identifying research trends. Then, by comparing several corpora, we demonstrated that corpora with different scope require a different mix of quantitative and qualitative research methods. Furthermore, the mixed methods model for big data discourse analysis increases the possibilities of performing effective culturomic analyses of databases, which differ in their scope.

Science Magazine /DIGITAL DATA

Google Opens Books to New Cultural Studies

JOHN BOHANNON

In March 2007, a young man with dark, curly hair and a Brooklyn accent knocked on the door of Peter Norvig, the head of research at Google in Mountain View, California. It was Erez Lieberman Aiden, a mathematician doing a Ph.D. in genomics at Harvard University, and he wanted some data. Specifi cally, Lieberman Aiden wanted access to Google Books, the company’s ambitious—and controversial—project to digitally scan every page of every book ever published. By analyzing the growth, change, and decline of published words over the centuries, the mathematician argued, it should be possible to rigorously study the evolution of culture on a grand scale. “I didn’t think the idea was crazy,” recalls Norvig. “We were doing the scanning anyway, so we would have the data.”

The first explorations of the Google Books data are now on display in a study published online this week by Science (www.sciencemag.org/ content/early/2010/12/16/ science.1199644.abstract). The researchers have revealed 500,000 English words missed by all dictionaries, tracked the rise and fall of ideologies and famous people, and, perhaps most provocatively, identified possible cases of political suppression unknown to historians. “The ambition is enormous,” says Nicholas Dames, a literary scholar at Columbia University. The project almost didn’t get off the ground because of the legal uncertainty surrounding Google Books. Most of its content is protected by copyright, and the entire project is currently under attack by a class action lawsuit from book publishers and authors. Norvig admits he had concerns about the legality of sharing the digital books, which cannot be distributed without compensating the authors. But Lieberman Aiden had an idea.

By converting the text of the scanned books into a single, massive “n-gram” database—a map of the context and frequency of words across history—scholars could do quantitative research on the tomes without actually reading them. That was enough to persuade Norvig. Lieberman Aiden teamed up with fellow Harvard Ph.D. student Jean-Baptiste Michel. The pair were already exploring ways to study written language with mathematical techniques borrowed from evolutionary biology.

Their 2007 study of the evolution of English verbs, for example, made the cover of Nature. But they had never contended with the amount of data that Google Books offered. It currently includes 2 trillion words from 15 million books, about 12% of every book in every language published since the Gutenberg Bible in 1450. By comparison, the human genome is a mere 3-billion-letter poem.

Michel took on the task of creating the software tools to explore the data. For the analysis, they pulled in a dozen more researchers, including Harvard linguist Steven Pinker. The first surprise, says Pinker, is that books contain “a huge amount of lexical dark matter.” Even after excluding proper nouns, more than 50% of the words in the n-gram database do not appear in any published dictionary. Widely used words such as “deletable” and obscure ones like “slenthem” (a type of musical instrument) slipped below the radar of standard references. By the research team’s estimate, the size of the English language has nearly doubled over the past century, to more than 1 million words. And vocabulary seems to be growing faster now than ever before. It was also possible to measure the cultural influence of individual people across the centuries.

For example, notes Pinker, tracking the ebb and flow of “Sigmund Freud” and “Charles Darwin” reveals an ongoing intellectual shift: Freud has been losing ground, and Darwin finally overtook him in 2005. Analysis of the n-gram database can also reveal patterns that have escaped the attention of historians. Aviva Presser Aiden led an analysis of the names of people that appear in German books in the first half of the 20th century.

(She is a medical student at Harvard and the wife of Erez Lieberman Aiden.) A large number of artists and academics of this era are known to have been censored during the Nazi period, for being either Jewish or “degenerate,” such as the painter Pablo Picasso. Indeed, the n-gram trace of their names in the German corpus plummets during that period, while it remains steady in the English corpus. Once the researchers had identified this signature of political suppression, they analyzed the “fame trace” of all people mentioned in German books across the same period, ranking them with a “suppression index.” They sent a sample of those names to a historian in Israel for validation. Over 80% of the people identified by the suppression index are known to have been censored— for example, because their names were on blacklists—proving that the technique works. But more intriguing, there is now a list of people who may have been victims of suppression unknown to history.

“This is a wake-up call to the humanities that there is a new style of research that can complement the traditional styles,” says Jon Orwant, a computer scientist and director of digital humanities initiatives at Google. In a nod to data-intensive genomics, Michel and Lieberman Aiden call this nascent field “culturomics.” Humanities scholars are reacting with a mix of excitement and frustration. If the available tools can be expanded beyond word frequency, “it could become extremely useful,” says Geoffrey Nunberg, a linguist at the University of California, Berkeley. “But calling it ‘culturomics’ is arrogant.” Nunberg dismisses most of the study’s analyses as “almost embarrassingly crude.” Although he applauds the current study, Dames has a score of other analyses he would like to perform on the Google Books corpus that are not yet possible with the n-gram database. For example, a search of the words in the vicinity of “God” could reveal “semantic shifts” over history, Dames says. But the current database only reveals the five-word neighborhood around any given term. Orwant says that both the available data and analytical tools will expand: “We’re going to make this as open-source as possible.” With the study’s publication, Google is releasing the n-gram database for public use. The current version is available at www.culturomics.org.

要查看或添加评论，请登录

Mulugeta Zewdu ( Bu Saleh )的更多文章

Macroeconomic Analysis

2020年8月14日

Macroeconomic Analysis

By Edward Shapiro Fifth Edition Harcourt Brace Jovanovich International Edition Introduction Measurement and Concepts 1…
Business Analysis for Business Intelligence

2019年8月28日

Business Analysis for Business Intelligence

Author: Bert Brijs Mastering Data Management Note that I do not use the term "master data management." Not that I don't…
inter-disciplinary involving: Leading on the Edge of Chaos - Management Wisdom in Perspective.

2019年8月17日

inter-disciplinary involving: Leading on the Edge of Chaos - Management Wisdom in Perspective.

Leading on the Edge of Chaos: The 10 Critical Elements for Success in Volatile Times Emmett C. Murphy, Mark Andrew…
Industrial Marketing Management

2019年8月15日

Industrial Marketing Management

Exploring proactive market strategies Author Harald Brege & Daniel Kindstr?m Highlights ? Proactivity can increase firm…

1 条评论
Industrial Marketing Explore the Strategy of Industrial Marketing

2019年8月15日

Industrial Marketing Explore the Strategy of Industrial Marketing

Marketing-Schools.org What is industrial Marketing? Industrial marketing, also known as business-to-business (B2B)…
Free Social Media Analytics Course

2019年8月14日

Free Social Media Analytics Course

By: https://academy.quintly.
ISO 50500 series innovation management: overview and potential usages in organizations

2019年8月8日

ISO 50500 series innovation management: overview and potential usages in organizations

Alice de Casanove, Laure Morel, Stéphane Negny HAL Id: hal-01624970 https://hal.univ-lorraine.
A Labor Market that Works: Connecting Talent with Opportunity in the Digital Age

2019年7月2日

A Labor Market that Works: Connecting Talent with Opportunity in the Digital Age

MCKINSEY GLOBAL INSTITUTE In the 25 years since its founding, the Mckinsey Global Institute (MGI) has sought to develop…
Guide to Enterprise Risk Management Frequently Asked Questions.

2019年3月14日

Guide to Enterprise Risk Management Frequently Asked Questions.

Protiviti Independent Risk Consulting: Business Risk - Technology Risk Internal Audit Introduction In today's…
?????? ?????? ???????? ???????? ?? ????? ?????? ???? ?????????

2019年1月31日

?????? ?????? ???????? ???????? ?? ????? ?????? ???? ?????????

??? ????? ??????? ???? ?????? ?? ?? ??? ????? ????? ?????? ??????? ?? ??????? ??? ?????????? ??? " ?? ???? ??? ????…

See all articles

Culturomics: Reflections on the Potential of Big Data Discourse Analysis Methods for Identifying Research Trends

Mulugeta Zewdu ( Bu Saleh )

Independent Researcher at Independent Researcher on Common Cause system at Part-time-researcher

Mulugeta Zewdu ( Bu Saleh )的更多文章

社区洞察

其他会员也浏览了

For Your Data Science Projects, Here Are 30+ Free Datasets

Graph Databases and Knowledge Graphs for Science - A Primer

Understanding the Central Limit Theorem in Data Science

Evolution of Market Research & Data Analytics, it's Present scenario & relevance in present time?

Cognitive Biases in Data Science

AIML 11- Choosing the appropriate correlation coefficient

Data Talks: Are you listening?

Graph Theory and Network Analysis in Data Science

Uncertainty Quantification on Sparse Spatiotemporal Data Prediction

Mulugeta Zewdu ( Bu Saleh )的更多文章

Macroeconomic Analysis

Business Analysis for Business Intelligence

inter-disciplinary involving: Leading on the Edge of Chaos - Management Wisdom in Perspective.

Industrial Marketing Management

Industrial Marketing Explore the Strategy of Industrial Marketing

Free Social Media Analytics Course

ISO 50500 series innovation management: overview and potential usages in organizations

A Labor Market that Works: Connecting Talent with Opportunity in the Digital Age

Guide to Enterprise Risk Management Frequently Asked Questions.

?????? ?????? ???????? ???????? ?? ????? ?????? ???? ?????????

社区洞察

其他会员也浏览了

For Your Data Science Projects, Here Are 30+ Free Datasets

Graph Databases and Knowledge Graphs for Science - A Primer

Understanding the Central Limit Theorem in Data Science

Evolution of Market Research & Data Analytics, it's Present scenario & relevance in present time?

Cognitive Biases in Data Science

AIML 11- Choosing the appropriate correlation coefficient

Data Talks: Are you listening?

Graph Theory and Network Analysis in Data Science

Uncertainty Quantification on Sparse Spatiotemporal Data Prediction