Content Analysis Data Recognition Interpretation

Reflections on quantified data: #ScholarStrike in the context of COVID-19

Although the COVID-19 pandemic created a truly shared global context for the first time in years, it soon began to coexist with the local reality of each country. Twitter, as expected, was no stranger to this, and certain hashtags soon began to appear that account for this “localization” process of the pandemic (for example, in Argentina, #coronacrisis, in reference to the financial collapse as a result of a long lockdown and a weak economy inherited from the previous government). However, other hashtags less representative of the public health situation soon began to become resignified, and even to emerge, within this context. For the United States, this was the case for #BlackLivesMatter and #ScholarStrike.

In this post we seek to look into the particularities of the latter, following the analysis that we proposed in our previous post (“What can academic journals tell us about COVID-19 and Education?”), that is, to use quantitative analysis platforms (in the previous post we used AVOBMAT) developed by third parties to perform a text mining analysis, while evaluating the functionalities and limitations of the tool. The case of #ScholarStrike seemed ideal to analyze with a “tailor-made” tool, since it is a hashtag that had a strong presence for a limited time (prior to the initiative, during it and a few days after).

For those unaware of the news from the U.S., Scholar Strike was an action and teach-in at the universities that sought to recognize and raise awareness of the increasing number of deaths of African Americans and other minorities due to the excessive use of violence and force by the American police. For two days, between September 8 and 9, professors, university staff, students and even administrators walked away from their regular duties and classes to participate in classes (in some cases open) on racial injustice, police surveillance and racism in United States. Canadian universities held their own Scholar Strike between September 9 and 10. At the Scholar Strike official site it is possible find more information on the actions, as well as on their YouTube channel, where different scholars posted examples of teach-ins and other resources. The official site also includes a list of textual and audiovisual resources that could be used in the classes as well as information on the media coverage of the Scholar Strike. Scholar Strike Canada also created an official website which includes details of the programmed activities, resources, and links to the organizations that supported the initiative.

 Our goal was to perform a text mining analysis on this hashtag, while also looking for terminological coincidences with others directly related, such as #BlackLivesMatter, and with some more connected to the COVID-19 crisis.

To do this, we used two commercial Twitter text mining platforms: Brand24 and Audiense. Brand24’s official site ( ) describes the platform as a “web and social media monitoring tool with powerful analytics”. The tool looks for keywords provided by the user and analyzes them on different levels. It is mostly oriented towards brands analysis and the use of the data in digital marketing. On the other hand, Audiense ( ) as it’s described on its official page, “provides detailed insights about any audience to drive your social marketing strategy with actionable and enriched real-time data to deliver genuine business results”. It is worth stressing, as it is clear from the official descriptions of the tools, that both have been developed to be used in business, although they can be adapted, of course, to any type of research on social media.

The work with these platforms is almost completely opposite to what we have been doing in this project. If in the interaction with our database, we establish a process of filtering and curating the data, to then proceed to the analysis through different tools and methods (terms frequency, topic modeling), here the filters that we can give to the platform are few (we can choose the social media platform, and set up the date range). It is the platform itself that produces a series of daily results that are also interpreted in an automatic analysis in the form of percentages, visualizations and infographics.

We used Brand24 and Audiense in their 7-day trial version. Broadly speaking, in comparison, Brand24 is quite superior to Audiense. We performed the same searches and the first thing we noticed was that Audiense had a high bias against the information. All the tweets that we collected via the #ScholarStrike hashtag were negative and all came from Trump supporters or the president himself.

Figure 1. Audiense report on #ScholarStrike.

Brand24, on the other hand, returned the data in a more neutral way. As we already described, once the platform finishes performing the search, it automatically sends an email to the project admin, and the user can choose to download a report. Data can be revised in the data on the ‘Mentions’ tab, that is meant to provide the user the ability to work on the data – from direct and boolean search, through tagging, advanced filtering, deleting irrelevant mentions, to sentiment, which can be either machine assessed, or changed manually, like so:

Figure 2. Mentions Tab. Brabd 24

Now, let’s now take a deeper look on the narrative that this platform offers us for the search on #ScholarStrike.

We did the first hashtag search on the 13 and Brand24 did the retrospective search for the last 30 days (Aug 14, 2020 to Sept 13, 2020). 24 hours after setting up the search, it allowed us to download a report and an infographic. In the first report, we can see that, generally, the sentiment about the strike was positive (44 positive against 21 negative):

Figure 3. Summary of #ScholarStrike mentions on social media from Brand24.

Clearly, since #ScholarStrike was an action that lasted just a couple of days, the mentions only occur in that period, but it is remarkable how they grew on the third day after it started:

Figure 4. Graph of the volume of #ScholarStrike mentions on social media throughout the month of September.

Then, the platform gives us a visualization of the most salient terms of all social media.

Figure 5. Set of most salient terms in social media within the context of #ScholarStrike exchange.

Justifiably, professor, teaching, are key terms since the action occurred in that field, but, as we said at the beginning of the post, the intertwine with the Black Lives Matter movement is visible in terms such as racial, issues, September, police, injustice, black. It is interesting, although expected, given its political use, that of the two most popular social network platforms, Facebook and Twitter, it is the second that stands out. Another notable term is Butler. What is interesting here is that, out of context, Butler could be associated with the philosopher and theorist Judith Butler (widely cited based on her thesis on the performativity of gender), who has also had an active intervention in the BlackLivesMatter movement through her publications in different media outlets, and on social media, as shown in these publications: . However, this term actually refers to Aethna Butler, professor in Religious Studies and Africana Studies at the University of Pennsylvania, who was one of the organizers of the Scholar Strike: lead-a-nationwide-scholar-strike-for-racial-justice /

Next, the platform shows us the most active and the most recent users in terms of their activity on Twitter:

Figure 6. Most popular users and recent mentions in Twitter.

It is difficult to know if the tool is measuring the most popular users by number of Tweets or by retweets. From what can be seen below, it seems that the calculation is made from the mentions and these are the ones that weight the degree of influence of a user on Twitter (figs 7 and 8).

However, something that struck us is the user ISASaxonists, a group of medievalists specialized in Anglo-Saxon medieval literature (fig 6).

Figure 7. Most active public profiles on Twitter related to #ScholarStrike.

Figure 8. Most influential public profiles on Twitter.

Lastly, the platform shows the most used hashtags (and related to each other):

Figure 9. Most mentioned hashtags on Twitter, from the #ScholarStrike search.

#ScholarStrike, #BlackLivesMatter, #Covid are expected hashtags. Once again, the interesting thing here is the medievaltwitter hashtag, in 13th place, which, although the platform does not make it explicit, must be related, for example, to the user ISASaxonists. If this is the case, it would be interesting to think if both the medievaltwitter hashtag and the tweets of the user ISASaxonists are related to the accusations that occurred in 2019 against the Anglo-Saxon International Society for its inability to account for issue of racism, sexism, diversity and inclusion within Ango-Saxon studies. Part of this discussion was published in academic journals in the U.S during September 2019:

Overall, exploring the context of ScholarStrike with the Brand24 platform allowed us to confirm some previous assumptions (its relationship with hashtags such as BLM, Covid) but it also illuminated less expected other hashtags for a non-academic user, such as #medievaltwitter, and other hashtags that subtly appeared in the beginning, but soon began to have more impact in the following weeks, in the midst of the electoral race, such as #bidenharris2020.

Gimena del Rio/Marisol Fila

Theorizing visualization

What can academic journals tell us about COVID-19 and Education?

The Covid situation has put new terms into our everyday vocabulary, terms such as pandemic or infodemic. This last one, according to Wiktionary can be defined as:

Blend of information +‎ epidemic


infodemic (plural infodemics)

  1. (informal) An excessive amount of information concerning a problem such that the solution is made more difficult.
  2. (informal) A wide and rapid spread of misinformation.

One good way of surviving infodemia is analyzing data. AVOBMAT (Analysis and Visualization of Bibliographic Metadata and Texts- is a text mining research tool that was primarily designed for digital humanities research. It is a powerful digital toolkit for analysing and visualizing bibliographic metadata and texts. AVOBMAT added a COVID-19 dataset to its new text mining research tool. This is a resource of over 138,000 scholarly articles (sadly, only in English), including over 69,000 with full text, regarding COVID-19, SARS-CoV-2, and related coronaviruses. We thought that before delving into the sea of Twitter to see what is happening in relation to the pandemia and Education (Higher Ed, Remote Teaching, etc.), we should build a framework that could support and inform our hypothesis. We used AVOBMAT to explore what scientific journals published between 2019 and 2020 regarding these topics.

First, we did a General Lucene query: we set up a period (2019 and 2020) and chose some general words such as “syllabus”, “education” and “Coronavirus” (not only COVID-19, but all the Coronavirus diseases). The search showed us 298 articles (of course, all of them in English):

Then, we chose to see what this general search could tell us in a closer approach, though still distant. We chose the WordCloud visualization option, and this was the result:

WordCloud in AVOBMAT

Something  we generally expected, then had confirmed by the cloud, are the references to cities and countries (Wuhan, Hubei, China, Vellingiri) and references to specific months (December, February, March). As the situation in the US was not critical until April, we discovered the presence of the East. However, it is curious that other countries such as Italy, Spain, and the United Kingdom, all of which were in a concerning situation through early 2020, were missing. We could explain these results with an argument that there was a delayed response in academic writing and publishing in tackling this new context, and maybe also that there was not much interest in the topics we were looking at (syllabus, education, coronavirus). However, the explanation itself is in the coronavirus, specifically SARS-CoV (2002-2003) and MERS-CoV (2012-nowadays). All of the other coronaviruses mainly attacked countries from the East and not the West. This explains the appearance of some of the cities that we mentioned before. Actually, it wasn’t until March 2020 that some journals, such as Inside Higher Ed and The Chronicle of Higher Ed, started publishing articles that talked about Covid-19 and Higher Ed in the US. Earlier publications from 2020 or even January and February 2020 were talking about new challenges in Higher Ed in China, South Korea or Europe (Italy, Spain, UK) (See for instance the search we did for Inside Higher Ed journal).

All in all, it is really interesting that in this cloud education is related with medicine (healthcare, pharmacists, emergency, quarantine, transmission) and, obviously with face, mask…and Google. Of course, it is not only bodily medicine referred to here, but also terms such as psychiatrist, mental, etc.

Keyword in Context (KWIC) in AVOBMAT. Education.

Finally, if we do a very close reading and analyze the metadata given in the general search, we find that in most of the articles the term education is related to the variables that the researchers used to study the disease. For instance, this is a passage in “A County-level Dataset for Informing the United States’ Response to COVID-19” by Benjamin D. Killeen et al (2020), in which the authors state that they have used “300 variables that summarize population estimates, demographics, ethnicity, housing, education, employment and income, climate, transit scores, and healthcare system-related metrics.”( In other cases, the term education is very much related to a Ministry (in the case of Iran, the work of the Ministry of Health and Medical Education is much cited (

Journals visualization in AVOBMAT

Therefore, it’s not easy to understand what this cloud is telling us. 

If we do a similar Lucene query but replacing Coronavirus with Covid-19, plus education and syllabus, we find 458 articles that show us these words:

WordCloud in AVOBMAT

Of course places (Hubei, Wuhan, China) and months (January, February, March) are still there. Terms related to mental illnesses are there (psychiatrist, mental), but quarantine now has a synonym which has been widely used in anglophone countries: lockdown. We also have words similar to Google (for example, Internet), and newcomers such as  Whatsapp and others related to our new  life, such as online, distance and telemedicine.

But what about education, as teaching and learning? We further detailed our search using terms such as teaching, universities, learning, students and COVID-19. As a result, we got 199 articles in which these were the most used words:

WordCloud in AVOBMAT

Gathering versus lockdown, moodle, moocs, distance, gym, gave us a very realistic picture of the education scenario these days. Even the metadata visualization tells us that these topics are approached from the Medical Sciences, and it gives us a detailed picture of our global COVID-19 situation.

Journal visualization in AVOBMAT

As we suspected, most of the articles published about COVID-19 and the different approaches to topics related to education, Higher Education, etc. are related to studies in the  Medical Sciences. On the one hand, as expected, this is a dominant discipline in a pandemic context, but it also shows how Medical Sciences have improved the slow timing of academic writing. Of course, we are not giving account of all the publications on this topics, as many harvesting services from other latitudes are not included as part of the AVOBMAT service. Nevertheless, it gives us the big picture to move in our next post to an approach of what the tweets are saying on these topics. More distant and close reading coming soon!

Marisol Fila and Gimena del Rio Riande

Content Analysis Curricula Visualization

COVID-19 and Higher Ed. A Look From the Digital Humanities

The 2020 opened with the news of a new disease. In a couple of weeks it became a global pandemic and we have all been concerned with this topic since then. Higher education is not exempt of it and in the last few months, we have seen how discussions on the pandemic have reached the syllabi.

From Humanities to Sciences, all disciplines are having discussions on causes, local and global consequences, history, politics… all about COVID-19. Aligned with the spirit of our project, we believe that Digital Humanities can help us to grasp what, how, and where these topics are discussed in Higher Ed.

Over the next few months, we will be posting some analysis and visualizations on the way syllabi are reacting to the global pandemic, and under which perspectives. Since we are relying on sources that have been made publicly available, our initial corpus will be composed by syllabi from the US, but we aim to open it up to Latin America as new material comes up. Stay tuned!

Undergraduate Course Syllabi | National Communication Association
Digital Humanities Organizing Preservation

Project resources

Interested in knowing more or collaborating with our project?

Our main platform is this WordPress site hosted at the University of Miami where we will be posting resources on Covid-19 data, from a humanistic and linguistic perspective, and documenting our work.

All our data are stored in our repository in Github, providing in the near future a list of datasets related to the pandemic, and a bilingual Twitter corpus in English and Spanish, especially focused on the South Florida area and Miami. We also use GitHub to document the development of the project, and we write blog posts about our work in our site.

We also have a Zotero library, where you can join us and add any reading you might find interesting.

Also, all our tweets are under #DHCOVID

Language Projects Research

Hello, world

Digital Humanities can help us understand data from a humanistic perspective, and this seems particularly true in this time, in which data about Covid-19 seems to be everywhere and overwhelmingly generated in large volumes. Data in a social, humanistic and human context need to be critically analyzed. Digital Narratives of Covid-19 (DHCOVID) will explore during one year (May 2020-2021) the narratives behind the data about coronavirus pandemia in academic literature and social networks using quantitative and qualitative DH approaches.

DHCOVID is a bilingual project, it focuses on data in English and Spanish, and brings togethers scholars from the University of Miami (USA) and CONICET (Argentina). However, the project is open to collaboration and co-working with researchers from Digital Humanities or any discipline interested in Covid-19 data.

The project has been funded by the College of Arts and Sciences at the University of Miami.

We will update this site with posts about our research and also disseminate the news on Twitter (#DHCOVID). Thanks for joining us!