The COVID-19 Twitter Corpus

Welcome to our corpus! You may view tweets in txt. files by day-location-language pairs. Feel free to select among our seven locations (South Florida, Spain, Mexico, Peru, Argentina, Colombia, Ecuador), two languages (English and Spanish), and any day after April 24, 2020.

Each file is named by time-language-location, for example, “dhcovid_texts_2021-06-23_en_fl.txt” contains English-language tweets in South Florida on June 23, 2021. The advantage of this collection is that it can save you all the hassle to hydrate tweet IDs by offering accessible txt files for you to read the tweets directly in a format that is readable in almost all software and machine systems.

Examples of our daily tweet collections.

Additionally, you may also find collections by month, year, and overall once scroll down the page for a broader review of our dataset.

All tweets so far by language and location
Annual collections in 2020
Monthly datasets by location and language

Once you select a file, let’s take April 24’s EN-FL results for example, you’ll see the text of each tweet organized by line. You can then use Voyant, our Coveet scripts, and other textual analysis tools to further explore our data.

To know more about our data collection and corpus curation processes, please read our post “A Twitter Dataset for Digital Narratives.” Happy exploring!