Skip to Main Content

Text Data Mining: Data Repositories

Data Repositories

Awesome Public Datasets

Check out the Natural Language category for a list of text corpora and ngrams for text analysis.

COVID-19 Open Research Dataset (CoRD-19)

A free resource of over 47,000 scholarly articles, including over 36,000 with full text, about COVID-19 and the coronavirus family of viruses

Data.gov

Catalog of hundreds of thousands of public data sets created at the city, state, and federal levels.

Inter-University Consortium for Political and Social Research (ICPSR)

ICPSR receives, processes, and distributes data on social phenomena in countries across the world. ICPSR maintains a data archive of on topics in the social and behavioral sciences, including specialized collections in education, aging, criminal justice, substance abuse, terrorism, and other fields. Includes survey data, census records, election returns, economic data, and legislative records.

Programmable Web API Directory

Search over 15,000 APIs, or browse by categories.

re3data.org

Browse and search thousands of disciplinary, generalist, and institutional data repositories that include textual data.

reddit Datasets

A subreddit for sharing and discussing datasets.