Skip to Main Content

Text Data Mining: Tools

Tools We Can Help You With

Ample application

Featured library resource: NVivo for qualitative research | Library &  Technology Services

NVivo is a licensed software program used for qualitative research. Specifically, it is used for the analysis of unstructured text, audio, video, and image data, including (but not limited to) interviews, focus groups, surveys, social media, and journal articles. NVivo is available for both Windows and Mac operating systems; however, the MacOS version is missing some of the features that the Windows version has (see the comparison). NVivo is available to UCSB affiliates upon request (limited licenses). Contact the Interdisciplinary Research Collaboratory for more information about licensing and training.

Hardin Open Workshops - Data: Reviewing and Cleaning Up Spreadsheet Data  with OpenRefine

OpenRefine is a powerful tool with a graphical user interface for data cleaning/transformation and analysis. Imports a variety of data formats and provides a spreadsheet-like environment for clustering, manipulating, and reconciling data. Check for upcoming OpenRefine Workshops at the Interdisciplinary Research Collaboratory webpage or reach out to rds@ucsb.library.edu for consultations. 

 

The Python Logo | Python Software Foundation

Python is a powerful programming language that among other things can help researchers extract textual data and analyze corpora. UCSB offers workshops and consultations on Python. Check for upcoming Python Workshops at the Interdisciplinary Research Collaboratory webpage or reach out to rds@ucsb.library.edu for consultations. 

 

R

R s a free software environment for statistical computing and graphics. There are a number of packages you may use for TDM (e.g., Tidytext, Stringr, OpenNLP, tm). Check for upcoming Python Workshops at the Interdisciplinary Research Collaboratory webpage or reach out to rds@ucsb.library.edu for consultations. 

 

Digital Publishing

Voyant is an open-source, web-based application for performing text analysis. It supports scholarly reading and interpretation of texts or corpus, particularly by scholars in the digital humanities, but also by students and the general public. It can be used to analyze online texts or corpora uploaded by users. To learn more contact Research Data Services (rds@ucsb.library.edu).

 

Social Media Data

 

Brandwatch - Sageview Capital

Brandwatch (formerly Crimson Hexagon) is a web-based library of social media posts (updated in real-time) and a social media analysis software platform. Historical Twitter data is available for 10 years back from the access date. 

 

TWARC

Twarc is a command-line tool and Python library for collecting and archiving Twitter JSON data via the Twitter API. It has separate commands (twarc and twarc2) for working with the older v1.1 API and the newer v2 API and Academic Access (respectively). 

Contact collaboratory@library.ucsb.edu or visit the Social Media Data Libguide for support and more information about these tools. 

Tools You May Explore Yourself

logo

Constellate is a project of ITHAKA JSTOR Labs. 

This beta project provides a Jupyter Notebooks-based dashboard and tutorials for performing analyses of the contents of the JSTOR platform, and growing numbers of other digitized content contributed by project collaborators. 

 

Spread Gephi

Gephi is a free and open-source visualization and exploration tool for all kinds of graphs and networks. 

 

Lexos - Wheaton College Massachusetts

Lexos is a web-based tool to help you explore your favorite corpus of digitized texts. Our primary motivation is to help you find the explorer spirit as you apply computational and statistical probes to your favorite collection of texts. Lexos provides a workflow of effective practices so you are mindful of the many decisions made in your experimental methods.

 

VOSviewer - Visualizing scientific landscapes

VOSviewer is a software tool for constructing and visualizing bibliometric/citation networks. These networks may for instance include journals, researchers, or individual publications, and they can be constructed based on citation, bibliographic coupling, co-citation, or co-authorship relations. VOSviewer also offers text mining functionality that can be used to construct and visualize co-occurrence networks of important terms extracted from a body of scientific literature.

 

HTRC Analytics

HathiTrust Research Center Analytics supports large-scale computational analysis of the works in the HathiTrust Digital Library to facilitate non-profit and educational research.