Skip to Main Content

Hathi Trust Data Mining: Home

This guide describes how to get started doing Text Data Mining (TDM) on Hathi Trust's 16+ million scanned and OCR'd books.

Hathi Trust Research Center

The Hathi Trust Research Center (HTRC) provides algorithms and computing environments that allow you to search and analyze 16+ million books and journals dating back to 1700.  This is much more than full-text searching!  This service is available to any UCSB researcher with your UCSBnetID.

You can use web-based, click-and-run tools on collections of volumes that you create, download metadata and word counts for every page of the corpus to explore on your own computer, or gain access to a secure virtual environment to run your own analysis and visualization tools.

The HTRC keeps extensive documentation. But the best way to get started is to sign up for your account and experiment.

Pie-chart of dates for Hathi Trust books


Profile Photo
Jon Jablonski
UCSB Library
University of California Santa Barbara
Santa Barbara, CA 93106
preferred pronouns: he/him/his
Social: Twitter Page