LibGuides: Text Data Mining: Home

Purpose

This research guide is intended to offer guidance and support to members of the UCSB community who are interested in text data mining (TDM) of both open resources and those that the UCSB Library subscribes to. Please navigate the tabs above for more information and resources.

Introduction

Text data mining (TDM), or simply text mining, is a research process that entails deriving high-quality information from textual information. TDM specifically involves the methods for deriving insights from textual information. However, before doing text data mining, you must first identify an appropriate data source and extract its data. Often, this involves large corpora of textual information such as text from books, journal articles, blog forums, social media, etc.

The process of text data mining of the Library’s subscription databases can be long and complicated. We highly encourage you to contact our TDM Team (tdm@library.ucsb.edu) for guidance and assistance to ensure that your research plans comply with legal and licensing requirements. The team can also help you obtain data from various sources, negotiate with vendors to allow access to mining, and help you identify ways to preserve your own research data. Unauthorized use of programming tools such as Python, Selenium, web crawlers, bots, etc. to scrape search results or journal content from resources licensed by UCSB Library is in violation of many of our licenses and can result in access being shut down to the entire university.

Please get in touch with UCSB Library for support working with our vendors to ensure that what you are planning to do with potentially copyrighted texts complies with legal and licensing requirements.

Text Data Mining Support

The support provided by UCSB Library:

A research guide with the TDM process, contact information, and links to other resources
Librarian consultation for guidance on TDM options
Review of license agreements on file and consultation with the California Digital Library
Negotiate with vendors to propose general text mining provision in licenses for Library resources

The UCSB Library cannot assist researchers with:

Provide additional funds for TDM
Guaranteeing secure storage for text-mined vendor data
Providing licensing for individual text mining projects. Researchers must negotiate the license directly with the vendors. The only exception is when the vendor requests an addendum to the library-wide license
Enforcing user behavior and handling of vendor data

Library's Text Data Mining Team

Text Data Mining Team: tdm@library.ucsb.edu