This research guide is intended to offer guidance and support to members of the UCSB community who are interested in text data mining (TDM) of both open resources and those that the UCSB Library subscribes to. Please navigate the tabs above for more information and resources.
Text data mining (TDM), or simply text mining, is a research process that entails deriving high-quality information from textual information. TDM specifically involves the methods for deriving insights from textual information. However, before doing text data mining, you must first identify an appropriate data source and extract its data. Often, this involves large corpora of textual information such as text from books, journal articles, blog forums, social media, etc.
The process of text data mining of the Library’s subscription databases can be long and complicated. We highly encourage you to contact our TDM Team (tdm@library.ucsb.edu) for guidance and assistance to ensure that your research plans comply with legal and licensing requirements. The team can also help you obtain data from various sources, negotiate with vendors to allow access to mining, and help you identify ways to preserve your own research data. Unauthorized use of programming tools such as Python, Selenium, web crawlers, bots, etc. to scrape search results or journal content from resources licensed by UCSB Library is in violation of many of our licenses and can result in access being shut down to the entire university.
Please get in touch with UCSB Library for support working with our vendors to ensure that what you are planning to do with potentially copyrighted texts complies with legal and licensing requirements.
The support provided by UCSB Library:
The UCSB Library cannot assist researchers with: