Text data mining (TDM) is an umbrella term that includes a wide range of methods and technologies for analyzing and processing semi-structured and unstructured text data to conduct semantic analysis, identify patterns and trends, and discover meaningful relationships to inform research. The main goal of TDM is to uncover previously unknown meaningful information in existing text corpora. In other words, TDM is the process of deriving information from machine-read resources through data extraction, recombination, and association.
Check the TDM Glossary to get familiar with some related terms.
Text data mining (TDM) methods can be applied to numerous academic research contexts to help you answer a variety of research questions. A few examples of TDM applications are listed below:
One of the most common uses of text and data mining is in the analysis of social media data to examine public opinion and discourse, as in discovering “What groups XYZ think of ZYX.” One way to approach this question is to mine tweets, blog posts, and other social media user-generated content (UGC) to perform sentiment analysis.
You might want to explore patterns and trends in citations in and across research domains by mining bibliographies from papers.
Scholars interested in image recognition research may want to train a classifier for machine learning. Mining data from a source (e.g., Flickr or Instagram) can provide a lot of training data for a machine.
Linguistics might explore text corpora to investigate trends in language usage, or to automatically classify the genre of a set of publications.
Examining government records to analyze patterns in investments made over the years or shifts in political parties, by bulk examining these documents.
Not sure how TDM methods can serve your research purposes? Contact us: tdm@ucsb.library.ucsb.edu