Skip to Main Content

Text Data Mining: Citations & Metadata

Citations & Metadata Sources

Elsevier (Science Direct)

Researchers can text mine UCSB-subscribed journals and books published by Elsevier on the ScienceDirect full-text platform. Sign up for a developer account to use the Elsevier APIs for non-commercial purposes, and make sure to query the API from UCSB IPs to ensure full access.

Library of Congress: 25 million bibliographic metadata records

The LOC release of 25 million MARC records for free bulk download. MARC (Machine Readable Cataloging Records) is an international metadata standard for the representation and communication of bibliographic and related information.

New York Times APIs

The Article Search API provides access to headlines, abstracts, lead paragraphs, and more (but NOT full-text articles) from the New York Times, from 1851 to present.

Open Academic Graph

Downloadable datasets for citations drawn from two large academic graphs: Microsoft Academic Graph (MAG) and  Microsoft Academic Graph (MAG) and AMiner.

PubMed and NLM: Data Guide

A guide to using this API, called E-Utilities, to access citation data for medical journal literature in PubMed and other NCBI databases, including the National Library of Medicine Catalog, MeSH, Gene, and PMC (PubMed Central).