Chem 184/284 Lecture 1: Overview of the Organization of Information
Why a full-quarter course on chemical information?
Because the subject is HUGE...
Chemistry strictly defined is large, and it overlaps into physics, biology, medicine, pharaceutics, geology, materials engineering, forensics, erc.
For example: Chemical Abstracts Service
Indexes journal titles (over 10,000 per year currently; 50,000 different titles since 1907), plus patents from 63 patent issuing authorities, conferences, reports, dissertations, preprints, etc.
Adds over 3,000 document records and 15,000 substance records per day.
From 1876 to present, there are over 50 million abstracts of documents, over 203 million substances (over 155 million simple organic and inorganic substances, plus over 87 million biosequences) and over 107 million p organic reactions indexed (as of Aug. 2019.)
Similarly, the Reaxys database indexes over 118 million organic, organometallic and inorganic compounds, 59 million documents, and over 49 million reactions (as pf Aug. 2019).
In many areas of chemistry, notably synthesis, the older literature is as relevant as the newest literature.
In many areas of chemistry, the patent literature is as important as the more familiar journal literature.
Because the subject is COMPLEX...
Chemists are interested in information which cannot be readily defined merely by keywords, such as ranges of numeric data, sets of substances with particular structural features, reactions with specified reactants and/or products and/or reaction conditions, or macromolecules (both biomolecules and synthetics) with particular sequences of structural units.
The terminology of chemistry, especially chemical nomenclature, is incredibly complex.
The patent segment of the literature is often written in terminology obscure even to trained chemists.
Because the tools available for chemists are RAPIDLY EVOLVING...
Only a couple of decades ago, there was very little on the Internet of interest to chemists. Now, traditional journals and databases have been reinvented for the World Wide Web, and new resources have sprung up.
Even resources available in electronic form for decades are becoming available in new forms; often both more sophisticated and more user-friendly. Free and open-access databases, articles and data sets have become an important part of the scholarly enterprise in chemistry.
RESULT: The chemical researcher can benefit from learning in some detail both the HOW of searching chemical information and the WHY of the ways in which it is organized.
Information as a physical entity
Information can be treated as a thermodynamic system, subject to entropy (described by Claude Shannon, one of the founders of information science.)
The organization of raw data turns it into information -- the better organized, the more value added.
Organization can be added at many levels...including the ultimate user.
End-user information processing puts the information in its final form for use. To use a chemical analogy, each end user puts some of his or her effort in as a new task each time -- a stoichiometric use of time and skill.
Information professionals try to create organization in ways that can be used by many people -- a catalytic process.
Types of scientific literature
PRIMARY -- The original publication of data: journals, patents, technical reports, conferences, dissertations, preprints, some books.
SECONDARY -- Publications which provide access to the primary literature: reviews, indexes, abstracts, data collections, etc.
Approaches to organizing the scientific literature
Classification and Data Collection -- physically grouping related data by some common element.
Indexing -- creating pointers to the original literature based on some piece of information in the original, e.g. author names or subject terms.
Classification & Data Collection
Libraries use classification schemes to group related books together for browsing by subject. In the Library of Congress system, chemistry materials fall under QD.
Data collections bring information from various primary sources for easier location, e.g. the CRC handbook series.
Indexing for Subject Access
Some indexes use keywords from the original; others use standard subject vocabularies.
In US libraries, terms are assigned from the Library of Congress Subject Headings (LCSH).
MEDLINE (PubMed) uses the Medical Subject Headings (MESH).
Chemical Abstracts uses its Index Guide, which in STN online is called the CA Lexicon.
Tradeoffs in information access
All information organization and retrieval involves tradeoffs. Information professionals refer to this as specificity vs. collation or relevance vs. recall, but in simple terms, it is the tradeoff between maximizing the retrieval of useful data vs. minimizing the retrieval of useless data.
Specific headings avoid the irrelevant.
General headings bring like items together.
Searching narrowly avoids having to look at irrelevant items at the cost of missing some relevant material.
Searching broadly helps insure that nothing is missed, but may require later screening to eliminate irrelevancies.
Information Users, Information Professionals and the Quest for Knowledge
The information user brings a perceived need or needs. Sometimes the information professional can help define what is really needed.
The information professional can suggest how best to meet the needs of the user with the available technology.
Other information professionals develop the tools and technologies for searching.
The information user has to set priorities based on the ultimate objective and the time, labor and money available for searching.
Together, they evolve the strategy needed for extracting needed information from the universe of scientific publication.
In this course, you will learn about specific tools and how to use them, and also how to generally develop a strategy for finding scientific information.