Lecture 13 - CHEM 184/284 (Chemical Literature) - Huber - Winter 2025 - LibGuides at University of California, Santa Barbara

Lecture 13: SciFinder, Part 3 - Exploring Substances: Substance Displays, Name Searching, Molecular Formula Searching, Property Searching, Biosequence Searching

SciFinder

Searching for Substances

SciFinder substance search screen

This is the opening screen for substance searching in SciFinder. In the CAS databases, chemical substances include: simple organic and inorgagic substances, polymers, biomacromolecues (such as proteins and nucleic acids), metals and alloys, mixtures and more. Each substance receives its own CAS Registry Number (about which more below), including isotopically-labeled substances, stereoisomers, salts and ions of differing charges.
As indicated, you can use the search box to search by:
- Chemical Name - This includes trade names (e.g. Teflon), generic drug names (e.g. ibuprofen), common chemical names (acetone), acronyms (EDTA), systematic chemical names, and CAS inverted chemical names.
  - If you search by a single word chemical name, you may also retrieve substances in which your term is apart of the name. If you wish to retrieve only the single substance, put the word in quotes.
  - Chemical name searching may not retrieve all the variations on a substance, such as stereochemical variants, isotopically-labeled substances, the mineral version of a salt, etc.
  - You can use the asterisk wildcard to truncate single word names, and use quotation marks to enclose phrases.
  - You can enter multiple chemical names at the same time to find multiple substances.They must be separated by a space, not by commas or any other punctuation.
- CAS Registry Numbers - This is the Chemical Abstracts Service ID number for substances. Like a Social Security number, or a UCSB perm number, the number contains no information about its subject. It is purely an identifier..
- Recently, SciFinder added the ability to search functional groups in the basic substance search. This lets you identify groups of substances which contain the desired functional group.
- You may also search document identifiers, subh as patent numbers or DOIs to retrieve the substances fornd in that document.
- Note: At present, you cannot directly search other chemical identifiers, such a SMILES strings, orInChI numbers . You can use InChI keys to search in SciFinder (see thedfrop-down menu of search options.) You can, however, use SMILES or InChI identifiers in the structure drawing tool to generate a starting structure for searching. See lecture 14.
- Document identifiers - Patent numbers, DOIs PubMed IDs and CAS Accession Numbers can be used to retrieve the substances contained in the document identified.
You may also enter a DOI for a document, PubMed ID, CAS Accession Number (CAN) or a patent number, and retrieve the substances indexed in that document or patent.
To the right of the search box is the Draw button. Clicking it opens the SciFinder-n structure drawing tool, for finding substances by chemical structure. This will be discussed extensively in Lecture 14.
Below the search box is the link for Advanced Search, which will be discussed in detail below. You can use it to search by: You can add multiple advanced search fields if desired Unlike the advanced search fields in Reference searching, fields in Substance searching are automatically combined with AND.

Searching by Chemical Name; Substance Answer Sets

Entering a chemical name (in the example below, the trade name, "aspirin" opens a dropdown menu of possible stustances whose names start with aspirin.

SciFinder chemical name search for aspriin

Below is a substance search using four common names of over-the-counter analgesic and anti-inflammatory drugs. Note how the names are separated by space, not commas or other punctuation.

SciFinder substance search for four analgesics by chemical name

Note that in SciFiner-n, you can use Boolean operators, wildcards and parentheses in Substance name searching Wildcard searching only truncates the specific term to which it is applied, not the whole of a complex name. However, the terms you enter may be searched within names. Pay close attention to your results sets to determine whether you are retrieving the answers you expect to get from your name search.
Below are he results of that search.

SciFinder substance answer set for four analgesics, part 1

SciFinder substance answer set for four analgesics, part 2

Looking at the display above, notice:
- Since we searched by chemical name, the name(s) searched are the names displayed in the brief record(s), e.g. Acetaminophen.
- Just below the description of the search terms are the buttons to retrieve References, Reactions and Suppliers associated with the (selected) substances in the answer set.
- Further to the right, are the icons for Combine Answer Sets, Download, Add to Projectl and Save Options. For Substances, the download options are Excel, PDF, RTF and SDF. SDF stands for Substance Description File. Note that there are limits on how many substances you can dowload; for most formats it's 1000 records, for Excel files it's 100 records at a time. If you have a larger answer set, you'll need to break it up into smaller chunks.
- To the immediate right of the Filter Behavior header is the number of substances in the answer set (4)
- Further to the right is the drop-down Sort menu. The default sort is Relevance. Also available are CAS Registry Number (RN), Molecular Formula, Molecular Weight, Number of References, (all in ascending or descending order) and Number of Suppliers descending order only.) Sorting by CAS RN is essentially sorting by when the substance was added to the Registry database, the larger the number, the more recent the addition.
- To the right of that is the Record View drop-down menu. Dafault is Partial; option is Full.
To the left are the Filter options for Substances. As with References, you may opt to either Filter by or Exclude a given parameter. Again, only options that are relevant to your answer set appear. As with References, the top five possibilities display. If there are more, click on See More to get up to 10 answers, or a full table.
- Search within Results - Lets you open the structure drawing tool to search within the answer set for a particular structure of substructure and require or exclude that structure. Note that with small answer sets (that is, almost anything less than the full Registry file), even small structure fragments can be successfully searched.
- Reaction Role - What roles in reactions does a substance play? Product, Reactant, Reagent, Catalyst, Solvent. If a substance appears in a given role in even one reaction, it will be listed here.
- Reference Role - This is the counterpart to the Substance Role filter for References. If a substance has a given role in at least one reference, it will appear here. To view the table of all Reference Roles for substances in the answer set, click the View all link.
- Life scienceData - Which substances in the set have Structure-Activity Relationship or ADMET data (note, this option may go away when the CAS Life Sciences product is launched.)
- Commercial Availability - whether or not suppliers are avaiable for a substance
- Number of Components - Salts, mixtures, copolymers, alloys, etc. will have more than one component.
- Molecular Weight - Lets you specify a range of molecular weights to filter your results.
- logP - You can filter by a range of the octanol/water partition coefficients.
- Stereochemistry - Is there at least one stereochemical center in the answer structure?
- Element - What elements are present in a substance in the set?
- Functional Groups - You may bilter by (or exclude) a number of common organic functional groups, such as "Carboxylic acid" or "Amide"
- Aromatic Rings - The number of aromatic rings present in the substance
- Substance Class - Examples: Organic/Inorganic Small Molecule, Polymer, Biosequence, Mixture, etc.
- Isotopes - Are there any isotopically-labelled substances in the answer set?
- Metals - Do any of the substances in the answer set contain metals?
- Experimental Property - Lists the experimental properties (not the values of the properties) available for substances in the answer set.
- Experimental Spectrum - Lists the types of experimental spectra available for substances in the answer set.
- GHS Hazard Sttements - Lists what hazard statements are available for the substance(s)
- Bioactivity Indicator - Lists biological activities that have been studied for substances in the answer set.
- Target Indicator - Lists biological targets (e.g. enzymes) for which substances in the answer set have been studies.
- Regulatory Data by Country/Region - Is regulatory information available in the database for substances in the answer set, broken down geographically.
- Regulatory Data by List - Same as the above, but broken down by list, such as EINECS or NIOSH.
- Reference Availability - Whether or not there are literature references available for the substance. While most CAS Registry substances have literature refernces, some, such as some substances registered for regulatory agencies, do not.
Below the filter list is Filter Content Report, which generates an Excel spreadsheet of selected filter data for this answer set.
On the right are the brief records for the substances in the answer set. Note that right-clicking on links here, as elsewhere in SciFinder-n, will open a new tab or window containing the linked information. This can be handy for moving back and forth between an answer set and individual answers.
- At the top of each record is a set of three dots. Clicking on them given the options to Save the substance record, or to add it to a Project. Projects allow you to gather results from multiple different searches into a single saved location in your account.
- Just below that is the CAS Registry Number (or CAS RN) for the substance. Clicking on the RN will take you to the full record for the substance (see below for examples.)
- To the right of the RN is an Expand link, which displays a more extensive brief record (see below for the aspirin record.) This view adds key physical property data (where available) anda link to the experimental properties and spectra table for the substance.

SciFinder-n expanded view of aspirin brief record

Next you see the 2-D structure of the substance. Stereochemical bonds, if any, are indicated. Note that structures are only displayed i there is a known structure for the substance, and it contains 255 or fewer non-hydrogen atoms. Thus, most biosequences do not have a displayable 2-D structure. If you click on the structure, you get a pop-up "quick view" of the substance record (see below). The quick view includes
- the CAS RN,
- a brief name,
- the 2-D structure diagram, and links to
- Get Substance Detail s(the full substance record),
- Get BLife Science Data, the SAR and ADMET data, if any,
- Get Reactions (all reactions in which the substance is indexed),
- Synthesize (all reactions in which the substance is a product),
- Start Retrosynthetic Analysis (see Lecture 15 for a discussion),
- Get References (all references in which the substance is indexed), and
- Get Suppliers (all suppliers in the database for the substance.)
- The Edit Structure link opens the structure drawing tool and enters the substance's structure as a starting point for creating a new structure search.
- - Reset + affect the displayed size of the structure diagram
- The Download icon lets you download the structure as an image, molecular structure file, or SMILES string.

SciFiner-n Quick View of the substance aspirin

Teturning to the brief record, below the structure are:
- Molecular Formula in Hill order. Hill order is carbon, then hydrogen, then all other elements in alphabetical order. If carbon is not present, then all elements, including hydrogen, are in alphabetical order
- Substance Name - in this case, the name which was used as a search term.
- Links to References, Reactions and Suppliers in which the substance appears.

CAS Registry Numbers

CAS Registry Numbers were first assigned to substances by Chemical Abstracts Service in the 1960s when they created a computerized database of substances to aid their indexers in determining whether a substance in a document they were indexing had previously appeared in the literature. CAS RNs are of the form: xxx-xx-x where the first number is 2-7 digits long, the second number is always two digits long and the third number is a check digit generated by an aloorithm from the previous digits insuch a way that most common mistakes in entering an RN would generate an invalid RN, rather than the RN for the wrong substance.

Every unique chemical substance gets its own RN, including stereoisomers, isotopically-labeled substances, mixtures, etc. One excepption to this is that polymers which only differ in chain length or molecular weight do not get different RNs, nor do plastics which differ only in how tey were processed. This is a long-standing CAS indexing policy, somehwat to the regret of scientists working in the plastics industry.

Note that CAS RNs are purely identification numbers, and do not convey any information about the structure or properties of the substances they represent. Most RNs are assigned by indexers in the course of indexing documents. Some are assigned at the request of chemical manufacturers or government agencies, and represent substances which have no published references. Note, too, that CAS RNs are the property of Chemical Abstracts Service and are not in the public domain. Reaction to this led to the creation of the InChI system (International Chemical Identifier) as an alternative which would be freely available to anyone.

Substance Detail (Full Substance Records)

Below is the substance detail for aspirin

SciFinder full substance record for aspirin, part 1

SciFinder full substance record for aspirin, part 2

From the top:
- CAS Registry Number
- Links to References, Reactions, Suppliers for this substance. To the right, the Download, E-mail, and Save options
- 2-D Structure Diagram (Clicking on the structure gives the same pop-up window as clicking on the structure in the brief record shown above.)
- Molecular Formula in Hill order To the right of that are the standard Safety Warning Symbols that apply t the substance. Clicking on them takes you to the GHS Hazard Statements section below.
- CAS Systematic Chemical Name (in inverted order)
- Key Physical Properties (properties shown varies depending on the substance. These are fairly typical for a common organic molecule.)
- Then a series of drop-down lists, beginning with:
  - Other Names and Identifiers These include the canonical SMILES string, InChI number and InChI key, where available, and any trade, generic and other chemical names used for the substance. This list can be VERY long - polyethylene has over 1000 names in its list!
  - Experimental Properties - These are given in tabular form. divided into tabbed sections by type of property. These sections will vary depending on what is available for the substance. Each property may or may not give actual numeric values, and may or may not have conditions associated with them (such as pressure for boiling points). All will have a link to the reference from which the property information was obtained.)
  - Experimental Spectra - These two are listed in tabular form. If a spectrum listed says View then the specrum itself is available in SciFinder-n. Click on the link to get the spectrum, with source detail. The spectrum may be scrolled up and down in size, or shifted left to right or up and down by clicking and dragging for better viewing. The spectrum may be freely downloaded as a JPG image. The SciFinder-n spectra do not, in general, give peak assignments. If the spectrumd does not say vieew, then it will link to the SciFinder-n record for the source document.
  - Pharmacological Data - Includes data by receptor and organism
  - ADME (Absorption, Distribution, Metabolism and Excretion Data)
  - Toxicity Data
    - The three above sets of data are from the CAS Life Sciences product, and may cease to be available in SciFinder-n once CAS Life Sciences launches. The data in each category are in tabular form, and may be orted by any of the data categories. All provide source references for each data entry, linked to the CAS record for the document.
  - Predicted Properties - This table of properties is calculated from the chemical structure with software creaated by ACDLabs and licensed by CAS. Among the tabbed lists for aspirin, you will see one labeled "Lipinski". These are named for Christopher Lipinski, who, while at Pfizer, described a set of five properties which could be used to determine whether a given chemical would be orally active as a drug. These involve molecular weight, acid-base properties and the relative solubility in water vs. organic solvents.
  - Predicted Spectra - These are also generated by ACDLabs software, and may be downloaded like the experimental spectra mentioned above.
  - Bioactivity Indicators - A hierarchical list of the broad and narrow bioactivities described in the literature for the substance. Each bioactivity has the number of current documents containing the information. Clicking on the name of the boactivity creates a Reference list of the relevant documents, which may then be maniupulated like any other SciFinder-n reference list.
  - Target Indicators - Hierarchical list of the proteins (including enzymes) with which the substance has been shown to interact. Like the bioactivity indicators, the number of papers is shown for each protein target, and clicking on the link generates a Reference list of those papers. Note: For a widely-tested drug like aspirin, this list is VERY LONG!
  - Regulatory Information - Lists the names under which is substance is known in national regulations, and the countries which regulate it, and the names of the documents in which the regulation appears. This information is derived from the CAS database, CHEMLIST. Note that SciFinder-n does not contain, or link to, the actual regulatory documents.
  - GHS Hazard Statements - GHS is an acronym for Globally Harmonized System. GHS label hazard statements are phrases that describe the nature of hazardous products and the degree of hazard. They're further organized and identified by GHS hazard statement codes, or H codes. You can learn more about the GHS hazard statements at the List of GHS Hazard Statements and How to Choose site. Users can filter the GHS table by code, hazard statement, class, and/or source to find the right information.
  - Additional Details - (not visible in the image above) Includes a list of Document Types in which the substance is referenced; Substance Classes to which it belongs, and Deleted CAS Registry Numbers. Deleted RNs occur when an indexer identified a substance in a document as a new substance, assigned a RN, and it is later found to be the same as a previously known substance. The newer RN is then deleted from the Registry file. However, since it is still attached to the original document(s), SciFinder=n automatically searches all the deleted RNs when you search for references to a substance - so you don't miss out on anything!.

Sample Records for Other Classes of Substances

Polymers/Plastics

Note that when a substance answer set contains one or more polymers, an additional Filter appears: Polymer Class. This field describes the type of polymer present, such as: Polystyrene, Polyacrylic, Polyamid, Polyether, Fluoropolymer, etc.

SciFinder full substane record for styrene-butadience copolymer

Above is the Substance Detail for a styrene-butadiene copolymer,
At top is the CAS Registry Number for the polymer shown. Immediately below are icons showing (and linking to) the number of References, Reactions and Suppliers currently available for the substance.
Note how the two monomers are treated as individual components of the polymer. Some polymers are graphically described with the structure repeating units (SRU) instead.
Note the molecular formula gives the two monomers in descending order of molecular formula in Hill order, enclsed in parentheses with an x subscript. Molecular formulas for SRU polymers use an n subscript. Both the n ad x indicate an indeterminate length polymer.
Below that are the number of components in the polymer and the Polymer Class Terms which apply.
Note how the systematic name is written. Copolymers use "copolymer with"; homopolymers use "homopolymer" There are also Registry Numbers ofr "block" and "graft" polymers.
If you open the drop-down list, ote how the Experimental Properties include categories relevant to plastics, such as Flow and Diffusion and Mechanical.

Biosequence

See below the substance recrd for human insulin, with the Sequence Details section expanded:

SciFinder substance record for human insulin, part 1

SciFinder substance record for human insulin, part 2

Note how there is no structure diagram given. Human insulin has more than 255 non-hydrogen atoms, so the Registry Record cannot record a 2D structure for the molecule. However, see the Sequence Details section.
Note that below the name, it is identified as a Protein/Peptide Sequence, the total sequence length (in amino acids) as well as the lengths of the two sub-chains, the protein is identified as Multichain, and there is a link to Related Sequences. Since CAS assigns separate Registry records for each distinct protein or polynucleotide, and sell as distinguishing by source organism, there are many other "insulin" records besides the one we tretrieved. Clicking on the link creates a Substance answer set of all the related sequences.
In Sequence Details, you get the amino acid sequences for each subchain, as well information on the number, types and locations of modifications to the chairs (in this case the Cys-Cys bridges between the two subunits.) The sequences are gtiven in standard one-letter codes for each amino acid, familiar to protein chemists. Polynucleotide chains use the standard A,T,C,G, and U

Alloys and Tabular Inorganics

SciFinder-n full substance record for Monel alloy, part 1

SciFinder-n full substance record for Monel alloy, part 2

Above is the Substance Detail for a Registry record for Monel alloy.
Note the tabular composition display. This is common for metal alloys, as well as some other types of nonstoichiometric inorganic substances, like the high-temperature superconducting perovskites. The first column lists the components (usually elements, though occasionally metal oxides), the second column the molar percentage (or range of percentages) of each component, and the third column the CAS RN for the component.
Below that is the molecular formula in Hll order. Notice that there are no subscripts given for the elements. Below that is the number of components, and the CAS systematic name. The element with the highest molar percentage is considered the "base" of the alloy, The percentage ranges for it and the other elements are included in the name.
Again, note the the Experimental Properties categories given are ones appropriate for a metal alloy, such as Electrical and Mechanical.

Salts

SciFinder full substance record for sodium sulfate

As mentioned in Lecture 110, the CAS Registry system has a unique system of nomenclature for inorganic oxyacids and organic acids. Originally created to group salts of such acids alphabetically with the parent acid in printed indexes, this system bases names and molecular formulas and structure diagrams of salts as derivatives of the parent asid.
In the example above, disodium hydrogen phosphate is treated as a mixture of phosphoric acid with two sodium atoms. Note the effect oh the molecular formula, index name and structe diagram.

Mixtures

SciFinder-n full substance record for metformin-glipizide mixture

SciFinder-n full substance record for metformin-glipizide mixture, part 2

Above is the Substance Detail for a mixture of metformin and glipizide (two drugs often used for the treatment of hyperglycemia, that is, high blood sugar, in humans.)
Note that in addition to the mixture Registry Number, the structures and Registry Numbers for each component of the mixture are given.
The molecular formula gives the Hill order molecular formula of each component in descending alphabetical order. Note that no specific rations of the two components are given. (CAS now has a database, Formulus, that gives detailed information on formulations, including states of matter, coatings and the like, for drugs and agrochemicals. It is a separate producton theSciFinder Discovery platform, aimed at industrial users. However, when you retrieve references for substanceswhich in in formulations, you will see a Filter for formulation information appear. This can be useful on identifying which documents have detailed formulation information int heir full text. Formulus will be discussed in detail in Lecture 15)
Typically, for mixtures there is no experimental or predicted property information given, but for mixtures used as drugs or agrochemicals, there is frequently bioactivity indicator data.

Advanced Substance Search (Molecular Formulas, Substance Properties, Experimental Spectra)

SciFinder advanced substance search drop-down menu

Just below the main keyword search window in Substance Search is a link to the Advanced Substance Search (see image above), Note that some of the otions listed have arrows by the name. This indicates sub-menus for that option. See below..
Advanced search fields include:
- Molecular Formula - Molecular formulas are generally entered in Hill oder. Polymers, salts and mixture have special formats.
- CAS Registry Number. Note the arrow opening Substance RN and Component RN options.
- Chemical Identifier. Note the arrow opening Chemical Name and InChI Key options.
- Document Identifier - Searching by document identifier (DOI, PubMed ID, CAN, etc.) will retrieve all substances indexed in the selected document.
- Patent Identifier - Searching by patent identifier (patent number, application number) will retrieve all substances indexed in the selected patent.
- Experimental Spectra
  - . Valuees are in ppm Currently, searchable experimenal spectra include:
    - Proton NMR
    - Carbon-13 NMR
    - Nitrogen-15 NMR
    - Fluorine-19 NMR
    - Phosphorus-31 NMR
  - You may enter specific peaks in ppm, or ranges of ppm. Examples are given.
- Life Science Data - one one chemical identifie or disease may be searched at a time
  - Target
  - Ligand
  - Disease
- Biological
  - Bioconcnetration Factor (predicted) - specific values or ranges
  - Median Lethal Dose (experimental) in mg/kg specific values or ranges. Note that you cannot specify the organism.
- Chemical Properties
  - Koc (predicted)
  - LogD (predicted)
  - LogP (predicted)
  - Mass Intrinsic Solubility (predicted) g/L
  - Mass Solubility (predicted) g/L
  - Molar Intrinsic Solubility (predicted) mol/L
  - Molar Solubility (predicted) mol/L
  - Molecular Weight
  - pKa (predicted)
  - Vapor Pressure (predicted) Torr
- Density
  - Density - can search both experimental and predicted values, or experimental values only g/cm3
  - Molar Volume (predicted) cm3/mol
- Electrical (experimental only)
  - Electrical conductance - S
  - Electrical conductivity - S/cm
  - Electrical resistance - ohm
  - Electrical resistivity - ohm*cm
- -Lipinski (predicted only) - These are properties used to predict potential drug applications.
  - Freely Rotatable Bonds
  - Hydrogen Acceptors
  - Hydrogen Donor/Acceptor Sum
  - Hydrogen Donors
- Magnetic
  - Magnetic Moment (experimental) - muB
- Mechanical
  - Tensile Strength (experimental) - Mpa
- Optical and Scattering
  - Optical Rotatory Power (experimental) - degrees
  - Refractive Index (experimental)
- Structure Related
  - Molar Surface Area (predicted) - A2
- Thermal
  - Boiling Point (experimental and predicted, or experimental only) Note that SciFinder-n does not allow you to - specify the pressure at which the bp is measured. - kdeg C
  - Enthalpy of Vaporization - kJ/mol
  - Flash Point - deg C
  - Glass Transition Temperature - def C
  - Melting Point - deg C
Note that new searchable properties are added from time to time. Also note that for property values, units are specified. (In STNext, the searcher can choose which unit system to use. SciFinder does not have any units conversion facility.)
Also note, unlike Reaxys and STNext, SciFinder does not allow you to specify conditions, such as pressure for boiling points.
Advanced substance search fields may be combined with structure drawing searching. See Lecture 14 for mroe information.

Supplier Searching

SciFinder-n supplier search screen

Supplier searches are available as direct searches in SciFinder-n, though supplier information is always linked from a Registry Record where aailable. You may search by chemical name CAS Registry Number or structure (for structure searching, see Lecture 14)
Entering a chemical name opens a dropdown menu of possible search terms beginning with those letters.
Below, see some of the results for a supplier search for "iron pentacarbonyl".

SciFinder-n supplier search results for iron pentacarbonyl, part 1

SciFinder-n supplier search results for iron pentacarbonly, part 2

SciFinder-n supplier search results for iron pentacarbonyl, part 3

t the upper right are Download and E-mail options. You amy download the records in an Excel file or PDF. Note that there is noSave or Alert options for supplier searching.
Alongside the Filter Behavior header is the total number of suppliers in the answer set. To the right is the drop-down Sort menu. The default is Relevance. but you can also sort by Supplier name (alphabetically or reverse alphabetically), Ships within (that is, how quickly the supplier can ship the desired substance), and Purity. J.
On the left are the Filter options, wit the choice of Filter by or Exclude:
- Preferred Suppliers - This is a preference you can set up in your account, creating a list of preferred suppliers.
- Supplier - Displays the top five suppliers by frequency of appearance. You can click the See All link to get a tfull list.
- Purity - 99%+, 95-98% or 90-94%
- Quantity - Milligrams, grams, kilograms or bulk.
- Ships Within - 1, 2, 4, or 8 weeks.
- Stock Status - Maintained in stock, Typically in stock, or Synthesized on demand.
- IOrder from Supplier - indicates the presence of a link to the supplier's website.
- Country/Region
Below that is a Filter content Report button ,to create an Excel spreadsheet of selected filter values for this answer set.
In the brief records:
- Supplier name and country
- CAS Registry Number and Substance name (may contain purity information.)
- Purity (if available)
- Purchasing details - Includes Link to supplier (if available), Quantities available,
- Availability - In stock, etc.
- If you click on the link in the supplier name filed, you get the Supplier Detail. See below for example.

SciFinder-n supplier detail for Strem Chemical, part 1

SciFinder-n supplier detail for Strem Chemical, part 2

SciFinder-n supplier detail for Strem Chemical, part 3

Supplier Detail includes:
- Contact Information; Name, Website URL, E-mail address, Phone number(s) Next to the name are click buttons to set this supplier as "Preferred" or "Non-preferred"
- Substance information: CAS RN, chemical name and structure diagram (if available)
- Item details - includes name, order number quantities, When price was last updated, Link to supplier's website (where available)
- Additional Contact Information - such as mailing address, fax number if available.

Sequence Searching (see also SciFiner Sequence Search Help)

Though the CAS REGISTRY File has long contained records for biosequences (proteins and polynucleotides), Now there is Sequence searching in SciFinder-n, across a file containing the Registry biosequences, plus over 550 millionn additional sequences from the patent literature, plus the contents of the NCBI protein and nucleic acid databases, for a total of over one billion searchable biosequences.

Clicking te "Search CAS Sequences" icon on the SciFinder-n homepage will open a new tab with the Search CAS Sequences screen below.

SciFinder-n sequence search screen

SciFinder-n's sequence searching currently has three search options:
- The first (see above image) uses BLAST (Basic Local Alignment Search Tool), a common program for searching biosequences in many databases. It is essentially a similarity search for biosequences. Sequences are described using the IUPC standard single-letter codes for amino acids or nucleotides. You can find the IUPAC nucleotide and amino acid abbreviations at https://www.bioinformatics.org/sms/iupac.html
- The second, CDR (Complementarity Determining Regions) is used for finding particular protein sequences which bind to antibodies or t-cells..
- The third, Motif, searches for short patterns in DNA, RNA, or proteins with queries enabled for additional variability.

BLAST Searching

You may enter your sequence or sequences for searching either by typing in the sequence, or by uploading a .txt file with one or more sequences for searching. You may enter a maxiuum of 100 sequences at a time.
Note on the right hand side, you can select whether your are searching for nucleotide or protein sequence. You also specify whether you are searching within the nucleotide or protein sequences. This is because every nucleotide sequence translates inot a protein sequence, and vice versa. So, you can enter a nucleotide sequence and find all the protein records that contain the peptide sequence that the nucleotide sequence would translate into.
Use the drop-down menu to set the upper limit on the number of sequences retrieved. The default is 100; the range is 10 to 20,000.
Then, click the Start Sequence Search button to begin searching. Note that BLAST searching can be time-consuming. You will generally not receive an answer set immediately, but will be notified when it appears in your results history list.
The Advanced Sequence Search drop-down menu lets you fine-tune your search parameters. The display will vary depending on which combination of nucleotide/protein and within nucleotide/protein you have selected. Below is an example for the protein/within protein combination.

SciFinder-n advanced biosequence search menu

Sequence BLAST Search Results

Note: BLAST sequence searching can be time-consuming! Unlike general reference, substance and reaction searching, a BLAST search in SciFinder=n can take hours. So, SciFinder-n creates a record in your search history which displays "Searching" while the search proceeds. While it is searching, you may go on to do other SciFinder-n tasks, or even log off and return later. When the search is complete, you'll see a display like the one below.
The search history gives the date and time the search was initiated, a describtion of the type of search done, the sequence(s) searched, and links to View Results or Edit Search and redo it.
Note that while the search is saved in your Seaerch History, you cannot, at present, create Search Alerts for Biosequence searches as you can for Reference, Substance or Reaction searches.

SciFinder-n biosequence search history link

Click on the View Results button to go to a display liek the one below.
Below the sequences header is the number of sequences retrieved (the maximum was defined by you in setting up the search, but there may be fewer.) At the right, is a drop down menu with a choice of Expanded (default) or Collapsed display.
At the left is the search description, followed by a link to the Biocape Analysis tool (see below), and the filters for BLAST search results.
- E-value - TStands for Expect value. It represents the chance that the target would have been found by random chance.
- Query coverage - What percent of the query sequence is included in the subject sequence
- Subject coverage - What percent of the subject sequence is included in the query sequence
- Sequence identity - What percentage of the two sequences are identical; essentially the product of the two above percentages.
The individual results record:
- Shows the alignment of the query and target dequences. At right is the number of mathces (amino acids i this example) and mismatches.
- Below that are three tabs:
  - Assignment - Gives the BLAST score, the E-value and the matching sequences
  - Subject - Gives the subject sequence's length, and its full sequence .
  - References - Gives the bibliographic references for the source patent(s) of the subject sequences. At the right is a References link which will take you to the SciFinder-n Reference records for the patents in question, including full family detail and PatentPak links, if available. If the sequence is from the NCBI databases, you'll find a link to the NCBI record for the sequence, ehich will contain more detailed information.
Formore on results, see the Sequence Results Page -BLAST help page, https://scifinder-n.cas.org/help/#t=Working_with_Search_Results%2FBiosequences%2FBLAST%2FBiosequences_Page.htm

SciFinder-n BLAST results for human insulin protein sequence, part 1

SciFinder-n BLAST results for insulin sequence, part 2

SciFinder-n BLAST search results for insulin sequence, part 3

Bioscape Analysis

Bioscape is a visualization tool for the results of BLAS Sequence searches in SciFinder-n. Clicking on the Create Bioscape Analysis button opean aa new window/tab withthe results of the Bioscape analysis. An example is show below.

SciFinder-n Bioscape for BLAST search results

At the left are two toggle buttons. The top button opens a bar graph of the Sequence Similarity of the sequences displayed, and allows you to select a range of similarities to display. The second Search, allows you to search for patents by keywords in the title or claims, or by Legal status: Undetermined, Active, Inactive or Pending.
The button at middle center lets you select the view of the sequence field.
The "peaks" represent sequences. Their distribution represents similarity of the sequence to the query sequence, and to one another. The "height" of each peak represents the number of patents containing the sequences.
Clicking on a "peak" opens and pox showing the sequence length, the number of patents and the CAS Registry Number or numbers associated with the sequence. Clicking ton that link will take you t the Substance record or records for that sequence. This is the most direct way to go from the Sequence search to the Substance records, and from there to other References besides the initial patents that may contain the sequence in question.
At lower right, clicking on the camera icon lets you capture a .PNG image of the Bioscape display.
For more information on Bioscape Analysis, see: https://scifinder-n.cas.org/help/#t=Working_with_Search_Results/Biosequences/Bioscape_Analysis.htm

Biosequence CDR Searching

SciFinder-n sequence search using CDR

CDR searching applies only to protein sequences, unlike BLAT or Motif searching, which can also apply to nucleotide sequences.
You can enter up to three CDR sequences for searching together.
Limit options allow you to specify the maximum number of retrievals (which can seave search time.)
CDR results are returned to your Search History, just as for BLAST results. The display features are essentially the same as for BLAST searching above.

Sequences Motif Searching

SciFinder-n sequence search by Motif

Motif searching is very similar to BLAST searching. You can search either protein or nucleotide sequences atgainst either the nucleotide or protein sequence collections. You may use Advanced Biosequence Search to focus your search. The only main difference is that since Motif looks at short sequences only, there is no cuntion for uploading sequences -all must be keyed in directly.
Motif search results are sent to your search history just as in BLAST, and the results display has the same basic features as for BLAST results above.

This work by Charles F. Huber is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Based on a work at guides.library.ucsb.edu

CHEM 184/284 (Chemical Literature) - Huber - Winter 2025: Lecture 13