CHEM 184/284 (Chemical Literature) - Huber - Winter 2022: Lecture 13

A two-credit course in the techniques and tools for effective searching the literature of chemistry, biochemistry, chemical engineering and related fields.

Lecture 13: SciFinder, Part 3 - Exploring Substances: Substance Displays, Name Searching, Molecular Formula Searching, Property Searching, Biosequence Searching

Searching for Substances

SciFinder-n search for substances opening screen

  • This is the opening screen for substance searching in SciFinder-n. In the CAS databases, chemical substances include: simple organic and inorgagic substances, polymers, biomacromolecues (such as proteins and nucleic acids), metals and alloys, mixtures and more. Each substance receives its own CAS Registry Number (about which more below), including isotopically-labeled substances, stereoisomers, salts and ions of differing charges.
  • As indicated, you can use the search box to search by:
    • Chemical Name - This includes trade names (e.g. Teflon), generic drug names (e.g. ibuprofen), common chemical names (acetone), acronyms (EDTA), systematic chemical names, and CAS inverted chemical names.
      • If you search by a single word chemical name, you may also retrieve substances in which your term is apart of the name. If you wish to retrieve only the single substance, put the word in quotes.
      • Chemical name searching may not retrieve all the variations on a substance, such as stereochemical variants, isotopically-labeled substances, the mineral version of a salt, etc.
      • You can use the  asterisk wildcard to truncate single word names, and use quotation marks to enclose phrases.
      • You can enter multiple chemical names at the same time to find multiple substances.They must be separated by a space, not by commas or any other punctuation.
    • CAS Registry Numbers - This is the Chemical Abstracts Service ID number for substances. Like a Social Security number, or a UCSB perm number, the number contains no information about its subject. It is purely an identifier..
    • Note: At present, you cannot directly search other chemical identifiers, such a SMILES strings, InChI numbers or InChI keys in SciFinder-n. You can, however, use SMILES or InChI identifiers in the structure drawing tool to generate a starting structure for searching. See lecture 14.
    • Document identifiers - Patent numbers, DOIs PubMed IDs and CAS Accession Numbers can be used to retrieve the substances contained in the document identified.
  • You may also enter a DOI for a document or a patent number, and retrieve the substances indexed in that document or patent.
  • To the right of the search box is the Draw button. Clicking it opens the SciFinder-n structure drawing tool, for finding substances by chemical structure. This will be discussed extensively in Lecture 14.
  • Below the search box is the link for Advanced Search, which will be discussed in detail below. You can use it to search by: You can add multiple advanced search fields if desired  Unlike the advanced search fields in Reference searching, fields in Substance searching are automatically combined with AND.

Searching by Chemical Name; Substance Answer Sets

SciFinder-n substance search using chemical names

  • Above is a substance search using four common names of over-the-counter analgesic and anti-inflammatory drugs.
  • Note that in SciFiner-n, you can use Boolean operators, wildcards and parentheses in Substance name searching Wildard searching only trucates the specific term to which it is applied, not the whole of a complex hame. However, the terms you enter may be seaerched within names. Pay close attention to your results sets to determine whether you are retrieving the answers you expect to get from your name search.
  • Below are he results of that search.

SciFinder-n substance answer set, part 1

SciFinder-n substance answer set, part 2

SciFinder-n substance answer set, part 3

  • Looking at the display above, notice:
  • To the immediate right of the Substances header is the number of substances in the answer set (4)
  • Further to the right is the drop-down Sort menu. The default sort is Relevance. Also available are CAS Registry Number (RN), Molecular Formula, Molecular Weight, Number of References, (all in ascending or descending order) and Number of Suppliers descending order only.) Sorting by CAS RN is essen tially sorting by when the substance was added to the Registry database, the larger the number, the more recent the addition. 
  • To the right of that is the Record View drop-down menu. Dafault is Partial; option is Full.
  • Below that, left to right, are tabs for retrieving References, Reactions or Suppliers associated with selected records or the entire answer set.
  • To the right are the icons for Download, E-mail and Save and Alerts. For Substances, the download options are Excel, PDF, RTF and SDF. SDF stands for Substance Description File. Note that there are limits on how many substances you can dowload; for most formats it's 1000 records, for Excel files it's 100 records at a time. If you have a larger answer set, you'll need to break it up into smaller chunks.
  • To the left are the Filter options for Substances. As with References, you may opt to either Filter by or Exclude a given parameter.  Again, only options that are relevant to your answer set appear. As with References, the top five possibilities display. If there are more, click on See More to get up to 10 answers, or a full table.
    • Commercial Availability - whether or not suppliers are avaiable for a substance
    • Reaction Role - What roles in reactions does a substance play? Product, Reactant, Reagent, Catalyst, Solvent. If a substance appears in a given role in even one reaction, it will be listed here.
    • Reference Role - This is the counterpart to the Substance Role filter for References. If a substance has a given role in at least one reference, it will appear here. To view the table of all Reference Roles for substances in the answer set, click the View all link.
    • Stereochemistry - Is there at least one stereochemical center in the answer structure?
    • Number of Components - Salts, mixtures, copolymers, alloys, etc. will have more than one component.
    • Substance Class - Examples: Organic/Inorganic Small Molecule, Polymer, Biosequence, Mixture, etc.
    • Isotopes - Are there any isotopically-labelled substances in the answer set?
    • Metals - Do any of the substances in the answer set contain metals?
    • Molecular Weight - Lets you specify a range of molecular weights to filter your results.
    • Experimental Property - Lists the experimental properties (not the values of the properties) available for substances in the answer set.
    • Experimental Spectrum -  Lists the types of experimental spectra available for substances in the answer set.
    • Regulatory Data by Country/Region - Is regulatory information available in the database for substances in the answer set, broken down geographically.
    • Regulatory Data by List - Same as the above, but broken down by list, such as EINECS or NIOSH.
    • Bioactivity Indicator - Lists biological activities that have been studied for substances in the answer set.
    • Target Indicator - Lists biological targets (e.g. enzymes) for which substances in the answer set have been studies.
    • Search Within Results - Lets you open the structure drawing tool to search within the answer set for a particular structure of substructure and require or exclude that structure. Note that with small answer sets (that is, almost anything less than the full Registry file), even small structure fragments can be successfully searched.
  • Below the filter list is Filter Content Report, which generates an Excel spreadsheet of selected filter data for this answer set.
  • On the right are the brief records for the substances in the answer set. Note that right-clicking on links here, as elsewhere in SciFinder-n, will open a new tab or window containing the linked information. Thisi can be handy for moving back and forth between an answer set and individual answers.
    • At the top of each record is the CAS Registry Number (or CAS RN) for the substance. Clicking on the RN will take you to the full record for the substance (see below for examples.)
    • To the right of the RN is an Expand link, which displays a more extensive brief record (see below for the aspirin record.) This view adds key physical property data (where available) anda link to the experimental properties and spectra table for the substance.

SciFinder-n expanded view of aspirin brief record

  • Next you see the 2-D structure of the substance. Stereochemical bonds, if any, are indicated. Note that structures are only displayed i fthere is a knkown structure for the substance, and it contains 255 or fewer non-hydrogen atoms. Thus, most biosequences do not have a displayable 2-D structure. If you click on the structure, you get a pop-up "quick view" of the substance record (see below). The quick view includes the CAS RN, a brief name, the structure diagram, and links to Substance Detail (the full substance record), Reactions (all reactions in which the substance is indexed), Synthesize (all reactions in which the substance is a product), Start Retrosynthetic Analysis (see Lecture 15 for a discussion), References (all references in which the substance is indexed), and Suppliers (all suppliers in the database for the substance.) The Edit Structure link opens the structure drawing tool and enters the substance's structure as a starting point for creating a new structure search.

SciFinder-n substance quick view for aspirin

  • Teturning to the brief record, below the structure are:
    • Molecular Formula in Hill order. Hill order is carbon, then hydrogen, then all other elements in alphabetical order. If carbon is not present, then all elements, including hydrogen, are in alphabetical order
    • Substance Name - in this case, the name which was used as a search term.
    • Links to References, Reactions and Suppliers in which the substance appears.

CAS Registry Numbers

     CAS Registry Numbers were first assigned to substances by Chemical Abstracts Service in the 1960s when they created a computerized database of substances to aid their indexers in determining whether a substance in a document they were indexing had previously appeared in the literature. CAS RNs are of the form: xxx-xx-x where the first number is 2-7 digits long, the second number is always two digits long and the third number is a check digit generated by an alogrithm from the previous digits insuch a way that most common mistakes in entering an RN would generate an invalid RN, rather than the RN for the wrong substance.  

     Every unique chemical substance gets its own RN, including stereoisomers, isotopically-labeled substances, mixtures, etc. One excepption to this is that polymers which only differ in chain length or molecular weight do not get different RNs, nor do plastics which differ only in how tey were processed. This is a long-standing CAS indexing policy, somehwat to the regret of scientists working in the plastics industry.

     Note that CAS RNs are purely identification numbers, and do not convey any information about the structure or properties of the substances they represent. Most RNs are assigned by indexers in the course of indexing documents. Some are assigned at the request of chemical manufacturers or government agencies, and represent substances which have no published references. Note, too, that CAS RNs are the property of Chemical Abstracts Service and are not in the public domain. Reaction to this led to the creation of the InChI system (International Chemical Identifier) as an alternative which would be freely available to anyone.

Substance Detail (Full Substance Records)

Below is the substance detail for aspirin

SciFinder-n substance detail for aspirin, part 1

 

SciFinder-n substance detail for aspirin, part 2

  • From the top: 
    • Links to References, Reactions, Suppliers for this substance.
    • CAS Registry Number
    • 2-D Structure Diagram (Clicking on the structure gives the same pop-up window as clicking on the structure in the brief record shown above.)
    • Molecular Formula in Hill order
    • CAS Systematic Chemical Name (in inverted order)
    • Key Physical Properties (properties shown varies depending on the substance. These are fairly typical for a common organic molecule.)
    • Then a series of drop-down lists, beginning with: Other Names and Identifiers These include the canonical SMILES string, where available, and any trade, generic and other chemical names used for the substance. This list can be VERY long - polyethylene has over 1000 names in its list!
    • Experimental Properties - These are given in tabular form. divided into tabbed sections by type of property. These sections will vary depending on what is available for the substance. Each property may or may not give actual numeric values, and may or may not have conditions associated with them (such as pressure for boiling points). All will have a link to the reference from which the property information was obtained.)
    • Experimental Spectra - These two are listed in tabular form. If a spectrum listed says View then the specrum itself is available in SciFinder-n. Click on the link to get the spectrum, with source detail. The spectrum may be scrolled up and down in size, or shifted left to right or up and down by clicking and dragging for better viewing. The spectrum may be freely downloaded as a JPG image. The SciFinder-n spectra do not, in general, give peak assignments. If the spectrumd does not say vieew, then it will link to the SciFinder-n record for the source document.
    • Predicted Properties - This table of properties is calculated from the chemical structure with software creaated by ACDLabs and licensed by CAS. Among the tabbed lists for aspirin, you will see one labeled "Lipinski". These are named for Christopher Lipinski, who, while at Pfizer, described a set of five properties which could be used to determine whether a given chemical would be orally active as a drug. These involve molecular weight, acid-base properties and the relative solubility in water vs. organic solvents.
    • Predicted Spectra - These are also generated by ACDLabs software, and may be downloaded like the experimental spectra mentioned above.
    • Bioactivity Indicators - A hierarchical list of the broad and narrow bioactivities described in the literature for the substance. Each bioactivity has the number of current documents containing the information. Clicking on the name of the boactivity creates a Reference list of the relevant documents, which may then be maniupulated like any other SciFinder-n reference list.
    • Target Indicators - Hierarchical list of the proteins (including enzymes) with which the substance has been shown to interact. Like the bioactivity indicators, the number of papers is shown for each protein target, and clicking on the link generates a Reference list of those papers. Note: For a widely-tested drug like aspirin, this list is VERY LONG!
    • Regulatory Information - Lists the names under which is substance is known in national regulations, and the countries which regulate it, and the names of the documents in which the regulation appears. This information is derived from the CAS database, CHEMLIST. Note that SciFinder-n does not contain, or link to, the actual regulatory documents.
    • Additional Details - (not visible in the image above) Includes a list of  Document Types in which the substance is referenced; Substance Classes to which it belongs, and Deleted CAS Registry Numbers. Deleted RNs occur when an indexer identified a substance in a document as a new substance, assigned a RN, and it is later found to be the same as a previously known substance. The newer RN is then deleted from the Registry file. However, since it is still attached to the original document(s), SciFinder=n automatically searches all the deleted RNs when you search for references to a substance - so you don't miss out on anything!.

Sample Records for Other Classes of Substances

Polymers/Plastics

SciFinder-n substance detail for styrene-butadiene copolymer

SciFiner-n substance detail for styrene-butadiene copolymer, part 2P

  • Above is the Substance Detail for a styrene-butadiene copolymer, with the Experimental Properties section expanded.
  • Note how the two monomers are treated as individual compenents of the polymer. Some polymers are graphically described with the structure repeading units (SRU) instead.
  • Note the molecular formula gives the two monomers in descending order of molecular formula in Hill order, enclsed in parentheses with an x subscript. Molecular formulas for SRU polymers use an n subscript. Both the n ad x indicate an indeterminate length polymer.
  • Note how the systematic name is written. Copolymers use "copolymer with"; homopolymers use "homopolymer"  There are also Registry Numbers ofr "block" and "graft" polymers.
  • Note how the Experimental Properties include categories relevant to plastics, such as Flow and Diffusion and Mechanical.
Biosequence

SciFinder-n substance detail for human insulin, part 1

SciFinder-n substance detail for human insulin, part 2

  • Above is the Substance Detail for a Registry Number for human insulin, with the Sequence Details, a section unique to biiosequece records, expanded.
  • Note how there is no structure diagram given. Human insulin has more than 255 non-hydrogen atoms, so the Registry Record cannot record a 2D structure for the molecule. However, see the Sequence Details section.
  • Note that below the name, it is identified as a Protein/Peptide Sequence, the total sequence length (in amino acids) as well as the lengths of the two sub-chains, the protein is identified as Multichain, and there is a link to Related Sequences. Since CAS assigns separate Registry records for each distinct protein or polynucleotide, and sell as distinguishing by source organism, there are many other "insulin" records besides the one we tretrieved. Clicking on the link creates a Substance answer set of all the related sequences.
  • In Sequence Details, you get the amino acid sequences for each subchain, as well information on the number, types and locations of modifications to the chairs (in this case the Cys-Cys bridges between the two subunits.) The sequences are gtiven in standard one-letter codes for each amino acid, familiar to protein chemists. Polynucleotide chains use the standard A,T,C,G, and U
  • Be aware that in SciFinder-n (at present) there is no way to directly search for biosequences, subsequences or similarity, though remor has it that they may be adding this capability in the near future. The STN version of the Registry database does allow sequence and subequence searching, with gaps, wildcards and so forth, as well as BLAST similarity searching. BLAST searching is also available in public swquence databanks like those at the National Center for Biotechnology Information (NCBI).
Alloys and Tabular Inorganics

SciFinder-n substance detail for monel alloy, part 1

SciFinder-n substance detail for monel alloy, part 2

  • Above is the Subtance Deetail for a Registry record for monel alloy, with the Experimental Properties section expanded.
  • Note the tabular composition display. This is common for metal alloys, as well as some other types of nonstoichiometric inorganic substances, like the high-temperature superconducting perovskites. The first column lists the components (usually elements, though occasionally metal oxides), the secon column the molar percentage (or range of percentages) of each component, and the third column the CAS RN for the component.
  • Below that is the molecular formula in Hll order. Notice that there are no subscripts given for the elements. Below that is the number of components, and the CAS systematic name. The element with the highest molar percentage is considered the "base" of the alloy,  The percentage ranges for it and the other elements are included in the name.
  • Again, note the the Experimental Properties categories given are ones appropriate for a metal alloy, scuh as Electrical and Mechanical.
Mixtures

SciFinder-n substance detail for metformin-glipizide mixture

  • Above is the Substance Detail for a mixture of metformin and glipizide (two drugs usd for the treatment of hypertension, that is, high blood pressure, in humans.)
  • Note that in addition to the mixure Registy Number, the structures and Registry Numbers for each component of the mixture are given.
  • The molecular formula gives the Hill order molecular formula of each component in descending alphabetical order. Note that no specific rations of the two components are given. (CAS now has a database, Formulus, that gives detailed information on formulations, including states of matter, coatings and the like, for drugs and agrochemicals. It is a separate product from SciFinder-n, aimed at industrial users. However, when you retrieve references for substances which in in formulations, you will see a Filter for formulation information appear. This can be useful on itentifying which documents have detailed formulation information int heir full text.)
  • Typically, for mixtures there is no experimental or  predicted property informatino given, but for mixtures used as drugs or agrochemicals, there is frequently bioactivity indicator data.

Advanced Substance Search (Molecular Formulas, Substance Properties, Experimental Spectra)

SciFinder-n advanced substance search, part 1

SciFinder-n advanced substance search, part 2

  • Just below the main keyword search window in Substance Search is a link to the Advanced Subtance Search (see image above), including Molecular Formula searech, Substance Property Search and Experimental Spectra search.
  • Advanced search fields include:
    • CAS Registry Number
    • Chemical Name
    • Document Identifier - Searching by document identifier will retrieve all substances indexed in the selected document.
    • Patent Identifier - Searching by patent identifier will retrieve all subtances indexed in the selected patent.
    • Experimental Spectra
      • Currently, searchable experimenta spectra include:
        • Proton NMR
        • Carbon-13 NMR
        • Nitrogen-15 NMR
        • Fluorine-19 NMR
        • Phosphorus-31 NMR
      • You may enter specific peaks in ppm, or ranges of ppm. Examples are given.
    • Biological
      • ​​​​​​​Bioconcnetration Factor (predicted) - specific values or ranges
      • Median Lethal Dose (experimental) in mg/kgspecific values or ranges. Note that you cannot specify the organism.
    • Chemical Properties
      • ​​​​​​​Koc (predicted)
      • LogD (predicted)
      • LogP (predicted)
      • Mass Intrinsic Solubility (predicted)
      • Mass Solubility (predicted)
      • Molar Intrinsic Solubility (predicted)
      • Molar Solubility (predicted)
      • Molecular Weight
      • pKa (predicted)
      • Vapor Pressure (predicted)
    • ​​​​​​​Density
      • ​​​​​​​Density - can search both experimental and predicted values, or experimental values only
      • Molar Volume (predicted)
    • ​​​​​​​Electrical (experimental only)
      • ​​​​​​​Electrical conductance
      • Electrical conductivity
      • Electrical resistance
      • Electrical resistivity
    • ​​​​​​​Lipinski (predicted only)
      • ​​​​​​​Freely Rotatable Bonds
      • Hydrogen Acceptors
      • Hydrogen Donor/Acceptor Sum
      • Hydrogen Donors
    • ​​​​​​​Magnetic
      • ​​​​​​​Magnetic Moment (experimental)
    • ​​​​​​​Mechanical
      • ​​​​​​​Tensile Strength (experimental)
    • ​​​​​​​Optical and Scattering
      • ​​​​​​​Optical Rotatory Power (experimental)
      • Refractive Index (experimental)
    • ​​​​​​​Structure Related
      • ​​​​​​​Molar Surface Area (predicted)
    • ​​​​​​​Thermal
      • Boiling Point (experimental and predicted, or experimental only) Note that SciFinder-n does not allow you to specify the pressre at which the bp is measured.
      • Enthalpy of Vaporization
      • Flash Point
      • Glass Transition Temperature
      • Melting Point
  • ​​​​​​​​​​​​​​Note that new searchable properties are added from time to time. Also note that for property values, units are specified. SciFinder-n does not have any units conversion facility.
  • ​​​​​​​Advanced structure search fields may be combined with structure drawing searching. See Lecture 14 for mroe information.

Supplier Searching

SciFinder-n supplier search screen

  • Supplier searches are available as direct searches in SciFinder-n, though supplier information is always linked from a Registry Record where avaialable. You may search by chemical name CAS Registry Number or structure (for structure searching, see Lecture 14)
  • Below, see some of the results for a supplier search for "iron pentacarbonyl".

SciFinder-n supplier search results for aspirin, part 1

SciFinder-n supplier results for aspirin, part 2

 

  • Alongside the Suppliers header is the total number of suppliers in the answer set. To the right is the drop-down Sort menu. The default is Relevance. but you can also sort by Supplier name (alphabetically or reverse alphabetically), Ships within (that is, how quickly the supplier can ship the desired substance), and Purity. Just below that are the Download and E-mail options. You amy download the records in an Excel file or PDF.
  • On the left are the Filter options:
    • Preferred Suppliers - This is a preference you can set up in your account, creating a list of preferred suppliers.
    • Supplier - Displays the top five suppliers by frequency of appearance. You can click the See All link to get a tfull list.
    • Purity - 99%+, 95-98% or 90-94%
    • Quantity - Milligrans, grams, kilograms or bulk.
    • Ships Within - 1, 2, 4, or 8 weeks.
    • Stock Status - Maintained in stock, Typically in stock, or Synthesized on demand.
    • IOrder from Supplier - indicates the presence of a link to the supplier's website.
    • Country/Region
  • Below that is a Filter content Report button ,to create an Excel spreadsheet of selected filter values for this answer set.
  • In the brief records:
    • Supplier name and country
    • CAS Registry Number and Substance name (may contain purity information.)
    • Purity (if available)
    • Purchasing details - Includes Link to supplier (if available), Quantities available, 
    • Availability - In stock, etc.
    • If you click on the link in the supplier name filed, you get the Supplier Detail. See below for example.

SciFinder-n supplier detail for iron pentacarbonly from Aldrich

 

SciFinder-n supplier detail for iron pentacarbonyl from Aldrich, part 2

  • Supplier Detatil includes:
    • Contact Information; Name, Website URL, E-mail address, Phone number(s)  Next to the name are click buttons to set this supplier as "Preferred" or "Non-preferred"
    • Substance information: CAS RN, chemical name and structure diagram (if available)
    • Item details - includes name, order number quantities, When price was last updated, Link to supplier's website (where available)
    • Additional Contact Information - such as mailing address, fax number if available.

Biosequence Searching

Though the CAS REGISTRY File has long contained records for biosequences (proteins and polynucleotides), Now there is Biosequence searching in SciFinder=n, across a file containing the Registry biosences, plus over 550 millionn additional sequences from the patent literature, plus the contents of the NCBI protein and nucleic acid databases, for a total of over one billion searchable biosequences

SciFinder-n biosequence search window

  • SciFinder-n's biosequence searching currently has three search options:
    • The first (see above image) uses BLAST (Basic Local Alignment Search Tool), a sommon program for searching biosequences in many databases. It is essentially a similarity search for biosequences. Sequences are described using the IUPC  standard single-letter codes for amino acids or nucleotides. You can find the IUPAC nucleotide and amino acid abbreviations at https://www.bioinformatics.org/sms/iupac.html
    • The second, CDR (Complementarity Determining Regions) is used for finding particular protein sequences which bind to antibodies or t-cells..
    • The third, Motif, searches for short patterns in DNA, RNA, or proteins with queries enabled for additional variability.

BLAST Searching

  • You may enter your sequence or sequences for searching either by typing in the sequence, or by uploading a .txt file with one or more sequences for searching. You may enter a maxiumm of 100 sequences at a time.
  • Note on the right hand side, you can select whether your are searching for nucleotide or protein sequence. You also specify whether you are searching within the nucleotide or protein sequences. This is because every nucleotide sequence translates inot a protein sequence, and vice versa. So, you can enter a nucleotide sequence and find all the protein records that contain the peptide sequence that the nucleotide sequence would translate into.
  • Use the drop-down menu to set the upper limit on the number of sequences retrieved. The default is 100; the range is 10 to 20,000.
  • Then, click the Start Biosequence Search button to begin searching. Note that BLAST searching can be time-sonsuming. You will generally not receive an answer set immediately, but will be notified when it appears in your results history list.
  • The Advanced Biosequence Search drop-down menu lets you fine-tune your search parameters. The display will vary depending on which combination of nucleotide/protein and within nucleotide/protein you have selected. Below is an example for the protein/within protein combination.

SciFinder-n advanced biosequence search menu

Biosequence BLAST Search Results

  • Note: BLAST dequence searching can be time-consuming! Unlike general reference, substance and reaction searching, a BLAST search in SciFinder=n can take hours. So, SciFinder-n creates a record in your search history which displays "Searching" while the search proceeds. While it is searching, you may go on to do other SciFinder-n tasks, or even log off and return later. When the search is complete, you'll see a display like the one below.
  • The search history gives the date and time the search was initiated, a describtion of the type of search done, the sequence(s) searched, and links to View Results or Edit Search and redo it.
  • Note that while the search is saved in your Seaerch History, you cannot, at present, create Search Alerts for Biosequence searches as you can for Reference, Substance or Reaction searches.

SciFinder-n biosequence search history link

 

  • Click on the View Results button to go to a display liek the one below.
  • Next to the Biosequences header is the number of sequences retrieved (the maximum was defined by you in setting up the search, but there may be fewer.) At the right, is a drop down menu with a choice of Expanded (default) or Collapsed display.
  • At the left is the search description, followed by a link to the Biocape Analysis tool (see below), and the filters for BLAST search results.
    • E-value - TStands for Expect value. It represents the chance that the target would have been found by random chance.
    • Query coverage - What percent of the query sequence is included in the subject sequence
    • Subject coverage - What percent of the subject sequence is included in the query sequence
    • Sequence identity - What percentage of the two sequences are identical; essentially the product of the two above percentages.
  • The individual results record:
    • Shows the alignment of the query and target dequences. At right is the number of mathces (amino acids i this example) and mismatches.
    • Below that are three tabs:
      • Assignment - Gives the BLAST score, the E-value and the matching sequences
      • Subject - Gives the subject sequence's length, and its full sequence .
      • References - Gives the  bibliographic references for the source patent(s) of the subject sequences. At the right is a References link which will take you to the SciFinder-n Reference records for the patents in question, including full family detail and PatentPak links, if available. If the sequence is from the NCBI databases, you'll find a link to the NCBI record for the sequence, ehich will contain more detailed information.
  •  

SciFinder-n biosequence display, part 1

SciFinder-n biosequence display, part 2

SciFinder-n biosequence display, part 3

 

Bioscape Analysis

Bioscape is a visualization tool for the reults of BLAS Biosequence searches in SciFinder-n.  Clicking on the Create Bioscape Analysis button opean aa new window/tab withthe results of the Bioscape analysi. An example is show below.

SciFinder-n Bioscape analysis of biosequence search results

  • At the left are two toggle buttons. The top button opens a bar graph of the Sequence Similarity of the sequences displayed, and allows you to select a range of similarities to display. The second Search, allows you to search for patents by keywords in the title or claims, or by legal status: Undetermined, Active, Inactive or Pending.
  • The button at middle center lets you select the view of the sequence field.
  • The "peaks" reqpresent sequences. Their distribution represents. Their distribution represets the similarity of the sequence to the query sequence, and to one another. The "height" of each peak represents the number of patentss containing the sequences.
  • Clicking on a "peak" opens and pox showing the sequence length, the number of patents and the CAS Registry Number or numbers associated with the sequence. Clicking ton that link will take you t the Substance record or recorsd fo that sequence. This is the most direct way to go from the Biosequence search to the Subtance records, and from there to other References besides the initial patents that may contain the sequence in question.
  • At lower right, clicking on the camera icon lets you capture a .PNG image of the Bioscape display.
  • For more information on Bioscape Analysis, see: https://scifinder-n.cas.org/help/#t=Working_with_Search_Results/Biosequences/Bioscape_Analysis.htm

Biosequence CDR Searching

SciFinder-n biosequence CDR search window

  • CDR searching applies only to pretein sequences, unlike BLAT or Motif searching, which can also apply to nucleotide sequences.
  • You can enter up to three CDR sequences for searching together.
  • Limit options allow you to specify the maximum number of retrievals (which can seave search time.)
  • CDR results are returned to your Search History, just as for BLAST reults.  The display features are essentially the same as for BLAST searching above.

Biosequences Motif Searching

SciFinder-n biosequence Motif search window

  • Motif searching is very similar to BLAST searching. You can search either protein or nucleotide sequences atainst either the nucleotide or protein sequence collections. You may use Advanced Biosequence Search to focus your search. The only main difference is that since Motif looks at short sequences only, there is no cunction for uploading sequences -all must be keyed in directly.
  • Motif search results are sent to your search history just as in BLAST, and the results display has the same basic features as for BLAST results above.

© 2022 Charles F. Huber

Creative Commons License
This work by Charles F. Huber is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Based on a work at guides.library.ucsb.edu

Screenshots of SciFinder-n are copyright © 2020 by the American Chemical Society and are used under fair use for educational purposes only.


Copyright © 2008-2019 The Regents of the University of California, All Rights Reserved.
UCSB Library (805) 893-2478 • Music Library (805) 893-2641 • UCSB, Santa Barbara, CA 93106-9010
Contact UsPolicies