Skip to content

Specimen Modeling and Relations

Background

The Allen Institute aims to support FAIR, which includes the publication, sharing, dissemination, and exploration of data and datasets. This requires the interoperability of data and metadata for the data and datasets that are housed at AIBS or collaborating institutions/centers. To this end, we require that datasets submitted through AIBS include some effort at standardization before publication. Preferably, this requirement is met before submission, but submitters need to know the format the data must take and the schema that the data must adhere to prior to data submission.

This document attempts to describe an initial attempt at defining such a schema for specimens and processes that involves specimens as they relate to the data and datasets of interest to AIBS so that we can enable publication, searching, sharing, and exploration.

Overview

This schema is designed to be sufficiently general so as to support multiple specimen types as well as multiple types of assays (in which specimens can function as participants).

Metadata Integration

In an effort to enable integration, search and comparison of data, it is recommended that data is annotated using an ontology term. Ontology terms for metadata should use an OBO-format identifier, which is a CURIE that includes a prefix and then an ID string. If there is no appropriate term for use, then the user can request that we create a new term for them. However, it is not the case that every request will be granted as these are regulated by the ontology curators and curatorial efforts are constrained by both practical and theoretical considerations.

For example, if one was submitting data regarding interneurons, the appropriate term CURIE would be CL:0000099 (interneuron). Note that the prefix indicates to which ontology the term belongs and the string following the prefix is a unique numerical sequence.

General Requirements

As each assay and specimen type will have slightly different features/characteristics, the requirements for each may vary depending on specimen type or assay type.

Organisms

Data regarding organisms should annotated using NCBI organismal classification. Each species has a unique ID in NCBI and this is why it is preferred over common name or binomial name. Currently, AIO supports a subset of these already, listed in the table below. If there is a species that is missing, users can request that we add NCBI species ID.

Common Name NCBITaxon ID Binomial Name
human NCBITaxon:9606 Homo sapiens
mouse NCBITaxon:10090 Mus musculus
grivet NCBITaxon:9534 Chlorocebus aethiops
arctic ground squirrel NCBITaxon:9999 Urocitellus parryii
nine-banded armadillo NCBITaxon:9361 Dasypus novemcinctus
common chimpanzee NCBITaxon:9598 Pan troglodytes
domesticated ferret NCBITaxon:1353796 Mustela furo
western gorilla NCBITaxon:9593 Gorilla gorilla
southern pig-tailed macaque NCBITaxon:9545 Macaca nemestrina
gray short-tailed opossum NCBITaxon:13616 Monodelphis domestica
Nancy Ma's night monkey NCBITaxon:37293 Aotus nancymaae
brown rat/common rat NCBITaxon:10116 Rattus norvegicus
crab-eating macaque NCBITaxon:9541 Macaca fascicularis
European rabbit NCBITaxon:9986 Oryctolagus cuniculus

Specimens

A specimen is a material entity that has the specimen role, where a specimen role is simply a particular intent to use that entity in an investigation. Since specimens have no restriction in terms of specific material form (i.e., they can be anything at all so long as there is intent to use them in an investigation), specimens are typically categorized in terms of their material form. For example, here is a list of common specimen types:

Label Ontology ID Definition Notes/Synonyms
cell specimen OBI:0001468 A specimen primarily composed of a cell or cells collected from a multicellular organism or a cell culture.
single cell specimen EFO:0007831 A sample specimen consisting of exactly one cell.
neuroblast CL:0000031 A cell that will develop into a neuron often after a migration phase.
neuron CL:0000540 The basic cellular unit of nervous tissue. Each neuron consists of a body, an axon, and dendrites. Their purpose is to receive, conduct, and transmit impulses in the nervous system.
GABAergic neuron CL:0000617 A neuron that uses GABA as a vesicular neurotransmitter. synonym: GABA-ergic neuron
hippocampal neuron CL:0002608 A neuron of the hippocampus.
interneuron CL:0000099 Most generally any neuron which is not motor or sensory. Interneurons may also refer to neurons whose axons remain within a particular brain region as contrasted with projection neurons which have axons projecting to other brain regions.
pyramidal neuron CL:0000598 Pyramidal neurons have a pyramid-shaped soma with a single axon, a large apical dendrite and multiple basal dendrites. The apex and an apical dendrite typically point toward the pial surface and other dendrites and an axon emerging from the base. The axons may have local collaterals but also project outside their region. Pyramidal neurons are found in the cerebral cortex, the hippocampus, and the amygdala.
peripheral neuron CL:0000111 A neuron that is part of nerve found outside the central nervous system.
sensory neuron CL:0000101 Any neuron having a sensory function; an efferent neuron conveying sensory impulses.

Assays

Since any material entity can function as a specimen (and in effect, be a specimen), it is important to track specimens according to the process in which they participate. They will all participate in an investigation, but the specific part of the investigation they participate in may also be of interest to a researcher. In most cases (in neurosciences), specimens participate in an investigation through an assay. An assay is a planned process with the objective to produce information about the material entity that is the evaluant, by physically examining it or its proxies. In this case, the material assays we call "specimens" serve as the evaluants of an assay. The result is some information (data item or data set) that is about the evaluant (specimen). The material entity may also be changed and used in another assay, but we shall ignore that for the time being.