Class: Organism
A material entity that is a living or once-living individual. Organism instances represent the biological identity of what is in a sample, not the sample itself. Sub-species identity (strain, cultivar, lab isolate) is captured by slots on this class rather than via a separate Strain subclass.
URI: nmdc:Organism
classDiagram
class Organism
click Organism href "../Organism"
MaterialEntity <|-- Organism
click MaterialEntity href "../MaterialEntity"
Organism : alternative_identifiers
Organism : classified_as
Organism --> "*" NcbiTaxon : classified_as
click NcbiTaxon href "../NcbiTaxon"
Organism : description
Organism : estimated_size
Organism : gc_content
Organism --> "0..1" QuantityValue : gc_content
click QuantityValue href "../QuantityValue"
Organism : id
Organism : isolate_name
Organism : name
Organism : organism_genus
Organism : organism_species
Organism : ref_biomaterial
Organism --> "0..1" TextValue : ref_biomaterial
click TextValue href "../TextValue"
Organism : strain_name
Organism : type
Inheritance
- NamedThing
- MaterialEntity
- Organism
- MaterialEntity
Slots
| Name | Cardinality and Range | Description | Inheritance |
|---|---|---|---|
| classified_as | * NcbiTaxon |
Taxonomic classification of this organism | direct |
| organism_genus | 0..1 String |
Genus of the organism | direct |
| organism_species | 0..1 String |
Species of the organism | direct |
| strain_name | 0..1 String |
Strain or cultivar name of the organism | direct |
| isolate_name | 0..1 String |
Isolate or mutant name | direct |
| estimated_size | 0..1 String |
Estimated genome size, as integer base pairs | direct |
| gc_content | 0..1 QuantityValue |
Estimated GC content as a percentage | direct |
| ref_biomaterial | 0..1 TextValue |
Reference for the organism, preferentially a DOI when a primary publication o... | direct |
| id | 1 Uriorcurie |
A unique identifier for a thing | NamedThing |
| name | 0..1 String |
A human readable label for an entity | NamedThing |
| description | 0..1 String |
a human-readable description of a thing | NamedThing |
| alternative_identifiers | * Uriorcurie |
A list of alternative identifiers for the entity | NamedThing |
| type | 1 Uriorcurie |
the class_uri of the class that has been instantiated | NamedThing |
Usages
| used by | used in | type | used |
|---|---|---|---|
| Database | organism_set | range | Organism |
| OrganismSample | expected_organism | range | Organism |
Comments
- Organism instances are stored in organism_set. An Organism is not a sample — it is the biological entity that an OrganismSample is expected to contain, linked via expected_organism. Sub-species identity (strain_name, isolate_name) is captured directly on Organism.
See Also
- https://github.com/microbiomedata/nmdc-schema/issues/2959
- https://github.com/microbiomedata/nmdc-schema/issues/2803
- https://github.com/microbiomedata/nmdc-schema/issues/2971
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/nmdc/nmdc
Mappings
| Mapping Type | Mapped Value |
|---|---|
| exact | COB:0000022 |
LinkML Source
Direct
name: Organism
description: A material entity that is a living or once-living individual. Organism
instances represent the biological identity of what is in a sample, not the sample
itself. Sub-species identity (strain, cultivar, lab isolate) is captured by slots
on this class rather than via a separate Strain subclass.
notes:
- 'DEBATED — `estimated_size` and `gc_content` placement on Organism. Montana argues
these are analyte properties measured during sample QC (like concentration or absorbance)
rather than stable organism properties, and belong only in submission-schema. Counterargument:
genome size and GC% are reproducible biological properties of the organism that
are useful for downstream data integration. Keeping on Organism pending resolution.'
comments:
- Organism instances are stored in organism_set. An Organism is not a sample — it
is the biological entity that an OrganismSample is expected to contain, linked via
expected_organism. Sub-species identity (strain_name, isolate_name) is captured
directly on Organism.
from_schema: https://w3id.org/nmdc/nmdc
see_also:
- https://github.com/microbiomedata/nmdc-schema/issues/2959
- https://github.com/microbiomedata/nmdc-schema/issues/2803
- https://github.com/microbiomedata/nmdc-schema/issues/2971
exact_mappings:
- COB:0000022
is_a: MaterialEntity
slots:
- classified_as
- organism_genus
- organism_species
- strain_name
- isolate_name
- estimated_size
- gc_content
- ref_biomaterial
slot_usage:
id:
name: id
required: true
structured_pattern:
syntax: '{id_nmdc_prefix}:orgn-{id_shoulder}-{id_blade}$'
interpolated: true
classified_as:
name: classified_as
description: 'Taxonomic classification of this organism. Narrowed from the global
OntologyClass range (defined on the slot itself) to NcbiTaxon, since organism
identity at NMDC is anchored to NCBI Taxonomy. Per #3016 — the broader pattern
is to narrow `classified_as` to `NcbiTaxon` on all organism-oriented classes
via slot_usage.'
range: NcbiTaxon
estimated_size:
name: estimated_size
description: Estimated genome size, as integer base pairs. Reuses MIxS estimated_size
(MIXS:0000024). The JGI isolate field reports in megabases (Mb); values must
be converted to the MIxS integer-bp representation before validation and storage.
The submission portal should auto-populate the "bp" suffix and enforce integer
input.
structured_aliases:
- literal_form: Estimated Genome Size (Mb)
predicate: BROAD_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
structured_pattern:
syntax: ^[0-9]+ bp$
interpolated: false
ref_biomaterial:
name: ref_biomaterial
description: Reference for the organism, preferentially a DOI when a primary publication
or genome report exists; PMID and URL are also accepted per the MIxS ref_biomaterial
pattern (`{PMID}|{DOI}|{URL}`). Reuses MIxS ref_biomaterial (MIXS:0000025).
comments:
- The MIxS pattern accepts DOI, PMID, or URL. DOI is preferred when available
— it gives a stable reference to the publication or genome report. See the `associated_dois`
pattern elsewhere in the NMDC schema for DOI-structured alternatives.
- JGI "Reference Genome" submissions sometimes carry non-publication identifiers
such as IMG or Phytozome IDs, which do not match the MIxS pattern. Those are
out of scope for this slot and should be captured separately (see `gold_organism_identifiers`
and `insdc_nucleotide_identifiers` for genome / assembly references).
- The MIxS name ref_biomaterial may be renamed in a future MIxS release. See ongoing
MIxS renaming work.
examples:
- description: DOI form (preferred when a primary publication exists)
object:
type: nmdc:TextValue
has_raw_value: doi:10.1016/j.syapm.2018.01.009
- description: PubMed ID form
object:
type: nmdc:TextValue
has_raw_value: PMID:24296464
- description: URL form (e.g. NCBI Genome record)
object:
type: nmdc:TextValue
has_raw_value: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000016065.1/
structured_aliases:
- literal_form: Reference Genome
predicate: RELATED_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
class_uri: nmdc:Organism
Induced
name: Organism
description: A material entity that is a living or once-living individual. Organism
instances represent the biological identity of what is in a sample, not the sample
itself. Sub-species identity (strain, cultivar, lab isolate) is captured by slots
on this class rather than via a separate Strain subclass.
notes:
- 'DEBATED — `estimated_size` and `gc_content` placement on Organism. Montana argues
these are analyte properties measured during sample QC (like concentration or absorbance)
rather than stable organism properties, and belong only in submission-schema. Counterargument:
genome size and GC% are reproducible biological properties of the organism that
are useful for downstream data integration. Keeping on Organism pending resolution.'
comments:
- Organism instances are stored in organism_set. An Organism is not a sample — it
is the biological entity that an OrganismSample is expected to contain, linked via
expected_organism. Sub-species identity (strain_name, isolate_name) is captured
directly on Organism.
from_schema: https://w3id.org/nmdc/nmdc
see_also:
- https://github.com/microbiomedata/nmdc-schema/issues/2959
- https://github.com/microbiomedata/nmdc-schema/issues/2803
- https://github.com/microbiomedata/nmdc-schema/issues/2971
exact_mappings:
- COB:0000022
is_a: MaterialEntity
slot_usage:
id:
name: id
required: true
structured_pattern:
syntax: '{id_nmdc_prefix}:orgn-{id_shoulder}-{id_blade}$'
interpolated: true
classified_as:
name: classified_as
description: 'Taxonomic classification of this organism. Narrowed from the global
OntologyClass range (defined on the slot itself) to NcbiTaxon, since organism
identity at NMDC is anchored to NCBI Taxonomy. Per #3016 — the broader pattern
is to narrow `classified_as` to `NcbiTaxon` on all organism-oriented classes
via slot_usage.'
range: NcbiTaxon
estimated_size:
name: estimated_size
description: Estimated genome size, as integer base pairs. Reuses MIxS estimated_size
(MIXS:0000024). The JGI isolate field reports in megabases (Mb); values must
be converted to the MIxS integer-bp representation before validation and storage.
The submission portal should auto-populate the "bp" suffix and enforce integer
input.
structured_aliases:
- literal_form: Estimated Genome Size (Mb)
predicate: BROAD_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
structured_pattern:
syntax: ^[0-9]+ bp$
interpolated: false
ref_biomaterial:
name: ref_biomaterial
description: Reference for the organism, preferentially a DOI when a primary publication
or genome report exists; PMID and URL are also accepted per the MIxS ref_biomaterial
pattern (`{PMID}|{DOI}|{URL}`). Reuses MIxS ref_biomaterial (MIXS:0000025).
comments:
- The MIxS pattern accepts DOI, PMID, or URL. DOI is preferred when available
— it gives a stable reference to the publication or genome report. See the `associated_dois`
pattern elsewhere in the NMDC schema for DOI-structured alternatives.
- JGI "Reference Genome" submissions sometimes carry non-publication identifiers
such as IMG or Phytozome IDs, which do not match the MIxS pattern. Those are
out of scope for this slot and should be captured separately (see `gold_organism_identifiers`
and `insdc_nucleotide_identifiers` for genome / assembly references).
- The MIxS name ref_biomaterial may be renamed in a future MIxS release. See ongoing
MIxS renaming work.
examples:
- description: DOI form (preferred when a primary publication exists)
object:
type: nmdc:TextValue
has_raw_value: doi:10.1016/j.syapm.2018.01.009
- description: PubMed ID form
object:
type: nmdc:TextValue
has_raw_value: PMID:24296464
- description: URL form (e.g. NCBI Genome record)
object:
type: nmdc:TextValue
has_raw_value: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000016065.1/
structured_aliases:
- literal_form: Reference Genome
predicate: RELATED_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
attributes:
classified_as:
name: classified_as
description: 'Taxonomic classification of this organism. Narrowed from the global
OntologyClass range (defined on the slot itself) to NcbiTaxon, since organism
identity at NMDC is anchored to NCBI Taxonomy. Per #3016 — the broader pattern
is to narrow `classified_as` to `NcbiTaxon` on all organism-oriented classes
via slot_usage.'
comments:
- 'Taxonomy-oriented uses (e.g. on Organism) should point to NcbiTaxon instances.
OrganismSample reaches taxonomy indirectly via expected_organism.classified_as.
The global range stays OntologyClass; narrowing to NcbiTaxon via slot_usage
is tracked in #3016.'
from_schema: https://w3id.org/nmdc/nmdc
see_also:
- https://github.com/microbiomedata/nmdc-schema/issues/2959
narrow_mappings:
- biolink:in_taxon
rank: 1000
alias: classified_as
owner: Organism
domain_of:
- Organism
range: NcbiTaxon
multivalued: true
inlined: true
inlined_as_list: true
organism_genus:
name: organism_genus
description: Genus of the organism.
comments:
- Free-text submitter-provided genus name. For an ontology-grounded classification,
use `classified_as` with a NcbiTaxon instance on the parent Organism class.
examples:
- value: Shewanella
description: GOLD organism_v2 Go0000189 (Shewanella loihica PV-4, queried 2026-04-21)
- value: Ruegeria
description: GOLD organism_v2 Go0000514 (Ruegeria pomeroyi DSS-3, queried 2026-04-21)
- value: Campylobacter
description: GOLD organism_v2 (Go0000058, queried 2026-04-14)
in_subset:
- jgi_isolate
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
- literal_form: Genus
predicate: EXACT_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
rank: 1000
alias: organism_genus
owner: Organism
domain_of:
- Organism
range: string
organism_species:
name: organism_species
description: Species of the organism.
comments:
- Free-text submitter-provided species name. For an ontology-grounded classification,
use `classified_as` with a NcbiTaxon instance on the parent Organism class.
examples:
- value: loihica
description: GOLD organism_v2 Go0000189 (Shewanella loihica PV-4, queried 2026-04-21)
- value: pomeroyi
description: GOLD organism_v2 Go0000514 (Ruegeria pomeroyi DSS-3, queried 2026-04-21)
- value: sp.
description: GOLD organism_v2.species (n=37 records, queried 2026-04-30) — used
when the isolate has not yet been assigned a species name
in_subset:
- jgi_isolate
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
- literal_form: Species
predicate: EXACT_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
rank: 1000
alias: organism_species
owner: Organism
domain_of:
- Organism
range: string
strain_name:
name: strain_name
description: Strain or cultivar name of the organism.
comments:
- 'Microbial strain identifiers and plant cultivar names (governed by the International
Code of Nomenclature for Cultivated Plants, ICNCP) are nomenclaturally distinct,
but this slot accepts both for now to match the JGI Isolate (NA) v19 form''s
combined "Strain or cultivar" field. A separate `cultivar_name` slot may be
added if a plant-specific use case emerges; see #3056.'
- MIxS `subspecf_gen_lin` (MIXS:0000020) covers this concept along with cultivar,
serovar, biotype, ecotype, and other sub-species lineage types in a single slot
using a rank-prefix encoding (e.g. "strain:PV-4"). NMDC splits the concept into
separate slots; this slot covers the strain rank specifically.
examples:
- value: PV-4
description: GOLD organism_v2 Go0000189 (Shewanella loihica PV-4, queried 2026-04-21)
- value: DSS-3
description: GOLD organism_v2 Go0000514 (Ruegeria pomeroyi DSS-3, queried 2026-04-21)
- value: DSM 6724
description: GOLD organism_v2 Dictyoglomus turgidum (Go0000002, queried 2026-04-14)
in_subset:
- jgi_isolate
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
- literal_form: Strain or cultivar
predicate: EXACT_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
related_mappings:
- MIXS:0000020
rank: 1000
alias: strain_name
owner: Organism
domain_of:
- Organism
range: string
isolate_name:
name: isolate_name
description: Isolate or mutant name.
comments:
- MIxS `subspecf_gen_lin` (MIXS:0000020) covers this concept along with strain,
cultivar, serovar, biotype, ecotype, and other sub-species lineage types in
a single slot using a rank-prefix encoding. NMDC uses a separate slot for the
isolate rank specifically.
examples:
- value: Bd21-3
description: GOLD dw_sample_taxonomy_info.isolate (n=260 records, queried 2026-04-30)
— Brachypodium distachyon Bd21-3 reference accession
- value: MR164
description: GOLD dw_sample_taxonomy_info.isolate (n=555 records, queried 2026-04-30)
- value: Isolate
description: GOLD dw_sample_taxonomy_info.isolate (n=918 records, queried 2026-04-30)
— generic placeholder used when no specific mutant/isolate name is recorded
in_subset:
- jgi_isolate
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
- literal_form: Isolate
predicate: EXACT_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
related_mappings:
- MIXS:0000020
rank: 1000
alias: isolate_name
owner: Organism
domain_of:
- Organism
range: string
estimated_size:
name: estimated_size
annotations:
Expected_value:
tag: Expected_value
value: number of base pairs
description: Estimated genome size, as integer base pairs. Reuses MIxS estimated_size
(MIXS:0000024). The JGI isolate field reports in megabases (Mb); values must
be converted to the MIxS integer-bp representation before validation and storage.
The submission portal should auto-populate the "bp" suffix and enforce integer
input.
title: estimated size
examples:
- value: 300000 bp
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
- literal_form: Estimated Genome Size (Mb)
predicate: BROAD_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
rank: 1000
keywords:
- size
string_serialization: '{integer} bp'
slot_uri: MIXS:0000024
alias: estimated_size
owner: Organism
domain_of:
- Organism
range: string
structured_pattern:
syntax: ^[0-9]+ bp$
interpolated: false
gc_content:
name: gc_content
annotations:
storage_units:
tag: storage_units
value: '%'
description: Estimated GC content as a percentage.
comments:
- Expected `has_numeric_value` range is 0–100 (percentage units).
examples:
- description: GOLD project Gp0000189 (Shewanella loihica PV-4, queried 2026-04-21)
object:
type: nmdc:QuantityValue
has_numeric_value: 54.0
has_unit: '%'
- description: GOLD project Gp0000514 (Ruegeria pomeroyi DSS-3, queried 2026-04-21)
object:
type: nmdc:QuantityValue
has_numeric_value: 64.0
has_unit: '%'
in_subset:
- jgi_isolate
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
- literal_form: GC Content %
predicate: EXACT_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
rank: 1000
alias: gc_content
owner: Organism
domain_of:
- Organism
range: QuantityValue
ref_biomaterial:
name: ref_biomaterial
description: Reference for the organism, preferentially a DOI when a primary publication
or genome report exists; PMID and URL are also accepted per the MIxS ref_biomaterial
pattern (`{PMID}|{DOI}|{URL}`). Reuses MIxS ref_biomaterial (MIXS:0000025).
title: reference for biomaterial
comments:
- The MIxS pattern accepts DOI, PMID, or URL. DOI is preferred when available
— it gives a stable reference to the publication or genome report. See the `associated_dois`
pattern elsewhere in the NMDC schema for DOI-structured alternatives.
- JGI "Reference Genome" submissions sometimes carry non-publication identifiers
such as IMG or Phytozome IDs, which do not match the MIxS pattern. Those are
out of scope for this slot and should be captured separately (see `gold_organism_identifiers`
and `insdc_nucleotide_identifiers` for genome / assembly references).
- The MIxS name ref_biomaterial may be renamed in a future MIxS release. See ongoing
MIxS renaming work.
examples:
- description: DOI form (preferred when a primary publication exists)
object:
type: nmdc:TextValue
has_raw_value: doi:10.1016/j.syapm.2018.01.009
- description: PubMed ID form
object:
type: nmdc:TextValue
has_raw_value: PMID:24296464
- description: URL form (e.g. NCBI Genome record)
object:
type: nmdc:TextValue
has_raw_value: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_000016065.1/
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
- literal_form: Reference Genome
predicate: RELATED_SYNONYM
contexts:
- https://jgi.doe.gov/isolate-submission-form/v19
rank: 1000
slot_uri: MIXS:0000025
alias: ref_biomaterial
owner: Organism
domain_of:
- Organism
range: TextValue
structured_pattern:
syntax: ^({PMID}|{DOI}|{URL})$
interpolated: true
partial_match: true
id:
name: id
description: A unique identifier for a thing. Must be either a CURIE shorthand
for a URI or a complete URI
notes:
- 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
- a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
will be accepted
- typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
via per-class id slot usage assertions
- minting authority shoulders should probably be enumerated and checked in the
pattern
examples:
- value: nmdc:mgmag-00-x012.1_7_c1
description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
- literal_form: workflow_execution_id
predicate: NARROW_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
- literal_form: data_object_id
predicate: NARROW_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
rank: 1000
identifier: true
alias: id
owner: Organism
domain_of:
- NamedThing
range: uriorcurie
required: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
structured_pattern:
syntax: '{id_nmdc_prefix}:orgn-{id_shoulder}-{id_blade}$'
interpolated: true
name:
name: name
description: A human readable label for an entity
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: name
owner: Organism
domain_of:
- PersonValue
- NamedThing
- Protocol
range: string
description:
name: description
description: a human-readable description of a thing
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
slot_uri: dcterms:description
alias: description
owner: Organism
domain_of:
- ImageValue
- NamedThing
- Protocol
range: string
alternative_identifiers:
name: alternative_identifiers
description: A list of alternative identifiers for the entity.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: alternative_identifiers
owner: Organism
domain_of:
- NamedThing
- MetaboliteIdentification
range: uriorcurie
multivalued: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
type:
name: type
description: the class_uri of the class that has been instantiated
notes:
- makes it easier to read example data files
- required for polymorphic MongoDB collections
examples:
- value: nmdc:Biosample
- value: nmdc:Study
from_schema: https://w3id.org/nmdc/nmdc
see_also:
- https://github.com/microbiomedata/nmdc-schema/issues/1048
- https://github.com/microbiomedata/nmdc-schema/issues/1233
- https://github.com/microbiomedata/nmdc-schema/issues/248
structured_aliases:
- literal_form: workflow_execution_class
predicate: NARROW_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
rank: 1000
slot_uri: rdf:type
designates_type: true
alias: type
owner: Organism
domain_of:
- EukEval
- FunctionalAnnotationAggMember
- PeptideQuantification
- ProteinQuantification
- GenomeFeature
- FunctionalAnnotation
- AttributeValue
- NamedThing
- OntologyRelation
- FailureCategorization
- Protocol
- CreditAssociation
- Doi
- ProvenanceMetadata
- MobilePhaseSegment
- PortionOfSubstance
- MagBin
- MetaboliteIdentification
range: uriorcurie
required: true
class_uri: nmdc:Organism