Class: NucleotideSequencing
A DataGeneration in which the sequence of DNA or RNA molecules is generated.
URI: nmdc:NucleotideSequencing
classDiagram
class NucleotideSequencing
click NucleotideSequencing href "../NucleotideSequencing"
DataGeneration <|-- NucleotideSequencing
click DataGeneration href "../DataGeneration"
NucleotideSequencing : add_date
NucleotideSequencing : alternative_identifiers
NucleotideSequencing : analyte_category
NucleotideSequencing --> "1" AnalyteCategoryEnum : analyte_category
click AnalyteCategoryEnum href "../AnalyteCategoryEnum"
NucleotideSequencing : associated_studies
NucleotideSequencing --> "1..*" Study : associated_studies
click Study href "../Study"
NucleotideSequencing : description
NucleotideSequencing : end_date
NucleotideSequencing : gold_sequencing_project_identifiers
NucleotideSequencing : has_failure_categorization
NucleotideSequencing --> "*" FailureCategorization : has_failure_categorization
click FailureCategorization href "../FailureCategorization"
NucleotideSequencing : has_input
NucleotideSequencing --> "1..*" NamedThing : has_input
click NamedThing href "../NamedThing"
NucleotideSequencing : has_output
NucleotideSequencing --> "*" DataObject : has_output
click DataObject href "../DataObject"
NucleotideSequencing : id
NucleotideSequencing : insdc_bioproject_identifiers
NucleotideSequencing : insdc_experiment_identifiers
NucleotideSequencing : instrument_used
NucleotideSequencing --> "*" Instrument : instrument_used
click Instrument href "../Instrument"
NucleotideSequencing : mod_date
NucleotideSequencing : name
NucleotideSequencing : ncbi_project_name
NucleotideSequencing : principal_investigator
NucleotideSequencing --> "0..1" PersonValue : principal_investigator
click PersonValue href "../PersonValue"
NucleotideSequencing : processing_institution
NucleotideSequencing --> "0..1" ProcessingInstitutionEnum : processing_institution
click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum"
NucleotideSequencing : protocol_link
NucleotideSequencing --> "0..1" Protocol : protocol_link
click Protocol href "../Protocol"
NucleotideSequencing : qc_comment
NucleotideSequencing : qc_status
NucleotideSequencing --> "0..1" StatusEnum : qc_status
click StatusEnum href "../StatusEnum"
NucleotideSequencing : start_date
NucleotideSequencing : target_gene
NucleotideSequencing --> "0..1" TextValue : target_gene
click TextValue href "../TextValue"
NucleotideSequencing : target_subfragment
NucleotideSequencing --> "0..1" TextValue : target_subfragment
click TextValue href "../TextValue"
NucleotideSequencing : type
Inheritance
- NamedThing
- PlannedProcess
- DataGeneration
- NucleotideSequencing
- DataGeneration
- PlannedProcess
Slots
Name | Cardinality and Range | Description | Inheritance |
---|---|---|---|
gold_sequencing_project_identifiers | * ExternalIdentifier |
identifiers for corresponding sequencing project in GOLD | direct |
insdc_bioproject_identifiers | * ExternalIdentifier |
identifiers for corresponding project in INSDC Bioproject | direct |
insdc_experiment_identifiers | * ExternalIdentifier |
direct | |
ncbi_project_name | 0..1 String |
direct | |
target_gene | 0..1 TextValue |
Targeted gene or locus name for marker gene studies | direct |
target_subfragment | 0..1 TextValue |
Name of subfragment of a gene or locus | direct |
add_date | 0..1 String |
The date on which the information was added to the database | DataGeneration |
analyte_category | 1 AnalyteCategoryEnum |
The type of analyte(s) that were measured in the data generation process and ... | DataGeneration |
associated_studies | 1..* Study |
The study associated with a resource | DataGeneration |
instrument_used | * Instrument |
What instrument was used during DataGeneration or MaterialProcessing | DataGeneration |
mod_date | 0..1 String |
The last date on which the database information was modified | DataGeneration |
principal_investigator | 0..1 PersonValue |
Principal Investigator who led the study and/or generated the dataset | DataGeneration |
has_input | 1..* NamedThing or Biosample or ProcessedSample |
An input to a process | PlannedProcess |
has_output | * DataObject |
An output from a process | PlannedProcess |
processing_institution | 0..1 ProcessingInstitutionEnum |
The organization that processed the sample | PlannedProcess |
protocol_link | 0..1 Protocol |
PlannedProcess | |
start_date | 0..1 String |
The date on which any process or activity was started | PlannedProcess |
end_date | 0..1 String |
The date on which any process or activity was ended | PlannedProcess |
qc_status | 0..1 StatusEnum |
Stores information about the result of a process (ie the process of sequencin... | PlannedProcess |
qc_comment | 0..1 String |
Slot to store additional comments about laboratory or workflow output | PlannedProcess |
has_failure_categorization | * FailureCategorization |
PlannedProcess | |
id | 1 Uriorcurie |
A unique identifier for a thing | NamedThing |
name | 0..1 String |
A human readable label for an entity | NamedThing |
description | 0..1 String |
a human-readable description of a thing | NamedThing |
alternative_identifiers | * Uriorcurie |
A list of alternative identifiers for the entity | NamedThing |
type | 1 Uriorcurie |
the class_uri of the class that has been instantiated | NamedThing |
Comments
- For example data generated from an Illumina or Pacific Biosciences instrument.
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/nmdc/nmdc
Mappings
Mapping Type | Mapped Value |
---|---|
self | nmdc:NucleotideSequencing |
native | nmdc:NucleotideSequencing |
LinkML Source
Direct
name: NucleotideSequencing
description: A DataGeneration in which the sequence of DNA or RNA molecules is generated.
comments:
- For example data generated from an Illumina or Pacific Biosciences instrument.
from_schema: https://w3id.org/nmdc/nmdc
is_a: DataGeneration
slots:
- gold_sequencing_project_identifiers
- insdc_bioproject_identifiers
- insdc_experiment_identifiers
- ncbi_project_name
- target_gene
- target_subfragment
slot_usage:
id:
name: id
structured_pattern:
syntax: '{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$'
interpolated: true
class_uri: nmdc:NucleotideSequencing
Induced
name: NucleotideSequencing
description: A DataGeneration in which the sequence of DNA or RNA molecules is generated.
comments:
- For example data generated from an Illumina or Pacific Biosciences instrument.
from_schema: https://w3id.org/nmdc/nmdc
is_a: DataGeneration
slot_usage:
id:
name: id
structured_pattern:
syntax: '{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$'
interpolated: true
attributes:
gold_sequencing_project_identifiers:
name: gold_sequencing_project_identifiers
description: identifiers for corresponding sequencing project in GOLD
examples:
- value: https://bioregistry.io/gold:Gp0108335
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: omics_processing_identifiers
mixins:
- gold_identifiers
alias: gold_sequencing_project_identifiers
owner: NucleotideSequencing
domain_of:
- NucleotideSequencing
range: external_identifier
multivalued: true
pattern: ^gold:Gp[0-9]+$
insdc_bioproject_identifiers:
name: insdc_bioproject_identifiers
description: identifiers for corresponding project in INSDC Bioproject
comments:
- these are distinct IDs from INSDC SRA/ENA project identifiers, but are usually(?)
one to one
examples:
- value: https://bioregistry.io/bioproject:PRJNA366857
description: Avena fatua rhizosphere microbial communities - H1_Rhizo_Litter_2
metatranscriptome
from_schema: https://w3id.org/nmdc/nmdc
see_also:
- https://www.ncbi.nlm.nih.gov/bioproject/
- https://www.ddbj.nig.ac.jp/bioproject/index-e.html
aliases:
- NCBI bioproject identifiers
- DDBJ bioproject identifiers
rank: 1000
is_a: study_identifiers
mixins:
- insdc_identifiers
alias: insdc_bioproject_identifiers
owner: NucleotideSequencing
domain_of:
- NucleotideSequencing
- Study
range: external_identifier
multivalued: true
pattern: ^bioproject:PRJ[DEN][A-Z][0-9]+$
insdc_experiment_identifiers:
name: insdc_experiment_identifiers
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: external_database_identifiers
mixins:
- insdc_identifiers
alias: insdc_experiment_identifiers
owner: NucleotideSequencing
domain_of:
- NucleotideSequencing
- DataObject
range: external_identifier
multivalued: true
pattern: ^insdc.sra:(E|D|S)RX[0-9]{6,}$
ncbi_project_name:
name: ncbi_project_name
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: ncbi_project_name
owner: NucleotideSequencing
domain_of:
- NucleotideSequencing
range: string
target_gene:
name: target_gene
annotations:
expected_value:
tag: expected_value
value: gene name
description: Targeted gene or locus name for marker gene studies
title: target gene
examples:
- value: 16S rRNA, 18S rRNA, nif, amoA, rpo
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- target gene
rank: 1000
is_a: sequencing field
string_serialization: '{text}'
slot_uri: MIXS:0000044
alias: target_gene
owner: NucleotideSequencing
domain_of:
- NucleotideSequencing
range: TextValue
multivalued: false
target_subfragment:
name: target_subfragment
annotations:
expected_value:
tag: expected_value
value: gene fragment name
description: Name of subfragment of a gene or locus. Important to e.g. identify
special regions on marker genes like V6 on 16S rRNA
title: target subfragment
examples:
- value: V6, V9, ITS
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- target subfragment
rank: 1000
is_a: sequencing field
string_serialization: '{text}'
slot_uri: MIXS:0000045
alias: target_subfragment
owner: NucleotideSequencing
domain_of:
- NucleotideSequencing
range: TextValue
multivalued: false
add_date:
name: add_date
description: The date on which the information was added to the database.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: add_date
owner: NucleotideSequencing
domain_of:
- Biosample
- DataGeneration
range: string
analyte_category:
name: analyte_category
description: "The type of analyte(s) that were measured in the data generation\
\ process and analyzed\n in the Workflow Chain\n"
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: analyte_category
owner: NucleotideSequencing
domain_of:
- DataGeneration
range: AnalyteCategoryEnum
required: true
associated_studies:
name: associated_studies
description: The study associated with a resource.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: associated_studies
owner: NucleotideSequencing
domain_of:
- Biosample
- DataGeneration
range: Study
required: true
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(sty)-{id_shoulder}-{id_blade}$'
interpolated: true
instrument_used:
name: instrument_used
description: What instrument was used during DataGeneration or MaterialProcessing.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: instrument_used
owner: NucleotideSequencing
domain_of:
- MaterialProcessing
- DataGeneration
range: Instrument
multivalued: true
mod_date:
name: mod_date
description: The last date on which the database information was modified.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: mod_date
owner: NucleotideSequencing
domain_of:
- Biosample
- DataGeneration
range: string
principal_investigator:
name: principal_investigator
description: Principal Investigator who led the study and/or generated the dataset.
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- PI
rank: 1000
alias: principal_investigator
owner: NucleotideSequencing
domain_of:
- Study
- DataGeneration
range: PersonValue
has_input:
name: has_input
description: An input to a process.
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- input
rank: 1000
alias: has_input
owner: NucleotideSequencing
domain_of:
- PlannedProcess
range: NamedThing
required: true
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$'
interpolated: true
any_of:
- range: Biosample
- range: ProcessedSample
has_output:
name: has_output
description: An output from a process.
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- output
rank: 1000
alias: has_output
owner: NucleotideSequencing
domain_of:
- PlannedProcess
range: DataObject
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
interpolated: true
processing_institution:
name: processing_institution
description: The organization that processed the sample.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: processing_institution
owner: NucleotideSequencing
domain_of:
- PlannedProcess
range: ProcessingInstitutionEnum
protocol_link:
name: protocol_link
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: protocol_link
owner: NucleotideSequencing
domain_of:
- PlannedProcess
- Study
range: Protocol
start_date:
name: start_date
description: The date on which any process or activity was started
todos:
- add date string validation pattern
comments:
- We are using string representations of dates until all components of our ecosystem
can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: start_date
owner: NucleotideSequencing
domain_of:
- PlannedProcess
range: string
end_date:
name: end_date
description: The date on which any process or activity was ended
todos:
- add date string validation pattern
comments:
- We are using string representations of dates until all components of our ecosystem
can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: end_date
owner: NucleotideSequencing
domain_of:
- PlannedProcess
range: string
qc_status:
name: qc_status
description: Stores information about the result of a process (ie the process
of sequencing a library may have for qc_status of 'fail' if not enough data
was generated)
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: qc_status
owner: NucleotideSequencing
domain_of:
- PlannedProcess
range: StatusEnum
qc_comment:
name: qc_comment
description: Slot to store additional comments about laboratory or workflow output.
For workflow output it may describe the particular workflow stage that failed.
(ie Failed at call-stage due to a malformed fastq file).
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: qc_comment
owner: NucleotideSequencing
domain_of:
- PlannedProcess
range: string
has_failure_categorization:
name: has_failure_categorization
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: has_failure_categorization
owner: NucleotideSequencing
domain_of:
- PlannedProcess
range: FailureCategorization
multivalued: true
inlined: true
inlined_as_list: true
id:
name: id
description: A unique identifier for a thing. Must be either a CURIE shorthand
for a URI or a complete URI
notes:
- 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
- a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
will be accepted
- typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
via per-class id slot usage assertions
- minting authority shoulders should probably be enumerated and checked in the
pattern
examples:
- value: nmdc:mgmag-00-x012.1_7_c1
description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
identifier: true
alias: id
owner: NucleotideSequencing
domain_of:
- NamedThing
range: uriorcurie
required: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
structured_pattern:
syntax: '{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$'
interpolated: true
name:
name: name
description: A human readable label for an entity
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: name
owner: NucleotideSequencing
domain_of:
- PersonValue
- NamedThing
- Protocol
range: string
description:
name: description
description: a human-readable description of a thing
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
slot_uri: dcterms:description
alias: description
owner: NucleotideSequencing
domain_of:
- ImageValue
- NamedThing
range: string
alternative_identifiers:
name: alternative_identifiers
description: A list of alternative identifiers for the entity.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: alternative_identifiers
owner: NucleotideSequencing
domain_of:
- MetaboliteIdentification
- NamedThing
range: uriorcurie
multivalued: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
type:
name: type
description: the class_uri of the class that has been instantiated
notes:
- replaces legacy nmdc:type slot
- makes it easier to read example data files
- required for polymorphic MongoDB collections
examples:
- value: nmdc:Biosample
- value: nmdc:Study
from_schema: https://w3id.org/nmdc/nmdc
see_also:
- https://github.com/microbiomedata/nmdc-schema/issues/1048
- https://github.com/microbiomedata/nmdc-schema/issues/1233
- https://github.com/microbiomedata/nmdc-schema/issues/248
rank: 1000
slot_uri: rdf:type
designates_type: true
alias: type
owner: NucleotideSequencing
domain_of:
- EukEval
- FunctionalAnnotationAggMember
- MobilePhaseSegment
- PortionOfSubstance
- MagBin
- MetaboliteIdentification
- PeptideQuantification
- ProteinQuantification
- GenomeFeature
- FunctionalAnnotation
- AttributeValue
- NamedThing
- FailureCategorization
- Protocol
- CreditAssociation
- Doi
range: uriorcurie
required: true
class_uri: nmdc:NucleotideSequencing