Class: NucleotideSequencing

A DataGeneration in which the sequence of DNA or RNA molecules is generated.

classDiagram class NucleotideSequencing click NucleotideSequencing href "../NucleotideSequencing" DataGeneration <|-- NucleotideSequencing click DataGeneration href "../DataGeneration" NucleotideSequencing : add_date NucleotideSequencing : alternative_identifiers NucleotideSequencing : analyte_category NucleotideSequencing --> "1" AnalyteCategoryEnum : analyte_category click AnalyteCategoryEnum href "../AnalyteCategoryEnum" NucleotideSequencing : associated_studies NucleotideSequencing --> "1..*" Study : associated_studies click Study href "../Study" NucleotideSequencing : description NucleotideSequencing : end_date NucleotideSequencing : gold_sequencing_project_identifiers NucleotideSequencing : has_failure_categorization NucleotideSequencing --> "*" FailureCategorization : has_failure_categorization click FailureCategorization href "../FailureCategorization" NucleotideSequencing : has_input NucleotideSequencing --> "1..*" NamedThing : has_input click NamedThing href "../NamedThing" NucleotideSequencing : has_output NucleotideSequencing --> "*" DataObject : has_output click DataObject href "../DataObject" NucleotideSequencing : id NucleotideSequencing : insdc_bioproject_identifiers NucleotideSequencing : insdc_experiment_identifiers NucleotideSequencing : instrument_used NucleotideSequencing --> "*" Instrument : instrument_used click Instrument href "../Instrument" NucleotideSequencing : mod_date NucleotideSequencing : name NucleotideSequencing : ncbi_project_name NucleotideSequencing : principal_investigator NucleotideSequencing --> "0..1" PersonValue : principal_investigator click PersonValue href "../PersonValue" NucleotideSequencing : processing_institution NucleotideSequencing --> "0..1" ProcessingInstitutionEnum : processing_institution click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum" NucleotideSequencing : protocol_link NucleotideSequencing --> "0..1" Protocol : protocol_link click Protocol href "../Protocol" NucleotideSequencing : qc_comment NucleotideSequencing : qc_status NucleotideSequencing --> "0..1" StatusEnum : qc_status click StatusEnum href "../StatusEnum" NucleotideSequencing : start_date NucleotideSequencing : target_gene NucleotideSequencing --> "0..1" TextValue : target_gene click TextValue href "../TextValue" NucleotideSequencing : target_subfragment NucleotideSequencing --> "0..1" TextValue : target_subfragment click TextValue href "../TextValue" NucleotideSequencing : type

Inheritance

NamedThing
- PlannedProcess
  - DataGeneration
    - NucleotideSequencing

Slots

Name	Cardinality and Range	Description	Inheritance
gold_sequencing_project_identifiers	* ExternalIdentifier	identifiers for corresponding sequencing project in GOLD	direct
insdc_bioproject_identifiers	* ExternalIdentifier	identifiers for corresponding project in INSDC Bioproject	direct
insdc_experiment_identifiers	* ExternalIdentifier		direct
ncbi_project_name	0..1 String		direct
target_gene	0..1 TextValue	Targeted gene or locus name for marker gene studies	direct
target_subfragment	0..1 TextValue	Name of subfragment of a gene or locus	direct
add_date	0..1 String	The date on which the information was added to the database	DataGeneration
analyte_category	1 AnalyteCategoryEnum	The type of analyte(s) that were measured in the data generation process and ...	DataGeneration
associated_studies	1..* Study	The study associated with a resource	DataGeneration
instrument_used	* Instrument	What instrument was used during DataGeneration or MaterialProcessing	DataGeneration
mod_date	0..1 String	The last date on which the database information was modified	DataGeneration
principal_investigator	0..1 PersonValue	Principal Investigator who led the study and/or generated the dataset	DataGeneration
has_input	1..* NamedThing or Biosample or ProcessedSample	An input to a process	PlannedProcess
has_output	* DataObject	An output from a process	PlannedProcess
processing_institution	0..1 ProcessingInstitutionEnum	The organization that processed the sample	PlannedProcess
protocol_link	0..1 Protocol		PlannedProcess
start_date	0..1 String	The date on which any process or activity was started	PlannedProcess
end_date	0..1 String	The date on which any process or activity was ended	PlannedProcess
qc_status	0..1 StatusEnum	Stores information about the result of a process (ie the process of sequencin...	PlannedProcess
qc_comment	0..1 String	Slot to store additional comments about laboratory or workflow output	PlannedProcess
has_failure_categorization	* FailureCategorization		PlannedProcess
id	1 Uriorcurie	A unique identifier for a thing	NamedThing
name	0..1 String	A human readable label for an entity	NamedThing
description	0..1 String	a human-readable description of a thing	NamedThing
alternative_identifiers	* Uriorcurie	A list of alternative identifiers for the entity	NamedThing
type	1 Uriorcurie	the class_uri of the class that has been instantiated	NamedThing

Comments

For example data generated from an Illumina or Pacific Biosciences instrument.

Identifier and Mapping Information

Schema Source

from schema: https://w3id.org/nmdc/nmdc

Mappings

Mapping Type	Mapped Value
self	nmdc:NucleotideSequencing
native	nmdc:NucleotideSequencing

LinkML Source

Direct

name: NucleotideSequencing
description: A DataGeneration in which the sequence of DNA or RNA molecules is generated.
comments:
- For example data generated from an Illumina or Pacific Biosciences instrument.
from_schema: https://w3id.org/nmdc/nmdc
is_a: DataGeneration
slots:
- gold_sequencing_project_identifiers
- insdc_bioproject_identifiers
- insdc_experiment_identifiers
- ncbi_project_name
- target_gene
- target_subfragment
slot_usage:
  id:
    name: id
    domain_of:
    - NamedThing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$'
      interpolated: true
class_uri: nmdc:NucleotideSequencing

Induced

name: NucleotideSequencing
description: A DataGeneration in which the sequence of DNA or RNA molecules is generated.
comments:
- For example data generated from an Illumina or Pacific Biosciences instrument.
from_schema: https://w3id.org/nmdc/nmdc
is_a: DataGeneration
slot_usage:
  id:
    name: id
    domain_of:
    - NamedThing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$'
      interpolated: true
attributes:
  gold_sequencing_project_identifiers:
    name: gold_sequencing_project_identifiers
    description: identifiers for corresponding sequencing project in GOLD
    examples:
    - value: https://bioregistry.io/gold:Gp0108335
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: omics_processing_identifiers
    mixins:
    - gold_identifiers
    alias: gold_sequencing_project_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    range: external_identifier
    multivalued: true
    pattern: ^gold:Gp[0-9]+$
  insdc_bioproject_identifiers:
    name: insdc_bioproject_identifiers
    description: identifiers for corresponding project in INSDC Bioproject
    comments:
    - these are distinct IDs from INSDC SRA/ENA project identifiers, but are usually(?)
      one to one
    examples:
    - value: https://bioregistry.io/bioproject:PRJNA366857
      description: Avena fatua rhizosphere microbial communities - H1_Rhizo_Litter_2
        metatranscriptome
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://www.ncbi.nlm.nih.gov/bioproject/
    - https://www.ddbj.nig.ac.jp/bioproject/index-e.html
    aliases:
    - NCBI bioproject identifiers
    - DDBJ bioproject identifiers
    rank: 1000
    is_a: study_identifiers
    mixins:
    - insdc_identifiers
    alias: insdc_bioproject_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    - Study
    range: external_identifier
    multivalued: true
    pattern: ^bioproject:PRJ[DEN][A-Z][0-9]+$
  insdc_experiment_identifiers:
    name: insdc_experiment_identifiers
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: external_database_identifiers
    mixins:
    - insdc_identifiers
    alias: insdc_experiment_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    - DataObject
    range: external_identifier
    multivalued: true
    pattern: ^insdc.sra:(E|D|S)RX[0-9]{6,}$
  ncbi_project_name:
    name: ncbi_project_name
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: ncbi_project_name
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    range: string
  target_gene:
    name: target_gene
    annotations:
      expected_value:
        tag: expected_value
        value: gene name
    description: Targeted gene or locus name for marker gene studies
    title: target gene
    examples:
    - value: 16S rRNA, 18S rRNA, nif, amoA, rpo
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - target gene
    rank: 1000
    is_a: sequencing field
    string_serialization: '{text}'
    slot_uri: MIXS:0000044
    alias: target_gene
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    range: TextValue
    multivalued: false
  target_subfragment:
    name: target_subfragment
    annotations:
      expected_value:
        tag: expected_value
        value: gene fragment name
    description: Name of subfragment of a gene or locus. Important to e.g. identify
      special regions on marker genes like V6 on 16S rRNA
    title: target subfragment
    examples:
    - value: V6, V9, ITS
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - target subfragment
    rank: 1000
    is_a: sequencing field
    string_serialization: '{text}'
    slot_uri: MIXS:0000045
    alias: target_subfragment
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    range: TextValue
    multivalued: false
  add_date:
    name: add_date
    description: The date on which the information was added to the database.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: add_date
    owner: NucleotideSequencing
    domain_of:
    - Biosample
    - DataGeneration
    range: string
  analyte_category:
    name: analyte_category
    description: "The type of analyte(s) that were measured in the data generation\
      \ process and analyzed\n  in the Workflow Chain\n"
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: analyte_category
    owner: NucleotideSequencing
    domain_of:
    - DataGeneration
    range: AnalyteCategoryEnum
    required: true
  associated_studies:
    name: associated_studies
    description: The study associated with a resource.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: associated_studies
    owner: NucleotideSequencing
    domain_of:
    - Biosample
    - DataGeneration
    range: Study
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(sty)-{id_shoulder}-{id_blade}$'
      interpolated: true
  instrument_used:
    name: instrument_used
    description: What instrument was used during DataGeneration or MaterialProcessing.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: instrument_used
    owner: NucleotideSequencing
    domain_of:
    - MaterialProcessing
    - DataGeneration
    range: Instrument
    multivalued: true
  mod_date:
    name: mod_date
    description: The last date on which the database information was modified.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: mod_date
    owner: NucleotideSequencing
    domain_of:
    - Biosample
    - DataGeneration
    range: string
  principal_investigator:
    name: principal_investigator
    description: Principal Investigator who led the study and/or generated the dataset.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - PI
    rank: 1000
    alias: principal_investigator
    owner: NucleotideSequencing
    domain_of:
    - Study
    - DataGeneration
    range: PersonValue
  has_input:
    name: has_input
    description: An input to a process.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_input
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: NamedThing
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$'
      interpolated: true
    any_of:
    - range: Biosample
    - range: ProcessedSample
  has_output:
    name: has_output
    description: An output from a process.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_output
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: DataObject
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: processing_institution
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: ProcessingInstitutionEnum
  protocol_link:
    name: protocol_link
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: protocol_link
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    - Study
    range: Protocol
  start_date:
    name: start_date
    description: The date on which any process or activity was started
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: start_date
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: string
  end_date:
    name: end_date
    description: The date on which any process or activity was ended
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: end_date
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: string
  qc_status:
    name: qc_status
    description: Stores information about the result of a process (ie the process
      of sequencing a library may have for qc_status of 'fail' if not enough data
      was generated)
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_status
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: StatusEnum
  qc_comment:
    name: qc_comment
    description: Slot to store additional comments about laboratory or workflow output.
      For workflow output it may describe the particular workflow stage that failed.
      (ie Failed at call-stage due to a malformed fastq file).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_comment
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: string
  has_failure_categorization:
    name: has_failure_categorization
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_failure_categorization
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: FailureCategorization
    multivalued: true
    inlined: true
    inlined_as_list: true
  id:
    name: id
    description: A unique identifier for a thing. Must be either a CURIE shorthand
      for a URI or a complete URI
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    identifier: true
    alias: id
    owner: NucleotideSequencing
    domain_of:
    - NamedThing
    range: uriorcurie
    required: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$'
      interpolated: true
  name:
    name: name
    description: A human readable label for an entity
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: name
    owner: NucleotideSequencing
    domain_of:
    - PersonValue
    - NamedThing
    - Protocol
    range: string
  description:
    name: description
    description: a human-readable description of a thing
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    slot_uri: dcterms:description
    alias: description
    owner: NucleotideSequencing
    domain_of:
    - ImageValue
    - NamedThing
    range: string
  alternative_identifiers:
    name: alternative_identifiers
    description: A list of alternative identifiers for the entity.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: alternative_identifiers
    owner: NucleotideSequencing
    domain_of:
    - MetaboliteIdentification
    - NamedThing
    range: uriorcurie
    multivalued: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
  type:
    name: type
    description: the class_uri of the class that has been instantiated
    notes:
    - replaces legacy nmdc:type slot
    - makes it easier to read example data files
    - required for polymorphic MongoDB collections
    examples:
    - value: nmdc:Biosample
    - value: nmdc:Study
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://github.com/microbiomedata/nmdc-schema/issues/1048
    - https://github.com/microbiomedata/nmdc-schema/issues/1233
    - https://github.com/microbiomedata/nmdc-schema/issues/248
    rank: 1000
    slot_uri: rdf:type
    designates_type: true
    alias: type
    owner: NucleotideSequencing
    domain_of:
    - EukEval
    - FunctionalAnnotationAggMember
    - MobilePhaseSegment
    - PortionOfSubstance
    - MagBin
    - MetaboliteIdentification
    - PeptideQuantification
    - ProteinQuantification
    - GenomeFeature
    - FunctionalAnnotation
    - AttributeValue
    - NamedThing
    - FailureCategorization
    - Protocol
    - CreditAssociation
    - Doi
    range: uriorcurie
    required: true
class_uri: nmdc:NucleotideSequencing