Class: NucleotideSequencing

A DataGeneration in which the sequence of DNA or RNA molecules is generated.

classDiagram class NucleotideSequencing click NucleotideSequencing href "../NucleotideSequencing" DataGeneration <|-- NucleotideSequencing click DataGeneration href "../DataGeneration" NucleotideSequencing : alternative_identifiers NucleotideSequencing : analyte_category NucleotideSequencing --> "1" NucleotideSequencingEnum : analyte_category click NucleotideSequencingEnum href "../NucleotideSequencingEnum" NucleotideSequencing : associated_studies NucleotideSequencing --> "1..*" Study : associated_studies click Study href "../Study" NucleotideSequencing : description NucleotideSequencing : end_date NucleotideSequencing : gold_sequencing_project_identifiers NucleotideSequencing : has_failure_categorization NucleotideSequencing --> "*" FailureCategorization : has_failure_categorization click FailureCategorization href "../FailureCategorization" NucleotideSequencing : has_input NucleotideSequencing --> "1..*" Sample : has_input click Sample href "../Sample" NucleotideSequencing : has_output NucleotideSequencing --> "*" DataObject : has_output click DataObject href "../DataObject" NucleotideSequencing : id NucleotideSequencing : insdc_bioproject_identifiers NucleotideSequencing : insdc_experiment_identifiers NucleotideSequencing : instrument_instance_specifier NucleotideSequencing : instrument_used NucleotideSequencing --> "*" Instrument : instrument_used click Instrument href "../Instrument" NucleotideSequencing : name NucleotideSequencing : ncbi_project_name NucleotideSequencing : principal_investigator NucleotideSequencing --> "0..1" PersonValue : principal_investigator click PersonValue href "../PersonValue" NucleotideSequencing : processing_institution NucleotideSequencing --> "0..1" ProcessingInstitutionEnum : processing_institution click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum" NucleotideSequencing : protocol_link NucleotideSequencing --> "0..1" Protocol : protocol_link click Protocol href "../Protocol" NucleotideSequencing : provenance_metadata NucleotideSequencing --> "0..1" ProvenanceMetadata : provenance_metadata click ProvenanceMetadata href "../ProvenanceMetadata" NucleotideSequencing : qc_comment NucleotideSequencing : qc_status NucleotideSequencing --> "0..1" StatusEnum : qc_status click StatusEnum href "../StatusEnum" NucleotideSequencing : start_date NucleotideSequencing : type

Inheritance

NamedThing
- PlannedProcess
  - DataEmitterProcess
    - DataGeneration
      - NucleotideSequencing

Slots

Name	Cardinality and Range	Description	Inheritance
gold_sequencing_project_identifiers	* ExternalIdentifier	identifiers for corresponding sequencing project in GOLD	direct
insdc_bioproject_identifiers	* ExternalIdentifier	identifiers for corresponding project in INSDC Bioproject	direct
insdc_experiment_identifiers	* ExternalIdentifier		direct
ncbi_project_name	0..1 String		direct
analyte_category	1 NucleotideSequencingEnum	The type of analyte(s) that were measured in the data generation process	DataGeneration
associated_studies	1..* Study	The study associated with a resource	DataGeneration
instrument_used	* Instrument	What instrument was used during DataGeneration or MaterialProcessing	DataGeneration
principal_investigator	0..1 PersonValue	Principal Investigator who led the study and/or generated the dataset	DataGeneration
instrument_instance_specifier	0..1 String	A unique value that identifies an individual instrument instance, such as a s...	DataGeneration
provenance_metadata	0..1 ProvenanceMetadata	Provenance metadata for this DataGeneration, including when the record was ad...	DataGeneration
has_input	1..* Sample	An input to a process	PlannedProcess
has_output	* DataObject	An output from a process	PlannedProcess
processing_institution	0..1 ProcessingInstitutionEnum	The organization that processed the sample	PlannedProcess
protocol_link	0..1 Protocol		PlannedProcess
start_date	0..1 String	The date on which any process or activity was started	PlannedProcess
end_date	0..1 String	The date on which any process or activity was ended	PlannedProcess
qc_status	0..1 StatusEnum	Stores information about the result of a process (ie the process of sequencin...	PlannedProcess
qc_comment	0..1 String	Slot to store additional comments about laboratory or workflow output	PlannedProcess
has_failure_categorization	* FailureCategorization		PlannedProcess
id	1 Uriorcurie	A unique identifier for a thing	NamedThing
name	0..1 String	A human readable label for an entity	NamedThing
description	0..1 String	a human-readable description of a thing	NamedThing
alternative_identifiers	* Uriorcurie	A list of alternative identifiers for the entity	NamedThing
type	1 Uriorcurie	the class_uri of the class that has been instantiated	NamedThing

Usages

used by	used in	type	used
MetagenomeSequencing	was_informed_by	range	NucleotideSequencing
MetagenomeAnnotation	was_informed_by	range	NucleotideSequencing
MetagenomeAssembly	was_informed_by	range	NucleotideSequencing
MetatranscriptomeAssembly	was_informed_by	range	NucleotideSequencing
MetatranscriptomeAnnotation	was_informed_by	range	NucleotideSequencing
MetatranscriptomeExpressionAnalysis	was_informed_by	range	NucleotideSequencing
MagsAnalysis	was_informed_by	range	NucleotideSequencing
ReadQcAnalysis	was_informed_by	range	NucleotideSequencing
ReadBasedTaxonomyAnalysis	was_informed_by	range	NucleotideSequencing

Comments

For example data generated from an Illumina or Pacific Biosciences instrument.

Identifier and Mapping Information

Schema Source

from schema: https://w3id.org/nmdc/nmdc

Mappings

Mapping Type	Mapped Value

LinkML Source

Direct

name: NucleotideSequencing
description: A DataGeneration in which the sequence of DNA or RNA molecules is generated.
comments:
- For example data generated from an Illumina or Pacific Biosciences instrument.
from_schema: https://w3id.org/nmdc/nmdc
is_a: DataGeneration
slots:
- gold_sequencing_project_identifiers
- insdc_bioproject_identifiers
- insdc_experiment_identifiers
- ncbi_project_name
slot_usage:
  id:
    name: id
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$'
      interpolated: true
  analyte_category:
    name: analyte_category
    range: NucleotideSequencingEnum
class_uri: nmdc:NucleotideSequencing

Induced

name: NucleotideSequencing
description: A DataGeneration in which the sequence of DNA or RNA molecules is generated.
comments:
- For example data generated from an Illumina or Pacific Biosciences instrument.
from_schema: https://w3id.org/nmdc/nmdc
is_a: DataGeneration
slot_usage:
  id:
    name: id
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$'
      interpolated: true
  analyte_category:
    name: analyte_category
    range: NucleotideSequencingEnum
attributes:
  gold_sequencing_project_identifiers:
    name: gold_sequencing_project_identifiers
    description: identifiers for corresponding sequencing project in GOLD
    examples:
    - value: gold:Gp0108335
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: omics_processing_identifiers
    mixins:
    - gold_identifiers
    alias: gold_sequencing_project_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    range: external_identifier
    multivalued: true
    pattern: ^gold:Gp[0-9]+$
  insdc_bioproject_identifiers:
    name: insdc_bioproject_identifiers
    description: identifiers for corresponding project in INSDC Bioproject
    comments:
    - these are distinct IDs from INSDC SRA/ENA project identifiers, but are usually(?)
      one to one
    examples:
    - value: bioproject:PRJNA366857
      description: Avena fatua rhizosphere microbial communities - H1_Rhizo_Litter_2
        metatranscriptome
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://www.ncbi.nlm.nih.gov/bioproject/
    - https://www.ddbj.nig.ac.jp/bioproject/index-e.html
    aliases:
    - NCBI bioproject identifiers
    - DDBJ bioproject identifiers
    rank: 1000
    is_a: study_identifiers
    mixins:
    - insdc_identifiers
    alias: insdc_bioproject_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    - Study
    range: external_identifier
    multivalued: true
    pattern: ^bioproject:PRJ[DEN][A-Z][0-9]+$
  insdc_experiment_identifiers:
    name: insdc_experiment_identifiers
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: external_database_identifiers
    mixins:
    - insdc_identifiers
    alias: insdc_experiment_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    - DataObject
    range: external_identifier
    multivalued: true
    pattern: ^insdc.sra:(E|D|S)RX[0-9]{6,}$
  ncbi_project_name:
    name: ncbi_project_name
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: ncbi_project_name
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    range: string
  analyte_category:
    name: analyte_category
    description: 'The type of analyte(s) that were measured in the data generation
      process

      '
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: analyte_category
    owner: NucleotideSequencing
    domain_of:
    - DataGeneration
    range: NucleotideSequencingEnum
    required: true
  associated_studies:
    name: associated_studies
    description: The study associated with a resource.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: associated_studies
    owner: NucleotideSequencing
    domain_of:
    - DataGeneration
    - Biosample
    range: Study
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(sty)-{id_shoulder}-{id_blade}$'
      interpolated: true
  instrument_used:
    name: instrument_used
    description: What instrument was used during DataGeneration or MaterialProcessing.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: instrument_used
    owner: NucleotideSequencing
    domain_of:
    - DataGeneration
    - MaterialProcessing
    range: Instrument
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:inst-{id_shoulder}-{id_blade}$'
      interpolated: true
  principal_investigator:
    name: principal_investigator
    description: Principal Investigator who led the study and/or generated the dataset.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - PI
    rank: 1000
    alias: principal_investigator
    owner: NucleotideSequencing
    domain_of:
    - Study
    - DataGeneration
    range: PersonValue
  instrument_instance_specifier:
    name: instrument_instance_specifier
    description: A unique value that identifies an individual instrument instance,
      such as a serial number or similar identifiers assigned by the manufacturer
      or user.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: instrument_instance_specifier
    owner: NucleotideSequencing
    domain_of:
    - DataGeneration
    range: string
  provenance_metadata:
    name: provenance_metadata
    description: Provenance metadata for this DataGeneration, including when the record
      was added to and last modified in the NMDC database.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: provenance_metadata
    owner: NucleotideSequencing
    domain_of:
    - Study
    - DataGeneration
    - Biosample
    range: ProvenanceMetadata
  has_input:
    name: has_input
    description: An input to a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - input
    rank: 1000
    alias: has_input
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: Sample
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$'
      interpolated: true
  has_output:
    name: has_output
    description: An output from a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - output
    rank: 1000
    alias: has_output
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: DataObject
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: processing_institution
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: ProcessingInstitutionEnum
  protocol_link:
    name: protocol_link
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: protocol_link
    owner: NucleotideSequencing
    domain_of:
    - Configuration
    - PlannedProcess
    - Study
    range: Protocol
  start_date:
    name: start_date
    description: The date on which any process or activity was started
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: start_date
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: string
  end_date:
    name: end_date
    description: The date on which any process or activity was ended
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: end_date
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: string
  qc_status:
    name: qc_status
    description: Stores information about the result of a process (ie the process
      of sequencing a library may have for qc_status of 'fail' if not enough data
      was generated)
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_status
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: StatusEnum
  qc_comment:
    name: qc_comment
    description: Slot to store additional comments about laboratory or workflow output.
      For workflow output it may describe the particular workflow stage that failed.
      (ie Failed at call-stage due to a malformed fastq file).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_comment
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: string
  has_failure_categorization:
    name: has_failure_categorization
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_failure_categorization
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    range: FailureCategorization
    multivalued: true
    inlined: true
    inlined_as_list: true
  id:
    name: id
    description: A unique identifier for a thing. Must be either a CURIE shorthand
      for a URI or a complete URI
    notes:
    - 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
    - a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
      will be accepted
    - typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
      via per-class id slot usage assertions
    - minting authority shoulders should probably be enumerated and checked in the
      pattern
    examples:
    - value: nmdc:mgmag-00-x012.1_7_c1
      description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
    from_schema: https://w3id.org/nmdc/nmdc
    structured_aliases:
    - literal_form: workflow_execution_id
      predicate: NARROW_SYNONYM
      contexts:
      - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    - literal_form: data_object_id
      predicate: NARROW_SYNONYM
      contexts:
      - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    identifier: true
    alias: id
    owner: NucleotideSequencing
    domain_of:
    - NamedThing
    range: uriorcurie
    required: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dgns|omprc)-{id_shoulder}-{id_blade}$'
      interpolated: true
  name:
    name: name
    description: A human readable label for an entity
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: name
    owner: NucleotideSequencing
    domain_of:
    - PersonValue
    - NamedThing
    - Protocol
    range: string
  description:
    name: description
    description: a human-readable description of a thing
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    slot_uri: dcterms:description
    alias: description
    owner: NucleotideSequencing
    domain_of:
    - ImageValue
    - NamedThing
    - Protocol
    range: string
  alternative_identifiers:
    name: alternative_identifiers
    description: A list of alternative identifiers for the entity.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: alternative_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NamedThing
    - MetaboliteIdentification
    range: uriorcurie
    multivalued: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
  type:
    name: type
    description: the class_uri of the class that has been instantiated
    notes:
    - makes it easier to read example data files
    - required for polymorphic MongoDB collections
    examples:
    - value: nmdc:Biosample
    - value: nmdc:Study
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://github.com/microbiomedata/nmdc-schema/issues/1048
    - https://github.com/microbiomedata/nmdc-schema/issues/1233
    - https://github.com/microbiomedata/nmdc-schema/issues/248
    structured_aliases:
    - literal_form: workflow_execution_class
      predicate: NARROW_SYNONYM
      contexts:
      - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    slot_uri: rdf:type
    designates_type: true
    alias: type
    owner: NucleotideSequencing
    domain_of:
    - EukEval
    - FunctionalAnnotationAggMember
    - PeptideQuantification
    - ProteinQuantification
    - GenomeFeature
    - FunctionalAnnotation
    - AttributeValue
    - NamedThing
    - OntologyRelation
    - FailureCategorization
    - Protocol
    - CreditAssociation
    - Doi
    - ProvenanceMetadata
    - MobilePhaseSegment
    - PortionOfSubstance
    - MagBin
    - MetaboliteIdentification
    range: uriorcurie
    required: true
class_uri: nmdc:NucleotideSequencing