Skip to content

Class: DataGeneration

The methods and processes used to generate omics data from a biosample or organism.

Note

This is an abstract class and should not be instantiated directly.

URI: nmdc:DataGeneration

classDiagram class DataGeneration click DataGeneration href "../DataGeneration" PlannedProcess <|-- DataGeneration click PlannedProcess href "../PlannedProcess" DataGeneration <|-- NucleotideSequencing click NucleotideSequencing href "../NucleotideSequencing" DataGeneration <|-- MassSpectrometry click MassSpectrometry href "../MassSpectrometry" DataGeneration : add_date DataGeneration : alternative_identifiers DataGeneration : analyte_category DataGeneration --> "1" AnalyteCategoryEnum : analyte_category click AnalyteCategoryEnum href "../AnalyteCategoryEnum" DataGeneration : associated_studies DataGeneration --> "1..*" Study : associated_studies click Study href "../Study" DataGeneration : description DataGeneration : end_date DataGeneration : has_failure_categorization DataGeneration --> "*" FailureCategorization : has_failure_categorization click FailureCategorization href "../FailureCategorization" DataGeneration : has_input DataGeneration --> "1..*" NamedThing : has_input click NamedThing href "../NamedThing" DataGeneration : has_output DataGeneration --> "*" DataObject : has_output click DataObject href "../DataObject" DataGeneration : id DataGeneration : instrument_used DataGeneration --> "*" Instrument : instrument_used click Instrument href "../Instrument" DataGeneration : mod_date DataGeneration : name DataGeneration : principal_investigator DataGeneration --> "0..1" PersonValue : principal_investigator click PersonValue href "../PersonValue" DataGeneration : processing_institution DataGeneration --> "0..1" ProcessingInstitutionEnum : processing_institution click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum" DataGeneration : protocol_link DataGeneration --> "0..1" Protocol : protocol_link click Protocol href "../Protocol" DataGeneration : qc_comment DataGeneration : qc_status DataGeneration --> "0..1" StatusEnum : qc_status click StatusEnum href "../StatusEnum" DataGeneration : start_date DataGeneration : type

Inheritance

Slots

Name Cardinality and Range Description Inheritance
add_date 0..1
String
The date on which the information was added to the database direct
analyte_category 1
AnalyteCategoryEnum
The type of analyte(s) that were measured in the data generation process and ... direct
associated_studies 1..*
Study
The study associated with a resource direct
instrument_used *
Instrument
What instrument was used during DataGeneration or MaterialProcessing direct
mod_date 0..1
String
The last date on which the database information was modified direct
principal_investigator 0..1
PersonValue
Principal Investigator who led the study and/or generated the dataset direct
has_input 1..*
NamedThing or 
Biosample or 
ProcessedSample
An input to a process PlannedProcess
has_output *
DataObject
An output from a process PlannedProcess
processing_institution 0..1
ProcessingInstitutionEnum
The organization that processed the sample PlannedProcess
protocol_link 0..1
Protocol
PlannedProcess
start_date 0..1
String
The date on which any process or activity was started PlannedProcess
end_date 0..1
String
The date on which any process or activity was ended PlannedProcess
qc_status 0..1
StatusEnum
Stores information about the result of a process (ie the process of sequencin... PlannedProcess
qc_comment 0..1
String
Slot to store additional comments about laboratory or workflow output PlannedProcess
has_failure_categorization *
FailureCategorization
PlannedProcess
id 1
Uriorcurie
A unique identifier for a thing NamedThing
name 0..1
String
A human readable label for an entity NamedThing
description 0..1
String
a human-readable description of a thing NamedThing
alternative_identifiers *
Uriorcurie
A list of alternative identifiers for the entity NamedThing
type 1
Uriorcurie
the class_uri of the class that has been instantiated NamedThing

Usages

used by used in type used
Database data_generation_set range DataGeneration
MetagenomeAnnotation was_informed_by range DataGeneration
WorkflowExecution was_informed_by range DataGeneration
MetagenomeAssembly was_informed_by range DataGeneration
MetatranscriptomeAssembly was_informed_by range DataGeneration
MetatranscriptomeAnnotation was_informed_by range DataGeneration
MetatranscriptomeExpressionAnalysis was_informed_by range DataGeneration
MagsAnalysis was_informed_by range DataGeneration
MetagenomeSequencing was_informed_by range DataGeneration
ReadQcAnalysis was_informed_by range DataGeneration
ReadBasedTaxonomyAnalysis was_informed_by range DataGeneration
MetabolomicsAnalysis was_informed_by range DataGeneration
MetaproteomicsAnalysis was_informed_by range DataGeneration
NomAnalysis was_informed_by range DataGeneration

Aliases

  • OmicsProcessing
  • assay
  • omics assay
  • sequencing project
  • experiment

Identifier and Mapping Information

Schema Source

Mappings

Mapping Type Mapped Value
self nmdc:DataGeneration
native nmdc:DataGeneration
broad OBI:0000070, ISA:Assay

LinkML Source

Direct

name: DataGeneration
description: The methods and processes used to generate omics data from a biosample
  or organism.
alt_descriptions:
  embl.ena:
    source: embl.ena
    description: An experiment contains information about a sequencing experiment
      including library and instrument details.
in_subset:
- sample subset
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- OmicsProcessing
- assay
- omics assay
- sequencing project
- experiment
broad_mappings:
- OBI:0000070
- ISA:Assay
is_a: PlannedProcess
abstract: true
slots:
- add_date
- analyte_category
- associated_studies
- instrument_used
- mod_date
- principal_investigator
slot_usage:
  has_input:
    name: has_input
    domain_of:
    - PlannedProcess
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$'
      interpolated: true
    any_of:
    - range: Biosample
    - range: ProcessedSample
  associated_studies:
    name: associated_studies
    domain_of:
    - Biosample
    - DataGeneration
    range: Study
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(sty)-{id_shoulder}-{id_blade}$'
      interpolated: true
  has_output:
    name: has_output
    domain_of:
    - PlannedProcess
    range: DataObject
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
class_uri: nmdc:DataGeneration

Induced

name: DataGeneration
description: The methods and processes used to generate omics data from a biosample
  or organism.
alt_descriptions:
  embl.ena:
    source: embl.ena
    description: An experiment contains information about a sequencing experiment
      including library and instrument details.
in_subset:
- sample subset
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- OmicsProcessing
- assay
- omics assay
- sequencing project
- experiment
broad_mappings:
- OBI:0000070
- ISA:Assay
is_a: PlannedProcess
abstract: true
slot_usage:
  has_input:
    name: has_input
    domain_of:
    - PlannedProcess
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$'
      interpolated: true
    any_of:
    - range: Biosample
    - range: ProcessedSample
  associated_studies:
    name: associated_studies
    domain_of:
    - Biosample
    - DataGeneration
    range: Study
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(sty)-{id_shoulder}-{id_blade}$'
      interpolated: true
  has_output:
    name: has_output
    domain_of:
    - PlannedProcess
    range: DataObject
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
attributes:
  add_date:
    name: add_date
    description: The date on which the information was added to the database.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: add_date
    owner: DataGeneration
    domain_of:
    - Biosample
    - DataGeneration
    range: string
  analyte_category:
    name: analyte_category
    description: "The type of analyte(s) that were measured in the data generation\
      \ process and analyzed\n  in the Workflow Chain\n"
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: analyte_category
    owner: DataGeneration
    domain_of:
    - DataGeneration
    range: AnalyteCategoryEnum
    required: true
  associated_studies:
    name: associated_studies
    description: The study associated with a resource.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: associated_studies
    owner: DataGeneration
    domain_of:
    - Biosample
    - DataGeneration
    range: Study
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(sty)-{id_shoulder}-{id_blade}$'
      interpolated: true
  instrument_used:
    name: instrument_used
    description: What instrument was used during DataGeneration or MaterialProcessing.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: instrument_used
    owner: DataGeneration
    domain_of:
    - MaterialProcessing
    - DataGeneration
    range: Instrument
    multivalued: true
  mod_date:
    name: mod_date
    description: The last date on which the database information was modified.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: mod_date
    owner: DataGeneration
    domain_of:
    - Biosample
    - DataGeneration
    range: string
  principal_investigator:
    name: principal_investigator
    description: Principal Investigator who led the study and/or generated the dataset.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - PI
    rank: 1000
    alias: principal_investigator
    owner: DataGeneration
    domain_of:
    - Study
    - DataGeneration
    range: PersonValue
  has_input:
    name: has_input
    description: An input to a process.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_input
    owner: DataGeneration
    domain_of:
    - PlannedProcess
    range: NamedThing
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(bsm|procsm)-{id_shoulder}-{id_blade}$'
      interpolated: true
    any_of:
    - range: Biosample
    - range: ProcessedSample
  has_output:
    name: has_output
    description: An output from a process.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_output
    owner: DataGeneration
    domain_of:
    - PlannedProcess
    range: DataObject
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: processing_institution
    owner: DataGeneration
    domain_of:
    - PlannedProcess
    range: ProcessingInstitutionEnum
  protocol_link:
    name: protocol_link
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: protocol_link
    owner: DataGeneration
    domain_of:
    - PlannedProcess
    - Study
    range: Protocol
  start_date:
    name: start_date
    description: The date on which any process or activity was started
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: start_date
    owner: DataGeneration
    domain_of:
    - PlannedProcess
    range: string
  end_date:
    name: end_date
    description: The date on which any process or activity was ended
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: end_date
    owner: DataGeneration
    domain_of:
    - PlannedProcess
    range: string
  qc_status:
    name: qc_status
    description: Stores information about the result of a process (ie the process
      of sequencing a library may have for qc_status of 'fail' if not enough data
      was generated)
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_status
    owner: DataGeneration
    domain_of:
    - PlannedProcess
    range: StatusEnum
  qc_comment:
    name: qc_comment
    description: Slot to store additional comments about laboratory or workflow output.
      For workflow output it may describe the particular workflow stage that failed.
      (ie Failed at call-stage due to a malformed fastq file).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_comment
    owner: DataGeneration
    domain_of:
    - PlannedProcess
    range: string
  has_failure_categorization:
    name: has_failure_categorization
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_failure_categorization
    owner: DataGeneration
    domain_of:
    - PlannedProcess
    range: FailureCategorization
    multivalued: true
    inlined: true
    inlined_as_list: true
  id:
    name: id
    description: A unique identifier for a thing. Must be either a CURIE shorthand
      for a URI or a complete URI
    notes:
    - 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
    - a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
      will be accepted
    - typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
      via per-class id slot usage assertions
    - minting authority shoulders should probably be enumerated and checked in the
      pattern
    examples:
    - value: nmdc:mgmag-00-x012.1_7_c1
      description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    identifier: true
    alias: id
    owner: DataGeneration
    domain_of:
    - NamedThing
    range: uriorcurie
    required: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
  name:
    name: name
    description: A human readable label for an entity
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: name
    owner: DataGeneration
    domain_of:
    - PersonValue
    - NamedThing
    - Protocol
    range: string
  description:
    name: description
    description: a human-readable description of a thing
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    slot_uri: dcterms:description
    alias: description
    owner: DataGeneration
    domain_of:
    - ImageValue
    - NamedThing
    range: string
  alternative_identifiers:
    name: alternative_identifiers
    description: A list of alternative identifiers for the entity.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: alternative_identifiers
    owner: DataGeneration
    domain_of:
    - MetaboliteIdentification
    - NamedThing
    range: uriorcurie
    multivalued: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
  type:
    name: type
    description: the class_uri of the class that has been instantiated
    notes:
    - replaces legacy nmdc:type slot
    - makes it easier to read example data files
    - required for polymorphic MongoDB collections
    examples:
    - value: nmdc:Biosample
    - value: nmdc:Study
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://github.com/microbiomedata/nmdc-schema/issues/1048
    - https://github.com/microbiomedata/nmdc-schema/issues/1233
    - https://github.com/microbiomedata/nmdc-schema/issues/248
    rank: 1000
    slot_uri: rdf:type
    designates_type: true
    alias: type
    owner: DataGeneration
    domain_of:
    - EukEval
    - FunctionalAnnotationAggMember
    - MobilePhaseSegment
    - PortionOfSubstance
    - MagBin
    - MetaboliteIdentification
    - PeptideQuantification
    - ProteinQuantification
    - GenomeFeature
    - FunctionalAnnotation
    - AttributeValue
    - NamedThing
    - FailureCategorization
    - Protocol
    - CreditAssociation
    - Doi
    range: uriorcurie
    required: true
class_uri: nmdc:DataGeneration