Class: MetagenomeAssembly
A workflow execution activity that converts sequencing reads into an assembled metagenome.
classDiagram
  class MetagenomeAssembly
  click MetagenomeAssembly href "../MetagenomeAssembly"
    WorkflowExecution <|-- MetagenomeAssembly
      click WorkflowExecution href "../WorkflowExecution"
  MetagenomeAssembly : alternative_identifiers
  MetagenomeAssembly : asm_score
  MetagenomeAssembly : contig_bp
  MetagenomeAssembly : contigs
  MetagenomeAssembly : ctg_l50
  MetagenomeAssembly : ctg_l90
  MetagenomeAssembly : ctg_logsum
  MetagenomeAssembly : ctg_max
  MetagenomeAssembly : ctg_n50
  MetagenomeAssembly : ctg_n90
  MetagenomeAssembly : ctg_powsum
  MetagenomeAssembly : description
  MetagenomeAssembly : end_date
  MetagenomeAssembly : ended_at_time
  MetagenomeAssembly : execution_resource
      MetagenomeAssembly --> "0..1" ExecutionResourceEnum : execution_resource
    click ExecutionResourceEnum href "../ExecutionResourceEnum"
  MetagenomeAssembly : gap_pct
  MetagenomeAssembly : gc_avg
  MetagenomeAssembly : gc_std
  MetagenomeAssembly : git_url
  MetagenomeAssembly : has_failure_categorization
      MetagenomeAssembly --> "*" FailureCategorization : has_failure_categorization
    click FailureCategorization href "../FailureCategorization"
  MetagenomeAssembly : has_input
      MetagenomeAssembly --> "1..*" DataObject : has_input
    click DataObject href "../DataObject"
  MetagenomeAssembly : has_output
      MetagenomeAssembly --> "*" DataObject : has_output
    click DataObject href "../DataObject"
  MetagenomeAssembly : id
  MetagenomeAssembly : insdc_assembly_identifiers
  MetagenomeAssembly : name
  MetagenomeAssembly : num_aligned_reads
  MetagenomeAssembly : num_input_reads
  MetagenomeAssembly : processing_institution
      MetagenomeAssembly --> "1" ProcessingInstitutionEnum : processing_institution
    click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum"
  MetagenomeAssembly : processing_institution_workflow_metadata
  MetagenomeAssembly : protocol_link
      MetagenomeAssembly --> "0..1" Protocol : protocol_link
    click Protocol href "../Protocol"
  MetagenomeAssembly : qc_comment
  MetagenomeAssembly : qc_status
      MetagenomeAssembly --> "0..1" StatusEnum : qc_status
    click StatusEnum href "../StatusEnum"
  MetagenomeAssembly : scaf_bp
  MetagenomeAssembly : scaf_l50
  MetagenomeAssembly : scaf_l90
  MetagenomeAssembly : scaf_l_gt50k
  MetagenomeAssembly : scaf_logsum
  MetagenomeAssembly : scaf_max
  MetagenomeAssembly : scaf_n50
  MetagenomeAssembly : scaf_n90
  MetagenomeAssembly : scaf_n_gt50k
  MetagenomeAssembly : scaf_pct_gt50k
  MetagenomeAssembly : scaf_powsum
  MetagenomeAssembly : scaffolds
  MetagenomeAssembly : start_date
  MetagenomeAssembly : started_at_time
  MetagenomeAssembly : type
  MetagenomeAssembly : version
  MetagenomeAssembly : was_informed_by
      MetagenomeAssembly --> "1..*" NucleotideSequencing : was_informed_by
    click NucleotideSequencing href "../NucleotideSequencing"
Inheritance
- NamedThing- PlannedProcess- DataEmitterProcess- WorkflowExecution- MetagenomeAssembly
 
 
- WorkflowExecution
 
- DataEmitterProcess
 
- PlannedProcess
Slots
| Name | Cardinality and Range | Description | Inheritance | 
|---|---|---|---|
| asm_score | 0..1 Float | A score for comparing metagenomic assembly quality from same sample | direct | 
| scaffolds | 0..1 Float | Total sequence count of all scaffolds | direct | 
| scaf_logsum | 0..1 Float | The sum of the (length*log(length)) of all scaffolds, times some constant | direct | 
| scaf_powsum | 0..1 Float | Powersum of all scaffolds is the same as logsum except that it uses the sum o... | direct | 
| scaf_max | 0..1 Float | Maximum scaffold length | direct | 
| scaf_bp | 0..1 Float | Total size in bp of all scaffolds | direct | 
| scaf_n50 | 0..1 Float | Given a set of scaffolds, each with its own length, the N50 count is defined ... | direct | 
| scaf_n90 | 0..1 Float | Given a set of scaffolds, each with its own length, the N90 count is defined ... | direct | 
| scaf_l50 | 0..1 Float | Given a set of scaffolds, the L50 is defined as the sequence length of the sh... | direct | 
| scaf_l90 | 0..1 Float | The L90 statistic is less than or equal to the L50 statistic; it is the lengt... | direct | 
| scaf_n_gt50k | 0..1 Float | Total sequence count of scaffolds greater than 50 KB | direct | 
| scaf_l_gt50k | 0..1 Float | Total size in bp of all scaffolds greater than 50 KB | direct | 
| scaf_pct_gt50k | 0..1 Float | Total sequence size percentage of scaffolds greater than 50 KB | direct | 
| contigs | 0..1 Float | The sum of the (length*log(length)) of all contigs, times some constant | direct | 
| contig_bp | 0..1 Float | Total size in bp of all contigs | direct | 
| ctg_n50 | 0..1 Float | Given a set of contigs, each with its own length, the N50 count is defined as... | direct | 
| ctg_l50 | 0..1 Float | Given a set of contigs, the L50 is defined as the sequence length of the shor... | direct | 
| ctg_n90 | 0..1 Float | Given a set of contigs, each with its own length, the N90 count is defined as... | direct | 
| ctg_l90 | 0..1 Float | The L90 statistic is less than or equal to the L50 statistic; it is the lengt... | direct | 
| ctg_logsum | 0..1 Float | Maximum contig length | direct | 
| ctg_powsum | 0..1 Float | Powersum of all contigs is the same as logsum except that it uses the sum of ... | direct | 
| ctg_max | 0..1 Float | Maximum contig length | direct | 
| gap_pct | 0..1 Float | The gap size percentage of all scaffolds | direct | 
| gc_std | 0..1 Float | Standard deviation of GC content of all contigs | direct | 
| gc_avg | 0..1 Float | Average of GC content of all contigs | direct | 
| num_input_reads | 0..1 Float | The sequence count number of input reads for assembly | direct | 
| num_aligned_reads | 0..1 Float | The sequence count number of input reads aligned to assembled contigs | direct | 
| insdc_assembly_identifiers | * ExternalIdentifier | direct | |
| ended_at_time | 0..1 String | WorkflowExecution | |
| execution_resource | 0..1 ExecutionResourceEnum | The computing resource or facility where the workflow was executed | WorkflowExecution | 
| git_url | 1 String | The url that points to the exact github location of a workflow | WorkflowExecution | 
| started_at_time | 1 String | WorkflowExecution | |
| version | 0..1 String | The NMDC release tag for a given workflow release used for data processing | WorkflowExecution | 
| was_informed_by | 1..* NucleotideSequencing | The primary DataGeneration subclass that the WorkflowExecution subclass depen... | WorkflowExecution | 
| processing_institution_workflow_metadata | 0..1 String | Information about how workflow results were generated when the processing is ... | WorkflowExecution | 
| has_input | 1..* DataObject | An input to a process | PlannedProcess | 
| has_output | * DataObject | An output from a process | PlannedProcess | 
| processing_institution | 1 ProcessingInstitutionEnum | The organization that processed the sample | PlannedProcess | 
| protocol_link | 0..1 Protocol | PlannedProcess | |
| start_date | 0..1 String | The date on which any process or activity was started | PlannedProcess | 
| end_date | 0..1 String | The date on which any process or activity was ended | PlannedProcess | 
| qc_status | 0..1 StatusEnum | Stores information about the result of a process (ie the process of sequencin... | PlannedProcess | 
| qc_comment | 0..1 String | Slot to store additional comments about laboratory or workflow output | PlannedProcess | 
| has_failure_categorization | * FailureCategorization | PlannedProcess | |
| id | 1 Uriorcurie | A unique identifier for a thing | NamedThing | 
| name | 0..1 String | A human readable label for an entity | NamedThing | 
| description | 0..1 String | a human-readable description of a thing | NamedThing | 
| alternative_identifiers | * Uriorcurie | A list of alternative identifiers for the entity | NamedThing | 
| type | 1 Uriorcurie | the class_uri of the class that has been instantiated | NamedThing | 
Comments
- instances of this class may use a de novo assembly strategy in most or all cases relevant to NMDC
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/nmdc/nmdc
Mappings
| Mapping Type | Mapped Value | 
|---|---|
LinkML Source
Direct
name: MetagenomeAssembly
description: A workflow execution activity that converts sequencing reads into an
  assembled metagenome.
comments:
- instances of this class may use a de novo assembly strategy in most or all cases
  relevant to NMDC
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slots:
- asm_score
- scaffolds
- scaf_logsum
- scaf_powsum
- scaf_max
- scaf_bp
- scaf_n50
- scaf_n90
- scaf_l50
- scaf_l90
- scaf_n_gt50k
- scaf_l_gt50k
- scaf_pct_gt50k
- contigs
- contig_bp
- ctg_n50
- ctg_l50
- ctg_n90
- ctg_l90
- ctg_logsum
- ctg_powsum
- ctg_max
- gap_pct
- gc_std
- gc_avg
- num_input_reads
- num_aligned_reads
- insdc_assembly_identifiers
slot_usage:
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  was_informed_by:
    name: was_informed_by
    range: NucleotideSequencing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
class_uri: nmdc:MetagenomeAssembly
Induced
name: MetagenomeAssembly
description: A workflow execution activity that converts sequencing reads into an
  assembled metagenome.
comments:
- instances of this class may use a de novo assembly strategy in most or all cases
  relevant to NMDC
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slot_usage:
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  was_informed_by:
    name: was_informed_by
    range: NucleotideSequencing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
attributes:
  asm_score:
    name: asm_score
    description: A score for comparing metagenomic assembly quality from same sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: asm_score
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaffolds:
    name: scaffolds
    description: Total sequence count of all scaffolds.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaffolds
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_logsum:
    name: scaf_logsum
    description: The sum of the (length*log(length)) of all scaffolds, times some
      constant.  Increase the contiguity, the score will increase
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_logsum
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_powsum:
    name: scaf_powsum
    description: Powersum of all scaffolds is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_powsum
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_max:
    name: scaf_max
    description: Maximum scaffold length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_max
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_bp:
    name: scaf_bp
    description: Total size in bp of all scaffolds.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_bp
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_n50:
    name: scaf_n50
    description: Given a set of scaffolds, each with its own length, the N50 count
      is defined as the smallest number of scaffolds whose length sum makes up half
      of genome size.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_n50
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_n90:
    name: scaf_n90
    description: Given a set of scaffolds, each with its own length, the N90 count
      is defined as the smallest number of scaffolds whose length sum makes up 90%
      of genome size.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_n90
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_l50:
    name: scaf_l50
    description: Given a set of scaffolds, the L50 is defined as the sequence length
      of the shortest scaffold at 50% of the total genome length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_l50
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_l90:
    name: scaf_l90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all scaffolds of that length or longer
      contains at least 90% of the sum of the lengths of all scaffolds.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_l90
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_n_gt50k:
    name: scaf_n_gt50k
    description: Total sequence count of scaffolds greater than 50 KB.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_n_gt50k
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_l_gt50k:
    name: scaf_l_gt50k
    description: Total size in bp of all scaffolds greater than 50 KB.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_l_gt50k
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_pct_gt50k:
    name: scaf_pct_gt50k
    description: Total sequence size percentage of scaffolds greater than 50 KB.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_pct_gt50k
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  contigs:
    name: contigs
    description: The sum of the (length*log(length)) of all contigs, times some constant.  Increase
      the contiguity, the score will increase
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: contigs
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  contig_bp:
    name: contig_bp
    description: Total size in bp of all contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: contig_bp
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_n50:
    name: ctg_n50
    description: Given a set of contigs, each with its own length, the N50 count is
      defined as the smallest number_of_contigs whose length sum makes up half of
      genome size.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_n50
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_l50:
    name: ctg_l50
    description: Given a set of contigs, the L50 is defined as the sequence length
      of the shortest contig at 50% of the total genome length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_l50
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_n90:
    name: ctg_n90
    description: Given a set of contigs, each with its own length, the N90 count is
      defined as the smallest number of contigs whose length sum makes up 90% of genome
      size.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_n90
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_l90:
    name: ctg_l90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all contigs of that length or longer
      contains at least 90% of the sum of the lengths of all contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_l90
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_logsum:
    name: ctg_logsum
    description: Maximum contig length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_logsum
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_powsum:
    name: ctg_powsum
    description: Powersum of all contigs is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_powsum
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_max:
    name: ctg_max
    description: Maximum contig length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_max
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  gap_pct:
    name: gap_pct
    description: The gap size percentage of all scaffolds.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: gap_pct
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  gc_std:
    name: gc_std
    description: Standard deviation of GC content of all contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: gc_std
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  gc_avg:
    name: gc_avg
    description: Average of GC content of all contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: gc_avg
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  num_input_reads:
    name: num_input_reads
    description: The sequence count number of input reads for assembly.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: num_input_reads
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  num_aligned_reads:
    name: num_aligned_reads
    description: The sequence count number of input reads aligned to assembled contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: num_aligned_reads
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  insdc_assembly_identifiers:
    name: insdc_assembly_identifiers
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: assembly_identifiers
    mixins:
    - insdc_identifiers
    alias: insdc_assembly_identifiers
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: external_identifier
    multivalued: true
    pattern: ^insdc.sra:[A-Z]+[0-9]+(\.[0-9]+)?$
  ended_at_time:
    name: ended_at_time
    notes:
    - 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
      It may not be complete, but it is good enough for now.'
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - prov:endedAtTime
    rank: 1000
    alias: ended_at_time
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
    pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
  execution_resource:
    name: execution_resource
    description: The computing resource or facility where the workflow was executed.
    examples:
    - value: NERSC-Cori
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: execution_resource
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: ExecutionResourceEnum
  git_url:
    name: git_url
    description: The url that points to the exact github location of a workflow.
    examples:
    - value: https://github.com/microbiomedata/mg_annotation/releases/tag/0.1
    - value: https://github.com/microbiomedata/metaMS/blob/master/metaMS/gcmsWorkflow.py
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: git_url
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
    required: true
  started_at_time:
    name: started_at_time
    notes:
    - 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
      It may not be complete, but it is good enough for now.'
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - prov:startedAtTime
    rank: 1000
    alias: started_at_time
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
    required: true
    pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
  version:
    name: version
    description: The NMDC release tag for a given workflow release used for data processing.
      If workflows are processed externally, as denoted by processing_institution,
      this value represents the best mapping between a processing institution's (e.g.,
      JGI) workflow metadata and a NMDC tagged release.
    examples:
    - value: v1.2.0
    from_schema: https://w3id.org/nmdc/nmdc
    broad_mappings:
    - NCIT:C182117
    rank: 1000
    alias: version
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
  was_informed_by:
    name: was_informed_by
    description: The primary DataGeneration subclass that the WorkflowExecution subclass
      depends on.
    comments:
    - For version 1 of the proteomics workflow there are input files both from the
      NucleotideSequencing and MassSpectrometry, the MassSpectrometry record is considered
      the primary class to reference.
    from_schema: https://w3id.org/nmdc/nmdc
    structured_aliases:
      was_informed_by:
        literal_form: was_informed_by
        predicate: EXACT_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    narrow_mappings:
    - prov:wasInformedBy
    rank: 1000
    alias: was_informed_by
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: NucleotideSequencing
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution_workflow_metadata:
    name: processing_institution_workflow_metadata
    description: Information about how workflow results were generated when the processing
      is done by an external organziation (e.g., JGI) such as software tool name and
      version or pipeline name and version.
    examples:
    - value: metaspades v. 3.15.2
    - value: IMG Annotation Pipeline v.5.0.25
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - NCIT:C165211
    rank: 1000
    alias: processing_institution_workflow_metadata
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
  has_input:
    name: has_input
    description: An input to a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - input
    rank: 1000
    alias: has_input
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: DataObject
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  has_output:
    name: has_output
    description: An output from a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - output
    rank: 1000
    alias: has_output
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: DataObject
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: processing_institution
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: ProcessingInstitutionEnum
    required: true
  protocol_link:
    name: protocol_link
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: protocol_link
    owner: MetagenomeAssembly
    domain_of:
    - Configuration
    - PlannedProcess
    - Study
    range: Protocol
  start_date:
    name: start_date
    description: The date on which any process or activity was started
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: start_date
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: string
  end_date:
    name: end_date
    description: The date on which any process or activity was ended
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: end_date
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: string
  qc_status:
    name: qc_status
    description: Stores information about the result of a process (ie the process
      of sequencing a library may have for qc_status of 'fail' if not enough data
      was generated)
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_status
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: StatusEnum
  qc_comment:
    name: qc_comment
    description: Slot to store additional comments about laboratory or workflow output.
      For workflow output it may describe the particular workflow stage that failed.
      (ie Failed at call-stage due to a malformed fastq file).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_comment
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: string
  has_failure_categorization:
    name: has_failure_categorization
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_failure_categorization
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: FailureCategorization
    multivalued: true
    inlined: true
    inlined_as_list: true
  id:
    name: id
    description: A unique identifier for a thing. Must be either a CURIE shorthand
      for a URI or a complete URI
    notes:
    - 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
    - a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
      will be accepted
    - typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
      via per-class id slot usage assertions
    - minting authority shoulders should probably be enumerated and checked in the
      pattern
    examples:
    - value: nmdc:mgmag-00-x012.1_7_c1
      description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
    from_schema: https://w3id.org/nmdc/nmdc
    structured_aliases:
      workflow_execution_id:
        literal_form: workflow_execution_id
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
      data_object_id:
        literal_form: data_object_id
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    identifier: true
    alias: id
    owner: MetagenomeAssembly
    domain_of:
    - NamedThing
    range: uriorcurie
    required: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  name:
    name: name
    description: A human readable label for an entity
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: name
    owner: MetagenomeAssembly
    domain_of:
    - PersonValue
    - NamedThing
    - Protocol
    range: string
  description:
    name: description
    description: a human-readable description of a thing
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    slot_uri: dcterms:description
    alias: description
    owner: MetagenomeAssembly
    domain_of:
    - ImageValue
    - NamedThing
    - Protocol
    range: string
  alternative_identifiers:
    name: alternative_identifiers
    description: A list of alternative identifiers for the entity.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: alternative_identifiers
    owner: MetagenomeAssembly
    domain_of:
    - MetaboliteIdentification
    - NamedThing
    range: uriorcurie
    multivalued: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
  type:
    name: type
    description: the class_uri of the class that has been instantiated
    notes:
    - makes it easier to read example data files
    - required for polymorphic MongoDB collections
    examples:
    - value: nmdc:Biosample
    - value: nmdc:Study
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://github.com/microbiomedata/nmdc-schema/issues/1048
    - https://github.com/microbiomedata/nmdc-schema/issues/1233
    - https://github.com/microbiomedata/nmdc-schema/issues/248
    structured_aliases:
      workflow_execution_class:
        literal_form: workflow_execution_class
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    slot_uri: rdf:type
    designates_type: true
    alias: type
    owner: MetagenomeAssembly
    domain_of:
    - EukEval
    - FunctionalAnnotationAggMember
    - PeptideQuantification
    - ProteinQuantification
    - MobilePhaseSegment
    - PortionOfSubstance
    - MagBin
    - MetaboliteIdentification
    - GenomeFeature
    - FunctionalAnnotation
    - AttributeValue
    - NamedThing
    - OntologyRelation
    - FailureCategorization
    - Protocol
    - CreditAssociation
    - Doi
    range: uriorcurie
    required: true
class_uri: nmdc:MetagenomeAssembly