Skip to content

Class: MetagenomeAssembly

A workflow execution activity that converts sequencing reads into an assembled metagenome.

URI: nmdc:MetagenomeAssembly

execution_resource
1
has_failure_categorization
*
has_input
1..*
has_output
*
processing_institution
0..1
protocol_link
0..1
qc_status
0..1
was_informed_by
1..*
MetagenomeAssembly
alternative_identifiers
asm_score
contig_bp
contigs
ctg_l50
ctg_l90
ctg_logsum
ctg_max
ctg_n50
ctg_n90
ctg_powsum
description
end_date
ended_at_time
execution_resource
gap_pct
gc_avg
gc_std
git_url
has_failure_categorization
has_input
has_output
id
insdc_assembly_identifiers
name
num_aligned_reads
num_input_reads
processing_institution
processing_institution_workflow_metadata
protocol_link
qc_comment
qc_status
scaf_bp
scaf_l50
scaf_l90
scaf_l_gt50k
scaf_logsum
scaf_max
scaf_n50
scaf_n90
scaf_n_gt50k
scaf_pct_gt50k
scaf_powsum
scaffolds
start_date
started_at_time
type
version
was_informed_by
WorkflowExecution
ExecutionResourceEnum
FailureCategorization
DataObject
ProcessingInstitutionEnum
Protocol
StatusEnum
NucleotideSequencing

Inheritance

Slots

Name Cardinality and Range Description Inheritance
asm_score 0..1
Float
A score for comparing metagenomic assembly quality from same sample direct
scaffolds 0..1
Float
Total sequence count of all scaffolds direct
scaf_logsum 0..1
Float
The sum of the (length*log(length)) of all scaffolds, times some constant direct
scaf_powsum 0..1
Float
Powersum of all scaffolds is the same as logsum except that it uses the sum o... direct
scaf_max 0..1
Float
Maximum scaffold length direct
scaf_bp 0..1
Float
Total size in bp of all scaffolds direct
scaf_n50 0..1
Float
Given a set of scaffolds, each with its own length, the N50 count is defined ... direct
scaf_n90 0..1
Float
Given a set of scaffolds, each with its own length, the N90 count is defined ... direct
scaf_l50 0..1
Float
Given a set of scaffolds, the L50 is defined as the sequence length of the sh... direct
scaf_l90 0..1
Float
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... direct
scaf_n_gt50k 0..1
Float
Total sequence count of scaffolds greater than 50 KB direct
scaf_l_gt50k 0..1
Float
Total size in bp of all scaffolds greater than 50 KB direct
scaf_pct_gt50k 0..1
Float
Total sequence size percentage of scaffolds greater than 50 KB direct
contigs 0..1
Float
The sum of the (length*log(length)) of all contigs, times some constant direct
contig_bp 0..1
Float
Total size in bp of all contigs direct
ctg_n50 0..1
Float
Given a set of contigs, each with its own length, the N50 count is defined as... direct
ctg_l50 0..1
Float
Given a set of contigs, the L50 is defined as the sequence length of the shor... direct
ctg_n90 0..1
Float
Given a set of contigs, each with its own length, the N90 count is defined as... direct
ctg_l90 0..1
Float
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... direct
ctg_logsum 0..1
Float
Maximum contig length direct
ctg_powsum 0..1
Float
Powersum of all contigs is the same as logsum except that it uses the sum of ... direct
ctg_max 0..1
Float
Maximum contig length direct
gap_pct 0..1
Float
The gap size percentage of all scaffolds direct
gc_std 0..1
Float
Standard deviation of GC content of all contigs direct
gc_avg 0..1
Float
Average of GC content of all contigs direct
num_input_reads 0..1
Float
The sequence count number of input reads for assembly direct
num_aligned_reads 0..1
Float
The sequence count number of input reads aligned to assembled contigs direct
insdc_assembly_identifiers 0..1
String
direct
ended_at_time 0..1
String
WorkflowExecution
execution_resource 1
ExecutionResourceEnum
The computing resource or facility where the workflow was executed WorkflowExecution
git_url 1
String
The url that points to the exact github location of a workflow WorkflowExecution
started_at_time 1
String
WorkflowExecution
version 0..1
String
The NMDC release tag for a given workflow release used for data processing WorkflowExecution
was_informed_by 1..*
NucleotideSequencing
The primary DataGeneration subclass that the WorkflowExecution subclass depen... WorkflowExecution
processing_institution_workflow_metadata 0..1
String
Information about how workflow results were generated when the processing is ... WorkflowExecution
has_input 1..*
DataObject
An input to a process PlannedProcess
has_output *
DataObject
An output from a process PlannedProcess
processing_institution 0..1
ProcessingInstitutionEnum
The organization that processed the sample PlannedProcess
protocol_link 0..1
Protocol
PlannedProcess
start_date 0..1
String
The date on which any process or activity was started PlannedProcess
end_date 0..1
String
The date on which any process or activity was ended PlannedProcess
qc_status 0..1
StatusEnum
Stores information about the result of a process (ie the process of sequencin... PlannedProcess
qc_comment 0..1
String
Slot to store additional comments about laboratory or workflow output PlannedProcess
has_failure_categorization *
FailureCategorization
PlannedProcess
id 1
Uriorcurie
A unique identifier for a thing NamedThing
name 0..1
String
A human readable label for an entity NamedThing
description 0..1
String
a human-readable description of a thing NamedThing
alternative_identifiers *
Uriorcurie
A list of alternative identifiers for the entity NamedThing
type 1
Uriorcurie
the class_uri of the class that has been instantiated NamedThing

Comments

  • instances of this class may use a de novo assembly strategy in most or all cases relevant to NMDC

Identifier and Mapping Information

Schema Source

Mappings

Mapping Type Mapped Value

LinkML Source

Direct

name: MetagenomeAssembly
description: A workflow execution activity that converts sequencing reads into an
  assembled metagenome.
comments:
- instances of this class may use a de novo assembly strategy in most or all cases
  relevant to NMDC
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slots:
- asm_score
- scaffolds
- scaf_logsum
- scaf_powsum
- scaf_max
- scaf_bp
- scaf_n50
- scaf_n90
- scaf_l50
- scaf_l90
- scaf_n_gt50k
- scaf_l_gt50k
- scaf_pct_gt50k
- contigs
- contig_bp
- ctg_n50
- ctg_l50
- ctg_n90
- ctg_l90
- ctg_logsum
- ctg_powsum
- ctg_max
- gap_pct
- gc_std
- gc_avg
- num_input_reads
- num_aligned_reads
- insdc_assembly_identifiers
slot_usage:
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  was_informed_by:
    name: was_informed_by
    range: NucleotideSequencing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
class_uri: nmdc:MetagenomeAssembly

Induced

name: MetagenomeAssembly
description: A workflow execution activity that converts sequencing reads into an
  assembled metagenome.
comments:
- instances of this class may use a de novo assembly strategy in most or all cases
  relevant to NMDC
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slot_usage:
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  was_informed_by:
    name: was_informed_by
    range: NucleotideSequencing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
attributes:
  asm_score:
    name: asm_score
    description: A score for comparing metagenomic assembly quality from same sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: asm_score
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaffolds:
    name: scaffolds
    description: Total sequence count of all scaffolds.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaffolds
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_logsum:
    name: scaf_logsum
    description: The sum of the (length*log(length)) of all scaffolds, times some
      constant.  Increase the contiguity, the score will increase
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_logsum
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_powsum:
    name: scaf_powsum
    description: Powersum of all scaffolds is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_powsum
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_max:
    name: scaf_max
    description: Maximum scaffold length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_max
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_bp:
    name: scaf_bp
    description: Total size in bp of all scaffolds.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_bp
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_n50:
    name: scaf_n50
    description: Given a set of scaffolds, each with its own length, the N50 count
      is defined as the smallest number of scaffolds whose length sum makes up half
      of genome size.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_n50
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_n90:
    name: scaf_n90
    description: Given a set of scaffolds, each with its own length, the N90 count
      is defined as the smallest number of scaffolds whose length sum makes up 90%
      of genome size.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_n90
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_l50:
    name: scaf_l50
    description: Given a set of scaffolds, the L50 is defined as the sequence length
      of the shortest scaffold at 50% of the total genome length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_l50
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_l90:
    name: scaf_l90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all scaffolds of that length or longer
      contains at least 90% of the sum of the lengths of all scaffolds.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_l90
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_n_gt50k:
    name: scaf_n_gt50k
    description: Total sequence count of scaffolds greater than 50 KB.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_n_gt50k
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_l_gt50k:
    name: scaf_l_gt50k
    description: Total size in bp of all scaffolds greater than 50 KB.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_l_gt50k
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  scaf_pct_gt50k:
    name: scaf_pct_gt50k
    description: Total sequence size percentage of scaffolds greater than 50 KB.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: scaf_pct_gt50k
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  contigs:
    name: contigs
    description: The sum of the (length*log(length)) of all contigs, times some constant.  Increase
      the contiguity, the score will increase
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: contigs
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  contig_bp:
    name: contig_bp
    description: Total size in bp of all contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: contig_bp
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_n50:
    name: ctg_n50
    description: Given a set of contigs, each with its own length, the N50 count is
      defined as the smallest number_of_contigs whose length sum makes up half of
      genome size.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_n50
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_l50:
    name: ctg_l50
    description: Given a set of contigs, the L50 is defined as the sequence length
      of the shortest contig at 50% of the total genome length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_l50
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_n90:
    name: ctg_n90
    description: Given a set of contigs, each with its own length, the N90 count is
      defined as the smallest number of contigs whose length sum makes up 90% of genome
      size.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_n90
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_l90:
    name: ctg_l90
    description: The L90 statistic is less than or equal to the L50 statistic; it
      is the length for which the collection of all contigs of that length or longer
      contains at least 90% of the sum of the lengths of all contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_l90
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_logsum:
    name: ctg_logsum
    description: Maximum contig length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_logsum
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_powsum:
    name: ctg_powsum
    description: Powersum of all contigs is the same as logsum except that it uses
      the sum of (length*(length^P)) for some power P (default P=0.25).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_powsum
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  ctg_max:
    name: ctg_max
    description: Maximum contig length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: ctg_max
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  gap_pct:
    name: gap_pct
    description: The gap size percentage of all scaffolds.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: gap_pct
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  gc_std:
    name: gc_std
    description: Standard deviation of GC content of all contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: gc_std
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  gc_avg:
    name: gc_avg
    description: Average of GC content of all contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: gc_avg
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  num_input_reads:
    name: num_input_reads
    description: The sequence count number of input reads for assembly.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: num_input_reads
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  num_aligned_reads:
    name: num_aligned_reads
    description: The sequence count number of input reads aligned to assembled contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: metagenome_assembly_parameter
    alias: num_aligned_reads
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: float
  insdc_assembly_identifiers:
    name: insdc_assembly_identifiers
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: assembly_identifiers
    mixins:
    - insdc_identifiers
    alias: insdc_assembly_identifiers
    owner: MetagenomeAssembly
    domain_of:
    - MetagenomeAssembly
    - MetatranscriptomeAssembly
    range: string
    pattern: ^insdc.sra:[A-Z]+[0-9]+(\.[0-9]+)?$
  ended_at_time:
    name: ended_at_time
    notes:
    - 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
      It may not be complete, but it is good enough for now.'
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - prov:endedAtTime
    rank: 1000
    alias: ended_at_time
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
    pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
  execution_resource:
    name: execution_resource
    description: The computing resource or facility where the workflow was executed.
    examples:
    - value: NERSC-Cori
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: execution_resource
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: ExecutionResourceEnum
    required: true
  git_url:
    name: git_url
    description: The url that points to the exact github location of a workflow.
    examples:
    - value: https://github.com/microbiomedata/mg_annotation/releases/tag/0.1
    - value: https://github.com/microbiomedata/metaMS/blob/master/metaMS/gcmsWorkflow.py
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: git_url
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
    required: true
  started_at_time:
    name: started_at_time
    notes:
    - 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
      It may not be complete, but it is good enough for now.'
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - prov:startedAtTime
    rank: 1000
    alias: started_at_time
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
    required: true
    pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
  version:
    name: version
    description: The NMDC release tag for a given workflow release used for data processing.
      If workflows are processed externally, as denoted by processing_institution,
      this value represents the best mapping between a processing institution's (e.g.,
      JGI) workflow metadata and a NMDC tagged release.
    examples:
    - value: v1.2.0
    from_schema: https://w3id.org/nmdc/nmdc
    broad_mappings:
    - NCIT:C182117
    rank: 1000
    alias: version
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
  was_informed_by:
    name: was_informed_by
    description: The primary DataGeneration subclass that the WorkflowExecution subclass
      depends on.
    comments:
    - For version 1 of the proteomics workflow there are input files both from the
      NucleotideSequencing and MassSpectrometry, the MassSpectrometry record is considered
      the primary class to reference.
    from_schema: https://w3id.org/nmdc/nmdc
    structured_aliases:
      was_informed_by:
        literal_form: was_informed_by
        predicate: EXACT_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    narrow_mappings:
    - prov:wasInformedBy
    rank: 1000
    alias: was_informed_by
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: NucleotideSequencing
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution_workflow_metadata:
    name: processing_institution_workflow_metadata
    description: Information about how workflow results were generated when the processing
      is done by an external organziation (e.g., JGI) such as software tool name and
      version or pipeline name and version.
    examples:
    - value: metaspades v. 3.15.2
    - value: IMG Annotation Pipeline v.5.0.25
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - NCIT:C165211
    rank: 1000
    alias: processing_institution_workflow_metadata
    owner: MetagenomeAssembly
    domain_of:
    - WorkflowExecution
    range: string
  has_input:
    name: has_input
    description: An input to a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - input
    rank: 1000
    alias: has_input
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: DataObject
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  has_output:
    name: has_output
    description: An output from a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - output
    rank: 1000
    alias: has_output
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: DataObject
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: processing_institution
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: ProcessingInstitutionEnum
  protocol_link:
    name: protocol_link
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: protocol_link
    owner: MetagenomeAssembly
    domain_of:
    - Configuration
    - PlannedProcess
    - Study
    range: Protocol
  start_date:
    name: start_date
    description: The date on which any process or activity was started
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: start_date
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: string
  end_date:
    name: end_date
    description: The date on which any process or activity was ended
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: end_date
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: string
  qc_status:
    name: qc_status
    description: Stores information about the result of a process (ie the process
      of sequencing a library may have for qc_status of 'fail' if not enough data
      was generated)
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_status
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: StatusEnum
  qc_comment:
    name: qc_comment
    description: Slot to store additional comments about laboratory or workflow output.
      For workflow output it may describe the particular workflow stage that failed.
      (ie Failed at call-stage due to a malformed fastq file).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_comment
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: string
  has_failure_categorization:
    name: has_failure_categorization
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_failure_categorization
    owner: MetagenomeAssembly
    domain_of:
    - PlannedProcess
    range: FailureCategorization
    multivalued: true
    inlined: true
    inlined_as_list: true
  id:
    name: id
    description: A unique identifier for a thing. Must be either a CURIE shorthand
      for a URI or a complete URI
    notes:
    - 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
    - a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
      will be accepted
    - typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
      via per-class id slot usage assertions
    - minting authority shoulders should probably be enumerated and checked in the
      pattern
    examples:
    - value: nmdc:mgmag-00-x012.1_7_c1
      description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
    from_schema: https://w3id.org/nmdc/nmdc
    structured_aliases:
      workflow_execution_id:
        literal_form: workflow_execution_id
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
      data_object_id:
        literal_form: data_object_id
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    identifier: true
    alias: id
    owner: MetagenomeAssembly
    domain_of:
    - NamedThing
    range: uriorcurie
    required: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  name:
    name: name
    description: A human readable label for an entity
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: name
    owner: MetagenomeAssembly
    domain_of:
    - PersonValue
    - NamedThing
    - Protocol
    range: string
  description:
    name: description
    description: a human-readable description of a thing
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    slot_uri: dcterms:description
    alias: description
    owner: MetagenomeAssembly
    domain_of:
    - ImageValue
    - NamedThing
    range: string
  alternative_identifiers:
    name: alternative_identifiers
    description: A list of alternative identifiers for the entity.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: alternative_identifiers
    owner: MetagenomeAssembly
    domain_of:
    - MetaboliteIdentification
    - NamedThing
    range: uriorcurie
    multivalued: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
  type:
    name: type
    description: the class_uri of the class that has been instantiated
    notes:
    - makes it easier to read example data files
    - required for polymorphic MongoDB collections
    examples:
    - value: nmdc:Biosample
    - value: nmdc:Study
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://github.com/microbiomedata/nmdc-schema/issues/1048
    - https://github.com/microbiomedata/nmdc-schema/issues/1233
    - https://github.com/microbiomedata/nmdc-schema/issues/248
    structured_aliases:
      workflow_execution_class:
        literal_form: workflow_execution_class
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    slot_uri: rdf:type
    designates_type: true
    alias: type
    owner: MetagenomeAssembly
    domain_of:
    - EukEval
    - FunctionalAnnotationAggMember
    - PeptideQuantification
    - ProteinQuantification
    - MobilePhaseSegment
    - PortionOfSubstance
    - MagBin
    - MetaboliteIdentification
    - GenomeFeature
    - FunctionalAnnotation
    - AttributeValue
    - NamedThing
    - OntologyRelation
    - FailureCategorization
    - Protocol
    - CreditAssociation
    - Doi
    range: uriorcurie
    required: true
class_uri: nmdc:MetagenomeAssembly