Class: MetagenomeAssembly
A workflow execution activity that converts sequencing reads into an assembled metagenome.
Inheritance
- NamedThing
- PlannedProcess
- DataEmitterProcess
- WorkflowExecution
- MetagenomeAssembly
- WorkflowExecution
- DataEmitterProcess
- PlannedProcess
Slots
Name | Cardinality and Range | Description | Inheritance |
---|---|---|---|
asm_score | 0..1 Float |
A score for comparing metagenomic assembly quality from same sample | direct |
scaffolds | 0..1 Float |
Total sequence count of all scaffolds | direct |
scaf_logsum | 0..1 Float |
The sum of the (length*log(length)) of all scaffolds, times some constant | direct |
scaf_powsum | 0..1 Float |
Powersum of all scaffolds is the same as logsum except that it uses the sum o... | direct |
scaf_max | 0..1 Float |
Maximum scaffold length | direct |
scaf_bp | 0..1 Float |
Total size in bp of all scaffolds | direct |
scaf_n50 | 0..1 Float |
Given a set of scaffolds, each with its own length, the N50 count is defined ... | direct |
scaf_n90 | 0..1 Float |
Given a set of scaffolds, each with its own length, the N90 count is defined ... | direct |
scaf_l50 | 0..1 Float |
Given a set of scaffolds, the L50 is defined as the sequence length of the sh... | direct |
scaf_l90 | 0..1 Float |
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... | direct |
scaf_n_gt50k | 0..1 Float |
Total sequence count of scaffolds greater than 50 KB | direct |
scaf_l_gt50k | 0..1 Float |
Total size in bp of all scaffolds greater than 50 KB | direct |
scaf_pct_gt50k | 0..1 Float |
Total sequence size percentage of scaffolds greater than 50 KB | direct |
contigs | 0..1 Float |
The sum of the (length*log(length)) of all contigs, times some constant | direct |
contig_bp | 0..1 Float |
Total size in bp of all contigs | direct |
ctg_n50 | 0..1 Float |
Given a set of contigs, each with its own length, the N50 count is defined as... | direct |
ctg_l50 | 0..1 Float |
Given a set of contigs, the L50 is defined as the sequence length of the shor... | direct |
ctg_n90 | 0..1 Float |
Given a set of contigs, each with its own length, the N90 count is defined as... | direct |
ctg_l90 | 0..1 Float |
The L90 statistic is less than or equal to the L50 statistic; it is the lengt... | direct |
ctg_logsum | 0..1 Float |
Maximum contig length | direct |
ctg_powsum | 0..1 Float |
Powersum of all contigs is the same as logsum except that it uses the sum of ... | direct |
ctg_max | 0..1 Float |
Maximum contig length | direct |
gap_pct | 0..1 Float |
The gap size percentage of all scaffolds | direct |
gc_std | 0..1 Float |
Standard deviation of GC content of all contigs | direct |
gc_avg | 0..1 Float |
Average of GC content of all contigs | direct |
num_input_reads | 0..1 Float |
The sequence count number of input reads for assembly | direct |
num_aligned_reads | 0..1 Float |
The sequence count number of input reads aligned to assembled contigs | direct |
insdc_assembly_identifiers | 0..1 String |
direct | |
ended_at_time | 0..1 String |
WorkflowExecution | |
execution_resource | 1 ExecutionResourceEnum |
The computing resource or facility where the workflow was executed | WorkflowExecution |
git_url | 1 String |
The url that points to the exact github location of a workflow | WorkflowExecution |
started_at_time | 1 String |
WorkflowExecution | |
version | 0..1 String |
The NMDC release tag for a given workflow release used for data processing | WorkflowExecution |
was_informed_by | 1..* NucleotideSequencing |
The primary DataGeneration subclass that the WorkflowExecution subclass depen... | WorkflowExecution |
processing_institution_workflow_metadata | 0..1 String |
Information about how workflow results were generated when the processing is ... | WorkflowExecution |
has_input | 1..* DataObject |
An input to a process | PlannedProcess |
has_output | * DataObject |
An output from a process | PlannedProcess |
processing_institution | 0..1 ProcessingInstitutionEnum |
The organization that processed the sample | PlannedProcess |
protocol_link | 0..1 Protocol |
PlannedProcess | |
start_date | 0..1 String |
The date on which any process or activity was started | PlannedProcess |
end_date | 0..1 String |
The date on which any process or activity was ended | PlannedProcess |
qc_status | 0..1 StatusEnum |
Stores information about the result of a process (ie the process of sequencin... | PlannedProcess |
qc_comment | 0..1 String |
Slot to store additional comments about laboratory or workflow output | PlannedProcess |
has_failure_categorization | * FailureCategorization |
PlannedProcess | |
id | 1 Uriorcurie |
A unique identifier for a thing | NamedThing |
name | 0..1 String |
A human readable label for an entity | NamedThing |
description | 0..1 String |
a human-readable description of a thing | NamedThing |
alternative_identifiers | * Uriorcurie |
A list of alternative identifiers for the entity | NamedThing |
type | 1 Uriorcurie |
the class_uri of the class that has been instantiated | NamedThing |
Comments
- instances of this class may use a de novo assembly strategy in most or all cases relevant to NMDC
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/nmdc/nmdc
Mappings
Mapping Type | Mapped Value |
---|---|
LinkML Source
Direct
name: MetagenomeAssembly
description: A workflow execution activity that converts sequencing reads into an
assembled metagenome.
comments:
- instances of this class may use a de novo assembly strategy in most or all cases
relevant to NMDC
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slots:
- asm_score
- scaffolds
- scaf_logsum
- scaf_powsum
- scaf_max
- scaf_bp
- scaf_n50
- scaf_n90
- scaf_l50
- scaf_l90
- scaf_n_gt50k
- scaf_l_gt50k
- scaf_pct_gt50k
- contigs
- contig_bp
- ctg_n50
- ctg_l50
- ctg_n90
- ctg_l90
- ctg_logsum
- ctg_powsum
- ctg_max
- gap_pct
- gc_std
- gc_avg
- num_input_reads
- num_aligned_reads
- insdc_assembly_identifiers
slot_usage:
id:
name: id
required: true
structured_pattern:
syntax: '{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}$'
interpolated: true
was_informed_by:
name: was_informed_by
range: NucleotideSequencing
structured_pattern:
syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
interpolated: true
class_uri: nmdc:MetagenomeAssembly
Induced
name: MetagenomeAssembly
description: A workflow execution activity that converts sequencing reads into an
assembled metagenome.
comments:
- instances of this class may use a de novo assembly strategy in most or all cases
relevant to NMDC
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slot_usage:
id:
name: id
required: true
structured_pattern:
syntax: '{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}$'
interpolated: true
was_informed_by:
name: was_informed_by
range: NucleotideSequencing
structured_pattern:
syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
interpolated: true
attributes:
asm_score:
name: asm_score
description: A score for comparing metagenomic assembly quality from same sample.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: asm_score
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaffolds:
name: scaffolds
description: Total sequence count of all scaffolds.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaffolds
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_logsum:
name: scaf_logsum
description: The sum of the (length*log(length)) of all scaffolds, times some
constant. Increase the contiguity, the score will increase
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_logsum
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_powsum:
name: scaf_powsum
description: Powersum of all scaffolds is the same as logsum except that it uses
the sum of (length*(length^P)) for some power P (default P=0.25).
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_powsum
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_max:
name: scaf_max
description: Maximum scaffold length.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_max
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_bp:
name: scaf_bp
description: Total size in bp of all scaffolds.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_bp
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_n50:
name: scaf_n50
description: Given a set of scaffolds, each with its own length, the N50 count
is defined as the smallest number of scaffolds whose length sum makes up half
of genome size.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_n50
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_n90:
name: scaf_n90
description: Given a set of scaffolds, each with its own length, the N90 count
is defined as the smallest number of scaffolds whose length sum makes up 90%
of genome size.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_n90
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_l50:
name: scaf_l50
description: Given a set of scaffolds, the L50 is defined as the sequence length
of the shortest scaffold at 50% of the total genome length.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_l50
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_l90:
name: scaf_l90
description: The L90 statistic is less than or equal to the L50 statistic; it
is the length for which the collection of all scaffolds of that length or longer
contains at least 90% of the sum of the lengths of all scaffolds.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_l90
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_n_gt50k:
name: scaf_n_gt50k
description: Total sequence count of scaffolds greater than 50 KB.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_n_gt50k
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_l_gt50k:
name: scaf_l_gt50k
description: Total size in bp of all scaffolds greater than 50 KB.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_l_gt50k
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
scaf_pct_gt50k:
name: scaf_pct_gt50k
description: Total sequence size percentage of scaffolds greater than 50 KB.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: scaf_pct_gt50k
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
contigs:
name: contigs
description: The sum of the (length*log(length)) of all contigs, times some constant. Increase
the contiguity, the score will increase
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: contigs
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
contig_bp:
name: contig_bp
description: Total size in bp of all contigs.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: contig_bp
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
ctg_n50:
name: ctg_n50
description: Given a set of contigs, each with its own length, the N50 count is
defined as the smallest number_of_contigs whose length sum makes up half of
genome size.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: ctg_n50
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
ctg_l50:
name: ctg_l50
description: Given a set of contigs, the L50 is defined as the sequence length
of the shortest contig at 50% of the total genome length.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: ctg_l50
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
ctg_n90:
name: ctg_n90
description: Given a set of contigs, each with its own length, the N90 count is
defined as the smallest number of contigs whose length sum makes up 90% of genome
size.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: ctg_n90
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
ctg_l90:
name: ctg_l90
description: The L90 statistic is less than or equal to the L50 statistic; it
is the length for which the collection of all contigs of that length or longer
contains at least 90% of the sum of the lengths of all contigs.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: ctg_l90
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
ctg_logsum:
name: ctg_logsum
description: Maximum contig length.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: ctg_logsum
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
ctg_powsum:
name: ctg_powsum
description: Powersum of all contigs is the same as logsum except that it uses
the sum of (length*(length^P)) for some power P (default P=0.25).
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: ctg_powsum
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
ctg_max:
name: ctg_max
description: Maximum contig length.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: ctg_max
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
gap_pct:
name: gap_pct
description: The gap size percentage of all scaffolds.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: gap_pct
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
gc_std:
name: gc_std
description: Standard deviation of GC content of all contigs.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: gc_std
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
gc_avg:
name: gc_avg
description: Average of GC content of all contigs.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: gc_avg
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
num_input_reads:
name: num_input_reads
description: The sequence count number of input reads for assembly.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: num_input_reads
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
num_aligned_reads:
name: num_aligned_reads
description: The sequence count number of input reads aligned to assembled contigs.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: metagenome_assembly_parameter
alias: num_aligned_reads
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: float
insdc_assembly_identifiers:
name: insdc_assembly_identifiers
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: assembly_identifiers
mixins:
- insdc_identifiers
alias: insdc_assembly_identifiers
owner: MetagenomeAssembly
domain_of:
- MetagenomeAssembly
- MetatranscriptomeAssembly
range: string
pattern: ^insdc.sra:[A-Z]+[0-9]+(\.[0-9]+)?$
ended_at_time:
name: ended_at_time
notes:
- 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
It may not be complete, but it is good enough for now.'
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- prov:endedAtTime
rank: 1000
alias: ended_at_time
owner: MetagenomeAssembly
domain_of:
- WorkflowExecution
range: string
pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
execution_resource:
name: execution_resource
description: The computing resource or facility where the workflow was executed.
examples:
- value: NERSC-Cori
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: execution_resource
owner: MetagenomeAssembly
domain_of:
- WorkflowExecution
range: ExecutionResourceEnum
required: true
git_url:
name: git_url
description: The url that points to the exact github location of a workflow.
examples:
- value: https://github.com/microbiomedata/mg_annotation/releases/tag/0.1
- value: https://github.com/microbiomedata/metaMS/blob/master/metaMS/gcmsWorkflow.py
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: git_url
owner: MetagenomeAssembly
domain_of:
- WorkflowExecution
range: string
required: true
started_at_time:
name: started_at_time
notes:
- 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
It may not be complete, but it is good enough for now.'
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- prov:startedAtTime
rank: 1000
alias: started_at_time
owner: MetagenomeAssembly
domain_of:
- WorkflowExecution
range: string
required: true
pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
version:
name: version
description: The NMDC release tag for a given workflow release used for data processing.
If workflows are processed externally, as denoted by processing_institution,
this value represents the best mapping between a processing institution's (e.g.,
JGI) workflow metadata and a NMDC tagged release.
examples:
- value: v1.2.0
from_schema: https://w3id.org/nmdc/nmdc
broad_mappings:
- NCIT:C182117
rank: 1000
alias: version
owner: MetagenomeAssembly
domain_of:
- WorkflowExecution
range: string
was_informed_by:
name: was_informed_by
description: The primary DataGeneration subclass that the WorkflowExecution subclass
depends on.
comments:
- For version 1 of the proteomics workflow there are input files both from the
NucleotideSequencing and MassSpectrometry, the MassSpectrometry record is considered
the primary class to reference.
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
was_informed_by:
literal_form: was_informed_by
predicate: EXACT_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
narrow_mappings:
- prov:wasInformedBy
rank: 1000
alias: was_informed_by
owner: MetagenomeAssembly
domain_of:
- WorkflowExecution
range: NucleotideSequencing
required: true
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
interpolated: true
processing_institution_workflow_metadata:
name: processing_institution_workflow_metadata
description: Information about how workflow results were generated when the processing
is done by an external organziation (e.g., JGI) such as software tool name and
version or pipeline name and version.
examples:
- value: metaspades v. 3.15.2
- value: IMG Annotation Pipeline v.5.0.25
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- NCIT:C165211
rank: 1000
alias: processing_institution_workflow_metadata
owner: MetagenomeAssembly
domain_of:
- WorkflowExecution
range: string
has_input:
name: has_input
description: An input to a process.
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- input
rank: 1000
alias: has_input
owner: MetagenomeAssembly
domain_of:
- PlannedProcess
range: DataObject
required: true
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
interpolated: true
has_output:
name: has_output
description: An output from a process.
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- output
rank: 1000
alias: has_output
owner: MetagenomeAssembly
domain_of:
- PlannedProcess
range: DataObject
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
interpolated: true
processing_institution:
name: processing_institution
description: The organization that processed the sample.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: processing_institution
owner: MetagenomeAssembly
domain_of:
- PlannedProcess
range: ProcessingInstitutionEnum
protocol_link:
name: protocol_link
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: protocol_link
owner: MetagenomeAssembly
domain_of:
- Configuration
- PlannedProcess
- Study
range: Protocol
start_date:
name: start_date
description: The date on which any process or activity was started
todos:
- add date string validation pattern
comments:
- We are using string representations of dates until all components of our ecosystem
can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: start_date
owner: MetagenomeAssembly
domain_of:
- PlannedProcess
range: string
end_date:
name: end_date
description: The date on which any process or activity was ended
todos:
- add date string validation pattern
comments:
- We are using string representations of dates until all components of our ecosystem
can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: end_date
owner: MetagenomeAssembly
domain_of:
- PlannedProcess
range: string
qc_status:
name: qc_status
description: Stores information about the result of a process (ie the process
of sequencing a library may have for qc_status of 'fail' if not enough data
was generated)
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: qc_status
owner: MetagenomeAssembly
domain_of:
- PlannedProcess
range: StatusEnum
qc_comment:
name: qc_comment
description: Slot to store additional comments about laboratory or workflow output.
For workflow output it may describe the particular workflow stage that failed.
(ie Failed at call-stage due to a malformed fastq file).
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: qc_comment
owner: MetagenomeAssembly
domain_of:
- PlannedProcess
range: string
has_failure_categorization:
name: has_failure_categorization
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: has_failure_categorization
owner: MetagenomeAssembly
domain_of:
- PlannedProcess
range: FailureCategorization
multivalued: true
inlined: true
inlined_as_list: true
id:
name: id
description: A unique identifier for a thing. Must be either a CURIE shorthand
for a URI or a complete URI
notes:
- 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
- a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
will be accepted
- typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
via per-class id slot usage assertions
- minting authority shoulders should probably be enumerated and checked in the
pattern
examples:
- value: nmdc:mgmag-00-x012.1_7_c1
description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
workflow_execution_id:
literal_form: workflow_execution_id
predicate: NARROW_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
data_object_id:
literal_form: data_object_id
predicate: NARROW_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
rank: 1000
identifier: true
alias: id
owner: MetagenomeAssembly
domain_of:
- NamedThing
range: uriorcurie
required: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
structured_pattern:
syntax: '{id_nmdc_prefix}:wfmgas-{id_shoulder}-{id_blade}{id_version}$'
interpolated: true
name:
name: name
description: A human readable label for an entity
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: name
owner: MetagenomeAssembly
domain_of:
- PersonValue
- NamedThing
- Protocol
range: string
description:
name: description
description: a human-readable description of a thing
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
slot_uri: dcterms:description
alias: description
owner: MetagenomeAssembly
domain_of:
- ImageValue
- NamedThing
range: string
alternative_identifiers:
name: alternative_identifiers
description: A list of alternative identifiers for the entity.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: alternative_identifiers
owner: MetagenomeAssembly
domain_of:
- MetaboliteIdentification
- NamedThing
range: uriorcurie
multivalued: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
type:
name: type
description: the class_uri of the class that has been instantiated
notes:
- makes it easier to read example data files
- required for polymorphic MongoDB collections
examples:
- value: nmdc:Biosample
- value: nmdc:Study
from_schema: https://w3id.org/nmdc/nmdc
see_also:
- https://github.com/microbiomedata/nmdc-schema/issues/1048
- https://github.com/microbiomedata/nmdc-schema/issues/1233
- https://github.com/microbiomedata/nmdc-schema/issues/248
structured_aliases:
workflow_execution_class:
literal_form: workflow_execution_class
predicate: NARROW_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
rank: 1000
slot_uri: rdf:type
designates_type: true
alias: type
owner: MetagenomeAssembly
domain_of:
- EukEval
- FunctionalAnnotationAggMember
- PeptideQuantification
- ProteinQuantification
- MobilePhaseSegment
- PortionOfSubstance
- MagBin
- MetaboliteIdentification
- GenomeFeature
- FunctionalAnnotation
- AttributeValue
- NamedThing
- OntologyRelation
- FailureCategorization
- Protocol
- CreditAssociation
- Doi
range: uriorcurie
required: true
class_uri: nmdc:MetagenomeAssembly