Class: Metagenome-Assembled Genome analysis activity (MagsAnalysis)
A workflow execution activity that uses computational binning tools to group assembled contigs into genomes
URI: nmdc:MagsAnalysis
classDiagram
class MagsAnalysis
click MagsAnalysis href "../MagsAnalysis"
WorkflowExecution <|-- MagsAnalysis
click WorkflowExecution href "../WorkflowExecution"
MagsAnalysis : alternative_identifiers
MagsAnalysis : binned_contig_num
MagsAnalysis : description
MagsAnalysis : end_date
MagsAnalysis : ended_at_time
MagsAnalysis : execution_resource
MagsAnalysis --> "1" ExecutionResourceEnum : execution_resource
click ExecutionResourceEnum href "../ExecutionResourceEnum"
MagsAnalysis : git_url
MagsAnalysis : has_failure_categorization
MagsAnalysis --> "*" FailureCategorization : has_failure_categorization
click FailureCategorization href "../FailureCategorization"
MagsAnalysis : has_input
MagsAnalysis --> "1..*" DataObject : has_input
click DataObject href "../DataObject"
MagsAnalysis : has_output
MagsAnalysis --> "*" DataObject : has_output
click DataObject href "../DataObject"
MagsAnalysis : id
MagsAnalysis : img_identifiers
MagsAnalysis : input_contig_num
MagsAnalysis : low_depth_contig_num
MagsAnalysis : mags_list
MagsAnalysis --> "*" MagBin : mags_list
click MagBin href "../MagBin"
MagsAnalysis : name
MagsAnalysis : processing_institution
MagsAnalysis --> "0..1" ProcessingInstitutionEnum : processing_institution
click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum"
MagsAnalysis : processing_institution_workflow_metadata
MagsAnalysis : protocol_link
MagsAnalysis --> "0..1" Protocol : protocol_link
click Protocol href "../Protocol"
MagsAnalysis : qc_comment
MagsAnalysis : qc_status
MagsAnalysis --> "0..1" StatusEnum : qc_status
click StatusEnum href "../StatusEnum"
MagsAnalysis : start_date
MagsAnalysis : started_at_time
MagsAnalysis : too_short_contig_num
MagsAnalysis : type
MagsAnalysis : unbinned_contig_num
MagsAnalysis : version
MagsAnalysis : was_informed_by
MagsAnalysis --> "1..*" NucleotideSequencing : was_informed_by
click NucleotideSequencing href "../NucleotideSequencing"
Inheritance
- NamedThing
- PlannedProcess
- DataEmitterProcess
- WorkflowExecution
- MagsAnalysis
- WorkflowExecution
- DataEmitterProcess
- PlannedProcess
Slots
Name | Cardinality and Range | Description | Inheritance |
---|---|---|---|
binned_contig_num | 0..1 Integer |
Number of contigs that ended up in a medium or high quality bin | direct |
input_contig_num | 0..1 Integer |
Total number of input contigs | direct |
low_depth_contig_num | 0..1 Integer |
Number of contigs which were excluded from binning for depth of coverage | direct |
mags_list | * MagBin |
Contains detailed information about each metagenome-assembled genome | direct |
too_short_contig_num | 0..1 Integer |
Number of contigs which were excluded from binning for length | direct |
unbinned_contig_num | 0..1 Integer |
Number of contigs which did not end up in a medium or high quality bin | direct |
img_identifiers | * ExternalIdentifier |
A list of identifiers that relate the biosample to records in the IMG databas... | direct |
ended_at_time | 0..1 String |
WorkflowExecution | |
execution_resource | 1 ExecutionResourceEnum |
The computing resource or facility where the workflow was executed | WorkflowExecution |
git_url | 1 String |
The url that points to the exact github location of a workflow | WorkflowExecution |
started_at_time | 1 String |
WorkflowExecution | |
version | 0..1 String |
The NMDC release tag for a given workflow release used for data processing | WorkflowExecution |
was_informed_by | 1..* NucleotideSequencing |
The primary DataGeneration subclass that the WorkflowExecution subclass depen... | WorkflowExecution |
processing_institution_workflow_metadata | 0..1 String |
Information about how workflow results were generated when the processing is ... | WorkflowExecution |
has_input | 1..* DataObject |
An input to a process | PlannedProcess |
has_output | * DataObject |
An output from a process | PlannedProcess |
processing_institution | 0..1 ProcessingInstitutionEnum |
The organization that processed the sample | PlannedProcess |
protocol_link | 0..1 Protocol |
PlannedProcess | |
start_date | 0..1 String |
The date on which any process or activity was started | PlannedProcess |
end_date | 0..1 String |
The date on which any process or activity was ended | PlannedProcess |
qc_status | 0..1 StatusEnum |
Stores information about the result of a process (ie the process of sequencin... | PlannedProcess |
qc_comment | 0..1 String |
Slot to store additional comments about laboratory or workflow output | PlannedProcess |
has_failure_categorization | * FailureCategorization |
PlannedProcess | |
id | 1 Uriorcurie |
A unique identifier for a thing | NamedThing |
name | 0..1 String |
A human readable label for an entity | NamedThing |
description | 0..1 String |
a human-readable description of a thing | NamedThing |
alternative_identifiers | * Uriorcurie |
A list of alternative identifiers for the entity | NamedThing |
type | 1 Uriorcurie |
the class_uri of the class that has been instantiated | NamedThing |
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/nmdc/nmdc
Mappings
Mapping Type | Mapped Value |
---|---|
LinkML Source
Direct
name: MagsAnalysis
description: A workflow execution activity that uses computational binning tools to
group assembled contigs into genomes
title: Metagenome-Assembled Genome analysis activity
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slots:
- binned_contig_num
- input_contig_num
- low_depth_contig_num
- mags_list
- too_short_contig_num
- unbinned_contig_num
- img_identifiers
slot_usage:
id:
name: id
required: true
structured_pattern:
syntax: '{id_nmdc_prefix}:wfmag-{id_shoulder}-{id_blade}{id_version}$'
interpolated: true
img_identifiers:
name: img_identifiers
maximum_cardinality: 1
was_informed_by:
name: was_informed_by
range: NucleotideSequencing
structured_pattern:
syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
interpolated: true
class_uri: nmdc:MagsAnalysis
Induced
name: MagsAnalysis
description: A workflow execution activity that uses computational binning tools to
group assembled contigs into genomes
title: Metagenome-Assembled Genome analysis activity
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slot_usage:
id:
name: id
required: true
structured_pattern:
syntax: '{id_nmdc_prefix}:wfmag-{id_shoulder}-{id_blade}{id_version}$'
interpolated: true
img_identifiers:
name: img_identifiers
maximum_cardinality: 1
was_informed_by:
name: was_informed_by
range: NucleotideSequencing
structured_pattern:
syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
interpolated: true
attributes:
binned_contig_num:
name: binned_contig_num
description: Number of contigs that ended up in a medium or high quality bin.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: binned_contig_num
owner: MagsAnalysis
domain_of:
- MagsAnalysis
range: integer
minimum_value: 0
input_contig_num:
name: input_contig_num
description: Total number of input contigs.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: input_contig_num
owner: MagsAnalysis
domain_of:
- MagsAnalysis
range: integer
minimum_value: 0
low_depth_contig_num:
name: low_depth_contig_num
description: Number of contigs which were excluded from binning for depth of coverage.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: low_depth_contig_num
owner: MagsAnalysis
domain_of:
- MagsAnalysis
range: integer
minimum_value: 0
mags_list:
name: mags_list
description: Contains detailed information about each metagenome-assembled genome.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: mags_list
owner: MagsAnalysis
domain_of:
- MagsAnalysis
range: MagBin
multivalued: true
inlined: true
inlined_as_list: true
too_short_contig_num:
name: too_short_contig_num
description: Number of contigs which were excluded from binning for length.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: too_short_contig_num
owner: MagsAnalysis
domain_of:
- MagsAnalysis
range: integer
minimum_value: 0
unbinned_contig_num:
name: unbinned_contig_num
description: Number of contigs which did not end up in a medium or high quality
bin.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: unbinned_contig_num
owner: MagsAnalysis
domain_of:
- MagsAnalysis
range: integer
minimum_value: 0
img_identifiers:
name: img_identifiers
description: A list of identifiers that relate the biosample to records in the
IMG database.
title: IMG Identifiers
todos:
- add is_a or mixin modeling, like other external_database_identifiers
- what class would IMG records belong to?! Are they Studies, Biosamples, or something
else?
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: external_database_identifiers
alias: img_identifiers
owner: MagsAnalysis
domain_of:
- MetagenomeAnnotation
- Biosample
- MetatranscriptomeAnnotation
- MetatranscriptomeExpressionAnalysis
- MagsAnalysis
range: external_identifier
multivalued: true
pattern: ^img\.taxon:[a-zA-Z0-9_][a-zA-Z0-9_\/\.]*$
maximum_cardinality: 1
ended_at_time:
name: ended_at_time
notes:
- 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
It may not be complete, but it is good enough for now.'
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- prov:endedAtTime
rank: 1000
alias: ended_at_time
owner: MagsAnalysis
domain_of:
- WorkflowExecution
range: string
pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
execution_resource:
name: execution_resource
description: The computing resource or facility where the workflow was executed.
examples:
- value: NERSC-Cori
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: execution_resource
owner: MagsAnalysis
domain_of:
- WorkflowExecution
range: ExecutionResourceEnum
required: true
git_url:
name: git_url
description: The url that points to the exact github location of a workflow.
examples:
- value: https://github.com/microbiomedata/mg_annotation/releases/tag/0.1
- value: https://github.com/microbiomedata/metaMS/blob/master/metaMS/gcmsWorkflow.py
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: git_url
owner: MagsAnalysis
domain_of:
- WorkflowExecution
range: string
required: true
started_at_time:
name: started_at_time
notes:
- 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
It may not be complete, but it is good enough for now.'
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- prov:startedAtTime
rank: 1000
alias: started_at_time
owner: MagsAnalysis
domain_of:
- WorkflowExecution
range: string
required: true
pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
version:
name: version
description: The NMDC release tag for a given workflow release used for data processing.
If workflows are processed externally, as denoted by processing_institution,
this value represents the best mapping between a processing institution's (e.g.,
JGI) workflow metadata and a NMDC tagged release.
examples:
- value: v1.2.0
from_schema: https://w3id.org/nmdc/nmdc
broad_mappings:
- NCIT:C182117
rank: 1000
alias: version
owner: MagsAnalysis
domain_of:
- WorkflowExecution
range: string
was_informed_by:
name: was_informed_by
description: The primary DataGeneration subclass that the WorkflowExecution subclass
depends on.
comments:
- For version 1 of the proteomics workflow there are input files both from the
NucleotideSequencing and MassSpectrometry, the MassSpectrometry record is considered
the primary class to reference.
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
was_informed_by:
literal_form: was_informed_by
predicate: EXACT_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
narrow_mappings:
- prov:wasInformedBy
rank: 1000
alias: was_informed_by
owner: MagsAnalysis
domain_of:
- WorkflowExecution
range: NucleotideSequencing
required: true
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
interpolated: true
processing_institution_workflow_metadata:
name: processing_institution_workflow_metadata
description: Information about how workflow results were generated when the processing
is done by an external organziation (e.g., JGI) such as software tool name and
version or pipeline name and version.
examples:
- value: metaspades v. 3.15.2
- value: IMG Annotation Pipeline v.5.0.25
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- NCIT:C165211
rank: 1000
alias: processing_institution_workflow_metadata
owner: MagsAnalysis
domain_of:
- WorkflowExecution
range: string
has_input:
name: has_input
description: An input to a process.
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- input
rank: 1000
alias: has_input
owner: MagsAnalysis
domain_of:
- PlannedProcess
range: DataObject
required: true
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
interpolated: true
has_output:
name: has_output
description: An output from a process.
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- output
rank: 1000
alias: has_output
owner: MagsAnalysis
domain_of:
- PlannedProcess
range: DataObject
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
interpolated: true
processing_institution:
name: processing_institution
description: The organization that processed the sample.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: processing_institution
owner: MagsAnalysis
domain_of:
- PlannedProcess
range: ProcessingInstitutionEnum
protocol_link:
name: protocol_link
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: protocol_link
owner: MagsAnalysis
domain_of:
- Configuration
- PlannedProcess
- Study
range: Protocol
start_date:
name: start_date
description: The date on which any process or activity was started
todos:
- add date string validation pattern
comments:
- We are using string representations of dates until all components of our ecosystem
can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: start_date
owner: MagsAnalysis
domain_of:
- PlannedProcess
range: string
end_date:
name: end_date
description: The date on which any process or activity was ended
todos:
- add date string validation pattern
comments:
- We are using string representations of dates until all components of our ecosystem
can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: end_date
owner: MagsAnalysis
domain_of:
- PlannedProcess
range: string
qc_status:
name: qc_status
description: Stores information about the result of a process (ie the process
of sequencing a library may have for qc_status of 'fail' if not enough data
was generated)
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: qc_status
owner: MagsAnalysis
domain_of:
- PlannedProcess
range: StatusEnum
qc_comment:
name: qc_comment
description: Slot to store additional comments about laboratory or workflow output.
For workflow output it may describe the particular workflow stage that failed.
(ie Failed at call-stage due to a malformed fastq file).
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: qc_comment
owner: MagsAnalysis
domain_of:
- PlannedProcess
range: string
has_failure_categorization:
name: has_failure_categorization
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: has_failure_categorization
owner: MagsAnalysis
domain_of:
- PlannedProcess
range: FailureCategorization
multivalued: true
inlined: true
inlined_as_list: true
id:
name: id
description: A unique identifier for a thing. Must be either a CURIE shorthand
for a URI or a complete URI
notes:
- 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
- a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
will be accepted
- typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
via per-class id slot usage assertions
- minting authority shoulders should probably be enumerated and checked in the
pattern
examples:
- value: nmdc:mgmag-00-x012.1_7_c1
description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
from_schema: https://w3id.org/nmdc/nmdc
structured_aliases:
workflow_execution_id:
literal_form: workflow_execution_id
predicate: NARROW_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
data_object_id:
literal_form: data_object_id
predicate: NARROW_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
rank: 1000
identifier: true
alias: id
owner: MagsAnalysis
domain_of:
- NamedThing
range: uriorcurie
required: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
structured_pattern:
syntax: '{id_nmdc_prefix}:wfmag-{id_shoulder}-{id_blade}{id_version}$'
interpolated: true
name:
name: name
description: A human readable label for an entity
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: name
owner: MagsAnalysis
domain_of:
- PersonValue
- NamedThing
- Protocol
range: string
description:
name: description
description: a human-readable description of a thing
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
slot_uri: dcterms:description
alias: description
owner: MagsAnalysis
domain_of:
- ImageValue
- NamedThing
range: string
alternative_identifiers:
name: alternative_identifiers
description: A list of alternative identifiers for the entity.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: alternative_identifiers
owner: MagsAnalysis
domain_of:
- MetaboliteIdentification
- NamedThing
range: uriorcurie
multivalued: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
type:
name: type
description: the class_uri of the class that has been instantiated
notes:
- makes it easier to read example data files
- required for polymorphic MongoDB collections
examples:
- value: nmdc:Biosample
- value: nmdc:Study
from_schema: https://w3id.org/nmdc/nmdc
see_also:
- https://github.com/microbiomedata/nmdc-schema/issues/1048
- https://github.com/microbiomedata/nmdc-schema/issues/1233
- https://github.com/microbiomedata/nmdc-schema/issues/248
structured_aliases:
workflow_execution_class:
literal_form: workflow_execution_class
predicate: NARROW_SYNONYM
contexts:
- https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
rank: 1000
slot_uri: rdf:type
designates_type: true
alias: type
owner: MagsAnalysis
domain_of:
- EukEval
- FunctionalAnnotationAggMember
- PeptideQuantification
- ProteinQuantification
- MobilePhaseSegment
- PortionOfSubstance
- MagBin
- MetaboliteIdentification
- GenomeFeature
- FunctionalAnnotation
- AttributeValue
- NamedThing
- OntologyRelation
- FailureCategorization
- Protocol
- CreditAssociation
- Doi
range: uriorcurie
required: true
class_uri: nmdc:MagsAnalysis