Skip to content

Class: Metagenome-Assembled Genome analysis activity (MagsAnalysis)

A workflow execution activity that uses computational binning tools to group assembled contigs into genomes

URI: nmdc:MagsAnalysis

classDiagram class MagsAnalysis click MagsAnalysis href "../MagsAnalysis" WorkflowExecution <|-- MagsAnalysis click WorkflowExecution href "../WorkflowExecution" MagsAnalysis : alternative_identifiers MagsAnalysis : binned_contig_num MagsAnalysis : description MagsAnalysis : end_date MagsAnalysis : ended_at_time MagsAnalysis : execution_resource MagsAnalysis --> "1" ExecutionResourceEnum : execution_resource click ExecutionResourceEnum href "../ExecutionResourceEnum" MagsAnalysis : git_url MagsAnalysis : has_failure_categorization MagsAnalysis --> "*" FailureCategorization : has_failure_categorization click FailureCategorization href "../FailureCategorization" MagsAnalysis : has_input MagsAnalysis --> "1..*" DataObject : has_input click DataObject href "../DataObject" MagsAnalysis : has_output MagsAnalysis --> "*" DataObject : has_output click DataObject href "../DataObject" MagsAnalysis : id MagsAnalysis : img_identifiers MagsAnalysis : input_contig_num MagsAnalysis : low_depth_contig_num MagsAnalysis : mags_list MagsAnalysis --> "*" MagBin : mags_list click MagBin href "../MagBin" MagsAnalysis : name MagsAnalysis : processing_institution MagsAnalysis --> "0..1" ProcessingInstitutionEnum : processing_institution click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum" MagsAnalysis : processing_institution_workflow_metadata MagsAnalysis : protocol_link MagsAnalysis --> "0..1" Protocol : protocol_link click Protocol href "../Protocol" MagsAnalysis : qc_comment MagsAnalysis : qc_status MagsAnalysis --> "0..1" StatusEnum : qc_status click StatusEnum href "../StatusEnum" MagsAnalysis : start_date MagsAnalysis : started_at_time MagsAnalysis : too_short_contig_num MagsAnalysis : type MagsAnalysis : unbinned_contig_num MagsAnalysis : version MagsAnalysis : was_informed_by MagsAnalysis --> "1..*" NucleotideSequencing : was_informed_by click NucleotideSequencing href "../NucleotideSequencing"

Inheritance

Slots

Name Cardinality and Range Description Inheritance
binned_contig_num 0..1
Integer
Number of contigs that ended up in a medium or high quality bin direct
input_contig_num 0..1
Integer
Total number of input contigs direct
low_depth_contig_num 0..1
Integer
Number of contigs which were excluded from binning for depth of coverage direct
mags_list *
MagBin
Contains detailed information about each metagenome-assembled genome direct
too_short_contig_num 0..1
Integer
Number of contigs which were excluded from binning for length direct
unbinned_contig_num 0..1
Integer
Number of contigs which did not end up in a medium or high quality bin direct
img_identifiers *
ExternalIdentifier
A list of identifiers that relate the biosample to records in the IMG databas... direct
ended_at_time 0..1
String
WorkflowExecution
execution_resource 1
ExecutionResourceEnum
The computing resource or facility where the workflow was executed WorkflowExecution
git_url 1
String
The url that points to the exact github location of a workflow WorkflowExecution
started_at_time 1
String
WorkflowExecution
version 0..1
String
The NMDC release tag for a given workflow release used for data processing WorkflowExecution
was_informed_by 1..*
NucleotideSequencing
The primary DataGeneration subclass that the WorkflowExecution subclass depen... WorkflowExecution
processing_institution_workflow_metadata 0..1
String
Information about how workflow results were generated when the processing is ... WorkflowExecution
has_input 1..*
DataObject
An input to a process PlannedProcess
has_output *
DataObject
An output from a process PlannedProcess
processing_institution 0..1
ProcessingInstitutionEnum
The organization that processed the sample PlannedProcess
protocol_link 0..1
Protocol
PlannedProcess
start_date 0..1
String
The date on which any process or activity was started PlannedProcess
end_date 0..1
String
The date on which any process or activity was ended PlannedProcess
qc_status 0..1
StatusEnum
Stores information about the result of a process (ie the process of sequencin... PlannedProcess
qc_comment 0..1
String
Slot to store additional comments about laboratory or workflow output PlannedProcess
has_failure_categorization *
FailureCategorization
PlannedProcess
id 1
Uriorcurie
A unique identifier for a thing NamedThing
name 0..1
String
A human readable label for an entity NamedThing
description 0..1
String
a human-readable description of a thing NamedThing
alternative_identifiers *
Uriorcurie
A list of alternative identifiers for the entity NamedThing
type 1
Uriorcurie
the class_uri of the class that has been instantiated NamedThing

Identifier and Mapping Information

Schema Source

Mappings

Mapping Type Mapped Value

LinkML Source

Direct

name: MagsAnalysis
description: A workflow execution activity that uses computational binning tools to
  group assembled contigs into genomes
title: Metagenome-Assembled Genome analysis activity
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slots:
- binned_contig_num
- input_contig_num
- low_depth_contig_num
- mags_list
- too_short_contig_num
- unbinned_contig_num
- img_identifiers
slot_usage:
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfmag-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  img_identifiers:
    name: img_identifiers
    maximum_cardinality: 1
  was_informed_by:
    name: was_informed_by
    range: NucleotideSequencing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
class_uri: nmdc:MagsAnalysis

Induced

name: MagsAnalysis
description: A workflow execution activity that uses computational binning tools to
  group assembled contigs into genomes
title: Metagenome-Assembled Genome analysis activity
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slot_usage:
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfmag-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  img_identifiers:
    name: img_identifiers
    maximum_cardinality: 1
  was_informed_by:
    name: was_informed_by
    range: NucleotideSequencing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
attributes:
  binned_contig_num:
    name: binned_contig_num
    description: Number of contigs that ended up in a medium or high quality bin.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: binned_contig_num
    owner: MagsAnalysis
    domain_of:
    - MagsAnalysis
    range: integer
    minimum_value: 0
  input_contig_num:
    name: input_contig_num
    description: Total number of input contigs.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: input_contig_num
    owner: MagsAnalysis
    domain_of:
    - MagsAnalysis
    range: integer
    minimum_value: 0
  low_depth_contig_num:
    name: low_depth_contig_num
    description: Number of contigs which were excluded from binning for depth of coverage.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: low_depth_contig_num
    owner: MagsAnalysis
    domain_of:
    - MagsAnalysis
    range: integer
    minimum_value: 0
  mags_list:
    name: mags_list
    description: Contains detailed information about each metagenome-assembled genome.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: mags_list
    owner: MagsAnalysis
    domain_of:
    - MagsAnalysis
    range: MagBin
    multivalued: true
    inlined: true
    inlined_as_list: true
  too_short_contig_num:
    name: too_short_contig_num
    description: Number of contigs which were excluded from binning for length.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: too_short_contig_num
    owner: MagsAnalysis
    domain_of:
    - MagsAnalysis
    range: integer
    minimum_value: 0
  unbinned_contig_num:
    name: unbinned_contig_num
    description: Number of contigs which did not end up in a medium or high quality
      bin.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: unbinned_contig_num
    owner: MagsAnalysis
    domain_of:
    - MagsAnalysis
    range: integer
    minimum_value: 0
  img_identifiers:
    name: img_identifiers
    description: A list of identifiers that relate the biosample to records in the
      IMG database.
    title: IMG Identifiers
    todos:
    - add is_a or mixin modeling, like other external_database_identifiers
    - what class would IMG records belong to?! Are they Studies, Biosamples, or something
      else?
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: external_database_identifiers
    alias: img_identifiers
    owner: MagsAnalysis
    domain_of:
    - MetagenomeAnnotation
    - Biosample
    - MetatranscriptomeAnnotation
    - MetatranscriptomeExpressionAnalysis
    - MagsAnalysis
    range: external_identifier
    multivalued: true
    pattern: ^img\.taxon:[a-zA-Z0-9_][a-zA-Z0-9_\/\.]*$
    maximum_cardinality: 1
  ended_at_time:
    name: ended_at_time
    notes:
    - 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
      It may not be complete, but it is good enough for now.'
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - prov:endedAtTime
    rank: 1000
    alias: ended_at_time
    owner: MagsAnalysis
    domain_of:
    - WorkflowExecution
    range: string
    pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
  execution_resource:
    name: execution_resource
    description: The computing resource or facility where the workflow was executed.
    examples:
    - value: NERSC-Cori
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: execution_resource
    owner: MagsAnalysis
    domain_of:
    - WorkflowExecution
    range: ExecutionResourceEnum
    required: true
  git_url:
    name: git_url
    description: The url that points to the exact github location of a workflow.
    examples:
    - value: https://github.com/microbiomedata/mg_annotation/releases/tag/0.1
    - value: https://github.com/microbiomedata/metaMS/blob/master/metaMS/gcmsWorkflow.py
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: git_url
    owner: MagsAnalysis
    domain_of:
    - WorkflowExecution
    range: string
    required: true
  started_at_time:
    name: started_at_time
    notes:
    - 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
      It may not be complete, but it is good enough for now.'
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - prov:startedAtTime
    rank: 1000
    alias: started_at_time
    owner: MagsAnalysis
    domain_of:
    - WorkflowExecution
    range: string
    required: true
    pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
  version:
    name: version
    description: The NMDC release tag for a given workflow release used for data processing.
      If workflows are processed externally, as denoted by processing_institution,
      this value represents the best mapping between a processing institution's (e.g.,
      JGI) workflow metadata and a NMDC tagged release.
    examples:
    - value: v1.2.0
    from_schema: https://w3id.org/nmdc/nmdc
    broad_mappings:
    - NCIT:C182117
    rank: 1000
    alias: version
    owner: MagsAnalysis
    domain_of:
    - WorkflowExecution
    range: string
  was_informed_by:
    name: was_informed_by
    description: The primary DataGeneration subclass that the WorkflowExecution subclass
      depends on.
    comments:
    - For version 1 of the proteomics workflow there are input files both from the
      NucleotideSequencing and MassSpectrometry, the MassSpectrometry record is considered
      the primary class to reference.
    from_schema: https://w3id.org/nmdc/nmdc
    structured_aliases:
      was_informed_by:
        literal_form: was_informed_by
        predicate: EXACT_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    narrow_mappings:
    - prov:wasInformedBy
    rank: 1000
    alias: was_informed_by
    owner: MagsAnalysis
    domain_of:
    - WorkflowExecution
    range: NucleotideSequencing
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution_workflow_metadata:
    name: processing_institution_workflow_metadata
    description: Information about how workflow results were generated when the processing
      is done by an external organziation (e.g., JGI) such as software tool name and
      version or pipeline name and version.
    examples:
    - value: metaspades v. 3.15.2
    - value: IMG Annotation Pipeline v.5.0.25
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - NCIT:C165211
    rank: 1000
    alias: processing_institution_workflow_metadata
    owner: MagsAnalysis
    domain_of:
    - WorkflowExecution
    range: string
  has_input:
    name: has_input
    description: An input to a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - input
    rank: 1000
    alias: has_input
    owner: MagsAnalysis
    domain_of:
    - PlannedProcess
    range: DataObject
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  has_output:
    name: has_output
    description: An output from a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - output
    rank: 1000
    alias: has_output
    owner: MagsAnalysis
    domain_of:
    - PlannedProcess
    range: DataObject
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: processing_institution
    owner: MagsAnalysis
    domain_of:
    - PlannedProcess
    range: ProcessingInstitutionEnum
  protocol_link:
    name: protocol_link
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: protocol_link
    owner: MagsAnalysis
    domain_of:
    - Configuration
    - PlannedProcess
    - Study
    range: Protocol
  start_date:
    name: start_date
    description: The date on which any process or activity was started
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: start_date
    owner: MagsAnalysis
    domain_of:
    - PlannedProcess
    range: string
  end_date:
    name: end_date
    description: The date on which any process or activity was ended
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: end_date
    owner: MagsAnalysis
    domain_of:
    - PlannedProcess
    range: string
  qc_status:
    name: qc_status
    description: Stores information about the result of a process (ie the process
      of sequencing a library may have for qc_status of 'fail' if not enough data
      was generated)
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_status
    owner: MagsAnalysis
    domain_of:
    - PlannedProcess
    range: StatusEnum
  qc_comment:
    name: qc_comment
    description: Slot to store additional comments about laboratory or workflow output.
      For workflow output it may describe the particular workflow stage that failed.
      (ie Failed at call-stage due to a malformed fastq file).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_comment
    owner: MagsAnalysis
    domain_of:
    - PlannedProcess
    range: string
  has_failure_categorization:
    name: has_failure_categorization
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_failure_categorization
    owner: MagsAnalysis
    domain_of:
    - PlannedProcess
    range: FailureCategorization
    multivalued: true
    inlined: true
    inlined_as_list: true
  id:
    name: id
    description: A unique identifier for a thing. Must be either a CURIE shorthand
      for a URI or a complete URI
    notes:
    - 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
    - a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
      will be accepted
    - typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
      via per-class id slot usage assertions
    - minting authority shoulders should probably be enumerated and checked in the
      pattern
    examples:
    - value: nmdc:mgmag-00-x012.1_7_c1
      description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
    from_schema: https://w3id.org/nmdc/nmdc
    structured_aliases:
      workflow_execution_id:
        literal_form: workflow_execution_id
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
      data_object_id:
        literal_form: data_object_id
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    identifier: true
    alias: id
    owner: MagsAnalysis
    domain_of:
    - NamedThing
    range: uriorcurie
    required: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfmag-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  name:
    name: name
    description: A human readable label for an entity
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: name
    owner: MagsAnalysis
    domain_of:
    - PersonValue
    - NamedThing
    - Protocol
    range: string
  description:
    name: description
    description: a human-readable description of a thing
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    slot_uri: dcterms:description
    alias: description
    owner: MagsAnalysis
    domain_of:
    - ImageValue
    - NamedThing
    range: string
  alternative_identifiers:
    name: alternative_identifiers
    description: A list of alternative identifiers for the entity.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: alternative_identifiers
    owner: MagsAnalysis
    domain_of:
    - MetaboliteIdentification
    - NamedThing
    range: uriorcurie
    multivalued: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
  type:
    name: type
    description: the class_uri of the class that has been instantiated
    notes:
    - makes it easier to read example data files
    - required for polymorphic MongoDB collections
    examples:
    - value: nmdc:Biosample
    - value: nmdc:Study
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://github.com/microbiomedata/nmdc-schema/issues/1048
    - https://github.com/microbiomedata/nmdc-schema/issues/1233
    - https://github.com/microbiomedata/nmdc-schema/issues/248
    structured_aliases:
      workflow_execution_class:
        literal_form: workflow_execution_class
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    slot_uri: rdf:type
    designates_type: true
    alias: type
    owner: MagsAnalysis
    domain_of:
    - EukEval
    - FunctionalAnnotationAggMember
    - PeptideQuantification
    - ProteinQuantification
    - MobilePhaseSegment
    - PortionOfSubstance
    - MagBin
    - MetaboliteIdentification
    - GenomeFeature
    - FunctionalAnnotation
    - AttributeValue
    - NamedThing
    - OntologyRelation
    - FailureCategorization
    - Protocol
    - CreditAssociation
    - Doi
    range: uriorcurie
    required: true
class_uri: nmdc:MagsAnalysis