Skip to content

Class: Read quality control analysis activity (ReadQcAnalysis)

A workflow execution activity that performs quality control on raw Illumina reads including quality trimming, artifact removal, linker trimming, adapter trimming, spike-in removal, and human/cat/dog/mouse/microbe contaminant removal

URI: nmdc:ReadQcAnalysis

classDiagram class ReadQcAnalysis click ReadQcAnalysis href "../ReadQcAnalysis" WorkflowExecution <|-- ReadQcAnalysis click WorkflowExecution href "../WorkflowExecution" ReadQcAnalysis : alternative_identifiers ReadQcAnalysis : description ReadQcAnalysis : end_date ReadQcAnalysis : ended_at_time ReadQcAnalysis : execution_resource ReadQcAnalysis --> "1" ExecutionResourceEnum : execution_resource click ExecutionResourceEnum href "../ExecutionResourceEnum" ReadQcAnalysis : git_url ReadQcAnalysis : has_failure_categorization ReadQcAnalysis --> "*" FailureCategorization : has_failure_categorization click FailureCategorization href "../FailureCategorization" ReadQcAnalysis : has_input ReadQcAnalysis --> "1..*" DataObject : has_input click DataObject href "../DataObject" ReadQcAnalysis : has_output ReadQcAnalysis --> "*" DataObject : has_output click DataObject href "../DataObject" ReadQcAnalysis : id ReadQcAnalysis : input_base_count ReadQcAnalysis : input_read_bases ReadQcAnalysis : input_read_count ReadQcAnalysis : name ReadQcAnalysis : output_base_count ReadQcAnalysis : output_read_bases ReadQcAnalysis : output_read_count ReadQcAnalysis : processing_institution ReadQcAnalysis --> "0..1" ProcessingInstitutionEnum : processing_institution click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum" ReadQcAnalysis : processing_institution_workflow_metadata ReadQcAnalysis : protocol_link ReadQcAnalysis --> "0..1" Protocol : protocol_link click Protocol href "../Protocol" ReadQcAnalysis : qc_comment ReadQcAnalysis : qc_status ReadQcAnalysis --> "0..1" StatusEnum : qc_status click StatusEnum href "../StatusEnum" ReadQcAnalysis : start_date ReadQcAnalysis : started_at_time ReadQcAnalysis : type ReadQcAnalysis : version ReadQcAnalysis : was_informed_by ReadQcAnalysis --> "1..*" NucleotideSequencing : was_informed_by click NucleotideSequencing href "../NucleotideSequencing"

Inheritance

Slots

Name Cardinality and Range Description Inheritance
input_base_count 0..1
Float
The nucleotide base count number of input reads for QC analysis direct
input_read_bases 0..1
Float
TODO direct
input_read_count 0..1
Float
The sequence count number of input reads for QC analysis direct
output_base_count 0..1
Float
After QC analysis nucleotide base count number direct
output_read_bases 0..1
Float
TODO direct
output_read_count 0..1
Float
After QC analysis sequence count number direct
ended_at_time 0..1
String
WorkflowExecution
execution_resource 1
ExecutionResourceEnum
The computing resource or facility where the workflow was executed WorkflowExecution
git_url 1
String
The url that points to the exact github location of a workflow WorkflowExecution
started_at_time 1
String
WorkflowExecution
version 0..1
String
The NMDC release tag for a given workflow release used for data processing WorkflowExecution
was_informed_by 1..*
NucleotideSequencing
The primary DataGeneration subclass that the WorkflowExecution subclass depen... WorkflowExecution
processing_institution_workflow_metadata 0..1
String
Information about how workflow results were generated when the processing is ... WorkflowExecution
has_input 1..*
DataObject
An input to a process PlannedProcess
has_output *
DataObject
An output from a process PlannedProcess
processing_institution 0..1
ProcessingInstitutionEnum
The organization that processed the sample PlannedProcess
protocol_link 0..1
Protocol
PlannedProcess
start_date 0..1
String
The date on which any process or activity was started PlannedProcess
end_date 0..1
String
The date on which any process or activity was ended PlannedProcess
qc_status 0..1
StatusEnum
Stores information about the result of a process (ie the process of sequencin... PlannedProcess
qc_comment 0..1
String
Slot to store additional comments about laboratory or workflow output PlannedProcess
has_failure_categorization *
FailureCategorization
PlannedProcess
id 1
Uriorcurie
A unique identifier for a thing NamedThing
name 0..1
String
A human readable label for an entity NamedThing
description 0..1
String
a human-readable description of a thing NamedThing
alternative_identifiers *
Uriorcurie
A list of alternative identifiers for the entity NamedThing
type 1
Uriorcurie
the class_uri of the class that has been instantiated NamedThing

Identifier and Mapping Information

Schema Source

Mappings

Mapping Type Mapped Value

LinkML Source

Direct

name: ReadQcAnalysis
description: A workflow execution activity that performs quality control on raw Illumina
  reads including quality trimming, artifact removal, linker trimming, adapter trimming,
  spike-in removal, and human/cat/dog/mouse/microbe contaminant removal
title: Read quality control analysis activity
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slots:
- input_base_count
- input_read_bases
- input_read_count
- output_base_count
- output_read_bases
- output_read_count
slot_usage:
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfrqc-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  was_informed_by:
    name: was_informed_by
    range: NucleotideSequencing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
class_uri: nmdc:ReadQcAnalysis

Induced

name: ReadQcAnalysis
description: A workflow execution activity that performs quality control on raw Illumina
  reads including quality trimming, artifact removal, linker trimming, adapter trimming,
  spike-in removal, and human/cat/dog/mouse/microbe contaminant removal
title: Read quality control analysis activity
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slot_usage:
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfrqc-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  was_informed_by:
    name: was_informed_by
    range: NucleotideSequencing
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
attributes:
  input_base_count:
    name: input_base_count
    description: The nucleotide base count number of input reads for QC analysis.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: read_qc_analysis_statistic
    alias: input_base_count
    owner: ReadQcAnalysis
    domain_of:
    - ReadQcAnalysis
    range: float
  input_read_bases:
    name: input_read_bases
    description: 'TODO      '
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: input_read_bases
    owner: ReadQcAnalysis
    domain_of:
    - ReadQcAnalysis
    range: float
  input_read_count:
    name: input_read_count
    description: The sequence count number of input reads for QC analysis.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: read_qc_analysis_statistic
    alias: input_read_count
    owner: ReadQcAnalysis
    domain_of:
    - ReadQcAnalysis
    range: float
  output_base_count:
    name: output_base_count
    description: After QC analysis nucleotide base count number.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: read_qc_analysis_statistic
    alias: output_base_count
    owner: ReadQcAnalysis
    domain_of:
    - ReadQcAnalysis
    range: float
  output_read_bases:
    name: output_read_bases
    description: TODO
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: output_read_bases
    owner: ReadQcAnalysis
    domain_of:
    - ReadQcAnalysis
    range: float
  output_read_count:
    name: output_read_count
    description: After QC analysis sequence count number.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: read_qc_analysis_statistic
    alias: output_read_count
    owner: ReadQcAnalysis
    domain_of:
    - ReadQcAnalysis
    range: float
  ended_at_time:
    name: ended_at_time
    notes:
    - 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
      It may not be complete, but it is good enough for now.'
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - prov:endedAtTime
    rank: 1000
    alias: ended_at_time
    owner: ReadQcAnalysis
    domain_of:
    - WorkflowExecution
    range: string
    pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
  execution_resource:
    name: execution_resource
    description: The computing resource or facility where the workflow was executed.
    examples:
    - value: NERSC-Cori
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: execution_resource
    owner: ReadQcAnalysis
    domain_of:
    - WorkflowExecution
    range: ExecutionResourceEnum
    required: true
  git_url:
    name: git_url
    description: The url that points to the exact github location of a workflow.
    examples:
    - value: https://github.com/microbiomedata/mg_annotation/releases/tag/0.1
    - value: https://github.com/microbiomedata/metaMS/blob/master/metaMS/gcmsWorkflow.py
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: git_url
    owner: ReadQcAnalysis
    domain_of:
    - WorkflowExecution
    range: string
    required: true
  started_at_time:
    name: started_at_time
    notes:
    - 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
      It may not be complete, but it is good enough for now.'
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - prov:startedAtTime
    rank: 1000
    alias: started_at_time
    owner: ReadQcAnalysis
    domain_of:
    - WorkflowExecution
    range: string
    required: true
    pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
  version:
    name: version
    description: The NMDC release tag for a given workflow release used for data processing.
      If workflows are processed externally, as denoted by processing_institution,
      this value represents the best mapping between a processing institution's (e.g.,
      JGI) workflow metadata and a NMDC tagged release.
    examples:
    - value: v1.2.0
    from_schema: https://w3id.org/nmdc/nmdc
    broad_mappings:
    - NCIT:C182117
    rank: 1000
    alias: version
    owner: ReadQcAnalysis
    domain_of:
    - WorkflowExecution
    range: string
  was_informed_by:
    name: was_informed_by
    description: The primary DataGeneration subclass that the WorkflowExecution subclass
      depends on.
    comments:
    - For version 1 of the proteomics workflow there are input files both from the
      NucleotideSequencing and MassSpectrometry, the MassSpectrometry record is considered
      the primary class to reference.
    from_schema: https://w3id.org/nmdc/nmdc
    structured_aliases:
      was_informed_by:
        literal_form: was_informed_by
        predicate: EXACT_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    narrow_mappings:
    - prov:wasInformedBy
    rank: 1000
    alias: was_informed_by
    owner: ReadQcAnalysis
    domain_of:
    - WorkflowExecution
    range: NucleotideSequencing
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution_workflow_metadata:
    name: processing_institution_workflow_metadata
    description: Information about how workflow results were generated when the processing
      is done by an external organziation (e.g., JGI) such as software tool name and
      version or pipeline name and version.
    examples:
    - value: metaspades v. 3.15.2
    - value: IMG Annotation Pipeline v.5.0.25
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - NCIT:C165211
    rank: 1000
    alias: processing_institution_workflow_metadata
    owner: ReadQcAnalysis
    domain_of:
    - WorkflowExecution
    range: string
  has_input:
    name: has_input
    description: An input to a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - input
    rank: 1000
    alias: has_input
    owner: ReadQcAnalysis
    domain_of:
    - PlannedProcess
    range: DataObject
    required: true
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  has_output:
    name: has_output
    description: An output from a process.
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - output
    rank: 1000
    alias: has_output
    owner: ReadQcAnalysis
    domain_of:
    - PlannedProcess
    range: DataObject
    multivalued: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
      interpolated: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: processing_institution
    owner: ReadQcAnalysis
    domain_of:
    - PlannedProcess
    range: ProcessingInstitutionEnum
  protocol_link:
    name: protocol_link
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: protocol_link
    owner: ReadQcAnalysis
    domain_of:
    - Configuration
    - PlannedProcess
    - Study
    range: Protocol
  start_date:
    name: start_date
    description: The date on which any process or activity was started
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: start_date
    owner: ReadQcAnalysis
    domain_of:
    - PlannedProcess
    range: string
  end_date:
    name: end_date
    description: The date on which any process or activity was ended
    todos:
    - add date string validation pattern
    comments:
    - We are using string representations of dates until all components of our ecosystem
      can handle ISO 8610 dates
    - The date should be formatted as YYYY-MM-DD
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: end_date
    owner: ReadQcAnalysis
    domain_of:
    - PlannedProcess
    range: string
  qc_status:
    name: qc_status
    description: Stores information about the result of a process (ie the process
      of sequencing a library may have for qc_status of 'fail' if not enough data
      was generated)
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_status
    owner: ReadQcAnalysis
    domain_of:
    - PlannedProcess
    range: StatusEnum
  qc_comment:
    name: qc_comment
    description: Slot to store additional comments about laboratory or workflow output.
      For workflow output it may describe the particular workflow stage that failed.
      (ie Failed at call-stage due to a malformed fastq file).
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: qc_comment
    owner: ReadQcAnalysis
    domain_of:
    - PlannedProcess
    range: string
  has_failure_categorization:
    name: has_failure_categorization
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: has_failure_categorization
    owner: ReadQcAnalysis
    domain_of:
    - PlannedProcess
    range: FailureCategorization
    multivalued: true
    inlined: true
    inlined_as_list: true
  id:
    name: id
    description: A unique identifier for a thing. Must be either a CURIE shorthand
      for a URI or a complete URI
    notes:
    - 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
    - a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
      will be accepted
    - typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
      via per-class id slot usage assertions
    - minting authority shoulders should probably be enumerated and checked in the
      pattern
    examples:
    - value: nmdc:mgmag-00-x012.1_7_c1
      description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
    from_schema: https://w3id.org/nmdc/nmdc
    structured_aliases:
      workflow_execution_id:
        literal_form: workflow_execution_id
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
      data_object_id:
        literal_form: data_object_id
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    identifier: true
    alias: id
    owner: ReadQcAnalysis
    domain_of:
    - NamedThing
    range: uriorcurie
    required: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
    structured_pattern:
      syntax: '{id_nmdc_prefix}:wfrqc-{id_shoulder}-{id_blade}{id_version}$'
      interpolated: true
  name:
    name: name
    description: A human readable label for an entity
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: name
    owner: ReadQcAnalysis
    domain_of:
    - PersonValue
    - NamedThing
    - Protocol
    range: string
  description:
    name: description
    description: a human-readable description of a thing
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    slot_uri: dcterms:description
    alias: description
    owner: ReadQcAnalysis
    domain_of:
    - ImageValue
    - NamedThing
    range: string
  alternative_identifiers:
    name: alternative_identifiers
    description: A list of alternative identifiers for the entity.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: alternative_identifiers
    owner: ReadQcAnalysis
    domain_of:
    - MetaboliteIdentification
    - NamedThing
    range: uriorcurie
    multivalued: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
  type:
    name: type
    description: the class_uri of the class that has been instantiated
    notes:
    - makes it easier to read example data files
    - required for polymorphic MongoDB collections
    examples:
    - value: nmdc:Biosample
    - value: nmdc:Study
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://github.com/microbiomedata/nmdc-schema/issues/1048
    - https://github.com/microbiomedata/nmdc-schema/issues/1233
    - https://github.com/microbiomedata/nmdc-schema/issues/248
    structured_aliases:
      workflow_execution_class:
        literal_form: workflow_execution_class
        predicate: NARROW_SYNONYM
        contexts:
        - https://bitbucket.org/berkeleylab/jgi-jat/macros/nmdc_metadata.yaml
    rank: 1000
    slot_uri: rdf:type
    designates_type: true
    alias: type
    owner: ReadQcAnalysis
    domain_of:
    - EukEval
    - FunctionalAnnotationAggMember
    - PeptideQuantification
    - ProteinQuantification
    - MobilePhaseSegment
    - PortionOfSubstance
    - MagBin
    - MetaboliteIdentification
    - GenomeFeature
    - FunctionalAnnotation
    - AttributeValue
    - NamedThing
    - OntologyRelation
    - FailureCategorization
    - Protocol
    - CreditAssociation
    - Doi
    range: uriorcurie
    required: true
class_uri: nmdc:ReadQcAnalysis