Class: Read quality control analysis activity (ReadQcAnalysis)
A workflow execution activity that performs quality control on raw Illumina reads including quality trimming, artifact removal, linker trimming, adapter trimming, spike-in removal, and human/cat/dog/mouse/microbe contaminant removal
URI: nmdc:ReadQcAnalysis
classDiagram
class ReadQcAnalysis
click ReadQcAnalysis href "../ReadQcAnalysis"
WorkflowExecution <|-- ReadQcAnalysis
click WorkflowExecution href "../WorkflowExecution"
ReadQcAnalysis : alternative_identifiers
ReadQcAnalysis : description
ReadQcAnalysis : end_date
ReadQcAnalysis : ended_at_time
ReadQcAnalysis : execution_resource
ReadQcAnalysis --> "1" ExecutionResourceEnum : execution_resource
click ExecutionResourceEnum href "../ExecutionResourceEnum"
ReadQcAnalysis : git_url
ReadQcAnalysis : has_failure_categorization
ReadQcAnalysis --> "*" FailureCategorization : has_failure_categorization
click FailureCategorization href "../FailureCategorization"
ReadQcAnalysis : has_input
ReadQcAnalysis --> "1..*" NamedThing : has_input
click NamedThing href "../NamedThing"
ReadQcAnalysis : has_output
ReadQcAnalysis --> "*" NamedThing : has_output
click NamedThing href "../NamedThing"
ReadQcAnalysis : id
ReadQcAnalysis : input_base_count
ReadQcAnalysis : input_read_bases
ReadQcAnalysis : input_read_count
ReadQcAnalysis : name
ReadQcAnalysis : output_base_count
ReadQcAnalysis : output_read_bases
ReadQcAnalysis : output_read_count
ReadQcAnalysis : processing_institution
ReadQcAnalysis --> "0..1" ProcessingInstitutionEnum : processing_institution
click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum"
ReadQcAnalysis : protocol_link
ReadQcAnalysis --> "0..1" Protocol : protocol_link
click Protocol href "../Protocol"
ReadQcAnalysis : qc_comment
ReadQcAnalysis : qc_status
ReadQcAnalysis --> "0..1" StatusEnum : qc_status
click StatusEnum href "../StatusEnum"
ReadQcAnalysis : start_date
ReadQcAnalysis : started_at_time
ReadQcAnalysis : type
ReadQcAnalysis : version
ReadQcAnalysis : was_informed_by
ReadQcAnalysis --> "1" DataGeneration : was_informed_by
click DataGeneration href "../DataGeneration"
Inheritance
- NamedThing
- PlannedProcess
- WorkflowExecution
- ReadQcAnalysis
- WorkflowExecution
- PlannedProcess
Slots
Name | Cardinality and Range | Description | Inheritance |
---|---|---|---|
input_base_count | 0..1 Float |
The nucleotide base count number of input reads for QC analysis | direct |
input_read_bases | 0..1 Float |
TODO | direct |
input_read_count | 0..1 Float |
The sequence count number of input reads for QC analysis | direct |
output_base_count | 0..1 Float |
After QC analysis nucleotide base count number | direct |
output_read_bases | 0..1 Float |
TODO | direct |
output_read_count | 0..1 Float |
After QC analysis sequence count number | direct |
ended_at_time | 0..1 String |
WorkflowExecution | |
execution_resource | 1 ExecutionResourceEnum |
The computing resource or facility where the workflow was executed | WorkflowExecution |
git_url | 1 String |
The url that points to the exact github location of a workflow | WorkflowExecution |
started_at_time | 1 String |
WorkflowExecution | |
version | 0..1 String |
WorkflowExecution | |
was_informed_by | 1 DataGeneration |
WorkflowExecution | |
has_input | 1..* NamedThing |
An input to a process | PlannedProcess |
has_output | * NamedThing |
An output from a process | PlannedProcess |
processing_institution | 0..1 ProcessingInstitutionEnum |
The organization that processed the sample | PlannedProcess |
protocol_link | 0..1 Protocol |
PlannedProcess | |
start_date | 0..1 String |
The date on which any process or activity was started | PlannedProcess |
end_date | 0..1 String |
The date on which any process or activity was ended | PlannedProcess |
qc_status | 0..1 StatusEnum |
Stores information about the result of a process (ie the process of sequencin... | PlannedProcess |
qc_comment | 0..1 String |
Slot to store additional comments about laboratory or workflow output | PlannedProcess |
has_failure_categorization | * FailureCategorization |
PlannedProcess | |
id | 1 Uriorcurie |
A unique identifier for a thing | NamedThing |
name | 0..1 String |
A human readable label for an entity | NamedThing |
description | 0..1 String |
a human-readable description of a thing | NamedThing |
alternative_identifiers | * Uriorcurie |
A list of alternative identifiers for the entity | NamedThing |
type | 1 Uriorcurie |
the class_uri of the class that has been instantiated | NamedThing |
Identifier and Mapping Information
Schema Source
- from schema: https://w3id.org/nmdc/nmdc
Mappings
Mapping Type | Mapped Value |
---|---|
self | nmdc:ReadQcAnalysis |
native | nmdc:ReadQcAnalysis |
LinkML Source
Direct
name: ReadQcAnalysis
description: A workflow execution activity that performs quality control on raw Illumina
reads including quality trimming, artifact removal, linker trimming, adapter trimming,
spike-in removal, and human/cat/dog/mouse/microbe contaminant removal
title: Read quality control analysis activity
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slots:
- input_base_count
- input_read_bases
- input_read_count
- output_base_count
- output_read_bases
- output_read_count
slot_usage:
id:
name: id
required: true
structured_pattern:
syntax: '{id_nmdc_prefix}:wfrqc-{id_shoulder}-{id_blade}{id_version}$'
interpolated: true
was_informed_by:
name: was_informed_by
structured_pattern:
syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
interpolated: true
class_uri: nmdc:ReadQcAnalysis
Induced
name: ReadQcAnalysis
description: A workflow execution activity that performs quality control on raw Illumina
reads including quality trimming, artifact removal, linker trimming, adapter trimming,
spike-in removal, and human/cat/dog/mouse/microbe contaminant removal
title: Read quality control analysis activity
from_schema: https://w3id.org/nmdc/nmdc
is_a: WorkflowExecution
slot_usage:
id:
name: id
required: true
structured_pattern:
syntax: '{id_nmdc_prefix}:wfrqc-{id_shoulder}-{id_blade}{id_version}$'
interpolated: true
was_informed_by:
name: was_informed_by
structured_pattern:
syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
interpolated: true
attributes:
input_base_count:
name: input_base_count
description: The nucleotide base count number of input reads for QC analysis.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: read_qc_analysis_statistic
alias: input_base_count
owner: ReadQcAnalysis
domain_of:
- ReadQcAnalysis
range: float
input_read_bases:
name: input_read_bases
description: 'TODO '
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: input_read_bases
owner: ReadQcAnalysis
domain_of:
- ReadQcAnalysis
range: float
input_read_count:
name: input_read_count
description: The sequence count number of input reads for QC analysis.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: read_qc_analysis_statistic
alias: input_read_count
owner: ReadQcAnalysis
domain_of:
- ReadQcAnalysis
range: float
output_base_count:
name: output_base_count
description: After QC analysis nucleotide base count number.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: read_qc_analysis_statistic
alias: output_base_count
owner: ReadQcAnalysis
domain_of:
- ReadQcAnalysis
range: float
output_read_bases:
name: output_read_bases
description: TODO
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: output_read_bases
owner: ReadQcAnalysis
domain_of:
- ReadQcAnalysis
range: float
output_read_count:
name: output_read_count
description: After QC analysis sequence count number.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
is_a: read_qc_analysis_statistic
alias: output_read_count
owner: ReadQcAnalysis
domain_of:
- ReadQcAnalysis
range: float
ended_at_time:
name: ended_at_time
notes:
- 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
It may not be complete, but it is good enough for now.'
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- prov:endedAtTime
rank: 1000
alias: ended_at_time
owner: ReadQcAnalysis
domain_of:
- WorkflowExecution
range: string
pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
execution_resource:
name: execution_resource
description: The computing resource or facility where the workflow was executed.
examples:
- value: NERSC-Cori
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: execution_resource
owner: ReadQcAnalysis
domain_of:
- WorkflowExecution
range: ExecutionResourceEnum
required: true
git_url:
name: git_url
description: The url that points to the exact github location of a workflow.
examples:
- value: https://github.com/microbiomedata/mg_annotation/releases/tag/0.1
- value: https://github.com/microbiomedata/metaMS/blob/master/metaMS/gcmsWorkflow.py
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: git_url
owner: ReadQcAnalysis
domain_of:
- WorkflowExecution
range: string
required: true
started_at_time:
name: started_at_time
notes:
- 'The regex for ISO-8601 format was taken from here: https://www.myintervals.com/blog/2009/05/20/iso-8601-date-validation-that-doesnt-suck/
It may not be complete, but it is good enough for now.'
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- prov:startedAtTime
rank: 1000
alias: started_at_time
owner: ReadQcAnalysis
domain_of:
- WorkflowExecution
range: string
required: true
pattern: ^([\+-]?\d{4}(?!\d{2}\b))((-?)((0[1-9]|1[0-2])(\3([12]\d|0[1-9]|3[01]))?|W([0-4]\d|5[0-2])(-?[1-7])?|(00[1-9]|0[1-9]\d|[12]\d{2}|3([0-5]\d|6[1-6])))([T\s]((([01]\d|2[0-3])((:?)[0-5]\d)?|24\:?00)([\.,]\d+(?!:))?)?(\17[0-5]\d([\.,]\d+)?)?([zZ]|([\+-])([01]\d|2[0-3]):?([0-5]\d)?)?)?)?$
version:
name: version
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: version
owner: ReadQcAnalysis
domain_of:
- WorkflowExecution
range: string
was_informed_by:
name: was_informed_by
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- prov:wasInformedBy
rank: 1000
alias: was_informed_by
owner: ReadQcAnalysis
domain_of:
- WorkflowExecution
range: DataGeneration
required: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(omprc|dgns)-{id_shoulder}-{id_blade}$'
interpolated: true
has_input:
name: has_input
description: An input to a process.
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- input
rank: 1000
alias: has_input
owner: ReadQcAnalysis
domain_of:
- PlannedProcess
range: NamedThing
required: true
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
interpolated: true
has_output:
name: has_output
description: An output from a process.
from_schema: https://w3id.org/nmdc/nmdc
aliases:
- output
rank: 1000
alias: has_output
owner: ReadQcAnalysis
domain_of:
- PlannedProcess
range: NamedThing
multivalued: true
structured_pattern:
syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
interpolated: true
processing_institution:
name: processing_institution
description: The organization that processed the sample.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: processing_institution
owner: ReadQcAnalysis
domain_of:
- PlannedProcess
range: ProcessingInstitutionEnum
protocol_link:
name: protocol_link
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: protocol_link
owner: ReadQcAnalysis
domain_of:
- PlannedProcess
- Study
range: Protocol
start_date:
name: start_date
description: The date on which any process or activity was started
todos:
- add date string validation pattern
comments:
- We are using string representations of dates until all components of our ecosystem
can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: start_date
owner: ReadQcAnalysis
domain_of:
- PlannedProcess
range: string
end_date:
name: end_date
description: The date on which any process or activity was ended
todos:
- add date string validation pattern
comments:
- We are using string representations of dates until all components of our ecosystem
can handle ISO 8610 dates
- The date should be formatted as YYYY-MM-DD
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: end_date
owner: ReadQcAnalysis
domain_of:
- PlannedProcess
range: string
qc_status:
name: qc_status
description: Stores information about the result of a process (ie the process
of sequencing a library may have for qc_status of 'fail' if not enough data
was generated)
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: qc_status
owner: ReadQcAnalysis
domain_of:
- PlannedProcess
range: StatusEnum
qc_comment:
name: qc_comment
description: Slot to store additional comments about laboratory or workflow output.
For workflow output it may describe the particular workflow stage that failed.
(ie Failed at call-stage due to a malformed fastq file).
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: qc_comment
owner: ReadQcAnalysis
domain_of:
- PlannedProcess
range: string
has_failure_categorization:
name: has_failure_categorization
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: has_failure_categorization
owner: ReadQcAnalysis
domain_of:
- PlannedProcess
range: FailureCategorization
multivalued: true
inlined: true
inlined_as_list: true
id:
name: id
description: A unique identifier for a thing. Must be either a CURIE shorthand
for a URI or a complete URI
notes:
- 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
- a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
will be accepted
- typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
via per-class id slot usage assertions
- minting authority shoulders should probably be enumerated and checked in the
pattern
examples:
- value: nmdc:mgmag-00-x012.1_7_c1
description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
identifier: true
alias: id
owner: ReadQcAnalysis
domain_of:
- NamedThing
range: uriorcurie
required: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
structured_pattern:
syntax: '{id_nmdc_prefix}:wfrqc-{id_shoulder}-{id_blade}{id_version}$'
interpolated: true
name:
name: name
description: A human readable label for an entity
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: name
owner: ReadQcAnalysis
domain_of:
- PersonValue
- NamedThing
- Protocol
range: string
description:
name: description
description: a human-readable description of a thing
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
slot_uri: dcterms:description
alias: description
owner: ReadQcAnalysis
domain_of:
- ImageValue
- NamedThing
range: string
alternative_identifiers:
name: alternative_identifiers
description: A list of alternative identifiers for the entity.
from_schema: https://w3id.org/nmdc/nmdc
rank: 1000
alias: alternative_identifiers
owner: ReadQcAnalysis
domain_of:
- MetaboliteIdentification
- NamedThing
range: uriorcurie
multivalued: true
pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
type:
name: type
description: the class_uri of the class that has been instantiated
notes:
- replaces legacy nmdc:type slot
- makes it easier to read example data files
- required for polymorphic MongoDB collections
examples:
- value: nmdc:Biosample
- value: nmdc:Study
from_schema: https://w3id.org/nmdc/nmdc
see_also:
- https://github.com/microbiomedata/nmdc-schema/issues/1048
- https://github.com/microbiomedata/nmdc-schema/issues/1233
- https://github.com/microbiomedata/nmdc-schema/issues/248
rank: 1000
slot_uri: rdf:type
designates_type: true
alias: type
owner: ReadQcAnalysis
domain_of:
- EukEval
- FunctionalAnnotationAggMember
- PeptideQuantification
- ProteinQuantification
- MobilePhaseSegment
- PortionOfSubstance
- MagBin
- MetaboliteIdentification
- GenomeFeature
- FunctionalAnnotation
- AttributeValue
- NamedThing
- OntologyRelation
- FailureCategorization
- Protocol
- CreditAssociation
- Doi
range: uriorcurie
required: true
class_uri: nmdc:ReadQcAnalysis