Class: Metagenome Sequence Data (Non-Interleaved) (MetagenomeSequencingNonInterleavedDataInterface)

Interface for non-interleaved metagenome sequencing data

URI: nmdc_sub_schema:MetagenomeSequencingNonInterleavedDataInterface

classDiagram class MetagenomeSequencingNonInterleavedDataInterface click MetagenomeSequencingNonInterleavedDataInterface href "../MetagenomeSequencingNonInterleavedDataInterface" DhMultiviewCommonColumnsMixin <|-- MetagenomeSequencingNonInterleavedDataInterface click DhMultiviewCommonColumnsMixin href "../DhMultiviewCommonColumnsMixin" MetagenomeSequencingNonInterleavedDataInterface : analysis_type MetagenomeSequencingNonInterleavedDataInterface --> "1..*" AnalysisTypeEnum : analysis_type click AnalysisTypeEnum href "../AnalysisTypeEnum" MetagenomeSequencingNonInterleavedDataInterface : insdc_bioproject_identifiers MetagenomeSequencingNonInterleavedDataInterface : insdc_experiment_identifiers MetagenomeSequencingNonInterleavedDataInterface : model MetagenomeSequencingNonInterleavedDataInterface --> "1" IlluminaInstrumentModelEnum : model click IlluminaInstrumentModelEnum href "../IlluminaInstrumentModelEnum" MetagenomeSequencingNonInterleavedDataInterface : processing_institution MetagenomeSequencingNonInterleavedDataInterface --> "0..1" ProcessingInstitutionEnum : processing_institution click ProcessingInstitutionEnum href "../ProcessingInstitutionEnum" MetagenomeSequencingNonInterleavedDataInterface : protocol_link MetagenomeSequencingNonInterleavedDataInterface : read_1_md5_checksum MetagenomeSequencingNonInterleavedDataInterface : read_1_url MetagenomeSequencingNonInterleavedDataInterface : read_2_md5_checksum MetagenomeSequencingNonInterleavedDataInterface : read_2_url MetagenomeSequencingNonInterleavedDataInterface : samp_name MetagenomeSequencingNonInterleavedDataInterface : source_mat_id

Inheritance

MetagenomeSequencingNonInterleavedDataInterface [ DhMultiviewCommonColumnsMixin]

Slots

Name	Cardinality and Range	Description	Inheritance
read_1_url	1 String	URL for FASTQ file of read 1 of a pair of reads	direct
read_1_md5_checksum	0..1 String	MD5 checksum of file in "read 1 FASTQ"	direct
read_2_url	1 String	URL for FASTQ file of read 2 of a pair of reads	direct
read_2_md5_checksum	0..1 String	MD5 checksum of file in "read 2 FASTQ"	direct
model	1 IlluminaInstrumentModelEnum	The model of the Illumina sequencing instrument used to generate the data	direct
processing_institution	0..1 ProcessingInstitutionEnum	The organization that processed the sample	direct
protocol_link	0..1 String	A URL to a description of the sequencing protocol used to generate the data	direct
insdc_bioproject_identifiers	0..1 String	identifiers for corresponding project in INSDC Bioproject	direct
insdc_experiment_identifiers	0..1 String	If multiple identifiers are provided, separate them with a semicolon	direct
analysis_type	1..* AnalysisTypeEnum	Select all the data types associated or available for this biosample	DhMultiviewCommonColumnsMixin
samp_name	1 String	A local identifier or name that for the material sample collected	DhMultiviewCommonColumnsMixin
source_mat_id	0..1 String	A globally unique identifier assigned to the biological sample	DhMultiviewCommonColumnsMixin

Usages

used by	used in	type	used
SampleData	metagenome_sequencing_non_interleaved_data	range	MetagenomeSequencingNonInterleavedDataInterface

Identifier and Mapping Information

Schema Source

from schema: https://example.com/nmdc_submission_schema

Mappings

Mapping Type	Mapped Value
self	nmdc_sub_schema:MetagenomeSequencingNonInterleavedDataInterface
native	nmdc_sub_schema:MetagenomeSequencingNonInterleavedDataInterface

LinkML Source

Direct

name: MetagenomeSequencingNonInterleavedDataInterface
description: Interface for non-interleaved metagenome sequencing data
title: Metagenome Sequence Data (Non-Interleaved)
from_schema: https://example.com/nmdc_submission_schema
mixins:
- DhMultiviewCommonColumnsMixin
slots:
- read_1_url
- read_1_md5_checksum
- read_2_url
- read_2_md5_checksum
- model
- processing_institution
- protocol_link
- insdc_bioproject_identifiers
- insdc_experiment_identifiers
slot_usage:
  model:
    name: model
    description: The model of the Illumina sequencing instrument used to generate
      the data.
    title: instrument model
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 2
    owner: Instrument
    domain_of:
    - Instrument
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: IlluminaInstrumentModelEnum
    required: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    title: processing institution
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 3
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: ProcessingInstitutionEnum
  protocol_link:
    name: protocol_link
    description: A URL to a description of the sequencing protocol used to generate
      the data.
    title: protocol
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 4
    owner: NucleotideSequencing
    domain_of:
    - Configuration
    - PlannedProcess
    - Study
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: string
    multivalued: false
  insdc_bioproject_identifiers:
    name: insdc_bioproject_identifiers
    description: identifiers for corresponding project in INSDC Bioproject
    title: INSDC bioproject identifier
    comments:
    - these are distinct IDs from INSDC SRA/ENA project identifiers, but are usually(?)
      one to one
    examples:
    - value: bioproject:PRJNA366857
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://www.ncbi.nlm.nih.gov/bioproject/
    - https://www.ddbj.nig.ac.jp/bioproject/index-e.html
    aliases:
    - NCBI bioproject identifiers
    - DDBJ bioproject identifiers
    rank: 5
    is_a: study_identifiers
    mixins:
    - insdc_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    - Study
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: string
    multivalued: false
    pattern: ^bioproject:PRJ[DEN][A-Z][0-9]+$
  insdc_experiment_identifiers:
    name: insdc_experiment_identifiers
    description: If multiple identifiers are provided, separate them with a semicolon.
      The number of identifiers must match the number of sequencing files.
    title: INSDC experiment identifiers
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 6
    is_a: external_database_identifiers
    mixins:
    - insdc_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    - DataObject
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: string
    multivalued: false
    pattern: ^insdc.sra:(E|D|S)RX[0-9]{6,}$

Induced

name: MetagenomeSequencingNonInterleavedDataInterface
description: Interface for non-interleaved metagenome sequencing data
title: Metagenome Sequence Data (Non-Interleaved)
from_schema: https://example.com/nmdc_submission_schema
mixins:
- DhMultiviewCommonColumnsMixin
slot_usage:
  model:
    name: model
    description: The model of the Illumina sequencing instrument used to generate
      the data.
    title: instrument model
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 2
    owner: Instrument
    domain_of:
    - Instrument
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: IlluminaInstrumentModelEnum
    required: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    title: processing institution
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 3
    owner: NucleotideSequencing
    domain_of:
    - PlannedProcess
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: ProcessingInstitutionEnum
  protocol_link:
    name: protocol_link
    description: A URL to a description of the sequencing protocol used to generate
      the data.
    title: protocol
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 4
    owner: NucleotideSequencing
    domain_of:
    - Configuration
    - PlannedProcess
    - Study
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: string
    multivalued: false
  insdc_bioproject_identifiers:
    name: insdc_bioproject_identifiers
    description: identifiers for corresponding project in INSDC Bioproject
    title: INSDC bioproject identifier
    comments:
    - these are distinct IDs from INSDC SRA/ENA project identifiers, but are usually(?)
      one to one
    examples:
    - value: bioproject:PRJNA366857
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://www.ncbi.nlm.nih.gov/bioproject/
    - https://www.ddbj.nig.ac.jp/bioproject/index-e.html
    aliases:
    - NCBI bioproject identifiers
    - DDBJ bioproject identifiers
    rank: 5
    is_a: study_identifiers
    mixins:
    - insdc_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    - Study
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: string
    multivalued: false
    pattern: ^bioproject:PRJ[DEN][A-Z][0-9]+$
  insdc_experiment_identifiers:
    name: insdc_experiment_identifiers
    description: If multiple identifiers are provided, separate them with a semicolon.
      The number of identifiers must match the number of sequencing files.
    title: INSDC experiment identifiers
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 6
    is_a: external_database_identifiers
    mixins:
    - insdc_identifiers
    owner: NucleotideSequencing
    domain_of:
    - NucleotideSequencing
    - DataObject
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: string
    multivalued: false
    pattern: ^insdc.sra:(E|D|S)RX[0-9]{6,}$
attributes:
  read_1_url:
    name: read_1_url
    description: URL for FASTQ file of read 1 of a pair of reads.
    title: read 1 FASTQ
    comments:
    - If multiple runs were performed, separate each URL with a semi-colon.
    - External data urls should be available for at least a year. If you would like
      NMDC to submit your data to an appropriate raw data repository on your behalf
      please contact us at microbiomedata.science@gmail.com.
    from_schema: https://example.com/nmdc_submission_schema
    rank: 10
    alias: read_1_url
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    slot_group: data_files_section
    range: string
    required: true
    multivalued: false
    pattern: ^https://[^\s;]+(?:\s*;\s*https://[^\s;]+)*$
  read_1_md5_checksum:
    name: read_1_md5_checksum
    description: MD5 checksum of file in "read 1 FASTQ".
    title: read 1 FASTQ MD5
    comments:
    - If multiple runs were performed, separate each checksum with a semi-colon. The
      number of checksums should match the number of URLs in "read 1 FASTQ".
    from_schema: https://example.com/nmdc_submission_schema
    rank: 11
    alias: read_1_md5_checksum
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    slot_group: data_files_section
    range: string
    multivalued: false
    pattern: ^[a-fA-F0-9]{32}(?:\s*;\s*[a-fA-F0-9]{32})*$
  read_2_url:
    name: read_2_url
    description: URL for FASTQ file of read 2 of a pair of reads.
    title: read 2 FASTQ
    comments:
    - If multiple runs were performed, separate each URL with a semi-colon.
    - External data urls should be available for at least a year. If you would like
      NMDC to submit your data to an appropriate raw data repository on your behalf
      please contact us at microbiomedata.science@gmail.com.
    from_schema: https://example.com/nmdc_submission_schema
    rank: 12
    alias: read_2_url
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    slot_group: data_files_section
    range: string
    required: true
    multivalued: false
    pattern: ^https://[^\s;]+(?:\s*;\s*https://[^\s;]+)*$
  read_2_md5_checksum:
    name: read_2_md5_checksum
    description: MD5 checksum of file in "read 2 FASTQ".
    title: read 2 FASTQ MD5
    comments:
    - If multiple runs were performed, separate each checksum with a semi-colon. The
      number of checksums should match the number of URLs in "read 2 FASTQ".
    from_schema: https://example.com/nmdc_submission_schema
    rank: 13
    alias: read_2_md5_checksum
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    slot_group: data_files_section
    range: string
    multivalued: false
    pattern: ^[a-fA-F0-9]{32}(?:\s*;\s*[a-fA-F0-9]{32})*$
  model:
    name: model
    description: The model of the Illumina sequencing instrument used to generate
      the data.
    title: instrument model
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 2
    alias: model
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - Instrument
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: IlluminaInstrumentModelEnum
    required: true
  processing_institution:
    name: processing_institution
    description: The organization that processed the sample.
    title: processing institution
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 3
    alias: processing_institution
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - PlannedProcess
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: ProcessingInstitutionEnum
  protocol_link:
    name: protocol_link
    description: A URL to a description of the sequencing protocol used to generate
      the data.
    title: protocol
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 4
    alias: protocol_link
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - Configuration
    - PlannedProcess
    - Study
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: string
    multivalued: false
  insdc_bioproject_identifiers:
    name: insdc_bioproject_identifiers
    description: identifiers for corresponding project in INSDC Bioproject
    title: INSDC bioproject identifier
    comments:
    - these are distinct IDs from INSDC SRA/ENA project identifiers, but are usually(?)
      one to one
    examples:
    - value: bioproject:PRJNA366857
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://www.ncbi.nlm.nih.gov/bioproject/
    - https://www.ddbj.nig.ac.jp/bioproject/index-e.html
    aliases:
    - NCBI bioproject identifiers
    - DDBJ bioproject identifiers
    rank: 5
    is_a: study_identifiers
    mixins:
    - insdc_identifiers
    alias: insdc_bioproject_identifiers
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - NucleotideSequencing
    - Study
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: string
    multivalued: false
    pattern: ^bioproject:PRJ[DEN][A-Z][0-9]+$
  insdc_experiment_identifiers:
    name: insdc_experiment_identifiers
    description: If multiple identifiers are provided, separate them with a semicolon.
      The number of identifiers must match the number of sequencing files.
    title: INSDC experiment identifiers
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 6
    is_a: external_database_identifiers
    mixins:
    - insdc_identifiers
    alias: insdc_experiment_identifiers
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - NucleotideSequencing
    - DataObject
    - MetagenomeSequencingNonInterleavedDataInterface
    - MetagenomeSequencingInterleavedDataInterface
    - MetatranscriptomeSequencingNonInterleavedDataInterface
    - MetatranscriptomeSequencingInterleavedDataInterface
    slot_group: sequencing_section
    range: string
    multivalued: false
    pattern: ^insdc.sra:(E|D|S)RX[0-9]{6,}$
  analysis_type:
    name: analysis_type
    description: Select all the data types associated or available for this biosample
    title: analysis/data type
    comments:
    - MIxS:investigation_type was included as a `see_also` but that term doesn't resolve
      any more
    examples:
    - value: metagenomics; metabolomics; metaproteomics
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 3
    alias: analysis_type
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - Biosample
    - DhMultiviewCommonColumnsMixin
    slot_group: sample_id_section
    range: AnalysisTypeEnum
    required: true
    recommended: false
    multivalued: true
  samp_name:
    name: samp_name
    annotations:
      expected_value:
        tag: expected_value
        value: text
    description: A local identifier or name that for the material sample collected.
      Refers to the original material collected or to any derived sub-samples.
    title: sample name
    comments:
    - It can have any format, but we suggest that you make it concise, unique and
      consistent within your lab, and as informative as possible.
    examples:
    - value: Rock core CB1178(5-6) from NSW
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - sample name
    rank: 1
    is_a: investigation field
    string_serialization: '{text}'
    slot_uri: MIXS:0001107
    identifier: true
    alias: samp_name
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - Biosample
    - DhMultiviewCommonColumnsMixin
    slot_group: sample_id_section
    range: string
    required: true
    multivalued: false
  source_mat_id:
    name: source_mat_id
    annotations:
      expected_value:
        tag: expected_value
        value: 'for cultures of microorganisms: identifiers for two culture collections;
          for other material a unique arbitrary identifer'
    description: A globally unique identifier assigned to the biological sample.
    title: source material identifier
    todos:
    - Currently, the comments say to use UUIDs. However, if we implement assigning
      NMDC identifiers with the minter we dont need to require a GUID. It can be an
      optional field to fill out only if they already have a resolvable ID.
    - Currently, the comments say to use UUIDs. However, if we implement assigning
      NMDC identifiers with the minter we dont need to require a GUID. It can be an
      optional field to fill out only if they already have a resolvable ID.
    notes:
    - The source material IS the Globally Unique ID
    comments:
    - Identifiers must be prefixed. Possible FAIR prefixes are IGSNs (http://www.geosamples.org/getigsn),
      NCBI biosample accession numbers, ARK identifiers (https://arks.org/). These
      IDs enable linking to derived analytes and subsamples. If you have not assigned
      FAIR identifiers to your samples, you can generate UUIDs (https://www.uuidgenerator.net/).
    - Identifiers must be prefixed. Possible FAIR prefixes are IGSNs (http://www.geosamples.org/getigsn),
      NCBI biosample accession numbers, ARK identifiers (https://arks.org/). These
      IDs enable linking to derived analytes and subsamples. If you have not assigned
      FAIR identifiers to your samples, you can generate UUIDs (https://www.uuidgenerator.net/).
    examples:
    - value: IGSN:AU1243
    - value: UUID:24f1467a-40f4-11ed-b878-0242ac120002
    from_schema: https://w3id.org/nmdc/nmdc
    aliases:
    - source material identifiers
    rank: 2
    is_a: nucleic acid sequence source field
    string_serialization: '{text}:{text}'
    slot_uri: MIXS:0000026
    alias: source_mat_id
    owner: MetagenomeSequencingNonInterleavedDataInterface
    domain_of:
    - Biosample
    - DhMultiviewCommonColumnsMixin
    slot_group: sample_id_section
    range: string
    multivalued: false
    pattern: '[^\:\n\r]+\:[^\:\n\r]+'