Skip to content

Class: DataObject

An object that primarily consists of symbols that represent information. Files, records, and omics data are examples of data objects.

URI: nmdc:DataObject

classDiagram class DataObject click DataObject href "../DataObject" InformationObject <|-- DataObject click InformationObject href "../InformationObject" DataObject : alternative_identifiers DataObject : compression_type DataObject : data_category DataObject --> "0..1" DataCategoryEnum : data_category click DataCategoryEnum href "../DataCategoryEnum" DataObject : data_object_type DataObject --> "0..1" FileTypeEnum : data_object_type click FileTypeEnum href "../FileTypeEnum" DataObject : description DataObject : file_size_bytes DataObject : id DataObject : in_manifest DataObject --> "*" Manifest : in_manifest click Manifest href "../Manifest" DataObject : insdc_experiment_identifiers DataObject : md5_checksum DataObject : name DataObject : type DataObject : url DataObject : was_generated_by DataObject --> "0..1" WorkflowExecution : was_generated_by click WorkflowExecution href "../WorkflowExecution"

Inheritance

Slots

Name Cardinality and Range Description Inheritance
compression_type 0..1
String
If provided, specifies the compression type direct
data_category 0..1
DataCategoryEnum
The category of the file, such as instrument data from data generation or pro... direct
data_object_type 0..1
FileTypeEnum
The type of file represented by the data object direct
file_size_bytes 0..1
Bytes
Size of the file in bytes direct
insdc_experiment_identifiers *
ExternalIdentifier
direct
md5_checksum 0..1
String
MD5 checksum of file (pre-compressed) direct
url 0..1
String
direct
was_generated_by 0..1
WorkflowExecution or 
WorkflowExecution or 
DataGeneration
direct
in_manifest *
Manifest
one or more combinations of other DataObjects that can be analyzed together direct
id 1
Uriorcurie
A unique identifier for a thing NamedThing
name 1
String
A human readable label for an entity NamedThing
description 1
String
a human-readable description of a thing NamedThing
alternative_identifiers *
Uriorcurie
A list of alternative identifiers for the entity NamedThing
type 1
Uriorcurie
the class_uri of the class that has been instantiated NamedThing

Usages

used by used in type used
NucleotideSequencing has_output range DataObject
MassSpectrometry has_output range DataObject
CalibrationInformation calibration_object range DataObject
Database data_object_set range DataObject
DataGeneration has_output range DataObject

Identifier and Mapping Information

Schema Source

Mappings

Mapping Type Mapped Value
self nmdc:DataObject
native nmdc:DataObject

LinkML Source

Direct

name: DataObject
description: An object that primarily consists of symbols that represent information.   Files,
  records, and omics data are examples of data objects.
from_schema: https://w3id.org/nmdc/nmdc
is_a: InformationObject
slots:
- compression_type
- data_category
- data_object_type
- file_size_bytes
- insdc_experiment_identifiers
- md5_checksum
- url
- was_generated_by
- in_manifest
slot_usage:
  name:
    name: name
    required: true
  description:
    name: description
    required: true
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:dobj-{id_shoulder}-{id_blade}$'
      interpolated: true
  was_generated_by:
    name: was_generated_by
    structured_pattern:
      syntax: ^{id_nmdc_prefix}:(wfmag|wfmb|wfmgan|wfmgas|wfmsa|wfmp|wfmt|wfmtan|wfmtas|wfnom|wfrbt|wfrqc)-{id_shoulder}-{id_blade}{id_version}$|^{id_nmdc_prefix}:(omprc|dgms|dgns)-{id_shoulder}-{id_blade}$
      interpolated: true
class_uri: nmdc:DataObject

Induced

name: DataObject
description: An object that primarily consists of symbols that represent information.   Files,
  records, and omics data are examples of data objects.
from_schema: https://w3id.org/nmdc/nmdc
is_a: InformationObject
slot_usage:
  name:
    name: name
    required: true
  description:
    name: description
    required: true
  id:
    name: id
    required: true
    structured_pattern:
      syntax: '{id_nmdc_prefix}:dobj-{id_shoulder}-{id_blade}$'
      interpolated: true
  was_generated_by:
    name: was_generated_by
    structured_pattern:
      syntax: ^{id_nmdc_prefix}:(wfmag|wfmb|wfmgan|wfmgas|wfmsa|wfmp|wfmt|wfmtan|wfmtas|wfnom|wfrbt|wfrqc)-{id_shoulder}-{id_blade}{id_version}$|^{id_nmdc_prefix}:(omprc|dgms|dgns)-{id_shoulder}-{id_blade}$
      interpolated: true
attributes:
  compression_type:
    name: compression_type
    description: If provided, specifies the compression type
    todos:
    - consider setting the range to an enum
    examples:
    - value: gzip
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: compression_type
    owner: DataObject
    domain_of:
    - DataObject
    range: string
  data_category:
    name: data_category
    description: The category of the file, such as instrument data from data generation
      or processed data from a workflow execution.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: data_category
    owner: DataObject
    domain_of:
    - DataObject
    range: DataCategoryEnum
  data_object_type:
    name: data_object_type
    description: The type of file represented by the data object.
    examples:
    - value: FT ICR-MS Analysis Results
    - value: GC-MS Metabolomics Results
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: data_object_type
    owner: DataObject
    domain_of:
    - DataObject
    range: FileTypeEnum
  file_size_bytes:
    name: file_size_bytes
    description: Size of the file in bytes
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: file_size_bytes
    owner: DataObject
    domain_of:
    - DataObject
    range: bytes
  insdc_experiment_identifiers:
    name: insdc_experiment_identifiers
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    is_a: external_database_identifiers
    mixins:
    - insdc_identifiers
    alias: insdc_experiment_identifiers
    owner: DataObject
    domain_of:
    - NucleotideSequencing
    - DataObject
    range: external_identifier
    multivalued: true
    pattern: ^insdc.sra:(E|D|S)RX[0-9]{6,}$
  md5_checksum:
    name: md5_checksum
    description: MD5 checksum of file (pre-compressed)
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: md5_checksum
    owner: DataObject
    domain_of:
    - DataObject
    range: string
  url:
    name: url
    notes:
    - See issue 207 - this clashes with the mixs field
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: url
    owner: DataObject
    domain_of:
    - ImageValue
    - Protocol
    - DataObject
    range: string
  was_generated_by:
    name: was_generated_by
    from_schema: https://w3id.org/nmdc/nmdc
    mappings:
    - prov:wasGeneratedBy
    rank: 1000
    alias: was_generated_by
    owner: DataObject
    domain_of:
    - FunctionalAnnotationAggMember
    - FunctionalAnnotation
    - DataObject
    range: WorkflowExecution
    structured_pattern:
      syntax: ^{id_nmdc_prefix}:(wfmag|wfmb|wfmgan|wfmgas|wfmsa|wfmp|wfmt|wfmtan|wfmtas|wfnom|wfrbt|wfrqc)-{id_shoulder}-{id_blade}{id_version}$|^{id_nmdc_prefix}:(omprc|dgms|dgns)-{id_shoulder}-{id_blade}$
      interpolated: true
    any_of:
    - range: WorkflowExecution
    - range: DataGeneration
  in_manifest:
    name: in_manifest
    description: one or more combinations of other DataObjects that can be analyzed
      together
    comments:
    - A DataObject can be part of multiple manifests, for example, a DataObject could
      be part of a manifest for a single run of an instrument and a manifest for technical
      replicates of a single sample.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: in_manifest
    owner: DataObject
    domain_of:
    - DataObject
    range: Manifest
    multivalued: true
    structured_pattern:
      syntax: ^{id_nmdc_prefix}:manif-{id_shoulder}-{id_blade}$
      interpolated: true
  id:
    name: id
    description: A unique identifier for a thing. Must be either a CURIE shorthand
      for a URI or a complete URI
    notes:
    - 'abstracted pattern: prefix:typecode-authshoulder-blade(.version)?(_seqsuffix)?'
    - a minimum length of 3 characters is suggested for typecodes, but 1 or 2 characters
      will be accepted
    - typecodes must correspond 1:1 to a class in the NMDC schema. this will be checked
      via per-class id slot usage assertions
    - minting authority shoulders should probably be enumerated and checked in the
      pattern
    examples:
    - value: nmdc:mgmag-00-x012.1_7_c1
      description: https://github.com/microbiomedata/nmdc-schema/pull/499#discussion_r1018499248
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    identifier: true
    alias: id
    owner: DataObject
    domain_of:
    - NamedThing
    range: uriorcurie
    required: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,]*$
    structured_pattern:
      syntax: '{id_nmdc_prefix}:dobj-{id_shoulder}-{id_blade}$'
      interpolated: true
  name:
    name: name
    description: A human readable label for an entity
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: name
    owner: DataObject
    domain_of:
    - PersonValue
    - NamedThing
    - Protocol
    range: string
    required: true
  description:
    name: description
    description: a human-readable description of a thing
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    slot_uri: dcterms:description
    alias: description
    owner: DataObject
    domain_of:
    - ImageValue
    - NamedThing
    range: string
    required: true
  alternative_identifiers:
    name: alternative_identifiers
    description: A list of alternative identifiers for the entity.
    from_schema: https://w3id.org/nmdc/nmdc
    rank: 1000
    alias: alternative_identifiers
    owner: DataObject
    domain_of:
    - MetaboliteIdentification
    - NamedThing
    range: uriorcurie
    multivalued: true
    pattern: ^[a-zA-Z0-9][a-zA-Z0-9_\.]+:[a-zA-Z0-9_][a-zA-Z0-9_\-\/\.,\(\)\=\#]*$
  type:
    name: type
    description: the class_uri of the class that has been instantiated
    notes:
    - replaces legacy nmdc:type slot
    - makes it easier to read example data files
    - required for polymorphic MongoDB collections
    examples:
    - value: nmdc:Biosample
    - value: nmdc:Study
    from_schema: https://w3id.org/nmdc/nmdc
    see_also:
    - https://github.com/microbiomedata/nmdc-schema/issues/1048
    - https://github.com/microbiomedata/nmdc-schema/issues/1233
    - https://github.com/microbiomedata/nmdc-schema/issues/248
    rank: 1000
    slot_uri: rdf:type
    designates_type: true
    alias: type
    owner: DataObject
    domain_of:
    - EukEval
    - FunctionalAnnotationAggMember
    - PeptideQuantification
    - ProteinQuantification
    - MobilePhaseSegment
    - PortionOfSubstance
    - MagBin
    - MetaboliteIdentification
    - GenomeFeature
    - FunctionalAnnotation
    - AttributeValue
    - NamedThing
    - OntologyRelation
    - FailureCategorization
    - Protocol
    - CreditAssociation
    - Doi
    range: uriorcurie
    required: true
class_uri: nmdc:DataObject