Skip to content

Validate JSON and Fetch JSON

Let's dive in and get acquainted with the NMDC Runtime API.

Validate JSON

Already? Yes. Let's do this. Here is a tiny nmdc:Database JSON object:

{"biosample_set": [{"id": 42}]}

This represents a set of nmdc:Biosample objects. There is just one, with an id of 42.

Let's validate it. Go to the POST /metadata/json:validate endpoint at https://api.microbiomedata.org/docs and click "Try it out":

Try it Out

Now, copy the above JSON object, paste it into the Request body field, and hit Execute:

Paste Body and Execute

This gives us a response where the result is "errors". Looks like a biosample id needs to be a string value, and we are missing required properties. We also get a display of a curl command to reproduce the request on the command line:

Validation Response

Let's see what a "valid" response looks like. The GET /nmdcschema/{collection_name}/{doc_id} endpoint allows us to get the NMDC-schema-validated JSON object for one of the NMDC metadata collections:

Get one valid

For example, https://api.microbiomedata.org/nmdcschema/biosample_set/gold:Gb0115217 is

{
  "location": "groundwater-surface water interaction zone in Washington, USA",
  "env_medium": {
    "has_raw_value": "ENVO:01000017"
  },
  "depth2": {
    "has_raw_value": "1.0",
    "has_numeric_value": 1,
    "has_unit": "meter"
  },
  "env_broad_scale": {
    "has_raw_value": "ENVO:01000253"
  },
  "alternative_identifiers": [
    "img.taxon:3300042741"
  ],
  "ecosystem": "Engineered",
  "ecosystem_category": "Artificial ecosystem",
  "id": "gold:Gb0115217",
  "env_local_scale": {
    "has_raw_value": "ENVO:01000621"
  },
  "community": "microbial communities",
  "mod_date": "2021-06-17",
  "ecosystem_subtype": "Unclassified",
  "INSDC_biosample_identifiers": [
    "biosample:SAMN06343863"
  ],
  "description": "Sterilized sand packs were incubated back in the ground and collected at time point T2.",
  "collection_date": {
    "has_raw_value": "2014-09-23"
  },
  "ecosystem_type": "Sand microcosm",
  "sample_collection_site": "sand microcosm",
  "name": "Sand microcosm microbial communities from a hyporheic zone in Columbia River, Washington, USA - GW-RW T2_23-Sept-14",
  "lat_lon": {
    "has_raw_value": "46.37228379 -119.2717467",
    "latitude": 46.37228379,
    "longitude": -119.2717467
  },
  "specific_ecosystem": "Unclassified",
  "identifier": "GW-RW T2_23-Sept-14",
  "GOLD_sample_identifiers": [
    "gold:Gb0115217"
  ],
  "add_date": "2015-05-28",
  "habitat": "sand microcosm",
  "type": "nmdc:Biosample",
  "depth": {
    "has_raw_value": "0.5",
    "has_numeric_value": 0.5,
    "has_unit": "meter"
  },
  "part_of": [
    "gold:Gs0114663"
  ],
  "ncbi_taxonomy_name": "sediment metagenome",
  "geo_loc_name": {
    "has_raw_value": "USA: Columbia River, Washington"
  }
}

Now, copying and paste the above into the request body for POST /metadata/json:validate. Remember, the body needs to be a nmdc:Database object, in this case with a single member of the biosample_set collection, so copy and paste the {"biosample_set": [ and ]} parts to book-end the document JSON:

{"biosample_set": [
"PASTE_JSON_DOCUMENT_HERE"
]}

Now, when you execute the request, the response body will be

{
  "result": "All Okay!"
}

Hooray!

Get a List of NMDC-Schema-Compliant Documents

The GET /nmdcschema/{collection_name} endpoint allows you to get a filtered list of documents from one of the NMDC Schema collections:

List from collections

The collection_name must be one defined for a nmdc:Database, in the form expected by the JSON Schema, nmdc.schema.json. This typically means that any spaces in the name should be entered as underscores (_) instead.

The filter, if provided, is a JSON document in the form of the MongoDB Query Language. For example, the filter {"part_of": "gold:Gs0114663"} on collection_name biosample_set will list biosamples that are part of the gold:Gs0114663 study:

List from collection, with filter

When I execute that query, I use the default max_page_size of 20, meaning at most 20 documents are returned at a time. A much larger max_page_size is fine for programs/scripts, but can make your web browser less responsive when using the interactive documentation.

The response body for our request has two fields, resources and next_page_token:

{
  "resources": [
    ...
  ],
  "next_page_token": "nmdc:sys0s8f846"
}

resources is a list of documents. next_page_token is a value you can plug into a subsequent request as the page_token parameter:

List from collection, with filter and page token

This will return the next page of results. You do need to keep the other request parameters the same. In this way, you can page through and retrieve all documents that match a given filter (or no filter) for a given collection. Page tokens are ephemeral: once you use one in a request, it is removed from the system's memory.