Skip to content

Improving the Search API

A standard operating procedure (SOP) for search API maintainers/contributors.

The "Search API" may be thought of that part of the Runtime API that is both read-only and is particular to the needs of (meta)data consumers to retrieve NMDC data products that conform to the NMDC schema.

The "Data Management API", on the other hand, is all other aspects of the runtime API -- that is, the read-write parts that also serve the needs of data producers and data stewards.

Endpoints

Endpoints for the Search API are defined in the nmdc_runtime.api.endpoints.find module.

To add an endpoint, you will likely use the FindRequest and FindResponse models in nmdc_runtime.api.models.util, as they are the shared request/response models used by existing Search API endpoints.

You will also likely use the find_resources helper function from nmdc_runtime.api.endpoints.util. Improvements to this helper function will improve all existing Search API endpoints.

Adding a dedicated endpoint for a particular resource collection may be as simple as copying the code for a representative pair of endpoints, such as GET /studies and GET /studies/{study_id}, and changing names accordingly.

Index-backed Filter Attributes

In order to ensure an index for a particular attribute/slot of a collection entity, add it to the entity_attributes_to_index dictionary in the nmdc_runtime.api.models.util module. Each key of that dictionary is the collection name for the entity, e.g. biosample_set, and each value corresponding to a key is the set of attributes, e.g. ecosystem and collection_date.has_raw_value, for which an index will be ensured.

Currently, due to the limitations of the database technology (MongoDB) that backs the API, a single collection can have no more than 64 indexes.

When the API server code is re-deployed, the ensure_indexes startup hook in the nmdc_runtime.api.main module is run, which fetches entity_attributes_to_index and ensures the corresponding indexes exist.