Using MongoDB filters

This guide covers Python-first filtering with NMDC API Utilities. Filters use MongoDB query syntax and can be passed directly or built with helper methods.

Quick Start

[1]:

from nmdc_api_utilities import BiosampleSearch

client = BiosampleSearch()
results = client.get_record_by_filter('{"id": "nmdc:bsm-11-006pnx90"}')

print(f"Records found: {len(results)}")
if results:
    print(f"First record ID: {results[0].get('id')}")

Records found: 1
First record ID: nmdc:bsm-11-006pnx90

Filter Formats

MongoDB-style JSON filter strings are accepted by get_record_by_filter and get_records.

Examples:

Exact match: {"id": "nmdc:sty-11-8fb6t785"}
Case-insensitive partial match: {"name": {"$regex": "forest", "$options": "i"}}
Multiple criteria (implicit AND): {"ecosystem_category": "Plants", "lat_lon": {"$exists": true}}
Nested field (dot notation): {"env_broad_scale.has_raw_value": "Forest biome"}

Supported MongoDB Operators

$regex for pattern matching
$options for regex options (for example, "i")
$exists for field presence
$in for matching any value in an array
$gte and $lte for range filters
$and and $or for compound logic

Direct Filter Usage

[2]:

from nmdc_api_utilities import BiosampleSearch

client = BiosampleSearch()

filter_str = '{"name": {"$regex": "forest", "$options": "i"}}'
records = client.get_record_by_filter(filter_str)

print(f"Matching biosamples: {len(records)}")

Matching biosamples: 25

Build Filters Programmatically

Use DataProcessing.build_filter to create filters from Python dictionaries. By default, it builds case-insensitive regex filters and escapes special characters.

[3]:

from nmdc_api_utilities import BiosampleSearch, DataProcessing

client = BiosampleSearch()
dp = DataProcessing()

filter_str = dp.build_filter({"name": "GC-MS (2009)"})
records = client.get_record_by_filter(filter_str)

exact_filter = dp.build_filter({"ecosystem_category": "Plants"}, exact_match=True)
exact_records = client.get_record_by_filter(exact_filter)

print(f"Regex-style matches: {len(records)}")
print(f"Exact matches: {len(exact_records)}")

Regex-style matches: 0
Exact matches: 25

Attribute-Based Query Helper

For straightforward attribute lookups, use get_record_by_attribute.

[4]:

from nmdc_api_utilities import StudySearch

client = StudySearch()

partial = client.get_record_by_attribute(
    attribute_name="name",
    attribute_value="tropical soil",
)

exact = client.get_record_by_attribute(
    attribute_name="ecosystem_category",
    attribute_value="Plants",
    exact_match=True,
)

print(f"Partial matches: {len(partial)}")
print(f"Exact matches: {len(exact)}")

Partial matches: 1
Exact matches: 1

Troubleshooting

Filter returns no results:

Confirm field names against schema documentation.
Try regex + $options: i instead of strict equality.
Confirm the selected collection class matches your target records.

JSON syntax errors:

Use double quotes for keys and string values.
Validate JSON structure before passing filter strings.

Special character issues:

Prefer build_filter for automatic escaping.
If writing raw regex filters, escape special characters carefully.