Using MongoDB filters

This guide covers Python-first filtering with NMDC API Utilities. Filters use MongoDB query syntax and can be passed directly or built with helper methods.


Quick Start

[1]:
from nmdc_api_utilities import BiosampleSearch

client = BiosampleSearch()
results = client.get_record_by_filter('{"id": "nmdc:bsm-11-006pnx90"}')

print(f"Records found: {len(results)}")
if results:
    print(f"First record ID: {results[0].get('id')}")
Records found: 1
First record ID: nmdc:bsm-11-006pnx90

Filter Formats

MongoDB-style JSON filter strings are accepted by get_record_by_filter and get_records.

Examples:

  • Exact match: {"id": "nmdc:sty-11-8fb6t785"}

  • Case-insensitive partial match: {"name": {"$regex": "forest", "$options": "i"}}

  • Multiple criteria (implicit AND): {"ecosystem_category": "Plants", "lat_lon": {"$exists": true}}

  • Nested field (dot notation): {"env_broad_scale.has_raw_value": "Forest biome"}

Supported MongoDB Operators

  • $regex for pattern matching

  • $options for regex options (for example, "i")

  • $exists for field presence

  • $in for matching any value in an array

  • $gte and $lte for range filters

  • $and and $or for compound logic

Direct Filter Usage

[2]:
from nmdc_api_utilities import BiosampleSearch

client = BiosampleSearch()

filter_str = '{"name": {"$regex": "forest", "$options": "i"}}'
records = client.get_record_by_filter(filter_str)

print(f"Matching biosamples: {len(records)}")
Matching biosamples: 25

Build Filters Programmatically

Use DataProcessing.build_filter to create filters from Python dictionaries. By default, it builds case-insensitive regex filters and escapes special characters.

[3]:
from nmdc_api_utilities import BiosampleSearch, DataProcessing

client = BiosampleSearch()
dp = DataProcessing()

filter_str = dp.build_filter({"name": "GC-MS (2009)"})
records = client.get_record_by_filter(filter_str)

exact_filter = dp.build_filter({"ecosystem_category": "Plants"}, exact_match=True)
exact_records = client.get_record_by_filter(exact_filter)

print(f"Regex-style matches: {len(records)}")
print(f"Exact matches: {len(exact_records)}")
Regex-style matches: 0
Exact matches: 25

Attribute-Based Query Helper

For straightforward attribute lookups, use get_record_by_attribute.

[4]:
from nmdc_api_utilities import StudySearch

client = StudySearch()

partial = client.get_record_by_attribute(
    attribute_name="name",
    attribute_value="tropical soil",
)

exact = client.get_record_by_attribute(
    attribute_name="ecosystem_category",
    attribute_value="Plants",
    exact_match=True,
)

print(f"Partial matches: {len(partial)}")
print(f"Exact matches: {len(exact)}")
Partial matches: 1
Exact matches: 1

Pagination and Performance

  • Use max_page_size to tune result size for iterative exploration.

  • Use all_pages=True only when full export is required.

  • Use narrow filters and projection fields where possible.

[5]:
from nmdc_api_utilities.biosample_search import BiosampleSearch

client = BiosampleSearch()

records = client.get_records(
    filter='{"ecosystem_category": "Plants"}',
    fields="id,name,lat_lon",
    max_page_size=50,
    all_pages=False,
)

print(f"Page records fetched: {len(records)}")
Page records fetched: 50

Troubleshooting

Filter returns no results:

  • Confirm field names against schema documentation.

  • Try regex + $options: i instead of strict equality.

  • Confirm the selected collection class matches your target records.

JSON syntax errors:

  • Use double quotes for keys and string values.

  • Validate JSON structure before passing filter strings.

Special character issues:

  • Prefer build_filter for automatic escaping.

  • If writing raw regex filters, escape special characters carefully.