Using MongoDB filters
This guide covers Python-first filtering with NMDC API Utilities. Filters use MongoDB query syntax and can be passed directly or built with helper methods.
Quick Start
[1]:
from nmdc_api_utilities import BiosampleSearch
client = BiosampleSearch()
results = client.get_record_by_filter('{"id": "nmdc:bsm-11-006pnx90"}')
print(f"Records found: {len(results)}")
if results:
print(f"First record ID: {results[0].get('id')}")
Records found: 1
First record ID: nmdc:bsm-11-006pnx90
Filter Formats
MongoDB-style JSON filter strings are accepted by get_record_by_filter and get_records.
Examples:
Exact match:
{"id": "nmdc:sty-11-8fb6t785"}Case-insensitive partial match:
{"name": {"$regex": "forest", "$options": "i"}}Multiple criteria (implicit AND):
{"ecosystem_category": "Plants", "lat_lon": {"$exists": true}}Nested field (dot notation):
{"env_broad_scale.has_raw_value": "Forest biome"}
Supported MongoDB Operators
$regexfor pattern matching$optionsfor regex options (for example,"i")$existsfor field presence$infor matching any value in an array$gteand$ltefor range filters$andand$orfor compound logic
Direct Filter Usage
[2]:
from nmdc_api_utilities import BiosampleSearch
client = BiosampleSearch()
filter_str = '{"name": {"$regex": "forest", "$options": "i"}}'
records = client.get_record_by_filter(filter_str)
print(f"Matching biosamples: {len(records)}")
Matching biosamples: 25
Build Filters Programmatically
Use DataProcessing.build_filter to create filters from Python dictionaries. By default, it builds case-insensitive regex filters and escapes special characters.
[3]:
from nmdc_api_utilities import BiosampleSearch, DataProcessing
client = BiosampleSearch()
dp = DataProcessing()
filter_str = dp.build_filter({"name": "GC-MS (2009)"})
records = client.get_record_by_filter(filter_str)
exact_filter = dp.build_filter({"ecosystem_category": "Plants"}, exact_match=True)
exact_records = client.get_record_by_filter(exact_filter)
print(f"Regex-style matches: {len(records)}")
print(f"Exact matches: {len(exact_records)}")
Regex-style matches: 0
Exact matches: 25
Attribute-Based Query Helper
For straightforward attribute lookups, use get_record_by_attribute.
[4]:
from nmdc_api_utilities import StudySearch
client = StudySearch()
partial = client.get_record_by_attribute(
attribute_name="name",
attribute_value="tropical soil",
)
exact = client.get_record_by_attribute(
attribute_name="ecosystem_category",
attribute_value="Plants",
exact_match=True,
)
print(f"Partial matches: {len(partial)}")
print(f"Exact matches: {len(exact)}")
Partial matches: 1
Exact matches: 1
Pagination and Performance
Use
max_page_sizeto tune result size for iterative exploration.Use
all_pages=Trueonly when full export is required.Use narrow filters and projection fields where possible.
[5]:
from nmdc_api_utilities.biosample_search import BiosampleSearch
client = BiosampleSearch()
records = client.get_records(
filter='{"ecosystem_category": "Plants"}',
fields="id,name,lat_lon",
max_page_size=50,
all_pages=False,
)
print(f"Page records fetched: {len(records)}")
Page records fetched: 50
Troubleshooting
Filter returns no results:
Confirm field names against schema documentation.
Try regex +
$options: iinstead of strict equality.Confirm the selected collection class matches your target records.
JSON syntax errors:
Use double quotes for keys and string values.
Validate JSON structure before passing filter strings.
Special character issues:
Prefer
build_filterfor automatic escaping.If writing raw regex filters, escape special characters carefully.