sample_annotator package

Submodules

sample_annotator.report_model module

class sample_annotator.report_model.AnnotationMultiSampleReport(reports: List[AnnotationReport] | None = None)

Bases: object

Multi-report of a set of samples

all_outputs() List[Dict[str, Any]]
as_dataframe()
reports: List[AnnotationReport] = None
class sample_annotator.report_model.AnnotationReport(messages: List[Message] | None = None, package: PackageCombo | None = None, input: Dict[str, Any] | None = None, output: Dict[str, Any] | None = None, sample_id: str | None = None)

Bases: object

Annotation report for a single sample

add_message(*args, **kwargs)
annotation_sufficiency_score = 0.0
as_dataframe()
input: Dict[str, Any] = None
max_severity()
messages: List[Message] = None
messages_by_category() Dict
output: Dict[str, Any] = None
package: PackageCombo = None
passes()
sample_id: str = None
class sample_annotator.report_model.Category(value)

Bases: Enum

An enumeration.

BadNull = 'bad-null'
ControlledVocabulary = 'controlled-vocabulary'
Core = 'core'
Geo = 'geo'
Identifier = 'identifier'
Inapplicable = 'inapplicable'
MeasurementSyntax = 'measurement-syntax'
MissingCore = 'missing-core'
Unclassified = 'unclassified'
Units = 'units'
UnknownField = 'unknown-field'
static list()
class sample_annotator.report_model.Message(description: str | None = None, severity: int = 1, was_repaired: bool | None = None, category: Category = Category.Unclassified, field: str | None = None)

Bases: object

Individual report message

as_dict() Dict
category: Category = 'unclassified'
description: str = None
field: str = None
severity: int = 1
was_repaired: bool = None
class sample_annotator.report_model.PackageCombo(environmental_package: str | None = None, checklist: str | None = None)

Bases: object

Tuple of environmental package and checklist

checklist: str = None
environmental_package: str = None

sample_annotator.sample_annotator module

class sample_annotator.sample_annotator.SampleAnnotator(target_class: ClassDefinition | None = None, geoengine: GeoEngine = GeoEngine(googlemaps_api_key=None), measurement_engine: MeasurementEngine = MeasurementEngine(), schema: SampleSchema = SampleSchema(object=None))

Bases: object

TODO

annotate(sample: Dict[str, Any], study: Dict[str, Any] | None = None) AnnotationReport

Annotate a sample

Returns an AnnotationReport object that includes a transformed sample representation, plus reports of all errors/warnings found, and repairs made

Performs a sequential series of tidy activities. Each report

annotate_all(samples: List[Dict[str, Any]], study: Dict[str, Any] | None = None) AnnotationMultiSampleReport

Annotate a list of samples

geoengine: GeoEngine = GeoEngine(googlemaps_api_key=None)
infer_package(sample: Dict[str, Any], report: AnnotationReport)

Infer the environment package / checklist combo, either from directly asserted fields, or other means

measurement_engine: MeasurementEngine = MeasurementEngine()
perform_geolocation_inference(sample: Dict[str, Any], report: AnnotationReport)

Performs inference using geolocation information

perform_inference(sample: Dict[str, Any], report: AnnotationReport)

Performs Machine Learning inference

perform_text_mining(sample: Dict[str, Any], report: AnnotationReport)

Performs text mining

schema: SampleSchema = SampleSchema(object=None)
target_class: ClassDefinition = None
tidy_enumerations(sample: Dict[str, Any], report: AnnotationReport)

Tidies measurement fields

tidy_keys(sample: Dict[str, Any], report: AnnotationReport)

Performs tidying on all keys/fields/slots in the sample dictionary

  • uses mappings, e.g. between MIxS5 vs 6

  • performs case normalization

tidy_measurements(sample: Dict[str, Any], report: AnnotationReport)

Tidies measurement fields

tidy_nulls(sample: Dict[str, Any], report: AnnotationReport)

Normalizes to EBI standard null values

https://ena-docs.readthedocs.io/en/latest/submit/samples/missing-values.html

validate_identifier(sample: Dict[str, Any], report: AnnotationReport)

sample_annotator.sample_utils module

sample_annotator.sample_utils.create_tests(samples: List[Dict[str, Any]])

Takes normalized samples and uses this to create tests

Module contents