sample_annotator package
Submodules
sample_annotator.report_model module
- class sample_annotator.report_model.AnnotationMultiSampleReport(reports: List[AnnotationReport] | None = None)
Bases:
object
Multi-report of a set of samples
- all_outputs() List[Dict[str, Any]]
- as_dataframe()
- reports: List[AnnotationReport] = None
- class sample_annotator.report_model.AnnotationReport(messages: List[Message] | None = None, package: PackageCombo | None = None, input: Dict[str, Any] | None = None, output: Dict[str, Any] | None = None, sample_id: str | None = None)
Bases:
object
Annotation report for a single sample
- add_message(*args, **kwargs)
- annotation_sufficiency_score = 0.0
- as_dataframe()
- input: Dict[str, Any] = None
- max_severity()
- messages_by_category() Dict
- output: Dict[str, Any] = None
- package: PackageCombo = None
- passes()
- sample_id: str = None
- class sample_annotator.report_model.Category(value)
Bases:
Enum
An enumeration.
- BadNull = 'bad-null'
- ControlledVocabulary = 'controlled-vocabulary'
- Core = 'core'
- Geo = 'geo'
- Identifier = 'identifier'
- Inapplicable = 'inapplicable'
- MeasurementSyntax = 'measurement-syntax'
- MissingCore = 'missing-core'
- Unclassified = 'unclassified'
- Units = 'units'
- UnknownField = 'unknown-field'
- static list()
- class sample_annotator.report_model.Message(description: str | None = None, severity: int = 1, was_repaired: bool | None = None, category: Category = Category.Unclassified, field: str | None = None)
Bases:
object
Individual report message
- as_dict() Dict
- description: str = None
- field: str = None
- severity: int = 1
- was_repaired: bool = None
sample_annotator.sample_annotator module
- class sample_annotator.sample_annotator.SampleAnnotator(target_class: ClassDefinition | None = None, geoengine: GeoEngine = GeoEngine(googlemaps_api_key=None), measurement_engine: MeasurementEngine = MeasurementEngine(), schema: SampleSchema = SampleSchema(object=None))
Bases:
object
TODO
- annotate(sample: Dict[str, Any], study: Dict[str, Any] | None = None) AnnotationReport
Annotate a sample
Returns an AnnotationReport object that includes a transformed sample representation, plus reports of all errors/warnings found, and repairs made
Performs a sequential series of tidy activities. Each report
- annotate_all(samples: List[Dict[str, Any]], study: Dict[str, Any] | None = None) AnnotationMultiSampleReport
Annotate a list of samples
- geoengine: GeoEngine = GeoEngine(googlemaps_api_key=None)
- infer_package(sample: Dict[str, Any], report: AnnotationReport)
Infer the environment package / checklist combo, either from directly asserted fields, or other means
- measurement_engine: MeasurementEngine = MeasurementEngine()
- perform_geolocation_inference(sample: Dict[str, Any], report: AnnotationReport)
Performs inference using geolocation information
- perform_inference(sample: Dict[str, Any], report: AnnotationReport)
Performs Machine Learning inference
- perform_text_mining(sample: Dict[str, Any], report: AnnotationReport)
Performs text mining
- schema: SampleSchema = SampleSchema(object=None)
- target_class: ClassDefinition = None
- tidy_enumerations(sample: Dict[str, Any], report: AnnotationReport)
Tidies measurement fields
- tidy_keys(sample: Dict[str, Any], report: AnnotationReport)
Performs tidying on all keys/fields/slots in the sample dictionary
uses mappings, e.g. between MIxS5 vs 6
performs case normalization
- tidy_measurements(sample: Dict[str, Any], report: AnnotationReport)
Tidies measurement fields
- tidy_nulls(sample: Dict[str, Any], report: AnnotationReport)
Normalizes to EBI standard null values
https://ena-docs.readthedocs.io/en/latest/submit/samples/missing-values.html
- validate_identifier(sample: Dict[str, Any], report: AnnotationReport)
sample_annotator.sample_utils module
- sample_annotator.sample_utils.create_tests(samples: List[Dict[str, Any]])
Takes normalized samples and uses this to create tests