Postprocess#
- class ingredient_parser.en.postprocess.PostProcessor(sentence: str, labelled_tokens: list[LabelledToken], custom_units: dict[str, str], separate_names: bool = True, discard_isolated_stop_words: bool = True, string_units: bool = False, volumetric_units_system: str = 'us_customary', foundation_foods: bool = False)[source]#
Recipe ingredient sentence PostProcessor class.
Performs the necessary postprocessing on the sentence tokens and labels and scores for the tokens after tagging with the CRF model in order to return a coherent structure of parsed information.
- Attributes:
- sentence
str Original ingredient sentence.
- labelled_tokens
list[LabelledToken], List of labelled tokens for original ingredient sentence.
- custom_units
dict[str,str] Dict of custom units as plural: singular pairs.
- separate_namesbool,
optional If True and the sentence contains multiple alternative ingredients, return an IngredientText object for each ingredient name, otherwise return a single IngredientText object. Default is True.
- discard_isolated_stop_wordsbool,
optional If True, isolated stop words are discarded from the name, preparation or comment fields. Default value is True.
- string_unitsbool,
optional If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Default is False.
- imperial_unitsbool,
optional If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.
- foundation_foodsbool,
optional If True, populate the foundation_foods field of ParsedIngredient. Default is False, in which case the foundation_foods field is an empty list.
- consumed
list[int] List of indices of tokens consumed as part of postprocesing the tokens and labels.
- sentence
- property parsed: ParsedIngredient[source]#
Return parsed ingredient data.
- Returns:
ParsedIngredientObject containing structured data from sentence.