Postprocess

Postprocess#

class ingredient_parser.en.postprocess.PostProcessor(sentence: str, labelled_tokens: list[LabelledToken], custom_units: dict[str, str], separate_names: bool = True, discard_isolated_stop_words: bool = True, string_units: bool = False, volumetric_units_system: str = 'us_customary', foundation_foods: bool = False)[source]#

Recipe ingredient sentence PostProcessor class.

Performs the necessary postprocessing on the sentence tokens and labels and scores for the tokens after tagging with the CRF model in order to return a coherent structure of parsed information.

Attributes:
sentencestr

Original ingredient sentence.

labelled_tokenslist[LabelledToken],

List of labelled tokens for original ingredient sentence.

custom_unitsdict[str, str]

Dict of custom units as plural: singular pairs.

separate_namesbool, optional

If True and the sentence contains multiple alternative ingredients, return an IngredientText object for each ingredient name, otherwise return a single IngredientText object. Default is True.

discard_isolated_stop_wordsbool, optional

If True, isolated stop words are discarded from the name, preparation or comment fields. Default value is True.

string_unitsbool, optional

If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Default is False.

imperial_unitsbool, optional

If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.

foundation_foodsbool, optional

If True, populate the foundation_foods field of ParsedIngredient. Default is False, in which case the foundation_foods field is an empty list.

consumedlist[int]

List of indices of tokens consumed as part of postprocesing the tokens and labels.

property parsed: ParsedIngredient[source]#

Return parsed ingredient data.

Returns:
ParsedIngredient

Object containing structured data from sentence.