PostProcess#
- class ingredient_parser.postprocess.CompositeIngredientAmount(amounts: list[IngredientAmount], join: str)[source]#
Dataclass for a composite ingredient amount. This is an amount comprising more than one IngredientAmount object e.g. “1 lb 2 oz” or “1 cup plus 1 tablespoon”.
- amounts#
List of IngredientAmount objects that make up the composite amount. The order in this list is the order they appear in the sentence.
- Type:
- class ingredient_parser.postprocess.IngredientAmount(quantity: float | str, unit: str | Unit, text: str, confidence: float, starting_index: dataclasses.InitVar[int], APPROXIMATE: bool = False, SINGULAR: bool = False, RANGE: bool = False, MULTIPLIER: bool = False)[source]#
Dataclass for holding a parsed ingredient amount.
On instantiation, the unit is made plural if necessary.
- quantity#
Parsed ingredient quantity, as a float where possible, otherwise a string. If the amount if a range, this is the lower limit of the range.
- quantity_max#
If the amount is a range, this is the upper limit of the range. Otherwise, this is the same as the quantity field. This is set automatically depending on the type of quantity.
- unit#
Unit of parsed ingredient quantity. If the quantity is recognised in the pint unit registry, a pint.Unit object is used.
- confidence#
Confidence of parsed ingredient amount, between 0 and 1. This is the average confidence of all tokens that contribute to this object.
- Type:
- APPROXIMATE#
When True, indicates that the amount is approximate. Default is False.
- Type:
bool, optional
- class ingredient_parser.postprocess.IngredientText(text: str, confidence: float)[source]#
Dataclass for holding a parsed ingredient string, comprising the following attributes.
- class ingredient_parser.postprocess.ParsedIngredient(name: IngredientText | None, size: IngredientText | None, amount: list[IngredientAmount], preparation: IngredientText | None, comment: IngredientText | None, sentence: str)[source]#
Dataclass for holding the parsed values for an input sentence.
- name#
Ingredient name parsed from input sentence. If no ingredient name was found, this is None.
- Type:
IngredientText | None
- size#
Size modifer of ingredients, such as small or large. If no size modifier, this is None.
- Type:
IngredientText | None
- amount#
List of IngredientAmount objects, each representing a matching quantity and unit pair parsed from the sentence.
- Type:
List[IngredientAmount]
- preparation#
Ingredient preparation instructions parsed from sentence. If no ingredient preparation instruction was found, this is None.
- Type:
IngredientText | None
- comment#
Ingredient comment parsed from input sentence. If no ingredient comment was found, this is None.
- Type:
IngredientText | None
- class ingredient_parser.postprocess.PostProcessor(sentence: str, tokens: list[str], labels: list[str], scores: list[float], discard_isolated_stop_words: bool = True, string_units: bool = False, imperial_units: bool = False)[source]#
Recipe ingredient sentence PostProcessor class.
Performs the necessary postprocessing on the sentence tokens and labels and scores for the tokens after tagging with the CRF model in order to return a coherent structure of parsed information.
- discard_isolated_stop_words#
If True, isolated stop words are discarded from the name, preparation or comment fields. Default value is True.
- Type:
- string_units#
If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Dfault is False.
- Type:
- imperial_units#
If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.
- Type:
- consumed#
List of indices of tokens consumed as part of setting the APPROXIMATE and SINGULAR flags. These tokens should not end up in the parsed output.
- property parsed: ParsedIngredient#
Return parsed ingredient data
- Returns:
ParsedIngredient – Object containing structured data from sentence.