PostProcess#

class ingredient_parser.postprocess.CompositeIngredientAmount(amounts: list[IngredientAmount], join: str)[source]#

Dataclass for a composite ingredient amount. This is an amount comprising more than one IngredientAmount object e.g. “1 lb 2 oz” or “1 cup plus 1 tablespoon”.

amounts#

List of IngredientAmount objects that make up the composite amount. The order in this list is the order they appear in the sentence.

Type:

list[IngredientAmount]

join#

String of text that joins the amounts, e.g. “plus”.

Type:

str

text#

Composite amount as a string, automatically generated the amounts and join attributes.

Type:

str

class ingredient_parser.postprocess.IngredientAmount(quantity: float | str, unit: str | Unit, text: str, confidence: float, starting_index: dataclasses.InitVar[int], APPROXIMATE: bool = False, SINGULAR: bool = False, RANGE: bool = False, MULTIPLIER: bool = False)[source]#

Dataclass for holding a parsed ingredient amount.

On instantiation, the unit is made plural if necessary.

quantity#

Parsed ingredient quantity, as a float where possible, otherwise a string. If the amount if a range, this is the lower limit of the range.

Type:

float | str

quantity_max#

If the amount is a range, this is the upper limit of the range. Otherwise, this is the same as the quantity field. This is set automatically depending on the type of quantity.

Type:

float | str

unit#

Unit of parsed ingredient quantity. If the quantity is recognised in the pint unit registry, a pint.Unit object is used.

Type:

str | pint.Unit

text#

String describing the amount e.g. “1 cup”

Type:

str

confidence#

Confidence of parsed ingredient amount, between 0 and 1. This is the average confidence of all tokens that contribute to this object.

Type:

float

APPROXIMATE#

When True, indicates that the amount is approximate. Default is False.

Type:

bool, optional

SINGULAR#

When True, indicates if the amount refers to a singular item of the ingredient. Default is False.

Type:

bool, optional

RANGE#

When True, indicates the amount is a range e.g. 1-2. Default is False.

Type:

bool, optional

MULTIPLIER#

When True, indicates the amount is a multiplier e.g. 1x, 2x. Default is False.

Type:

bool, optional

class ingredient_parser.postprocess.IngredientText(text: str, confidence: float)[source]#

Dataclass for holding a parsed ingredient string, comprising the following attributes.

text#

Parsed text from ingredient. This is comprised of all tokens with the same label.

Type:

str

confidence#

Confidence of parsed ingredient amount, between 0 and 1. This is the average confidence of all tokens that contribute to this object.

Type:

float

class ingredient_parser.postprocess.ParsedIngredient(name: IngredientText | None, size: IngredientText | None, amount: list[IngredientAmount], preparation: IngredientText | None, comment: IngredientText | None, sentence: str)[source]#

Dataclass for holding the parsed values for an input sentence.

name#

Ingredient name parsed from input sentence. If no ingredient name was found, this is None.

Type:

IngredientText | None

size#

Size modifer of ingredients, such as small or large. If no size modifier, this is None.

Type:

IngredientText | None

amount#

List of IngredientAmount objects, each representing a matching quantity and unit pair parsed from the sentence.

Type:

List[IngredientAmount]

preparation#

Ingredient preparation instructions parsed from sentence. If no ingredient preparation instruction was found, this is None.

Type:

IngredientText | None

comment#

Ingredient comment parsed from input sentence. If no ingredient comment was found, this is None.

Type:

IngredientText | None

sentence#

Normalised input sentence

Type:

str

class ingredient_parser.postprocess.PostProcessor(sentence: str, tokens: list[str], labels: list[str], scores: list[float], discard_isolated_stop_words: bool = True, string_units: bool = False, imperial_units: bool = False)[source]#

Recipe ingredient sentence PostProcessor class.

Performs the necessary postprocessing on the sentence tokens and labels and scores for the tokens after tagging with the CRF model in order to return a coherent structure of parsed information.

labels#

List of labels for tokens.

Type:

list[str]

scores#

Confidence associated with the label for each token.

Type:

list[float]

sentence#

Original ingredient sentence.

Type:

str

tokens#

List of tokens for original ingredient sentence.

Type:

list[str]

discard_isolated_stop_words#

If True, isolated stop words are discarded from the name, preparation or comment fields. Default value is True.

Type:

bool

string_units#

If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Dfault is False.

Type:

bool

imperial_units#

If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.

Type:

bool

consumed#

List of indices of tokens consumed as part of setting the APPROXIMATE and SINGULAR flags. These tokens should not end up in the parsed output.

Type:

list[int]

property parsed: ParsedIngredient#

Return parsed ingredient data

Returns:

ParsedIngredient – Object containing structured data from sentence.