Dataclasses#

class ingredient_parser.dataclasses.UnitSystem(*values)[source]#

Enum defining unit systems

METRIC = 'metric'#
US_CUSTOMARY = 'us_customary'#
IMPERIAL = 'imperial'#
AUSTRALIAN = 'australian'#
JAPANESE = 'japanese'#
OTHER = 'other'#
NONE = 'none'#
class ingredient_parser.dataclasses.CompositeIngredientAmount(amounts: list[IngredientAmount], join: str, subtractive: bool)[source]#

Dataclass for a composite ingredient amount.

This is an amount comprising more than one IngredientAmount object e.g. “1 lb 2 oz” or “1 cup plus 1 tablespoon”.

Attributes:
amountslist[IngredientAmount]

List of IngredientAmount objects that make up the composite amount. The order in this list is the order they appear in the sentence.

joinstr

String of text that joins the amounts, e.g. “plus”.

subtractivebool

If True, the amounts combine subtractively. If False, the amounts combine additively.

textstr

Composite amount as a string, automatically generated the amounts and join attributes.

confidencefloat

Confidence of parsed ingredient amount, between 0 and 1. This is the average confidence of all tokens that contribute to this object.

starting_indexint

Index of token in sentence that starts this amount.

unit_systemUnitSystem

Unit system (e.g. metric) that the unit of the amount belongs to.

Methods

combined()

Return the combined amount in a single unit for the composite amount.

convert_to(unit, density)

Convert units of the combined CompositeIngredientAmount object to given unit.

combined() Quantity[source]#

Return the combined amount in a single unit for the composite amount.

The amounts that comprise the composite amount are combined according to whether the composite amount is subtractive or not. The combined amount is returned as a pint.Quantity object.

Returns:
pint.Quantity

Combined amount.

Raises:
TypeError

Raised if any of the amounts in the object do not comprise a float quantity and a pint.Unit unit. In these cases, they amounts cannot be combined.

convert_to(unit: str, density: Quantity = <Quantity(1000.0, 'kilogram / meter ** 3')>) Quantity[source]#

Convert units of the combined CompositeIngredientAmount object to given unit.

Conversion is only possible if none of the quantity, quantity_max and unit are strings.

Conversion between mass and volume is supported using the density parameter, but otherwise a DimensionalityError is raised if attempting to convert units of different dimensionality.

Warning

When a conversion between mass <-> volume is performed, the quantities will be converted to floats.

Parameters:
unitstr

Unit to convert to.

densitypint.Quantity, optional

Density used for conversion between volume and mass. Default is the density of water.

Returns:
pint.Quantity

Combined amount converted to given units.

class ingredient_parser.dataclasses.FoundationFood(text: str, confidence: float, fdc_id: int, category: str, data_type: str, name_index: int)[source]#

Dataclass for the attributes of an entry in the Food Data Central database.

Attributes:
textstr

Description FDC database entry.

confidencefloat

Confidence of the match, between 0 and 1.

fdc_idint

ID of the FDC database entry.

categorystr

Category of FDC database entry.

data_typestr

Food Data Central data set the entry belongs to.

urlstr

URL for FDC database entry.

name_indexint

Index of associated name in ParsedIngredient.name list.

class ingredient_parser.dataclasses.IngredientAmount(quantity: Fraction | str, quantity_max: Fraction | str, unit: str | Unit, text: str, confidence: float, starting_index: int, APPROXIMATE: bool = False, SINGULAR: bool = False, RANGE: bool = False, MULTIPLIER: bool = False, PREPARED_INGREDIENT: bool = False)[source]#

Dataclass for holding a parsed ingredient amount.

On instantiation, the unit is made plural if necessary.

Attributes:
quantityFraction | str

Parsed ingredient quantity, as a Fraction where possible, otherwise a string. If the amount if a range, this is the lower limit of the range.

quantity_maxFraction | str

If the amount is a range, this is the upper limit of the range. Otherwise, this is the same as the quantity field. This is set automatically depending on the type of quantity.

unitstr | pint.Unit

Unit of parsed ingredient quantity. If the quantity is recognised in the pint unit registry, a pint.Unit object is used.

textstr

String describing the amount e.g. “1 cup”, “8 oz”

confidencefloat

Confidence of parsed ingredient amount, between 0 and 1. This is the average confidence of all tokens that contribute to this object.

starting_indexint

Index of token in sentence that starts this amount

unit_systemUnitSystem

Unit system (e.g. metric) that the unit of the amount belongs to.

APPROXIMATEbool, optional

When True, indicates that the amount is approximate. Default is False.

SINGULARbool, optional

When True, indicates if the amount refers to a singular item of the ingredient. Default is False.

RANGEbool, optional

When True, indicates the amount is a range e.g. 1-2. Default is False.

MULTIPLIERbool, optional

When True, indicates the amount is a multiplier e.g. 1x, 2x. Default is False.

PREPARED_INGREDIENTbool, optional

When True, indicates the amount applies to the prepared ingredient. When False, indicates the amount applies to the ingredient before preparation. Default is False.

Methods

convert_to(unit, density)

Convert units of IngredientAmount object to given unit.

convert_to(unit: str, density: Quantity = <Quantity(1000.0, 'kilogram / meter ** 3')>)[source]#

Convert units of IngredientAmount object to given unit.

Conversion is only possible if none of the quantity, quantity_max and unit are strings.

Conversion between mass and volume is supported using the density parameter, but otherwise a DimensionalityError is raised if attempting to convert units of different dimensionality.

Warning

When a conversion between mass <-> volume is performed, the quantities will be converted to floats.

Parameters:
unitstr

Unit to convert to.

densitypint.Quantity, optional

Density used for conversion between volume and mass. Default is the density of water.

Returns:
Self

Copy of IngredientAmount object with units converted to given unit.

Raises:
TypeError

Raised if unit, quantity or quantity_max are str

class ingredient_parser.dataclasses.IngredientText(text: str, confidence: float, starting_index: int)[source]#

Dataclass for holding a parsed ingredient string.

Attributes:
textstr

Parsed text from ingredient. This is comprised of all tokens with the same label.

confidencefloat

Confidence of parsed ingredient text, between 0 and 1. This is the average confidence of all tokens that contribute to this object.

starting_indexint

Index of token in sentence that starts this text

class ingredient_parser.dataclasses.LabelledToken(index: int, text: str, pos_tag: str, label: str, score: float, plural: bool)[source]#

Dataclass representing a labelled token from a ingredient sentence.

Attributes:
indexint

Index of the token in the sentence.

textstr

Token text.

pos_tagstr

TPart of speech tag for token.

labelstr

Label assigned to token.

scorefloat

Confidence of assigned label between 0 and 1.

pluralbool

True if token is plural.

class ingredient_parser.dataclasses.ParsedIngredient(name: list[IngredientText], size: IngredientText | None, amount: list[IngredientAmount | CompositeIngredientAmount], preparation: IngredientText | None, comment: IngredientText | None, purpose: IngredientText | None, foundation_foods: list[FoundationFood], sentence: str)[source]#

Dataclass for holding the parsed values for an input sentence.

Attributes:
namelist[IngredientText]

List of IngredientText objects, each representing an ingreident name parsed from input sentence. If no ingredient names are found, this is an empty list.

sizeIngredientText | None

Size modifier of ingredients, such as small or large. If no size modifier, this is None.

amountList[IngredientAmount | CompositeIngredientAmount]

List of IngredientAmount objects, each representing a matching quantity and unit pair parsed from the sentence. If no ingredient amounts are found, this is an empty list.

preparationIngredientText | None

Ingredient preparation instructions parsed from sentence. If no ingredient preparation instruction was found, this is None.

commentIngredientText | None

Ingredient comment parsed from input sentence. If no ingredient comment was found, this is None.

purposeIngredientText | None

The purpose of the ingredient parsed from the sentence. If no purpose was found, this is None.

foundation_foodslist[FoundationFood]

List of foundation foods from the parsed sentence.

sentencestr

Normalised input sentence

class ingredient_parser.dataclasses.ParserDebugInfo(sentence: str, PreProcessor: Any, PostProcessor: Any, tagger: NumpyCRFInference)[source]#

Dataclass for holding intermediate objects generated during parsing.

Attributes:
sentencestr

Input ingredient sentence.

PreProcessorPreProcessor

PreProcessor object created using input sentence.

PostProcessorPostProcessor

PostProcessor object created using tokens, labels and scores from input sentence.

TaggerNumpyCRFInference

CRF model tagger object.

class ingredient_parser.dataclasses.Token(index: int, text: str, feat_text: str, pos_tag: str, features: TokenFeatures)[source]#

Dataclass representing a token from a ingredient sentence.

Attributes:
indexint

Index of the token in the sentence.

textstr

Token text.

feat_textstr

Token text used for feature generation.

pos_tagstr

Part of speech tag for token.

featuresTokenFeatures

Common features for token.

class ingredient_parser.dataclasses.TokenFeatures(stem: str, shape: str, is_capitalised: bool, is_unit: bool, is_punc: bool, is_ambiguous_unit: bool)[source]#

Dataclass for common token features.

Attributes:
stemstr

Stem of the token.

shapestr

Shape of the token, represented by X, x, d characters.

is_capitalisedbool

True if the token starts with a capital letter, else False.

is_unitstr

True if the token is in the list of units, else False.

is_puncstr

True if the token is a punctuation character, else False.

is_ambiguous_unitstr

True if the token is in the list of ambiguous units, else False.