Parsers#

class ingredient_parser.parsers.ParserDebugInfo(sentence: str, PreProcessor: PreProcessor, PostProcessor: PostProcessor)[source]#

Dataclass for holding intermediate objects generated during ingredient sentence parsing.

sentence#

Input ingredient sentence.

Type:: str

PreProcessor#

PreProcessor object created using input sentence.

Type:: PreProcessor

PostProcessor#

PostProcessor object created using tokens, labels and scores from input sentence.

Type:: PostProcessor

Tagger#

CRF model tagger object.

Type:: pycrfsuite.Tagger

ingredient_parser.parsers.inspect_parser(sentence: str, discard_isolated_stop_words: bool = True, string_units: bool = False, imperial_units: bool = False) → ParserDebugInfo[source]#

Return object containing all intermediate objects used in the parsing of a sentence.

Parameters:

sentence (str) – Ingredient sentence to parse
discard_isolated_stop_words (bool, optional) – If True, any isolated stop words in the name, preparation, or comment fields are discarded. Default is True.
string_units (bool) – If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Dfault is False.
imperial_units (bool) – If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.

Returns:

ParserDebugInfo – ParserDebugInfo object containing the PreProcessor object, PostProcessor object and Tagger.

ingredient_parser.parsers.load_model_if_not_loaded()[source]#

Load model into TAGGER variable if not loaded.

There isn’t a simple way to check if the model if loaded or not, so we try to call TAGGER.info() which will raise a RuntimeError if the model is not loaded yet.

ingredient_parser.parsers.parse_ingredient(sentence: str, discard_isolated_stop_words: bool = True, string_units: bool = False, imperial_units: bool = False) → ParsedIngredient[source]#

Parse an ingredient sentence using CRF model to return structured data

Parameters:

sentence (str) – Ingredient sentence to parse
discard_isolated_stop_words (bool, optional) – If True, any isolated stop words in the name, preparation, or comment fields are discarded. Default is True.
string_units (bool) – If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Dfault is False.
imperial_units (bool) – If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.

Returns:

ParsedIngredient – ParsedIngredient object of structured data parsed from input string

ingredient_parser.parsers.parse_multiple_ingredients(sentences: list[str], discard_isolated_stop_words: bool = True, string_units: bool = False, imperial_units: bool = False) → list[ParsedIngredient][source]#

Parse multiple ingredient sentences in one go.

This function accepts a list of sentences, with element of the list representing one ingredient sentence. A list of dictionaries is returned, with optional confidence values. This function is a simple for-loop that iterates through each element of the input list.

Parameters:

sentences (list[str]) – List of sentences to parse
discard_isolated_stop_words (bool, optional) – If True, any isolated stop words in the name, preparation, or comment fields are discarded. Default is True.
string_units (bool) – If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Dfault is False.
imperial_units (bool) – If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.

Returns:

list[ParsedIngredient] – List of ParsedIngredient objects of structured data parsed from input sentences

Parsers

Contents

Parsers#