Parsers#
- class ingredient_parser.parsers.ParserDebugInfo(sentence: str, PreProcessor: PreProcessor, PostProcessor: PostProcessor)[source]#
Dataclass for holding intermediate objects generated during ingredient sentence parsing.
- PreProcessor#
PreProcessor object created using input sentence.
- Type:
- PostProcessor#
PostProcessor object created using tokens, labels and scores from input sentence.
- Type:
- Tagger#
CRF model tagger object.
- Type:
pycrfsuite.Tagger
- ingredient_parser.parsers.inspect_parser(sentence: str, discard_isolated_stop_words: bool = True, string_units: bool = False, imperial_units: bool = False) ParserDebugInfo [source]#
Return object containing all intermediate objects used in the parsing of a sentence.
- Parameters:
sentence (str) – Ingredient sentence to parse
discard_isolated_stop_words (bool, optional) – If True, any isolated stop words in the name, preparation, or comment fields are discarded. Default is True.
string_units (bool) – If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Dfault is False.
imperial_units (bool) – If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.
- Returns:
ParserDebugInfo – ParserDebugInfo object containing the PreProcessor object, PostProcessor object and Tagger.
- ingredient_parser.parsers.load_model_if_not_loaded()[source]#
Load model into TAGGER variable if not loaded.
There isn’t a simple way to check if the model if loaded or not, so we try to call TAGGER.info() which will raise a RuntimeError if the model is not loaded yet.
- ingredient_parser.parsers.parse_ingredient(sentence: str, discard_isolated_stop_words: bool = True, string_units: bool = False, imperial_units: bool = False) ParsedIngredient [source]#
Parse an ingredient sentence using CRF model to return structured data
- Parameters:
sentence (str) – Ingredient sentence to parse
discard_isolated_stop_words (bool, optional) – If True, any isolated stop words in the name, preparation, or comment fields are discarded. Default is True.
string_units (bool) – If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Dfault is False.
imperial_units (bool) – If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.
- Returns:
ParsedIngredient – ParsedIngredient object of structured data parsed from input string
- ingredient_parser.parsers.parse_multiple_ingredients(sentences: list[str], discard_isolated_stop_words: bool = True, string_units: bool = False, imperial_units: bool = False) list[ParsedIngredient] [source]#
Parse multiple ingredient sentences in one go.
This function accepts a list of sentences, with element of the list representing one ingredient sentence. A list of dictionaries is returned, with optional confidence values. This function is a simple for-loop that iterates through each element of the input list.
- Parameters:
sentences (list[str]) – List of sentences to parse
discard_isolated_stop_words (bool, optional) – If True, any isolated stop words in the name, preparation, or comment fields are discarded. Default is True.
string_units (bool) – If True, return all IngredientAmount units as strings. If False, convert IngredientAmount units to pint.Unit objects where possible. Dfault is False.
imperial_units (bool) – If True, use imperial units instead of US customary units for pint.Unit objects for the the following units: fluid ounce, cup, pint, quart, gallon. Default is False, which results in US customary units being used. This has no effect if string_units=True.
- Returns:
list[ParsedIngredient] – List of ParsedIngredient objects of structured data parsed from input sentences