Getting Started#

The Ingredient Parser package is a Python package for parsing structured information from recipe ingredient sentences.

Given a recipe ingredient such as

200 g plain flour, sifted

we want to extract information about the quantity, units, name, preparation and comment. For the example above:

Quantity

Unit

Name

Preparation

Comment

200

g

plain flour

sifted

This package uses a Conditional Random Fields model trained on 60,000 example ingredient sentences. The model has been trained on data from three sources:

  • The New York Times released a large dataset when they did some similar work in 2015 in their Ingredient Phrase Tagger repository.

  • A dump of recipes taken from Cookstr in 2017.

  • A dump of recipe taken from BBC Food in 2017.

More information on how the model is trained and the output interpreted can be found in the Model Guide.

Installation#

You can install Ingredient Parser from PyPi with pip:

$ python -m pip install ingredient_parser_nlp

This will download and install the package and it’s dependencies:

Usage#

The primary functionality of this package is provided by the parse_ingredient function.

The parse_ingredient function takes an ingredient sentence and return the structured data extracted from it.

>>> from ingredient_parser import parse_ingredient
>>> parse_ingredient("3 pounds pork shoulder, cut into 2-inch chunks")
ParsedIngredient(
    name=IngredientText(text='pork shoulder',
                        confidence=0.999773),
    size=None,
    amount=[IngredientAmount(quantity=3.0,
                             unit=<Unit('pound')>,
                             text='3 pounds',
                             confidence=0.999906,,
                             APPROXIMATE=False,
                             SINGULAR=False)],
    preparation=IngredientText(text='cut into 2 inch chunks',
                               confidence=0.999193),
    comment=None,
    sentence='3 punds pork shoulder, cut into 2-inch chunks'
)

The returned ParsedIngredient object contains the following fields:

Field

Description

name

The name of the ingredient sentence, or None.

size

A size modifier for the ingredient, such as small or large, or None.

This size modifier only applies to the ingredient, not the unit. For example, 1 large pintch of salt would have the unit as large pinch and size of None.

amount

The amounts parsed from the sentence. Each amount has a quantity and a unit, plus optional flags indicating if the amount is approximate or is for a singular item.

By default the unit field is a pint.Unit object, if the unit can be matched to a unit in the pint unit registry.

preparation

The preparation notes for the ingredient. This is a string, or None is there are no preparation notes for the ingredient.

comment

The comment from the ingredient sentence. This is a string, or None if there is no comment.

sentence

The input sentence passed to the parse_ingredient function.

Each of the fields (except sentence) has a confidence value associated with it. This is a value between 0 and 1, where 0 represents no confidence and 1 represent full confidence. This is the confidence that the natural language model has that the given label is correct, averaged across all tokens that contribute to a particular field.

Optional parameters#

The parse_ingredient function has the following optional boolean parameters:

  • discard_isolated_stop_words

    If True (default), then any stop words that appear in isolation in the name, preparation, or comment fields are discarded. If False, then all words from the input sentence are retained in the parsed output. For example:

>>> from ingredient_parser import parse_ingredient
>>> parse_ingredient("2 tbsp of olive oil", discard_isolated_stop_words=True) # default
ParsedIngredient(name=IngredientText(text='olive oil', confidence=0.990923),
    size=None,
    amount=[IngredientAmount(quantity='2',
                             unit=<Unit('tablespoon')>,
                             text='2 tbsps',
                             confidence=0.999799,
                             APPROXIMATE=False,
                             SINGULAR=False)],
    preparation=None,
    comment=None,
    sentence='2 tbsp of olive oil'
)
>>> parse_ingredient("2 tbsp of olive oil", discard_isolated_stop_words=False)
ParsedIngredient(name=IngredientText(text='olive oil', confidence=0.990923),
    size=None,
    amount=[IngredientAmount(quantity='2',
                             unit=<Unit('tablespoon')>,
                             text='2 tbsps',
                             confidence=0.999799,
                             APPROXIMATE=False,
                             SINGULAR=False)],
    preparation=None,
    comment=IngredientText(text='of', confidence=0.8852),  # <-- Note the difference here
    sentence='2 tbsp of olive oil'
)
  • string_units

    If True, units in the IngredientAmount objects are returned as strings. The default is False, where units will be pint.Unit objects

  • imperial_unts

    If True, then any pint.Unit objects for fluid ounces, cups, pints, quarts or gallons will be the Imperial measurement. The default is False, where the US customary measurements are used.

Multiple ingredient sentences#

The parse_multiple_ingredients function is provided as a convenience function. It accepts a list of ingredient sentences as it’s input and returns a list of ParsedIngredient objects with the parsed information. It has the same optional arguments as parse_ingredient.

>>> from ingredient_parser import parse_multiple_ingredients
>>> sentences = [
    "3 tablespoons fresh lime juice, plus lime wedges for serving",
    "2 tablespoons extra-virgin olive oil",
    "2 large garlic cloves, finely grated",
]
>>> parse_multiple_ingredients(sentences)
[
    ParsedIngredient(
        name=IngredientText(text='fresh lime juice', confidence=0.991891),
        size=None,
        amount=[IngredientAmount(quantity='3',
                                 unit=<Unit('tablespoon')>,
                                 text='3 tablespoons',
                                 confidence=0.999459,
                                 APPROXIMATE=False,
                                 SINGULAR=False)],
        preparation=None,
        comment=IngredientText(text='plus lime wedges for serving', confidence=0.995029),
        sentence='3 tablespoons fresh lime juice, plus lime wedges for serving'
    ),
    ParsedIngredient(
        name=IngredientText(text='extra-virgin olive oil', confidence=0.996531),
        size=None,
        amount=[IngredientAmount(quantity='2',
                                 unit=<Unit('tablespoon')>,
                                 text='2 tablespoons',
                                 confidence=0.999259,
                                 APPROXIMATE=False,
                                 SINGULAR=False)],
        preparation=None,
        comment=None,
        sentence='2 tablespoons extra-virgin olive oil'
    ),
    ParsedIngredient(
        name=IngredientText(text='garlic', confidence=0.992021),
        size=None,
        amount=[IngredientAmount(quantity='2',
                                 unit='large cloves',
                                 text='2 large cloves',
                                 confidence=0.983268,
                                 APPROXIMATE=False,
                                 SINGULAR=False)],
        preparation=IngredientText(text='finely grated', confidence=0.997482),
        comment=None,
        sentence='2 large garlic cloves, finely grated'
    )
]