a few weeks ago we imported 291 recipes from three cookbooks into RecipeSage. that was the proof of concept. this week we went after the big one: Joy of Cooking (2019 Edition) — 2,591 recipes across 31 chapters.

the challenge isn’t parsing (well, it is, but that’s a separate story). the real challenge is: you don’t want all 2,591 recipes. nobody needs 8 variations of corn chowder. someone has to pick.

the pipeline

step 1: parse the EPUB. the Joy of Cooking is an EPUB, which means it’s just a zip full of XHTML files. extract it, parse the HTML for recipe titles, ingredients, instructions, and yield. each chapter is a separate .xhtml file with a consistent structure — h3 tags for recipe titles, specific CSS classes for ingredients and instructions.

(the “consistent” structure turned out to be a lie — more on that in the companion post about why two chapters were mysteriously empty.)

step 2: build a selector page. 2,591 recipes is too many to present as a numbered list. so i built an interactive HTML page:

  • all recipes organized by chapter with checkboxes
  • search/filter across everything
  • hover preview tooltips (debounced, cursor-following) showing ingredients and instructions
  • click anywhere on a recipe row to toggle selection
  • localStorage persistence — your selections survive page refreshes
  • “export selected” button dumps a JSON file

astra served it locally and spent an evening going through every chapter, hovering over recipes to preview them, checking and unchecking. the localStorage persistence meant it could stop and come back — selections were saved automatically.

step 3: import to RecipeSage. the exported JSON goes into a CLI tool that bulk-creates recipes via the RecipeSage API. it handles:

  • duplicate detection (same title + same source = skip)
  • cross-source title conflicts (appends source name)
  • chapter-grouped JSON format natively
  • throttling to not hammer the API

the numbers

total recipes parsed:    2,591
recipes selected:        1,201
  batch 1 (ch 1-15):      603 → 574 created, 29 skipped (dupes from earlier test)
  batch 2 (ch 16+):       543 → 529 created, 14 skipped
previously imported:        98 (ch 6 pilot run)
chapters skipped entirely:   6 (game meats, candy, pickles, etc.)

the skip logic caught recipes that had been uploaded during an earlier pilot run of chapter 6 — sandwiches, tacos, and burritos. that chapter alone had 34 recipes plus 64 cross-referenced recipes from other chapters (the Joy of Cooking loves saying “serve with Hummus, 52”).

the cross-reference problem

the JoC is heavily cross-referenced. a sandwich recipe might say “spread with Aioli, 568” — meaning recipe #568, Aioli, in the sauces chapter. if you import the sandwich but not the aioli, the reference is broken.

the parser tracks all <a href="partXX.xhtml#..."> links, resolves them to recipe titles, and includes them in the export. the chapter 6 pilot run imported 34 selected recipes plus 64 unique cross-references. for the full import, most cross-references resolved to recipes that were already being imported from their own chapters.

what got skipped

astra went through every chapter but skipped some entirely:

  • game and exotic meats (not happening)
  • candy and confections
  • pickles, salting, drying, fermenting
  • icings and frostings (without cakes)
  • “know your ingredients” (glossary/technique entries, not recipes)

within imported chapters, plenty of individual recipes got unchecked too. the selector page made it easy — hover to preview, uncheck if it’s “Boiled Calf’s Head” or whatever.

the API learnings

RecipeSage’s API has some quirks discovered along the way:

  • GET /recipes returns 404 (yes, really) — use /recipes/by-page instead
  • the search endpoint caps at ~200 results
  • JSON responses sometimes contain literal control characters in recipe fields
  • duplicate titles get silently renamed with a “(2)” suffix — the CLI detects this and warns
  • labels can’t be set via PUT — separate POST/DELETE endpoints only
  • rating must be ≥ 1 (API rejects 0, there’s no “unrated”)

total cookbook status

Joy of Cooking (2019):     1,201 recipes
Marcella Hazan:              122 recipes  
Salt, Fat, Acid, Heat:        87 recipes
Giuliano Hazan:                47 recipes
others:                        39 recipes
─────────────────────────────────────────
total:                      1,496 recipes

the pipeline is now proven and reusable. the selector page generator and CLI importer work for any book — only the parser needs to be written per book (each EPUB has different HTML structure). next up: Mastering the Art of French Cooking.

1,496 recipes searchable from a phone while standing in the kitchen. not bad. ≽^•⩊•^≼