Mapping the French Culinary Network
For French restaurant intel, lefooding.com is the definitive data source. Their anonymous critics conduct systematic reviews of establishments across France (and now Belgium), documenting their findings in a witty - if peculiar - style. Beyond choosing the place for a delicious night out, they can be used to map and understand France’s culinary network.
A network is composed of nodes and edges. Nodes represent entities, such as people or restaurants. Edges represent the relationships between people and restaurants, or absence thereof. For the French restaurant scene, nodes represent both people and restaurants. A person node is connected to a restaurant if that person is known to have worked at the restaurant.
Restaurants with very many neighbor nodes (nodes connected to it) are restaurants whose alumni go on to create and work in other prestigious restaurants. This is the case of Ducasse (ok, not a restaurant but a placeholder for all the restaurants in the Ducasse brand), Mandarin Oriental (technically the restaurant is called Sur Mesure) or Septime.
To visualize and analyse this network, I combine information contained in reviews of restaurants across France by LeFooding.com’s critics. Most reviews contain information about the staff and their CV.
Take this (shortened) review of Grenat (emphasis mine):
À l’enseigne de Grenat, Antoine Joannier et Neil Mahatsry ont le rouge aux joues et aux murs. Après avoir officié à La Brasserie Communale, où ils se sont rencontrés, les deux compères font désormais feu de tout bois en plein centre de Marseille, où Antoine veille au grain autour des tables en bois blond, convoyant les plats que Neil embrase à tout-va depuis l’âtre derrière le comptoir. Des huîtres aux pièces viandardes, […]
and LeFooding.com’s English version:
At Grenat, Antoine Joannier and Neil Mahatsry are bathed in an ardent red glow, much like the pomegranate-toned walls of their space. After working together at La Brasserie Communale, where they first met, the duo is now firing on all cylinders in the heart of Marseille, where Antoine tends to guests seated around blonde wood tables, delivering dishes ignited by Neil behind the bar. From oysters to prime cuts of red meat, […]
Notice that the review mentions the restaurant staff, their prior experience at La Brasserie Communale and gives their roles in the restaurant. All of this information should be included in the network. Using OpenAI’s gpt4o-mini model with its structured generation functionality, I can extract information from the 1800 publicly available LeFooding.com reviews going back four years. Instead of the order of 1800 “current” staff-restaurant relationships, I uncovered over 5000 links, enriching our analysis with all possible information from the reviews.
I combined the extracted information into a graph you can explore below (or click here) below. To get started in exploring look for Grenat, Antoine, Neil or Septime and other restaurants I’ve mentioned. 1
The network has over 5000 nodes and as many edges. In the visualization, green nodes represent staff and purple nodes represent restaurants. Larger nodes have a higher degree (more neighbors). Edges link restaurants and staff. You can it below. Look for your favorite restaurant, or click around to discover the connected components in the graph.
Making the network
To get here, I scraped data from LeFooding.com, extracted the desired information from the reviews, built the graph and finally made the visualization. The code used is available at theophilec/foudinge.
The crux of the approach is extracting the information is a structured way. Indeed, I need an easy was of building the graph and knowing which restaurants and people are related. Large Language Models (LLMs) are excellent in understanding a text, and can answer questions like “Where did Antoine Joannier work before Grenat?” with the Grenat review. However, if we ask for the information is a structured format like a JSON object, they struggle to reliably give useful answers.
For instance, the desired output for the Grenat review could be something like this:
Person(
name='Antoine Joannier',
role='Host',
previous_restaurants=['La Brasserie Communale']
)
Person(
name='Neil Mahatsry',
role='Chef',
previous_restaurants=['La Brasserie Communale']
)
This is the art of structured generation.
There are several approaches to this problem. First, one can prompt better. Second, and in combination with the first approach, one can retry inference until the response parses in the desired schema. While this can work as a one-off when hacking a configuration file, it won’t work for almost 2000 reviews.
Another approach uses the logits of the model. These define the sampling distributions for the next token. They can be modified at inference to force the generated tokens to build into a response respecting a defined structure. Behind the scences, this relies on regular expressions and finite state machines (the same techniques used when parsing JSON). When using such techniques, you prompt the model however you want (and give the schema to inference API) and the output provably respects the desired structure 100% of the time, and can be used reliably downstream.
The outlines
library built by .txt makes this possible for your favorite open model, such as Mistral-7B-v0.3
. It also wraps OpenAI’s implementation of this in its API.
Once a prompt_template
and Summary
schema are designed, it’s as easy as:
model = ...
generator = outlines.generate.json(model, Summary)
result = generator(prompt_template(text))
I put Mistral, Meta and OpenAI head to head. Both Llama3.2-3B and Mistral-7B-v0.3 had a tendency to hallucinate fake people and related restaurants. In the end, I opted for OpenAI gpt4o-mini, which was also much faster. But even gpt4o-mini was susceptible to mistakes, which makes me optimistic that better prompting, an improved schema design will allow me to reproduce the results with Mistral or Llama (future work).
The total cost of inference over 2000 reviews with gpt4o-mini with its structured generation endpoint is less that 1€!
I used gephi-lite to create work on the visualization. The key step was spatialization, which defines a spatial layout of the graph that makes sense given the information contained in the graph. In our case, I used the force simulation setting which sends connected nodes close and nodes with larger degree to the center. The embedded visualization runs using WebGL thanks to the Retina project2
In order to handle duplicates and eliminating some obvious errors visible on the graph once visualized I asked Claude 3.7 Sonnet to create a simple web app to allow me to edit the inferred entities while keeping the structure intact (as well as the encoding issues caused by sqlite/myself). It only took 2 iterations to get it working as I wanted, and then a few more iterations to get full-text search and some other features. It uses Flask and pure JS in jinja templates. It is available on Github at foudinge-scrub
There are still a few duplicates to fix, which I’ll get around to (maybe).
All the code used for scraping (using sqlite, asyncio and aiohttp), the inference (using outlines), the prompt and the Pydantic Schema are available on Github at theophilec/foudinge.
Conclusion
This project illustrates how LLMs can be used to extract information from rich sources of textual data, in this instance restaurant reviews. I am very happy with the tools I chose. A good next step is to get all this running of open models, locally or in a public cloud.
Thanks to Adrien Bocquet and Clotilde Bukato for proofreading earlier versions of this post.
-
Thanks to @jacomyal from @ouestware on GitHub for helping me get a better spatialization. I have updated the visualization in the post. You can see the original and inferior spatialization here. ↩
-
I identified and reported an issue in gephi-lite where string escaping prevented proper Retina imports. It is now fixed in production. ↩