Théophile Cantelobre

I am a Machine learning engineer, starting a new job in tech in a few weeks.

Before that, I studied engineering at Mines Paris - PSL, machine learning at Sorbonne Université, and earned my PhD in machine learning from École Normale Supérieure in the SIERRA team at Inria Paris, supervised by Alessandro Rudi, Carlo Ciliberto and Benjamin Guedj, on the topic of structure in ML methods. You can read my thesis here.

Right now, I am building tools in Python and Rust, around ML and the web. Trying to blog about it. Everything is on Github and on my blog.

latest writing

Apr 14, 2025	TIL: Deploying Datasette on fly.io
Feb 27, 2025	Mapping the French Culinary Network
Jan 09, 2025	Animating SVGs with Claude 3.5 Sonnet

See all posts in my blog.

news

Feb 21, 2025	Open sourced my projet Slinky, a browser extension to check links and built for newsletter authors. I am using it to learn Rust and WebAssembly. Still very WIP.
Feb 20, 2025	Co-hosted the second Mines Alumni Intelligence Artificielle meetup with Jean-Philippe Vert and Rémy Dubois who explained how AI is revolutionizing biology and why progress in the field is only just starting! You can watch the recording here.
Feb 06, 2025	My thesis is officially online: check it out.
Oct 16, 2024	Successfully defended my PhD at Inria Paris!
Aug 16, 2024	Finished writing my PhD. My defense will be on the October 16th at 14:00 at Inria Paris.

academic publications

Closed-form Filtering for Non-linear Systems

Théophile Cantelobre , Carlo Ciliberto , Benjamin Guedj , and 1 more author

arXiv 2402.09796, 2024

Abs HTML PDF

Sequential Bayesian Filtering aims to estimate the current state distribution of a Hidden Markov Model, given the past observations. The problem is well-known to be intractable for most application domains, except in notable cases such as the tabular setting or for linear dynamical systems with gaussian noise. In this work, we propose a new class of filters based on Gaussian PSD Models, which offer several advantages in terms of density approximation and computational efficiency. We show that filtering can be efficiently performed in closed form when transitions and observations are Gaussian PSD Models. When the transition and observations are approximated by Gaussian PSD Models, we show that our proposed estimator enjoys strong theoretical guarantees, with estimation error that depends on the quality of the approximation and is adaptive to the regularity of the transition probabilities. In particular, we identify regimes in which our proposed filter attains a TV ε-error with memory and computational complexity of O(ε^-1) and O(ε^-3/2) respectively, including the offline learning step, in contrast to the O(ε^-2) complexity of sampling methods such as particle filtering.
Measuring dissimilarity with diffeomorphism invariance

Théophile Cantelobre , Carlo Ciliberto , Benjamin Guedj , and 1 more author

In Proceedings of the 39th International Conference on Machine Learning , 17–23 jul 2022

Abs HTML PDF Code

Measures of similarity (or dissimilarity) are a key ingredient to many machine learning algorithms. We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces, which leverages the data’s internal structure to be invariant to diffeomorphisms. We prove that DID enjoys properties which make it relevant for theoretical study and practical use. By representing each datum as a function, DID is defined as the solution to an optimization problem in a Reproducing Kernel Hilbert Space and can be expressed in closed-form. In practice, it can be efficiently approximated via Nyström sampling. Empirical experiments support the merits of DID.
A PAC-Bayesian Perspective on Structured Prediction with Implicit Loss Embeddings

Théophile Cantelobre , Benjamin Guedj , María Pérez-Ortiz , and 1 more author

arXiv 2012.03780,, 17–23 jul 2020

Abs PDF

Many practical machine learning tasks can be framed as Structured prediction problems, where several output variables are predicted and considered interdependent. Recent theoretical advances in structured prediction have focused on obtaining fast rates convergence guarantees, especially in the Implicit Loss Embedding (ILE) framework. PAC-Bayes has gained interest recently for its capacity of producing tight risk bounds for predictor distributions. This work proposes a novel PAC-Bayes perspective on the ILE Structured prediction framework. We present two generalization bounds, on the risk and excess risk, which yield insights into the behavior of ILE predictors. Two learning algorithms are derived from these bounds. The algorithms are implemented and their behavior analyzed, with source code available at this link.
A real-time unscented Kalman filter on manifolds for challenging AUV navigation

Théophile Cantelobre , Clément Chahbazian , Arnaud Croux , and 1 more author

In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 17–23 jul 2020

Abs PDF

We consider the problem of localization and navigation of Autonomous Underwater Vehicles (AUV) in the context of high performance subsea asset inspection missions in deep water. We propose a solution based on the recently introduced Unscented Kalman Filter on Manifolds (UKF-M) for onboard navigation to estimate the robot’s location, attitude and velocity, using a precise round and rotating Earth navigation model. Our algorithm has the merit of seamlessly handling nonlinearity of attitude, and is far simpler to implement than the extended Kalman filter (EKF), which is widely used in the navigation industry. The unscented transform notably spares the user the computation of Jacobians and lends itself well to fast prototyping in the context of multi-sensor data fusion. Besides, we provide the community with feedback about implementation, and execution time is shown to be compatible with real-time. Realistic extensive Monte-Carlo simulations prove uncertainty is estimated with accuracy by the filter, and illustrate its convergence ability. Real experiments in the context of a 900m deep dive near Marseille (France) illustrate the relevance of the method.