RelBench is an open benchmark for predictive machine learning on relational databases. It pairs curated multi-table databases with well-defined predictive tasks and a single standardized evaluation, so methods can be compared head to head, helping Relational Deep Learning and relational foundation model research move quickly. The benchmark, datasets, and tooling are open source under an MIT license.
Relational deep learning
Most of the world's data lives in relational databases, several tables linked through primary and foreign keys. Turning that data into a prediction has traditionally meant flattening the tables into one feature matrix by hand, a slow and lossy process that throws away much of the relational structure. Relational Deep Learning instead treats the database directly as a graph, where rows are nodes and foreign keys are edges, and learns predictive models end to end with no manual feature engineering. Because the model reads the schema itself, one architecture can serve many tasks over the same database, and increasingly transfer across databases as a relational foundation model.
The paradigm was set out in the Relational Deep Learning position paper (ICML 2024), and RelBench is the benchmark built to measure it. It gives the field a common yardstick, the same way ImageNet did for computer vision, so that progress on relational prediction is concrete and reproducible.
Datasets and tasks
The core benchmark gathers seven real-world databases from very different domains, each shipped with a fixed temporal split and a set of predictive tasks. Every prediction is made at a reference time and may use only information available before it, so each task is a realistic forecast rather than a leaky lookup.
| Database | Example tasks |
|---|---|
| rel-amazon | churn, lifetime value, recommendation |
| rel-avito | click-through rate, ad visits, clicks |
| rel-event | attendance, repeat, ignore |
| rel-f1 | finishing position, top-3, DNF |
| rel-hm | item sales, churn, purchase |
| rel-stack | post votes, engagement, badges |
| rel-trial | study outcome, adverse events, site success |
The core tasks fall into three types, each scored with a single metric so results stay directly comparable across methods and databases.
A growing repository
RelBench v2 expands the benchmark with four new databases (SALT, RateBeer, arXiv, and MIMIC-IV), 36 additional tasks, and a new Autocomplete task type that predicts existing columns inside a database. It also turns RelBench into a single entry point for a much larger collection of relational data.
- CTU Prague repository. More than 70 relational databases from the CTU Prague Relational Learning Repository, integrated through ReDeLEx and installable with
relbench[ctu]. - 4DBInfer. Seven databases from the 4DBInfer benchmark for graph-centric predictive modeling, installable with
relbench[dbinfer]. - Temporal Graph Benchmark. The Temporal Graph Benchmark, with time-stamped event streams expressed as relational schemas, so temporal graph models and Relational Deep Learning can be compared directly.
Leaderboard
RelBench tracks test-set results across its tasks, curated from published papers and kept current as the field shifts toward relational foundation models. Each task reports a single headline metric, and entries are grouped by whether a method runs in-context, is fine-tuned, or both, so zero-shot and task-specific results stay easy to read side by side.
Live results. Browse the RelBench leaderboards, ranked by task and curated from published papers, and contribute numbers through the RelBench repository.
Get started
RelBench installs from PyPI and loads any database and task straight from the Hub, with no per-dataset code.
pip install relbench
Loading a dataset and a task takes only a few lines, and the relational schema comes back ready to model.
from relbench.datasets import get_dataset
from relbench.tasks import get_task
dataset = get_dataset("rel-amazon", download=True)
task = get_task("rel-amazon", "user-churn", download=True)
Two short notebooks walk through the rest and open directly in Google Colab with no local setup.
Cite
If you use RelBench, please cite the position and benchmark papers. If you use RelBench v2, please also cite the v2 paper.
title = {Position: Relational Deep Learning - Graph Representation Learning on Relational Databases},
author = {Fey, Matthias and Hu, Weihua and Huang, Kexin and Lenssen, Jan Eric and Ranjan, Rishabh and Robinson, Joshua and Ying, Rex and You, Jiaxuan and Leskovec, Jure},
booktitle = {Forty-first International Conference on Machine Learning}
}
title = {RelBench: A Benchmark for Deep Learning on Relational Databases},
author = {Joshua Robinson and Rishabh Ranjan and Weihua Hu and Kexin Huang and Jiaqi Han and Alejandro Dobles and Matthias Fey and Jan E. Lenssen and Yiwen Yuan and Zecheng Zhang and Xinwei He and Jure Leskovec},
year = {2024},
eprint = {2407.20060},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2407.20060}
}
title = {{RelBench} v2: A Large-Scale Benchmark and Repository for Relational Data},
author = {Justin Gu and Rishabh Ranjan and Charilaos Kanatsoulis and Haiming Tang and Martin Jurkovic and Valter Hudovernik and Mark Znidar and Pranshu Chaturvedi and Parth Shroff and Fengyu Li and Jure Leskovec},
year = {2026},
eprint = {2602.12606},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2602.12606}
}