RelBench is an open benchmark for predictive machine learning on relational databases. It pairs curated multi-table databases with well-defined predictive tasks and a single standardized evaluation, so methods can be compared head to head, helping Relational Deep Learning and relational foundation model research move quickly. The benchmark, datasets, and tooling are open source under an MIT license.

curated benchmark databases across very different domains

predictive tasks with fixed temporal splits

70⁺

community databases in the wider repository

task types spanning classification, regression, and recommendation

Relational deep learning

Most of the world's data lives in relational databases, several tables linked through primary and foreign keys. Turning that data into a prediction has traditionally meant flattening the tables into one feature matrix by hand, a slow and lossy process that throws away much of the relational structure. Relational Deep Learning instead treats the database directly as a graph, where rows are nodes and foreign keys are edges, and learns predictive models end to end with no manual feature engineering. Because the model reads the schema itself, one architecture can serve many tasks over the same database, and increasingly transfer across databases as a relational foundation model.

The paradigm was set out in the Relational Deep Learning position paper (ICML 2024), and RelBench is the benchmark built to measure it. It gives the field a common yardstick, the same way ImageNet did for computer vision, so that progress on relational prediction is concrete and reproducible.

Datasets and tasks

The core benchmark gathers seven real-world databases from very different domains, each shipped with a fixed temporal split and a set of predictive tasks. Every prediction is made at a reference time and may use only information available before it, so each task is a realistic forecast rather than a leaky lookup.

Database	Domain	Example tasks
rel-amazon	E-commerce reviews	churn, lifetime value, recommendation
rel-avito	Online advertising	click-through rate, ad visits, clicks
rel-event	Event recommendation	attendance, repeat, ignore
rel-f1	Formula 1 racing	finishing position, top-3, DNF
rel-hm	Fashion retail	item sales, churn, purchase
rel-stack	Online Q&A	post votes, engagement, badges
rel-trial	Clinical trials	study outcome, adverse events, site success

The core tasks fall into three types, each scored with a single metric so results stay directly comparable across methods and databases.

AUROC

Entity classification. Predict a binary outcome for an entity at a future time, such as whether a user will churn in the next window.

MAE

Entity regression. Predict a numeric quantity for an entity, such as the sales an item will see over the coming period.

MAP

Recommendation. Rank the items an entity will engage with, such as the products a user will buy or the posts they will answer.

A growing repository

RelBench v2 expands the benchmark with four new databases (SALT, RateBeer, arXiv, and MIMIC-IV), 36 additional tasks, and a new Autocomplete task type that predicts existing columns inside a database. It also turns RelBench into a single entry point for a much larger collection of relational data.

CTU Prague repository. More than 70 relational databases from the CTU Prague Relational Learning Repository, integrated through ReDeLEx and installable with relbench[ctu].
4DBInfer. Seven databases from the 4DBInfer benchmark for graph-centric predictive modeling, installable with relbench[dbinfer].
Temporal Graph Benchmark. The Temporal Graph Benchmark, with time-stamped event streams expressed as relational schemas, so temporal graph models and Relational Deep Learning can be compared directly.

Leaderboard

RelBench tracks test-set results across its tasks, curated from published papers and kept current as the field shifts toward relational foundation models. Each task reports a single headline metric, and entries are grouped by whether a method runs in-context, is fine-tuned, or both, so zero-shot and task-specific results stay easy to read side by side.

Live results. Browse the RelBench leaderboards, ranked by task and curated from published papers, and contribute numbers through the RelBench repository.

Get started

RelBench installs from PyPI and loads any database and task straight from the Hub, with no per-dataset code.

pip install relbench

Loading a dataset and a task takes only a few lines, and the relational schema comes back ready to model.

from relbench.datasets import get_dataset
from relbench.tasks import get_task

dataset = get_dataset("rel-amazon", download=True)
task = get_task("rel-amazon", "user-churn", download=True)

Two short notebooks walk through the rest and open directly in Google Colab with no local setup.

Quickstartload a dataset, run a baseline Training a GNNbuild a graph, train a model All notebookstutorials on GitHub

Cite

If you use RelBench, please cite the position and benchmark papers. If you use RelBench v2, please also cite the v2 paper.

@inproceedings{rdl,
  title     = {Position: Relational Deep Learning - Graph Representation Learning on Relational Databases},
  author    = {Fey, Matthias and Hu, Weihua and Huang, Kexin and Lenssen, Jan Eric and Ranjan, Rishabh and Robinson, Joshua and Ying, Rex and You, Jiaxuan and Leskovec, Jure},
  booktitle = {Forty-first International Conference on Machine Learning}
}

@misc{relbench,
  title         = {RelBench: A Benchmark for Deep Learning on Relational Databases},
  author        = {Joshua Robinson and Rishabh Ranjan and Weihua Hu and Kexin Huang and Jiaqi Han and Alejandro Dobles and Matthias Fey and Jan E. Lenssen and Yiwen Yuan and Zecheng Zhang and Xinwei He and Jure Leskovec},
  year          = {2024},
  eprint       = {2407.20060},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  url          = {https://arxiv.org/abs/2407.20060}
}

@misc{gu2026relbenchv2,
  title         = {{RelBench} v2: A Large-Scale Benchmark and Repository for Relational Data},
  author        = {Justin Gu and Rishabh Ranjan and Charilaos Kanatsoulis and Haiming Tang and Martin Jurkovic and Valter Hudovernik and Mark Znidar and Pranshu Chaturvedi and Parth Shroff and Fengyu Li and Jure Leskovec},
  year          = {2026},
  eprint       = {2602.12606},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  url          = {https://arxiv.org/abs/2602.12606}
}