Stanford Tabular and Relational Project

Benchmark Relational Deep Learning · NeurIPS 2024 · arXiv:2407.20060

RelBench: A Benchmark for Deep Learning on Relational Databases

Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan Eric Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, Jure Leskovec

Stanford University  ·  Kumo.AI

RelBench is an open benchmark for predictive machine learning on relational databases. It pairs curated multi-table databases with well-defined predictive tasks and a single standardized evaluation, so methods can be compared head to head, helping Relational Deep Learning and relational foundation model research move quickly. The benchmark, datasets, and tooling are open source under an MIT license.

7
curated benchmark databases across very different domains
30
predictive tasks with fixed temporal splits
70+
community databases in the wider repository
3
task types spanning classification, regression, and recommendation

Relational deep learning

Most of the world's data lives in relational databases, several tables linked through primary and foreign keys. Turning that data into a prediction has traditionally meant flattening the tables into one feature matrix by hand, a slow and lossy process that throws away much of the relational structure. Relational Deep Learning instead treats the database directly as a graph, where rows are nodes and foreign keys are edges, and learns predictive models end to end with no manual feature engineering. Because the model reads the schema itself, one architecture can serve many tasks over the same database, and increasingly transfer across databases as a relational foundation model.

The paradigm was set out in the Relational Deep Learning position paper (ICML 2024), and RelBench is the benchmark built to measure it. It gives the field a common yardstick, the same way ImageNet did for computer vision, so that progress on relational prediction is concrete and reproducible.

Datasets and tasks

The core benchmark gathers seven real-world databases from very different domains, each shipped with a fixed temporal split and a set of predictive tasks. Every prediction is made at a reference time and may use only information available before it, so each task is a realistic forecast rather than a leaky lookup.

DatabaseDomainExample tasks
rel-amazonE-commerce reviewschurn, lifetime value, recommendation
rel-avitoOnline advertisingclick-through rate, ad visits, clicks
rel-eventEvent recommendationattendance, repeat, ignore
rel-f1Formula 1 racingfinishing position, top-3, DNF
rel-hmFashion retailitem sales, churn, purchase
rel-stackOnline Q&Apost votes, engagement, badges
rel-trialClinical trialsstudy outcome, adverse events, site success

The core tasks fall into three types, each scored with a single metric so results stay directly comparable across methods and databases.

AUROC
Entity classification. Predict a binary outcome for an entity at a future time, such as whether a user will churn in the next window.
MAE
Entity regression. Predict a numeric quantity for an entity, such as the sales an item will see over the coming period.
MAP
Recommendation. Rank the items an entity will engage with, such as the products a user will buy or the posts they will answer.

A growing repository

RelBench v2 expands the benchmark with four new databases (SALT, RateBeer, arXiv, and MIMIC-IV), 36 additional tasks, and a new Autocomplete task type that predicts existing columns inside a database. It also turns RelBench into a single entry point for a much larger collection of relational data.

Leaderboard

RelBench tracks test-set results across its tasks, curated from published papers and kept current as the field shifts toward relational foundation models. Each task reports a single headline metric, and entries are grouped by whether a method runs in-context, is fine-tuned, or both, so zero-shot and task-specific results stay easy to read side by side.

Live results. Browse the RelBench leaderboards, ranked by task and curated from published papers, and contribute numbers through the RelBench repository.

Get started

RelBench installs from PyPI and loads any database and task straight from the Hub, with no per-dataset code.

pip install relbench

Loading a dataset and a task takes only a few lines, and the relational schema comes back ready to model.

from relbench.datasets import get_dataset
from relbench.tasks import get_task

dataset = get_dataset("rel-amazon", download=True)
task = get_task("rel-amazon", "user-churn", download=True)

Two short notebooks walk through the rest and open directly in Google Colab with no local setup.

Cite

If you use RelBench, please cite the position and benchmark papers. If you use RelBench v2, please also cite the v2 paper.

@inproceedings{rdl,
  title     = {Position: Relational Deep Learning - Graph Representation Learning on Relational Databases},
  author    = {Fey, Matthias and Hu, Weihua and Huang, Kexin and Lenssen, Jan Eric and Ranjan, Rishabh and Robinson, Joshua and Ying, Rex and You, Jiaxuan and Leskovec, Jure},
  booktitle = {Forty-first International Conference on Machine Learning}
}
@misc{relbench,
  title         = {RelBench: A Benchmark for Deep Learning on Relational Databases},
  author        = {Joshua Robinson and Rishabh Ranjan and Weihua Hu and Kexin Huang and Jiaqi Han and Alejandro Dobles and Matthias Fey and Jan E. Lenssen and Yiwen Yuan and Zecheng Zhang and Xinwei He and Jure Leskovec},
  year          = {2024},
  eprint       = {2407.20060},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  url          = {https://arxiv.org/abs/2407.20060}
}
@misc{gu2026relbenchv2,
  title         = {{RelBench} v2: A Large-Scale Benchmark and Repository for Relational Data},
  author        = {Justin Gu and Rishabh Ranjan and Charilaos Kanatsoulis and Haiming Tang and Martin Jurkovic and Valter Hudovernik and Mark Znidar and Pranshu Chaturvedi and Parth Shroff and Fengyu Li and Jure Leskovec},
  year          = {2026},
  eprint       = {2602.12606},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  url          = {https://arxiv.org/abs/2602.12606}
}