Overview and Installation of the Benchmarking code

Our benchmarking code contains several features:

Automatic dataset download
Running models (parallelized) with automatic scheduling, trying to respect RAM constraints
Evaluation and plotting

Installation

Our code has been tested with python 3.9 and 3.10. After cloning/forking the repo, the required libraries can be installed as follows:

# in the repo folder:
pip3 install -e .[extra,hpo,bench]

Note that the version requirements in our pyproject.toml are somewhat restrictive to avoid problems, they can potentially be relaxed.

To more closely reproduce the installation we used for running the benchmarks, we refer to the configuration files in the original_requirements folder:

The pip-only requirements in requirements_2024_06_25.txt were used to compute many of the older NN results (not TabR).
The conda requirements in conda_env_2024_06_25.yml and conda_env_2024_10_28.yml were used to compute GBDT-HPO results and TabR results as well as a few newer NN results. They can be installed as a new conda environment using conda env create -f conda_env_2024_10_28.yml. Note that the older of the two conda environments was very slow for TabR on some datasets since it uses an older torchmetrics version with slow implementations.

Using Sphinx Documentation

Go to the repo root dir and run

sphinx-apidoc -o docs/source/ pytabkit
sphinx-build -M html docs/source/ docs/build/

then open docs/build/html/index.html.