Overview and Installation of the Benchmarking code
Our benchmarking code contains several features:
Automatic dataset download
Running models (parallelized) with automatic scheduling, trying to respect RAM constraints
Evaluation and plotting
Installation
Our code has been tested with python 3.9 and 3.10. After cloning/forking the repo, the required libraries can be installed as follows:
# in the repo folder:
pip3 install -e .[extra,hpo,bench]
Note that the version requirements in our pyproject.toml
are somewhat restrictive to avoid problems, they can potentially be relaxed.
To more closely reproduce the installation we used for running the benchmarks,
we refer to the configuration files in the original_requirements folder:
The pip-only requirements in
requirements_2024_06_25.txtwere used to compute many of the older NN results (not TabR).The conda requirements in
conda_env_2024_06_25.ymlandconda_env_2024_10_28.ymlwere used to compute GBDT-HPO results and TabR results as well as a few newer NN results. They can be installed as a new conda environment usingconda env create -f conda_env_2024_10_28.yml. Note that the older of the two conda environments was very slow for TabR on some datasets since it uses an older torchmetrics version with slow implementations.
Using Sphinx Documentation
Go to the repo root dir and run
sphinx-apidoc -o docs/source/ pytabkit
sphinx-build -M html docs/source/ docs/build/
then open docs/build/html/index.html.