Reproducing results of “Rethinking Early Stopping: Refine, Then Calibrate”

Here, we document how to reproduce results from our paper Rethinking Early Stopping: Refine, Then Calibrate. For general instructions on how to set data paths and use slurm, we refer to the installation page. The following will be the parts specific to this paper.

Installation

pip install probmetrics[extra]  # to get smECE
pip install pytabkit[bench,dev]

Original environment

The original conda environment for exact reproduction is stored in original_requirements/conda_env_2025_01_15.yml.

Downloading datasets

Download the zipped datasets (dataset-latest.zip) of the TALENT benchmark from here. Extract them into a folder. Then, use

python3 scripts/download_data.py --import_talent_class_small --talent_folder=<unzipped data folder>

where the provided data folder should be the data folder inside the unzipped results.

Running experiments

Experiments can be run using python3 scripts/run_probclass_experiments.py, then plots can be generated using python3 scripts/create_probclass_plots.py.