Reproducing results of “Rethinking Early Stopping: Refine, Then Calibrate”
Here, we document how to reproduce results from our paper Rethinking Early Stopping: Refine, Then Calibrate. For general instructions on how to set data paths and use slurm, we refer to the installation page. The following will be the parts specific to this paper.
Installation
pip install probmetrics[extra] # to get smECE
pip install pytabkit[bench,dev]
Original environment
The original conda environment for exact reproduction
is stored in original_requirements/conda_env_2025_01_15.yml.
Downloading datasets
Download the zipped datasets (dataset-latest.zip) of the TALENT benchmark from
here.
Extract them into a folder. Then, use
python3 scripts/download_data.py --import_talent_class_small --talent_folder=<unzipped data folder>
where the provided data folder should be the data folder inside the unzipped results.
Running experiments
Experiments can be run using python3 scripts/run_probclass_experiments.py,
then plots can be generated using python3 scripts/create_probclass_plots.py.