# Running the benchmark

## Configuration of data paths

The paths for storing data and results are configured
through the `tab_bench.data.paths.Paths` class. 
There are several options to configure which folders are used, 
which will be automatically recognized by `Paths.from_env_variables()`:

- **Through environmental variables**: 
The base folder can be configured by setting the environmental variable
`TAB_BENCH_DATA_BASE_FOLDER`. 
Optionally, some sub-folders can be set separately 
(e.g. for moving them to another partition). These are
`TAB_BENCH_DATA_TASKS_FOLDER`, `TAB_BENCH_DATA_RESULTS_FOLDER`,
`TAB_BENCH_DATA_RESULT_SUMMARIES_FOLDER`, `TAB_BENCH_DATA_UCI_DOWNLOAD_FOLDER`.
- **Through a python file**: If `TAB_BENCH_DATA_BASE_FOLDER` is not available, 
the code will try to get the base folder (as a string) from
`scripts.custom_paths.get_base_folder()`.
This can be implemented by copying `scripts/custom_paths.py.default` to `scripts/custom_paths.py`
  (ignored by git) and adjusting the path therein.
- If neither of the two options above is used, 
all data will be stored in `./tab_bench_data`.


## Download datasets

To download all datasets for the meta-train and meta-test benchmarks, run 
(with your desired OpenML cache directory, optionally)
```commandline
python3 scripts/download_data.py openml_cache_dir --import_meta_train --import_meta_test --import_grinsztajn_medium
```
To run methods on the benchmarks, there are two options:

## Run experiments with slurm

Our benchmarking code contains its own scheduling code that will start subprocesses 
for each algorithm-dataset-split combination.
Therefore, it is in principle possible to run all experiments 
through a single slurm job,
though experiments can be divided into smaller pieces by running them separately.

First, in `scripts/ray_slurm_template.sh`, 
replace the line `cd ~/git/pytabkit` according to your folder location.
Also, make sure that the data path is specified there 
if you want to set it via an environmental variable.
Run the following command (replacing some of the parameters with your own values) on the login node:
```commandline
python3 scripts/ray_slurm_launch.py --exp_name=my_exp_name --num_nodes=num_nodes --queue="queue_name" --time=24:00:00 --mail_user="my@address.edu" --log_folder=log_folder --command="python3 -u scripts/run_slurm.py"
```
This will submit a job to the configured queue that will run `scripts/run_slurm.py` and create logfiles.
Your experiments then have to be configured in `scripts/run_slurm.py`, see below.
Multi-node is supported: `ray` will start instances on each node
and our benchmarking code will schedule the individual experiments on the nodes.

## Run experiments without slurm

Run the file with the corresponding experiments directly. 
For example, many of our experiment configurations 
can be found in `scripts/run_experiments.py`. 
One possible way to run the experiments detached from the shell with log-files is
````commandline
systemd-run --scope --user python3 -u scripts/run_experiments.py > ./out.log 2> ./err.log &
````

## Time measurements

For time measurements, simply run `scripts/run_time_measurements.py` (with or without slurm).
Results can be printed using `scripts/print_runtimes.py` 
(but these are averaged total times, not averaged per 1K samples as in the paper).

## Evaluating the benchmark results

Aggregated algorithm results can be printed using 
````commandline
python3 scripts/run_evaluation.py meta-train-class
````
where `meta-train-class` can be replaced by the name of any other task collection 
(that is stored in the `task_collections` folder in the configured data directory),
or a single dataset such as `openml-class/Higgs`. 
This script also has many more command line options, see the python file.
For example, one can print only those methods with a certain tag 
using the `--tag` option,
print results on individual datasets, for different metrics, etc.
The parameters are the same as the ones of the following method:
```{eval-rst}  
.. autofunction:: scripts.run_evaluation.show_eval
```

## Creating plots and tables

Plots and tables can be created using
````commandline
python3 scripts/create_plots_and_tables.py
````
The plots without missing value datasets require running
```commandline
python3 scripts/check_missing_values.py
```
once beforehand.

## Single-task experiments

You can also run a configuration on a single data set,
without saving the results, by adjusting and running `scripts/run_single_task.py`.

## Other utilities

- Use `scripts/analyze_tasks.py` to print some dataset statistics.
- You can rename a method using `python3 scripts/rename_alg.py old_name new_name`.
- We used some code in `scripts/meta_hyperopt.py` to optimize the default parameters for GBDTs.
- The code in `scripts/estimate_resource_params.py` has been used to get more precise estimates 
for RAM usage etc. for running methods on the benchmark.
- `scripts/print_complete_results.py` can be used to check which methods have results available 
on all splits for all tasks in a given collection.