Code structure
Algorithm wrappers
To run methods in tab_bench, one needs to
provide them as a subclass of tab_bench.alg_wrappers.general.AlgWrapper.
Generally, we use models from the tab_models library that implement
the AlgInterface from there, and wrap them lightly as an AlgInterfaceWrapper
in tab_bench/alg_wrappers/interface_wrappers.py,
see the numerous classes there for examples.
As in tab_models, we pass parameters to these models via **kwargs.
The scikit-learn interfaces in tab_models provide in their constructors
a list of the most important hyperparameters.
Datasets
We represent our datasets using the DictDataset class from tab_models.
These datasets can be loaded as follows:
from pytabkit.bench.data.paths import Paths
from pytabkit.bench.data.tasks import TaskDescription
paths = Paths.from_env_variables()
task_desc = TaskDescription('openml-reg', 'fifa')
task_info = task_desc.load_info(paths) # a TaskInfo object
task = task_info.load_task(paths)
ds = task.ds # this is the DictDataset object
We can convert ds to a Pandas DataFrame using ds.to_df().
It is also possible to load a list of all TaskInfo objects
for an entire task collection:
from pytabkit.bench.data.paths import Paths
from pytabkit.bench.data.tasks import TaskCollection
paths = Paths.from_env_variables()
task_infos = TaskCollection.from_name('meta-train-class', paths).load_infos(paths)
Scheduling code
We implement general scheduling code in tab_bench/scheduling.
This code can take a list of jobs with certain functionalities
and run them in parallel in a single-node or multi-node setup,
respecting the provided resource requirements
(on RAM usage, number of threads, etc.). It can be used independently as follows:
from typing import List
from pytabkit.bench.scheduling.jobs import AbstractJob
from pytabkit.bench.scheduling.execution import RayJobManager
from pytabkit.bench.scheduling.schedulers import SimpleJobScheduler
jobs: List[AbstractJob] = [] # create a list of jobs here
scheduler = SimpleJobScheduler(RayJobManager())
scheduler.add_jobs(jobs)
scheduler.run()
For our tabular benchmarking code,
the AbstractJob objects will be created by the
tab_bench.run.task_execution.TabBenchJobManager.
Numerous examples for this can be found in run_final_experiments.py.