# Code structure ## Algorithm wrappers To run methods in `tab_bench`, one needs to provide them as a subclass of `tab_bench.alg_wrappers.general.AlgWrapper`. Generally, we use models from the `tab_models` library that implement the `AlgInterface` from there, and wrap them lightly as an `AlgInterfaceWrapper` in `tab_bench/alg_wrappers/interface_wrappers.py`, see the numerous classes there for examples. As in `tab_models`, we pass parameters to these models via `**kwargs`. The scikit-learn interfaces in `tab_models` provide in their constructors a list of the most important hyperparameters. ## Datasets We represent our datasets using the `DictDataset` class from `tab_models`. These datasets can be loaded as follows: ```python from pytabkit.bench.data.paths import Paths from pytabkit.bench.data.tasks import TaskDescription paths = Paths.from_env_variables() task_desc = TaskDescription('openml-reg', 'fifa') task_info = task_desc.load_info(paths) # a TaskInfo object task = task_info.load_task(paths) ds = task.ds # this is the DictDataset object ``` We can convert `ds` to a Pandas DataFrame using `ds.to_df()`. It is also possible to load a list of all TaskInfo objects for an entire task collection: ```python from pytabkit.bench.data.paths import Paths from pytabkit.bench.data.tasks import TaskCollection paths = Paths.from_env_variables() task_infos = TaskCollection.from_name('meta-train-class', paths).load_infos(paths) ``` ## Scheduling code We implement general scheduling code in `tab_bench/scheduling`. This code can take a list of jobs with certain functionalities and run them in parallel in a single-node or multi-node setup, respecting the provided resource requirements (on RAM usage, number of threads, etc.). It can be used independently as follows: ```python from typing import List from pytabkit.bench.scheduling.jobs import AbstractJob from pytabkit.bench.scheduling.execution import RayJobManager from pytabkit.bench.scheduling.schedulers import SimpleJobScheduler jobs: List[AbstractJob] = [] # create a list of jobs here scheduler = SimpleJobScheduler(RayJobManager()) scheduler.add_jobs(jobs) scheduler.run() ``` For our tabular benchmarking code, the `AbstractJob` objects will be created by the `tab_bench.run.task_execution.TabBenchJobManager`. Numerous examples for this can be found in `run_final_experiments.py`. ## Resource estimation ## Evaluation and plotting