pytabkit.models.alg_interfaces package
Submodules
pytabkit.models.alg_interfaces.alg_interfaces module
- class pytabkit.models.alg_interfaces.alg_interfaces.AlgInterface
Bases:
objectAlgInterface is an abstract base class for tabular ML methods with an interfaces that offers more possibilities than a standard scikit-learn interface.
In particular, it allows for parallelized fitting of multiple models, bagging, and refitting. The idea is as follows:
- The dataset can be split into a test set and the remaining data. (We call this a trainval-test split.)
The fit() method allows to specify multiple such splits, and some AlgInterface implementations (NNAlgInterface) allow to vectorize computations across these splits. However, for vectorization, we may require that the test set sizes are identical in all splits.
- The remaining data can further be split into training and validation data. (We call this a train-val split.)
AlgInterface allows to fit with one or multiple train-val splits, which can also be vectorized in NNAlgInterface. Optionally, the function get_refit_interface() allows to extract an AlgInterface that can be used for fitting the model on training+validation set with the best settings found on the validation set in the cross-validation stage (represented by self.fit_params). These “best settings” could be an early stopping epoch or number of trees, or best hyperparameters found by hyperparameter optimization. We call this refitting.
Another feature of AlgInterface is that it provides methods to get (an estimate of) required resources and to evaluate metrics on training, validation, and test set.
- __init__(fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
- eval(ds, idxs_list, metrics, return_preds)
Evaluates the (already fitted) method using various metrics on training, validation, and test sets. The results will also contain the found fit_params and optionally the predictions on the dataset. This method should normally not be overridden in subclasses.
- Parameters:
ds (DictDataset) – Dataset.
idxs_list (List[SplitIdxs]) – List of indices for the training-validation-test splits, one per trainval-test split as in fit().
metrics (Metrics | None) – Metrics object that defines which metrics should be evaluated. If metrics is None, an empty list will be returned (which might avoid unnecessary computation when implementing fit() through fit_and_eval()).
return_preds (bool) – Whether the predictions on the dataset should be included in the returned results.
- Returns:
Returns a list with one NestedDict for every trainval-test split. Denote by results such a NestedDict object. Then, results will contain the following contents: results[‘metrics’, ‘train’/’val’/’test’, str(n_models), str(start_idx), metric_name] = metric_value Here, an ensemble of the predictions of models [start_idx:start_idx+n_models] will be used. results[‘y_preds’] = a list (converted from a tensor) with predictions on the whole dataset, included only if return_preds==True. results[‘fit_params’] = self.fit_params
- Return type:
List[NestedDict]
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
List[List[List[Tuple[Dict, float]]]] | None
- fit_and_eval(ds, idxs_list, interface_resources, logger, tmp_folders, name, metrics, return_preds)
Run fit() with the given parameters and then return the result of eval() with the given metrics. This method can be overridden instead of fit() if it is more convenient. The idea is that for hyperparameter optimization, one has to evaluate each hyperparameter combination anyway after training it, so it is more efficient to implement fit_and_eval() and return the evaluation of the best method at the end. See the documentation of fit() and eval() for the meaning of the parameters and returned values.
- Parameters:
ds (DictDataset)
idxs_list (List[SplitIdxs])
interface_resources (InterfaceResources)
logger (Logger)
tmp_folders (List[Path | None])
name (str)
metrics (Metrics | None)
return_preds (bool)
- Return type:
List[NestedDict]
- get_available_predict_params()
- Return type:
Dict[str, Dict[str, Any]]
- get_current_predict_params_dict()
- get_current_predict_params_name()
- get_fit_params()
- Returns:
Return self.fit_params.
- Return type:
List[Dict] | None
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- set_current_predict_params(name)
- Parameters:
name (str)
- Return type:
None
- to(device)
- Parameters:
device (str)
- Return type:
None
- class pytabkit.models.alg_interfaces.alg_interfaces.MultiSplitWrapperAlgInterface
Bases:
AlgInterface- __init__(single_split_interfaces, **config)
- Parameters:
fit_params – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
single_split_interfaces (List[AlgInterface])
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
List[List[List[Tuple[Dict, float]]]] | None
- fit_and_eval(ds, idxs_list, interface_resources, logger, tmp_folders, name, metrics, return_preds)
Run fit() with the given parameters and then return the result of eval() with the given metrics. This method can be overridden instead of fit() if it is more convenient. The idea is that for hyperparameter optimization, one has to evaluate each hyperparameter combination anyway after training it, so it is more efficient to implement fit_and_eval() and return the evaluation of the best method at the end. See the documentation of fit() and eval() for the meaning of the parameters and returned values.
- Parameters:
ds (DictDataset)
idxs_list (List[SplitIdxs])
interface_resources (InterfaceResources)
logger (Logger)
tmp_folders (List[Path | None])
name (str)
metrics (Metrics | None)
return_preds (bool)
- Return type:
List[NestedDict]
- get_available_predict_params()
- Return type:
Dict[str, Dict[str, Any]]
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- set_current_predict_params(name)
- Parameters:
name (str)
- Return type:
None
- class pytabkit.models.alg_interfaces.alg_interfaces.OptAlgInterface
Bases:
SingleSplitAlgInterface- __init__(hyper_optimizer, max_resource_config, **config)
- Parameters:
fit_params – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
hyper_optimizer (HyperOptimizer)
max_resource_config (Dict)
- create_alg_interface(n_sub_splits, **config)
- Parameters:
n_sub_splits (int)
- Return type:
- fit_and_eval(ds, idxs_list, interface_resources, logger, tmp_folders, name, metrics, return_preds)
Run fit() with the given parameters and then return the result of eval() with the given metrics. This method can be overridden instead of fit() if it is more convenient. The idea is that for hyperparameter optimization, one has to evaluate each hyperparameter combination anyway after training it, so it is more efficient to implement fit_and_eval() and return the evaluation of the best method at the end. See the documentation of fit() and eval() for the meaning of the parameters and returned values.
- Parameters:
ds (DictDataset)
idxs_list (List[SplitIdxs])
interface_resources (InterfaceResources)
logger (Logger)
tmp_folders (List[Path | None])
name (str)
metrics (Metrics | None)
return_preds (bool)
- Return type:
List[NestedDict]
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- objective(params, ds, idxs_list, interface_resources, logger, tmp_folder, name, metrics, return_preds)
- Parameters:
ds (DictDataset)
idxs_list (List[SplitIdxs])
interface_resources (InterfaceResources)
logger (Logger)
tmp_folder (Path | None)
name (str)
metrics (Metrics | None)
return_preds (bool)
- Return type:
Tuple[float, Tuple[List[NestedDict], AlgInterface]]
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- class pytabkit.models.alg_interfaces.alg_interfaces.RandomParamsAlgInterface
Bases:
SingleSplitAlgInterface- __init__(model_idx, fit_params=None, **config)
- Parameters:
model_idx (int) – used for seeding along with the seed given in fit(), so we can do random search HPO by combining multiple RandomParamsNNAlgInterface objects with different model_idx values-
fit_params (List[Dict[str, Any]] | None) – Fit parameters (stopping epoch for refitting).
config – Configuration parameters.
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
None
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- class pytabkit.models.alg_interfaces.alg_interfaces.SingleSplitAlgInterface
Bases:
AlgInterface
pytabkit.models.alg_interfaces.autogluon_model_interfaces module
- class pytabkit.models.alg_interfaces.autogluon_model_interfaces.AutoGluonModelAlgInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
pytabkit.models.alg_interfaces.base module
- class pytabkit.models.alg_interfaces.base.InterfaceResources
Bases:
objectSimple class representing resources that a method is allowed to use (number of threads and GPUs).
- __init__(n_threads, gpu_devices, time_in_seconds=None)
- Parameters:
n_threads (int)
gpu_devices (List[str])
time_in_seconds (int | None)
- class pytabkit.models.alg_interfaces.base.RequiredResources
Bases:
objectRepresents estimated/requested resources by a method.
- __init__(time_s, n_threads, cpu_ram_gb, n_gpus=0, gpu_usage=1.0, gpu_ram_gb=0.0, n_explicit_physical_cores=0)
- Parameters:
time_s (float)
n_threads (float)
cpu_ram_gb (float)
n_gpus (int)
gpu_usage (float)
gpu_ram_gb (float)
n_explicit_physical_cores (int)
- static combine_sequential(resources_list)
- Parameters:
resources_list (List[RequiredResources])
- get_resource_vector(fixed_resource_vector)
- Parameters:
fixed_resource_vector (ndarray)
- should_add_fixed_resources()
- Return type:
bool
- class pytabkit.models.alg_interfaces.base.SplitIdxs
Bases:
objectRepresents multiple train-validation-test splits for AlgInterface.
- __init__(train_idxs, val_idxs, test_idxs, split_seed, sub_split_seeds, split_id)
- Parameters:
train_idxs (Tensor) – Tensor of shape (n_trainval_splits, n_train_idxs). Each of the train-val splits needs to have the same number of training samples. The elements of the tensor should index the training set elements in a larger dataset.
val_idxs (Tensor | None) – Tensor of shape (n_trainval_splits, n_val_idxs), or None if no validation set should be used.
test_idxs (Tensor | None) – Tensor of shape (n_test_idxs,). The same test set will be used for all train-val splits.
split_seed (int) – Random seed for algorithms on this split.
sub_split_seeds (List[int]) – Separate random seeds for algorithms on each train-val split (length should be n_trainval_splits).
split_id (int) – ID of this split (for logging/saving purposes).
- get_sub_split_idxs(i)
- Parameters:
i (int)
- Return type:
pytabkit.models.alg_interfaces.calibration module
- class pytabkit.models.alg_interfaces.calibration.PostHocCalibrationAlgInterface
Bases:
AlgInterface- __init__(alg_interface, fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
alg_interface (AlgInterface)
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
List[List[List[Tuple[Dict, float]]]] | None
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- to(device)
- Parameters:
device (str)
- Return type:
None
pytabkit.models.alg_interfaces.catboost_interfaces module
- class pytabkit.models.alg_interfaces.catboost_interfaces.CatBoostCustomMetric
Bases:
object- __init__(metric_name, is_classification, is_higher_better=False, select_pred_col=None)
- Parameters:
metric_name (str)
is_classification (bool)
is_higher_better (bool)
select_pred_col (int | None)
- evaluate(approxes, target, weight)
- get_final_error(error, weight)
- is_max_optimal()
- class pytabkit.models.alg_interfaces.catboost_interfaces.CatBoostHyperoptAlgInterface
Bases:
OptAlgInterface- __init__(space=None, n_hyperopt_steps=50, **config)
- Parameters:
fit_params – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
n_hyperopt_steps (int)
- create_alg_interface(n_sub_splits, **config)
- Parameters:
n_sub_splits (int)
- Return type:
- class pytabkit.models.alg_interfaces.catboost_interfaces.CatBoostSklearnSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.catboost_interfaces.CatBoostSubSplitInterface
Bases:
TreeBasedSubSplitInterface- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.catboost_interfaces.RandomParamsCatBoostAlgInterface
Bases:
RandomParamsAlgInterface
pytabkit.models.alg_interfaces.ensemble_interfaces module
- class pytabkit.models.alg_interfaces.ensemble_interfaces.AlgorithmSelectionAlgInterface
Bases:
SingleSplitAlgInterfacePicks the best model out of a list of candidates.
- __init__(alg_interfaces, fit_params=None, **config)
- Parameters:
fit_params (List[Dict] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
alg_interfaces (List[AlgInterface])
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
None
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- to(device)
- Parameters:
device (str)
- Return type:
None
- class pytabkit.models.alg_interfaces.ensemble_interfaces.CaruanaEnsembleAlgInterface
Bases:
SingleSplitAlgInterfaceFollowing a simple variant of Caruana et al. (2004), “Ensemble selection from libraries of models” without pre-selection of candidates
- __init__(alg_interfaces, fit_params=None, **config)
- Parameters:
fit_params (List[Dict] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
alg_interfaces (List[AlgInterface])
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
None
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- to(device)
- Parameters:
device (str)
- Return type:
None
- class pytabkit.models.alg_interfaces.ensemble_interfaces.PrecomputedPredictionsAlgInterface
Bases:
SingleSplitAlgInterface- __init__(y_preds_cv, y_preds_refit, fit_params_cv, fit_params_refit)
- Parameters:
fit_params – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
y_preds_cv (Tensor)
y_preds_refit (Tensor | None)
fit_params_cv (Dict)
fit_params_refit (Dict | None)
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
None
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
pytabkit.models.alg_interfaces.lightgbm_interfaces module
- class pytabkit.models.alg_interfaces.lightgbm_interfaces.LGBMCustomMetric
Bases:
object- __init__(metric_name, is_classification, is_higher_better=False)
- Parameters:
metric_name (str)
is_classification (bool)
is_higher_better (bool)
- class pytabkit.models.alg_interfaces.lightgbm_interfaces.LGBMHyperoptAlgInterface
Bases:
OptAlgInterface- __init__(space=None, n_hyperopt_steps=50, opt_method='hyperopt', **config)
- Parameters:
fit_params – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
n_hyperopt_steps (int)
opt_method (str)
- create_alg_interface(n_sub_splits, **config)
- Parameters:
n_sub_splits (int)
- Return type:
- class pytabkit.models.alg_interfaces.lightgbm_interfaces.LGBMSklearnSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.lightgbm_interfaces.LGBMSubSplitInterface
Bases:
TreeBasedSubSplitInterface- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.lightgbm_interfaces.RandomParamsLGBMAlgInterface
Bases:
RandomParamsAlgInterface
pytabkit.models.alg_interfaces.nn_interfaces module
- class pytabkit.models.alg_interfaces.nn_interfaces.NNAlgInterface
Bases:
AlgInterface- __init__(fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- get_available_predict_params()
- Return type:
Dict[str, Dict[str, Any]]
- get_first_layer_weights(with_scale)
- Parameters:
with_scale (bool)
- Return type:
Tensor
- get_importances()
- Return type:
Tensor
- get_model_ram_gb(ds, n_cv, n_refit, n_splits, split_seeds)
- Parameters:
ds (DictDataset)
n_cv (int)
n_refit (int)
n_splits (int)
split_seeds (List[int])
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- to(device)
- Parameters:
device (str)
- Return type:
None
- class pytabkit.models.alg_interfaces.nn_interfaces.NNHyperoptAlgInterface
Bases:
OptAlgInterface- __init__(space=None, n_hyperopt_steps=50, opt_method='hyperopt', **config)
- Parameters:
fit_params – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
space (str | Dict[str, Any] | None)
n_hyperopt_steps (int)
opt_method (str)
- create_alg_interface(n_sub_splits, **config)
- Parameters:
n_sub_splits (int)
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.nn_interfaces.RandomParamsNNAlgInterface
Bases:
SingleSplitAlgInterface- __init__(model_idx, fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
model_idx (int)
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
None
- get_available_predict_params()
- Return type:
Dict[str, Dict[str, Any]]
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- to(device)
- Parameters:
device (str)
- Return type:
None
- class pytabkit.models.alg_interfaces.nn_interfaces.RealMLPParamSampler
Bases:
object- __init__(is_classification, hpo_space_name='default', **config)
- Parameters:
is_classification (bool)
hpo_space_name (str)
- sample_params(seed)
- Parameters:
seed (int)
- Return type:
Dict[str, Any]
- pytabkit.models.alg_interfaces.nn_interfaces.get_lignting_accel_and_devices(device)
- Parameters:
device (str)
pytabkit.models.alg_interfaces.other_interfaces module
- class pytabkit.models.alg_interfaces.other_interfaces.ExtraTreesSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.other_interfaces.GBTSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.other_interfaces.GrandeSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.other_interfaces.GrandeWrapper
Bases:
objectWrapper class for GRANDE that allows to pass cat_features in fit() instead of the constructor.
- __init__(**config)
- fit(X, y, X_val, y_val, cat_features=None)
- Parameters:
cat_features (List[str] | None)
- predict(X)
- predict_proba(X)
- class pytabkit.models.alg_interfaces.other_interfaces.KANSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.other_interfaces.KNNSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.other_interfaces.LinearModelSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.other_interfaces.RFSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.other_interfaces.RandomParamsExtraTreesAlgInterface
Bases:
RandomParamsAlgInterface
- class pytabkit.models.alg_interfaces.other_interfaces.RandomParamsKNNAlgInterface
Bases:
RandomParamsAlgInterface
- class pytabkit.models.alg_interfaces.other_interfaces.RandomParamsLinearModelAlgInterface
Bases:
RandomParamsAlgInterface
- class pytabkit.models.alg_interfaces.other_interfaces.RandomParamsRFAlgInterface
Bases:
RandomParamsAlgInterface
- class pytabkit.models.alg_interfaces.other_interfaces.SklearnMLPSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.other_interfaces.TabICLSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.other_interfaces.TabPFN2SubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
pytabkit.models.alg_interfaces.resource_computation module
- class pytabkit.models.alg_interfaces.resource_computation.FeatureSpec
Bases:
objectAllows to create a list of product feature names from product and powerset operations etc.
- static concat(*feature_specs)
- static powerset_products(*feature_specs)
- static product(*feature_specs)
- class pytabkit.models.alg_interfaces.resource_computation.LogLinearModule
Bases:
Module- __init__(n_features)
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
n_features (int)
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Parameters:
x (Tensor)
- Return type:
Tensor
- class pytabkit.models.alg_interfaces.resource_computation.LogLinearRegressor
Bases:
object- __init__(pessimistic)
- Parameters:
pessimistic (bool)
- fit(X, y)
- Parameters:
X (ndarray)
y (ndarray)
- get_coefs()
- Return type:
ndarray
- class pytabkit.models.alg_interfaces.resource_computation.NormalizedDataRegressor
Bases:
object- __init__(sub_regressor)
- fit(X, y)
- Parameters:
X (ndarray)
y (ndarray)
- get_coefs()
- Return type:
ndarray
- predict(X)
- Parameters:
X (ndarray)
- Return type:
ndarray
- class pytabkit.models.alg_interfaces.resource_computation.ResourcePredictor
Bases:
objectPredicts resource usages based on a linear model on raw and product features.
- __init__(config, time_params, cpu_ram_params, gpu_ram_params=None, n_gpus=0, gpu_usage=1.0)
- Parameters:
config (Dict[str, Any]) – Configuration parameters.
time_params (Dict[str, float]) – Coefficients for the linear model for time prediction.
cpu_ram_params (Dict[str, float]) – Coefficients for the linear model for CPU RAM prediction.
gpu_ram_params (Dict[str, float] | None) – Coefficients for the linear model for GPU RAM prediction.
n_gpus (int) – Number of GPUs that should be used.
gpu_usage (float) – Usage level of each GPU (between 0 and 1).
- get_required_resources(ds, **extra_params)
Function that provides an estimate of the required resources :param ds: Dataset (does not need to contain the tensors, just the n_samples and tensor_infos) :return: RequiredResources estimate.
- Parameters:
ds (DictDataset)
- Return type:
- class pytabkit.models.alg_interfaces.resource_computation.Sampler
Bases:
object- sample()
- Return type:
int | float
- class pytabkit.models.alg_interfaces.resource_computation.TimeWrapper
Bases:
object- __init__(f)
- Parameters:
f (Callable)
- class pytabkit.models.alg_interfaces.resource_computation.UniformSampler
Bases:
Sampler- __init__(low, high, log=False, is_int=False)
- Parameters:
low (int | float)
high (int | float)
- sample()
- Return type:
int | float
- pytabkit.models.alg_interfaces.resource_computation.create_ds(n_samples, n_cont, n_cat, cat_size, n_classes)
- Parameters:
n_samples (int)
n_cont (int)
n_cat (int)
cat_size (int)
n_classes (int)
- Return type:
- pytabkit.models.alg_interfaces.resource_computation.ds_to_xy(ds)
- Parameters:
ds (DictDataset)
- Return type:
Tuple[DataFrame, ndarray]
- pytabkit.models.alg_interfaces.resource_computation.eval_linear_product_model(raw_features, params)
Computes the “inner product” between the feature dictionaries (obtained from raw features and products according to the keys in params). :return:
- Parameters:
raw_features (Dict[str, Any])
params (Dict[str, float])
- pytabkit.models.alg_interfaces.resource_computation.fit_resource_factors(data, pessimistic, coef_factor=1.0)
- Parameters:
data (List[Tuple[Dict[str, float], float]])
pessimistic (bool)
coef_factor (float)
- pytabkit.models.alg_interfaces.resource_computation.get_resource_features(config, ds, n_cv, n_refit, n_splits, **extra_params)
Extracts features that can be used in a linear model for predicting resource usage.
- Parameters:
config (Dict)
ds (DictDataset)
n_cv (int)
n_refit (int)
n_splits (int)
- Return type:
Dict[str, float]
- pytabkit.models.alg_interfaces.resource_computation.process_resource_features(raw_features, feature_spec)
Adds product features to raw features. :param raw_features: Raw feature values :param feature_spec: List of strings. Each string should be of the form ‘feature_1*…*feature_n’,
using the names of the features whose products should be added
- Returns:
Returns a dictionary of the raw features along with the newly computed product features.
- Parameters:
raw_features (Dict[str, Any])
feature_spec (List[str])
pytabkit.models.alg_interfaces.resource_params module
- class pytabkit.models.alg_interfaces.resource_params.ResourceParams
Bases:
object- cb_class_ram = {'': 0.9345478156433287, '2_power_maxdepth': 2.576133502607949e-09, '2_power_maxdepth*n_features': 7.810833280259485e-12, '2_power_maxdepth*n_features*n_samples': 1.5863977594541182e-13, '2_power_maxdepth*n_features*n_samples*n_tree_repeats': 2.3171956595374328e-17, '2_power_maxdepth*n_features*n_tree_repeats': 6.14544078331367e-15, '2_power_maxdepth*n_samples': 1.3036510550142841e-15, '2_power_maxdepth*n_samples*n_tree_repeats': 1.9523394732422347e-09, '2_power_maxdepth*n_tree_repeats': 2.356086562374563e-05, 'ds_onehot_size_gb': 0.012758554137232066, 'ds_prep_size_gb': 1.804116547565268e-05, 'ds_size_gb': 1.804116547565268e-05, 'max_depth': 0.004088255941858752, 'max_depth*n_features': 0.0006014917997388746, 'max_depth*n_features*n_samples': 4.241634070711833e-09, 'max_depth*n_features*n_samples*n_tree_repeats': 1.197601653926371e-16, 'max_depth*n_features*n_tree_repeats': 1.834250929757216e-13, 'max_depth*n_samples': 1.4477032736637855e-13, 'max_depth*n_samples*n_tree_repeats': 3.3706497906893135e-13, 'max_depth*n_tree_repeats': 1.1590969030724202e-09, 'n_features': 3.8863715875356e-09, 'n_features*n_samples': 3.767039504566679e-08, 'n_features*n_samples*n_tree_repeats': 7.361290583089635e-16, 'n_features*n_tree_repeats': 1.1947420344843242e-12, 'n_samples': 7.243808011863237e-07, 'n_samples*n_tree_repeats': 1.2285638949747794e-07, 'n_tree_repeats': 4.077606761367131e-09}
- cb_class_time = {'': 1.1074866100217955, 'ds_onehot_size_gb': 2.0150542417790342e-07, 'ds_prep_size_gb': 6.2276292117813865, 'ds_size_gb': 6.2276292117813865, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads': 2.651274595052903e-10, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features': 2.3903321610037346e-05, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_samples': 2.3930248376103085e-16, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 8.531748659348444e-11, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_tree_repeats': 4.589892590504275e-14, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_samples': 3.673856471950424e-15, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_samples*n_tree_repeats': 6.267867148099078e-16, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_tree_repeats': 3.5098969397077584e-11, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads': 1.7778533486675952e-08, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features': 1.285253358050953e-10, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_samples': 2.627359007275516e-15, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 1.133320942151551e-15, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_tree_repeats': 6.629510161784679e-13, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_samples': 4.732937240944653e-13, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_samples*n_tree_repeats': 5.508439525827261e-13, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_tree_repeats': 8.378247017832774e-10, 'n_cv_refit*n_splits*n_estimators*1/n_threads': 2.214973220043591, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features': 0.000849954711796066, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples': 2.3531597535778573e-14, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 4.2994223618739465e-15, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_tree_repeats': 3.964226717465322e-12, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples': 3.035559075362487e-06, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples*n_tree_repeats': 7.13999461225352e-07, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_tree_repeats': 5.1876881836135774e-09}
- lgbm_class_ram = {'': 0.8604627263253337, 'ds_onehot_size_gb': 3.622669179301401e-06, 'ds_prep_size_gb': 2.0214168208781946, 'ds_size_gb': 2.0214168208781946, 'log_num_leaves': 1.573053922451339e-08, 'log_num_leaves*n_features': 2.930068871528871e-11, 'log_num_leaves*n_features*n_samples': 3.939554526330466e-15, 'log_num_leaves*n_features*n_samples*n_tree_repeats': 3.851475872271092e-15, 'log_num_leaves*n_features*n_tree_repeats': 2.7540140942935337e-13, 'log_num_leaves*n_samples': 1.617414150367892e-13, 'log_num_leaves*n_samples*n_tree_repeats': 6.161688826595097e-13, 'log_num_leaves*n_tree_repeats': 1.626145985707e-06, 'n_features': 3.1028960780988996e-10, 'n_features*n_samples': 2.5173717397818705e-08, 'n_features*n_samples*n_tree_repeats': 6.656160609292717e-11, 'n_features*n_tree_repeats': 1.4858440058980697e-12, 'n_samples': 3.856682701344501e-07, 'n_samples*n_tree_repeats': 1.544688671627044e-10, 'n_tree_repeats': 0.0015219464100389682, 'num_leaves': 7.114807543594747e-11, 'num_leaves*n_features': 6.127161836179573e-06, 'num_leaves*n_features*n_samples': 5.682583426130539e-17, 'num_leaves*n_features*n_samples*n_tree_repeats': 2.820814699620109e-14, 'num_leaves*n_features*n_tree_repeats': 4.723694325860319e-15, 'num_leaves*n_samples': 6.063719974576439e-16, 'num_leaves*n_samples*n_tree_repeats': 1.1825948996367154e-14, 'num_leaves*n_tree_repeats': 7.004349205794621e-07}
- lgbm_class_time = {'': 0.07952271409861912, 'ds_onehot_size_gb': 0.6707498854892533, 'ds_prep_size_gb': 24.914198992356777, 'ds_size_gb': 24.914198992356777, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads': 1.6421556695965297e-07, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_features': 0.001802775666445253, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_features*n_samples': 3.376112165195102e-07, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 8.92885930282138e-09, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_features*n_tree_repeats': 6.072475113612503e-12, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_samples': 2.330829367448416e-12, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_samples*n_tree_repeats': 1.2170171882409568e-13, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_tree_repeats': 0.015956943711852814, 'n_cv_refit*n_splits*n_estimators*1/n_threads': 0.15904542819723824, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features': 0.015836831101031235, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples': 2.320710370608533e-08, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 4.006248880421662e-14, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_tree_repeats': 2.885892548234532e-11, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples': 3.995934332919547e-09, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples*n_tree_repeats': 4.51061814549484e-13, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_tree_repeats': 3.75292585133515e-07, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads': 7.505014868911757e-10, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_features': 2.152594512387446e-12, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_features*n_samples': 9.221334002333759e-16, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 4.8809384428115866e-11, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_features*n_tree_repeats': 6.26406208478857e-14, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_samples': 9.05403593468941e-15, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_samples*n_tree_repeats': 2.3824258787970722e-15, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_tree_repeats': 0.00041603300901854167}
- xgb_class_ram = {'': 0.899804501497566, '2_power_maxdepth': 3.26910486762921e-11, '2_power_maxdepth*n_features': 1.140492447521818e-08, '2_power_maxdepth*n_features*n_samples': 3.6325731146686714e-13, '2_power_maxdepth*n_features*n_samples*n_tree_repeats': 3.723108372490702e-19, '2_power_maxdepth*n_features*n_tree_repeats': 2.404137742885295e-15, '2_power_maxdepth*n_samples': 2.64316777243899e-16, '2_power_maxdepth*n_samples*n_tree_repeats': 1.4901204061072977e-17, '2_power_maxdepth*n_tree_repeats': 1.4676442049665057e-12, 'ds_onehot_size_gb': 7.280007472890875e-06, 'ds_prep_size_gb': 0.41986843027802623, 'ds_size_gb': 0.41986843027802623, 'max_depth': 3.280529943711475e-08, 'max_depth*n_features': 6.35648749681192e-05, 'max_depth*n_features*n_samples': 1.28838675675802e-08, 'max_depth*n_features*n_samples*n_tree_repeats': 1.69854661852343e-16, 'max_depth*n_features*n_tree_repeats': 1.935402530195678e-13, 'max_depth*n_samples': 6.291962320207664e-14, 'max_depth*n_samples*n_tree_repeats': 5.126839919323976e-15, 'max_depth*n_tree_repeats': 5.768929558524772e-10, 'n_features': 1.6375678219943912e-10, 'n_features*n_samples': 3.488627499883473e-11, 'n_features*n_samples*n_tree_repeats': 4.2124781789579334e-11, 'n_features*n_tree_repeats': 1.302388952570238e-12, 'n_samples': 8.808932580897527e-08, 'n_samples*n_tree_repeats': 8.625259564591089e-10, 'n_tree_repeats': 0.0012854309387287798}
- xgb_class_time = {'': 1.5850150119193643e-06, 'ds_onehot_size_gb': 7.555892653328937e-06, 'ds_prep_size_gb': 67.40780781613621, 'ds_size_gb': 67.40780781613621, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads': 6.35528424560118e-10, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features': 3.4755127308109863e-05, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_samples': 2.652000680981318e-10, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 1.1214153087760665e-11, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_tree_repeats': 1.1585222842499338e-13, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_samples': 7.369774923827121e-15, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_samples*n_tree_repeats': 6.186297360838691e-16, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_tree_repeats': 8.810550042257941e-11, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads': 9.578781115632407e-08, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features': 0.007922594727428374, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_samples': 6.758297160216264e-08, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 1.4232541896951673e-10, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_tree_repeats': 8.113108001263881e-12, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_samples': 1.7180121037111673e-12, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_samples*n_tree_repeats': 7.916471324379998e-14, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_tree_repeats': 1.2099510988434818e-08, 'n_cv_refit*n_splits*n_estimators*1/n_threads': 3.1700300238387285e-06, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features': 4.361726529019224e-09, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples': 3.348195651528877e-12, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 3.4142887744033714e-13, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_tree_repeats': 4.433229074601185e-11, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples': 1.7981743709586172e-06, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples*n_tree_repeats': 3.1379386919643983e-12, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_tree_repeats': 0.416152219367654}
- class pytabkit.models.alg_interfaces.resource_params.ResourceParamsOld
Bases:
object- cb_class_ram = {'': 0.8683295939412378, '2_power_maxdepth': 0.0001056123359157812, '2_power_maxdepth*n_features': 1.0080022114889349e-10, '2_power_maxdepth*n_features*n_samples': 2.3070275489115195e-12, '2_power_maxdepth*n_features*n_samples*n_tree_repeats': 2.7850591221080067e-17, '2_power_maxdepth*n_features*n_tree_repeats': 6.15051597263584e-15, '2_power_maxdepth*n_samples': 1.3780270956209364e-15, '2_power_maxdepth*n_samples*n_tree_repeats': 2.064100170958034e-09, '2_power_maxdepth*n_tree_repeats': 2.694024798514516e-06, 'ds_onehot_size_gb': 0.054809311336043706, 'ds_prep_size_gb': 2.1956796547330758e-05, 'ds_size_gb': 2.1956796547330758e-05, 'max_depth': 0.00023942254928693192, 'max_depth*n_features': 0.0006188384463276942, 'max_depth*n_features*n_samples': 4.017104578325911e-09, 'max_depth*n_features*n_samples*n_tree_repeats': 1.2652983818045863e-16, 'max_depth*n_features*n_tree_repeats': 1.825891231551508e-13, 'max_depth*n_samples': 2.0135633249657367e-13, 'max_depth*n_samples*n_tree_repeats': 1.9065381412052897e-13, 'max_depth*n_tree_repeats': 7.662207891804141e-10, 'n_features': 1.728902260462638e-09, 'n_features*n_samples': 3.2106346545767416e-08, 'n_features*n_samples*n_tree_repeats': 8.080444898120663e-16, 'n_features*n_tree_repeats': 1.1883754249270118e-12, 'n_samples': 5.359259624964122e-07, 'n_samples*n_tree_repeats': 1.817237502556807e-07, 'n_tree_repeats': 3.16259450440823e-09}
- cb_class_time = {'': 0.060695272326207535, 'ds_onehot_size_gb': 0.040427221672569374, 'ds_prep_size_gb': 2.4268955178538847, 'ds_size_gb': 2.4268955178538847, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads': 1.99445077397377e-10, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features': 1.2644593910088394e-05, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_samples': 1.1517663973680398e-15, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 2.4847067022145893e-11, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_tree_repeats': 2.235731644015564e-14, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_samples': 3.0511461549128756e-15, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_samples*n_tree_repeats': 2.873281614024595e-16, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_tree_repeats': 1.2520160532307873e-11, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads': 1.374338752023958e-08, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features': 7.126063129715731e-11, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_samples': 2.631878772648314e-15, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 1.4077434831895832e-15, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_tree_repeats': 3.344879400790812e-13, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_samples': 1.242824030666801e-12, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_samples*n_tree_repeats': 9.32433742185293e-08, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_tree_repeats': 4.062768369148915e-10, 'n_cv_refit*n_splits*n_estimators*1/n_threads': 0.12139054465241507, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features': 0.002034550389178136, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples': 1.590097554595333e-14, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 2.280000915439824e-15, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_tree_repeats': 1.972850747965341e-12, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples': 5.259225293072914e-06, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples*n_tree_repeats': 1.1159977413280863e-07, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_tree_repeats': 3.0362927572255956e-09}
- lgbm_class_ram = {'': 0.8545661661490145, 'ds_onehot_size_gb': 4.0697094447404033e-07, 'ds_prep_size_gb': 2.3080037837801175, 'ds_size_gb': 2.3080037837801175, 'log_num_leaves': 1.8470627691115034e-08, 'log_num_leaves*n_features': 4.90256931677757e-11, 'log_num_leaves*n_features*n_samples': 3.020317664222622e-15, 'log_num_leaves*n_features*n_samples*n_tree_repeats': 2.1876975907194365e-15, 'log_num_leaves*n_features*n_tree_repeats': 2.6408516124748747e-13, 'log_num_leaves*n_samples': 1.4244297306885883e-13, 'log_num_leaves*n_samples*n_tree_repeats': 7.582204707419711e-13, 'log_num_leaves*n_tree_repeats': 4.350203928522753e-07, 'n_features': 4.08148741723376e-07, 'n_features*n_samples': 2.3506833903706615e-08, 'n_features*n_samples*n_tree_repeats': 8.047116933926301e-12, 'n_features*n_tree_repeats': 1.4109066020140611e-12, 'n_samples': 2.994431799612211e-07, 'n_samples*n_tree_repeats': 1.1377985339470745e-09, 'n_tree_repeats': 0.0018080853926450316, 'num_leaves': 1.0490359582375276e-10, 'num_leaves*n_features': 6.105483514684091e-06, 'num_leaves*n_features*n_samples': 3.668665655364504e-17, 'num_leaves*n_features*n_samples*n_tree_repeats': 1.2053037667373442e-13, 'num_leaves*n_features*n_tree_repeats': 4.533114041820276e-15, 'num_leaves*n_samples': 5.943342181332617e-16, 'num_leaves*n_samples*n_tree_repeats': 1.9123390691308356e-14, 'num_leaves*n_tree_repeats': 1.0650528506541837e-07}
- lgbm_class_time = {'': 0.028063263911210914, 'ds_onehot_size_gb': 0.09163862856656434, 'ds_prep_size_gb': 2.970270224525262, 'ds_size_gb': 2.970270224525262, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads': 6.47442904885375e-08, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_features': 0.0001926020481234091, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_features*n_samples': 1.3986995179321424e-08, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 6.208468162170729e-10, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_features*n_tree_repeats': 4.598542008079632e-13, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_samples': 9.964309915135878e-13, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_samples*n_tree_repeats': 2.608150056678177e-14, 'n_cv_refit*n_splits*log_num_leaves*n_estimators*1/n_threads*n_tree_repeats': 0.0011608214817588585, 'n_cv_refit*n_splits*n_estimators*1/n_threads': 0.05612652782242183, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features': 0.0018753906815885733, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples': 8.471355616223231e-12, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 3.3001370294885434e-15, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_tree_repeats': 2.1257067882553722e-12, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples': 3.057993467818764e-07, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples*n_tree_repeats': 6.264643485181751e-14, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_tree_repeats': 3.7651417047281056e-08, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads': 1.1569746986292633e-09, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_features': 2.0127433109741758e-13, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_features*n_samples': 2.39530599680757e-16, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 1.8233627245552183e-12, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_features*n_tree_repeats': 5.291223606102416e-15, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_samples': 4.6777144377244544e-14, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_samples*n_tree_repeats': 1.075739698121751e-15, 'n_cv_refit*n_splits*num_leaves*n_estimators*1/n_threads*n_tree_repeats': 7.442820019642213e-05}
- xgb_class_ram = {'': 0.89800664010472, '2_power_maxdepth': 3.500391185762912e-11, '2_power_maxdepth*n_features': 8.730859656468559e-07, '2_power_maxdepth*n_features*n_samples': 5.586329461516387e-11, '2_power_maxdepth*n_features*n_samples*n_tree_repeats': 3.406456640909277e-19, '2_power_maxdepth*n_features*n_tree_repeats': 2.253274531849529e-15, '2_power_maxdepth*n_samples': 2.6046111134557463e-16, '2_power_maxdepth*n_samples*n_tree_repeats': 1.4647083952656776e-17, '2_power_maxdepth*n_tree_repeats': 1.446703161897511e-12, 'ds_onehot_size_gb': 1.2775211008166364e-05, 'ds_prep_size_gb': 0.8958165176491728, 'ds_size_gb': 0.8958165176491728, 'max_depth': 4.602455291339385e-08, 'max_depth*n_features': 8.276969896399465e-05, 'max_depth*n_features*n_samples': 1.1188204977077247e-08, 'max_depth*n_features*n_samples*n_tree_repeats': 1.2101329730965103e-16, 'max_depth*n_features*n_tree_repeats': 1.73562626225241e-13, 'max_depth*n_samples': 6.003527146823594e-14, 'max_depth*n_samples*n_tree_repeats': 5.458849368989926e-15, 'max_depth*n_tree_repeats': 5.846802665464209e-10, 'n_features': 1.419262523195433e-10, 'n_features*n_samples': 2.1948939540241107e-11, 'n_features*n_samples*n_tree_repeats': 6.761378006837745e-13, 'n_features*n_tree_repeats': 1.189619404783309e-12, 'n_samples': 7.445989056176149e-08, 'n_samples*n_tree_repeats': 1.1095360093190593e-08, 'n_tree_repeats': 0.0005355693710144896}
- xgb_class_time = {'': 0.04616911535729873, 'ds_onehot_size_gb': 0.0698867127341342, 'ds_prep_size_gb': 3.47457744189382, 'ds_size_gb': 3.47457744189382, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads': 9.064818572421352e-11, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features': 2.802431219594177e-06, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_samples': 5.094046852454207e-14, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 4.515896055082407e-12, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_features*n_tree_repeats': 9.943166031719296e-15, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_samples': 2.9578963011700153e-15, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_samples*n_tree_repeats': 1.991428507510768e-16, 'n_cv_refit*n_splits*2_power_maxdepth*n_estimators*1/n_threads*n_tree_repeats': 6.993000397349683e-07, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads': 1.68587043083397e-08, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features': 0.0007712724349247164, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_samples': 1.7162683220472862e-09, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 1.226904474214378e-10, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_features*n_tree_repeats': 6.967156404769764e-13, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_samples': 3.601942853784541e-13, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_samples*n_tree_repeats': 1.5052320282512473e-14, 'n_cv_refit*n_splits*max_depth*n_estimators*1/n_threads*n_tree_repeats': 0.0026046534716614215, 'n_cv_refit*n_splits*n_estimators*1/n_threads': 0.09233823071459746, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features': 3.291166164590293e-10, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples': 1.914319987041818e-13, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_samples*n_tree_repeats': 2.926688203905133e-15, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_features*n_tree_repeats': 3.670077849317217e-12, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples': 6.154537890478014e-07, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_samples*n_tree_repeats': 8.63288843709104e-14, 'n_cv_refit*n_splits*n_estimators*1/n_threads*n_tree_repeats': 3.035228262559771e-08}
pytabkit.models.alg_interfaces.rtdl_interfaces module
- class pytabkit.models.alg_interfaces.rtdl_interfaces.FTTransformerSubSplitInterface
Bases:
SkorchSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.rtdl_interfaces.RTDL_MLPSubSplitInterface
Bases:
SkorchSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.rtdl_interfaces.RTDL_MLP_ParamSamplerNew
Bases:
object- __init__(is_classification, train_size, num_emb_type='none')
- Parameters:
is_classification (bool)
train_size (int)
num_emb_type (str)
- sample_params(seed)
- Parameters:
seed (int)
- Return type:
Dict[str, Any]
- class pytabkit.models.alg_interfaces.rtdl_interfaces.RTDL_ResNet_ParamSampler
Bases:
object- __init__(is_classification, train_size)
- Parameters:
is_classification (bool)
train_size (int)
- sample_params(seed)
- Parameters:
seed (int)
- Return type:
Dict[str, Any]
- class pytabkit.models.alg_interfaces.rtdl_interfaces.RTDL_ResNet_ParamSamplerNew
Bases:
object- __init__(is_classification, train_size)
- Parameters:
is_classification (bool)
train_size (int)
- sample_params(seed)
- Parameters:
seed (int)
- Return type:
Dict[str, Any]
- class pytabkit.models.alg_interfaces.rtdl_interfaces.RandomParamsFTTransformerAlgInterface
Bases:
RandomParamsAlgInterface
- class pytabkit.models.alg_interfaces.rtdl_interfaces.RandomParamsRTDLMLPAlgInterface
Bases:
SingleSplitAlgInterface- __init__(model_idx, fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
model_idx (int)
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
None
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- class pytabkit.models.alg_interfaces.rtdl_interfaces.RandomParamsResnetAlgInterface
Bases:
SingleSplitAlgInterface- __init__(model_idx, fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
model_idx (int)
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
None
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- class pytabkit.models.alg_interfaces.rtdl_interfaces.ResnetSubSplitInterface
Bases:
SkorchSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.rtdl_interfaces.SkorchSubSplitInterface
Bases:
SklearnSubSplitInterface- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- pytabkit.models.alg_interfaces.rtdl_interfaces.allow_single_underscore(params_config)
- Parameters:
params_config (List[Tuple])
- Return type:
List[Tuple]
- pytabkit.models.alg_interfaces.rtdl_interfaces.choose_batch_size_rtdl(train_size)
- Return type:
int
- pytabkit.models.alg_interfaces.rtdl_interfaces.choose_batch_size_rtdl_new(train_size)
- Parameters:
train_size (int)
- Return type:
int
pytabkit.models.alg_interfaces.sub_split_interfaces module
- class pytabkit.models.alg_interfaces.sub_split_interfaces.SingleSplitWrapperAlgInterface
Bases:
SingleSplitAlgInterfaceAlgInterface that takes multiple AlgInterfaces that can only handle a single train-val-test split and wraps them to handle a trainval-test split (possibly with multiple train-val splits)
- __init__(sub_split_interfaces, fit_params=None, **config)
- Parameters:
sub_split_interfaces (List[AlgInterface]) – Interfaces for each sub-split (train-val split).
fit_params (List[Dict[str, Any]] | None)
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
List[List[List[Tuple[Dict, float]]]] | None
- get_available_predict_params()
- Return type:
Dict[str, Dict[str, Any]]
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- set_current_predict_params(name)
- Parameters:
name (str)
- Return type:
None
- class pytabkit.models.alg_interfaces.sub_split_interfaces.SklearnSubSplitInterface
Bases:
SingleSplitAlgInterfaceBase class for AlgInterfaces based on scikit-learn methods.
- __init__(fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
List[List[List[Tuple[Dict, float]]]] | None
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- class pytabkit.models.alg_interfaces.sub_split_interfaces.TreeBasedSubSplitInterface
Bases:
SingleSplitAlgInterfaceBase class for tree-based ML models (XGB, LGBM, CatBoost).
- __init__(fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
List[List[List[Tuple[Dict, float]]]] | None
- get_available_predict_params()
- Return type:
Dict[str, Dict[str, Any]]
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
pytabkit.models.alg_interfaces.tabm_interface module
- class pytabkit.models.alg_interfaces.tabm_interface.RandomParamsTabMAlgInterface
Bases:
RandomParamsAlgInterface- get_available_predict_params()
- Return type:
Dict[str, Dict[str, Any]]
- set_current_predict_params(name)
- Parameters:
name (str)
- Return type:
None
- class pytabkit.models.alg_interfaces.tabm_interface.TabMSubSplitInterface
Bases:
SingleSplitAlgInterface- __init__(fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
List[List[List[Tuple[Dict, float]]]] | None
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
- pytabkit.models.alg_interfaces.tabm_interface.get_tabm_auto_batch_size(n_train)
- Parameters:
n_train (int)
- Return type:
int
pytabkit.models.alg_interfaces.tabr_interface module
- class pytabkit.models.alg_interfaces.tabr_interface.ExceptionPrintingCallback
Bases:
Callback- on_exception(trainer, pl_module, exception)
Called when any trainer execution is interrupted by an exception.
- class pytabkit.models.alg_interfaces.tabr_interface.RandomParamsTabRAlgInterface
Bases:
RandomParamsAlgInterface
- class pytabkit.models.alg_interfaces.tabr_interface.TabRSubSplitInterface
Bases:
AlgInterface- __init__(**config)
- Parameters:
fit_params – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
- create_model(n_num_features, n_bin_features, cat_cardinalities, n_classes, freeze_contexts_after_n_epochs)
- Parameters:
freeze_contexts_after_n_epochs (int | None)
- Return type:
Any
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
List[List[List[Tuple[Dict, float]]]] | None
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- infer_batch_size(n_samples_train)
- Parameters:
n_samples_train (int)
- Return type:
int
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor
pytabkit.models.alg_interfaces.xgboost_interfaces module
- class pytabkit.models.alg_interfaces.xgboost_interfaces.RandomParamsXGBAlgInterface
Bases:
RandomParamsAlgInterface- get_available_predict_params()
- Return type:
Dict[str, Dict[str, Any]]
- set_current_predict_params(name)
- Parameters:
name (str)
- Return type:
None
- class pytabkit.models.alg_interfaces.xgboost_interfaces.XGBCustomMetric
Bases:
object- __init__(metric_names, is_classification, is_higher_better=False)
- Parameters:
metric_names (str | List[str])
is_classification (bool)
is_higher_better (bool)
- class pytabkit.models.alg_interfaces.xgboost_interfaces.XGBHyperoptAlgInterface
Bases:
OptAlgInterface- __init__(space=None, n_hyperopt_steps=50, **config)
- Parameters:
fit_params – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
n_hyperopt_steps (int)
- create_alg_interface(n_sub_splits, **config)
- Parameters:
n_sub_splits (int)
- Return type:
- class pytabkit.models.alg_interfaces.xgboost_interfaces.XGBSklearnSubSplitInterface
Bases:
SklearnSubSplitInterface- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- class pytabkit.models.alg_interfaces.xgboost_interfaces.XGBSubSplitInterface
Bases:
TreeBasedSubSplitInterface- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
pytabkit.models.alg_interfaces.xrfm_interfaces module
- class pytabkit.models.alg_interfaces.xrfm_interfaces.RandomParamsxRFMAlgInterface
Bases:
RandomParamsAlgInterface
- pytabkit.models.alg_interfaces.xrfm_interfaces.sample_xrfm_params(seed, hpo_space_name='default')
- Parameters:
seed (int)
hpo_space_name (str)
- class pytabkit.models.alg_interfaces.xrfm_interfaces.xRFMSubSplitInterface
Bases:
SingleSplitAlgInterface- __init__(fit_params=None, **config)
- Parameters:
fit_params (List[Dict[str, Any]] | None) – This parameter can be used to store the best hyperparameters found during fit() in (cross-)validation mode. These can then be used for fit() in refitting mode. If fit_params is not None, it should be a list with one dictionary per trainval-test split. The dictionaries then contain the obtained hyperparameters for each of the trainval-test splits. Normally, there are no best parameters per train-val split as we might not have the same number of refitted models as train-val splits.
config – Other parameters.
- fit(ds, idxs_list, interface_resources, logger, tmp_folders, name)
Fit the models on the given data and splits. Should be overridden by subclasses unless fit_and_eval() is overloaded. In the latter case, this method will by default use fit_and_eval() and discard the evaluation.
- Parameters:
ds (DictDataset) – DictDataset representing the dataset. Should be on the CPU.
idxs_list (List[SplitIdxs]) – List containing one SplitIdxs object per trainval-test split. Indices should be on the CPU.
interface_resources (InterfaceResources) – Resources assigned to fit().
logger (Logger) – Logger that can be used for logging.
tmp_folders (List[Path | None]) – List of paths that can be used for storing intermediate data. The paths can be None, in which case methods will try not to save intermediate results. There should be one folder per trainval-test-split (i.e. only one per k-fold CV).
name (str) – Name of the algorithm (for logging).
- Returns:
May return information about different possible fit_params settings that can be used. Say a variable results is returned that is not None. Then, results[tt_split_idx][tv_split_idx] should be a list of tuples (params, loss). This is useful for k-fold cross-validation, where the params with the best average loss (averaged over tv_split_idx) can be selected for fit_params.
- Return type:
List[List[List[Tuple[Dict, float]]]] | None
- get_refit_interface(n_refit, fit_params=None)
Returns another AlgInterface that is configured for refitting on the training and validation data. Override in subclasses.
- Parameters:
n_refit (int) – Number of models that should be refitted (with different seeds) per trainval-test split.
fit_params (List[Dict] | None) – Fit parameters (see the constructor) that should be used for refitting. If fit_params is None, self.fit_params will be used instead.
- Returns:
Returns the AlgInterface object for refitting.
- Return type:
- get_required_resources(ds, n_cv, n_refit, n_splits, split_seeds, n_train)
Estimate the required resources for fit().
- Parameters:
ds (DictDataset) – Dataset. Does not have to contain tensors.
n_cv (int) – Number of train-val splits per trainval-test split.
n_refit (int) – Number of refitted models per trainval-test split.
n_splits (int) – Number of trainval-test splits.
split_seeds (List[int]) – Seeds for every trainval-test split.
n_train (int)
- Returns:
Returns estimated required resources.
- Return type:
- predict(ds)
Method to predict labels on the given dataset. Override in subclasses.
- Parameters:
ds (DictDataset) – Dataset on which to predict labels
- Returns:
Returns a tensor of shape [n_trainval_splits * n_splits, ds.n_samples, output_shape] In the classification case, output_shape will be the number of classes (even in the binary case) and the outputs will be logits (i.e., softmax should be applied to get probabilities) In the regression case, output_shape will be the target dimension (often 1).
- Return type:
Tensor