pytabkit.bench.scheduling package
Submodules
pytabkit.bench.scheduling.execution module
- class pytabkit.bench.scheduling.execution.RayJobManager
Bases:
NodeManager- __init__(max_n_threads=None, available_cpu_ram_multiplier=1.0, available_gpu_ram_multiplier=1.0, **ray_kwargs)
- Parameters:
max_n_threads (int | None)
available_cpu_ram_multiplier (float)
available_gpu_ram_multiplier (float)
- get_resource_manager()
- Return type:
- start()
- Return type:
None
- terminate()
- Return type:
None
- pytabkit.bench.scheduling.execution.get_gpu_rams_gb(use_reserved=True)
- Returns:
gpu_rams_gb: total GPU memory per visible device (GB) gpu_rams_fixed_gb: this process GPU memory per visible device (GB)
reserved (default): torch caching allocator reserved bytes (often matches “process used” better)
allocated: live tensor bytes only
- Parameters:
use_reserved (bool)
- pytabkit.bench.scheduling.execution.measure_node_resources(node_id)
Function that measures available resources.
- Parameters:
node_id (int) – Node ID that will be used to identify the node in the returned NodeResources.
- Returns:
Returns a tuple of NodeResources objects. The first one contains the total available resources, and the second one contains the resources that a single process (with PyTorch GPU usage) uses without doing anything.
- Return type:
Tuple[NodeResources, NodeResources]
- pytabkit.bench.scheduling.execution.node_runner(feedback_queue, job_queue, node_id)
- Parameters:
node_id (int)
pytabkit.bench.scheduling.jobs module
- class pytabkit.bench.scheduling.jobs.AbstractJob
Bases:
objectAbstract base class for jobs that can be scheduled using schedulers in schedulers.py.
- get_desc()
- Returns:
Return a description that can be logged, e.g., when the job is started and when it finishes.
- Return type:
str
- get_group()
- Returns:
Should return a “group name” string. All jobs with the same “group name” will have a common time factor that is adjusted on-the-fly during scheduling based on already completed jobs.
- Return type:
str
- get_required_resources()
- Returns:
Return the resources requested by this job.
- Return type:
- class pytabkit.bench.scheduling.jobs.JobResult
Bases:
objectHelper class to store information about a job that has been run.
- __init__(job_id, time_s, oom_cpu=False, oom_gpu=False, finished_normally=True, exception_msg=None)
- Parameters:
job_id (int) – Job id.
time_s (float) – Time in seconds that the job ran for.
oom_cpu (bool) – Whether an out-of-memory error occurred on the CPU.
oom_gpu (bool) – Whether an out-of-memory error occurred on the GPU.
finished_normally (bool) – Whether the job ran normally, such that its time and RAM values are representative of how it would normally run. For example, if the job ran faster because the results were already partially precomputed, it should not count towards the time estimation. Of course, if an exception occurred, we should have finished_normally=False.
exception_msg (str | None) – Exception message (if there was any).
- set_max_cpu_ram_gb(value)
Set the maximum RAM usage of the job. :param value: maximum RAM usage in GiB.
- Parameters:
value (float)
- Return type:
None
- class pytabkit.bench.scheduling.jobs.JobRunner
Bases:
objectHelper class that runs an AbstractJob, catches exceptions, measures time and RAM usage, and returns its result.
- __init__(job, job_id, assigned_resources)
- Parameters:
job (AbstractJob) – The job to be run.
job_id (int) – An ID that will be returned at the end so that the job can be identified.
assigned_resources (NodeResources) – Assigned resources to run the job.
pytabkit.bench.scheduling.resource_manager module
- class pytabkit.bench.scheduling.resource_manager.JobInfo
Bases:
object- __init__(job, job_id, start_time=None, assigned_resources=None, job_result=None)
- Parameters:
job (AbstractJob)
job_id (int)
start_time (float | None)
assigned_resources (NodeResources | None)
job_result (JobResult | None)
- is_failed()
- is_finished()
- is_remaining()
- is_running()
- is_succeed()
- set_started(assigned_resources)
- Parameters:
assigned_resources (NodeResources)
- class pytabkit.bench.scheduling.resource_manager.JobStatus
Bases:
EnumAn enumeration.
- FAILED = 3
- REMAINING = 0
- RUNNING = 1
- SUCCEEDED = 2
- class pytabkit.bench.scheduling.resource_manager.ResourceManager
Bases:
objectKeeps track of running jobs and available resources.
- __init__(total_resources, fixed_resources)
- Parameters:
total_resources (SystemResources)
fixed_resources (SystemResources)
- get_fixed_resources()
- get_free_resources()
- get_total_resources()
pytabkit.bench.scheduling.resources module
- class pytabkit.bench.scheduling.resources.NodeResources
Bases:
objectRepresents available/used/free resources on a compute node.
- __init__(node_id, n_threads, cpu_ram_gb, gpu_usages, gpu_rams_gb, physical_core_usages)
- Parameters:
node_id (int)
n_threads (float)
cpu_ram_gb (float)
gpu_usages (ndarray)
gpu_rams_gb (ndarray)
physical_core_usages (ndarray)
- get_cpu_ram_gb()
- Return type:
float
- get_gpu_rams_gb()
- Return type:
ndarray
- get_gpu_usages()
- Return type:
ndarray
- get_interface_resources()
- Return type:
- get_n_physical_cores()
- Return type:
int
- get_n_threads()
- Return type:
int
- get_physical_core_usages()
- Return type:
ndarray
- get_resource_vector()
- Return type:
ndarray
- get_total_gpu_ram_gb()
- Return type:
float
- get_total_gpu_usage()
- Return type:
float
- get_used_gpu_ids()
- Return type:
ndarray
- get_used_physical_cores()
- Return type:
ndarray
- set_cpu_ram_gb(cpu_ram_gb)
- Parameters:
cpu_ram_gb (float)
- Return type:
None
- set_gpu_rams_gb(gpu_rams_gb)
- Parameters:
gpu_rams_gb (ndarray)
- Return type:
None
- set_n_threads(n_threads)
- Parameters:
n_threads (int)
- try_assign(required_resources, fixed_resources)
- Parameters:
required_resources (RequiredResources)
fixed_resources (SystemResources)
- Return type:
NodeResources | None
- static zeros_like(node_resources)
- Parameters:
node_resources (NodeResources)
- Return type:
- class pytabkit.bench.scheduling.resources.SystemResources
Bases:
objectSystem resources, consisting of NodeResources for each node.
- __init__(resources)
- Parameters:
resources (List[NodeResources])
- get_cpu_ram_gb()
- get_gpu_ram_gb()
- get_gpu_usage()
- get_n_threads()
- get_num_gpus()
- get_resource_vector()
pytabkit.bench.scheduling.schedulers module
- class pytabkit.bench.scheduling.schedulers.BaseJobScheduler
Bases:
objectBase scheduler class where the logic for selecting which jobs should be run next still has to be implemented. Contains functionality for printing intermediate states and the main loop in run().
- __init__(job_manager)
- Parameters:
job_manager (RayJobManager)
- add_jobs(jobs)
- Parameters:
jobs (List[AbstractJob])
- run()
- class pytabkit.bench.scheduling.schedulers.CustomJobScheduler
Bases:
BaseJobSchedulerMore complicated scheduler with different heuristics for which jobs to submit first (based on which resources it thinks are scarce, estimated time, which methods have not been run yet, etc.). This scheduler can be slow for a large number of jobs (say 10,000 or more).
- class pytabkit.bench.scheduling.schedulers.SimpleJobScheduler
Bases:
BaseJobSchedulerSimple scheduler. Submits jobs with the largest estimated time. If a job doesn’t fit, jobs with not too much smaller time can be submitted instead. In the beginning, the scheduler ensures that at least three jobs from each group are run (e.g. 3x XGB, 3x LGBM, 3x MLP).
- pytabkit.bench.scheduling.schedulers.format_date_s(time_s)
- Parameters:
time_s (float)
- Return type:
str
- pytabkit.bench.scheduling.schedulers.format_length_s(duration)
- Parameters:
duration (float)
- Return type:
str