distributor.distributor_service
Attributes
Classes
Initialization of DistributorService. |
Module Contents
- distributor.distributor_service.logger
- class distributor.distributor_service.DistributorService(cluster_interface: jobmon.core.cluster_protocol.ClusterDistributor, requester: jobmon.core.requester.Requester | None = None, workflow_run_heartbeat_interval: int | None = None, task_instance_heartbeat_interval: int | None = None, heartbeat_report_by_buffer: float | None = None, distributor_poll_interval: int | None = None, raise_on_error: bool = False)
Initialization of DistributorService.
- raise_on_error = False
- cluster_interface
- process_status(status: str, timeout: int | float = -1) None
Processes commands until all work is done or timeout is reached.
- Parameters:
status – which status to process work for.
timeout – time until we stop processing. -1 means process till no more work
- instantiate_task_instances(task_instances: List[jobmon.distributor.distributor_task_instance.DistributorTaskInstance]) None
- launch_task_instance_batch(task_instance_batch: jobmon.distributor.task_instance_batch.TaskInstanceBatch) None
- launch_task_instance(task_instance: jobmon.distributor.distributor_task_instance.DistributorTaskInstance) None
Submits a task instance on a given distributor.
Adds the new task instance to self.submitted_or_running_task_instances.
- triage_error(task_instance: jobmon.distributor.distributor_task_instance.DistributorTaskInstance) None
Triage a running task instance that has missed a heartbeat.
Queries the cluster for exit info. If accounting has not finalized yet (plugin returns finalized=False), defers the transition and retries on the next distributor cycle, up to MAX_TRIAGE_DURATION wall-clock seconds.
Allowed transitions are (R, U, Z, F)
- kill_self_batch(task_instance_batch: jobmon.distributor.task_instance_batch.TaskInstanceBatch) None
Terminate all TIs in this batch.
- Parameters:
task_instance_batch – The batch of task instances to terminate.
- no_heartbeat_error(task_instance: jobmon.distributor.distributor_task_instance.DistributorTaskInstance) None
Move a task instance in NO_HEARTBEAT state to a recoverable error state.
This signal is sent from the swarm in the event a task instance in LAUNCHED state fails to log a heartbeat, either due to the distributor failing to log a heartbeat batch or due to the worker node failing to start up properly.
ERROR state allows for a retry, so that a new task instance can attempt to run.