multiprocess.multiproc_distributor

Multiprocess executes tasks in parallel if multiple threads are available.

Attributes

logger

Classes

PickableTask

Object passed between processes.

Consumer

Consumes the tasks to be run.

MultiprocessDistributor

Executes tasks locally in parallel.

MultiprocessWorkerNode

Task instance info for an instance run with the Multiprocessing distributor.

Module Contents

multiprocess.multiproc_distributor.logger[source]
class multiprocess.multiproc_distributor.PickableTask(distributor_id: str, command: str, task_type: str = 'array')[source]

Object passed between processes.

distributor_id[source]
command[source]
task_type[source]
class multiprocess.multiproc_distributor.Consumer(task_queue: multiprocessing.JoinableQueue, response_queue: multiprocessing.Queue)[source]

Bases: multiprocessing.Process

Consumes the tasks to be run.

task_queue: multiprocessing.JoinableQueue[PickableTask | None][source]
response_queue: multiprocessing.Queue[Tuple[str, int | None]][source]
run() None[source]

Wait for work, the execute it.

class multiprocess.multiproc_distributor.MultiprocessDistributor(cluster_name: str, parallelism: int = 3, *args: tuple, **kwargs: dict)[source]

Bases: jobmon.core.cluster_protocol.ClusterDistributor

Executes tasks locally in parallel.

It uses the multiprocessing Python library and queues to parallelize the execution of tasks. The subprocessing pattern looks like this:

LocalExec –> consumer1 —-> subconsumer1 –> consumer2 —-> subconsumer2 … –> consumerN —-> subconsumerN

temp_dir: str | None = None[source]
started = False[source]
_cluster_name[source]
property worker_node_entry_point: str[source]

Path to jobmon worker_node_entry_point.

_worker_node_entry_point[source]
_parallelism[source]
_next_job_id = 1[source]
_running_or_submitted: Dict[str, int | None][source]
task_queue: multiprocessing.JoinableQueue[PickableTask | None][source]
response_queue: multiprocessing.Queue[Tuple[str, int | None]][source]
consumers: List[Consumer] = [][source]
property cluster_name: str[source]

Return the name of the cluster type.

_get_subtask_id(distributor_id: int, array_step_id: int) str[source]

Get the subtask_id based on distributor_id and array_step_id.

start() None[source]

Fire up N task consuming processes using Multiprocessing.

Number of consumers is controlled by parallelism.

stop() None[source]

Terminate consumers and call sync 1 final time.

_update_internal_states() None[source]
terminate_task_instances(distributor_ids: List[str]) None[source]

Terminate task instances.

Only terminate the task instances that are running, not going to kill the jobs that are actually still in a waiting or a transitioning state.

Parameters:

distributor_ids – A list of distributor IDs.

get_submitted_or_running(distributor_ids: List[str] | None = None) Set[str][source]

Get tasks that are active.

submit_to_batch_distributor(command: str, name: str, requested_resources: Dict[str, Any]) str[source]

Submit the command on the cluster technology and return a distributor_id.

The distributor_id can be used to identify the associated TaskInstance, terminate it, monitor for missingness, or collect usage statistics. If an exception is raised by this method the task instance will move to “W” state and the exception will be logged in the database under the task_instance_error_log table.

Parameters:
  • command – command to be run

  • name – name of task

  • requested_resources – resource requests sent to distributor API

Returns:

A tuple indicating the distributor id, the full output file location, and full error location.

submit_array_to_batch_distributor(command: str, name: str, requested_resources: Dict[str, Any], array_length: int) Dict[int, str][source]

Submit an array task to the multiprocess cluster.

Return: a mapping of array_step_id to distributor_id, output path, and error path

get_queueing_errors(distributor_ids: List[str]) Dict[str, str][source]

Get the task instances that have errored out.

get_remote_exit_info(distributor_id: str) Tuple[str, str][source]

Get the exit info about the task instance once it is done running.

class multiprocess.multiproc_distributor.MultiprocessWorkerNode[source]

Bases: jobmon.core.cluster_protocol.ClusterWorkerNode

Task instance info for an instance run with the Multiprocessing distributor.

_distributor_id: str | None = None[source]
_array_step_id: int | None = None[source]
_subtask_id: str | None = None[source]
_logfile_template[source]
property distributor_id: str | None[source]

The id from the distributor.

get_exit_info(exit_code: int, error_msg: str) Tuple[str, str][source]

Exit code and message.

get_usage_stats() Dict[source]

Usage information specific to the distributor.

initialize_logfile(log_type: str, log_dir: str, name: str) str[source]

Error and exit code info from the executor.

property array_step_id: int | None[source]

Return array_step_id .