Jobmon

Jobmon is a Scientific Workflow Management system that simplifies running computational workflows on distributed computing systems. It provides:

  • Easy-to-use Python and R APIs for defining workflows

  • Centralized monitoring of jobs, including statuses and errors

  • Automatic retries to protect against cluster failures

  • Resource-aware retries that scale memory and runtime after failures

  • Workflow resumes to continue from where you left off

  • Fine-grained job dependencies including support for job arrays

  • A web-based GUI for monitoring and debugging

Table of Contents

Indices and Tables