Jobmon
Jobmon is a Scientific Workflow Management system that simplifies running computational workflows on distributed computing systems. It provides:
Easy-to-use Python and R APIs for defining workflows
Centralized monitoring of jobs, including statuses and errors
Automatic retries to protect against cluster failures
Resource-aware retries that scale memory and runtime after failures
Workflow resumes to continue from where you left off
Fine-grained job dependencies including support for job arrays
A web-based GUI for monitoring and debugging
Quick Links
Getting Started - New to Jobmon? Start here with installation and your first workflow.
User Guide - Learn about workflows, tasks, resources, and monitoring.
Configuration - Configure Jobmon for your environment.
Advanced Topics - Arrays, dynamic resources, troubleshooting, and more.
Table of Contents
Getting Started
Configuration
Advanced Topics
IHME Users
Reference
Developer Guide
Architecture