server.web.middleware.db_reset_retry
====================================

.. py:module:: server.web.middleware.db_reset_retry

.. autoapi-nested-parse::

   ASGI middleware that retries a request once on transient DB connection resets.

   Azure Private Link (and similar cloud NATs / load balancers) periodically
   sever idle TCP connections. Those resets surface from SQLAlchemy as
   ``OperationalError`` with MySQL errno 2006 / 2013 / 2026.
   ``pool_pre_ping=True`` already validates pooled connections at checkout,
   but it cannot help when a query is already in flight or when the commit
   roundtrip that happens in ``jobmon.server.web.db.deps.get_db``'s
   dependency finalizer hits the reset.

   This middleware wraps the entire request pipeline — handler body AND
   dependency finalizers — in a single retry boundary. A transient
   connection-reset exception triggers one retry with a short backoff.
   Other ``OperationalError`` variants (deadlocks 1213, lock timeouts 1205,
   etc.) propagate immediately.

   Implementation notes
   --------------------
   We intentionally implement this as raw ASGI middleware rather than
   ``starlette.middleware.base.BaseHTTPMiddleware``. ``BaseHTTPMiddleware``
   wires ``call_next`` through anyio memory streams that are closed after a
   single use, so it cannot re-invoke the downstream app. Raw ASGI lets us
   buffer the request body once, replay it on each attempt, and stage the
   outgoing send() messages so a failed attempt can be discarded cleanly
   before any bytes reach the client.

   Safety
   ------
   ``get_db`` defers ``session.commit()`` until after the handler returns
   and rolls back on any in-handler exception before the session is closed.
   That means a mid-handler reset guarantees no write was committed, so a
   replay is safe. The rarer commit-phase race (MySQL committed but client
   got RST before ACK) has identical semantics whether or not we retry;
   jobmon's hash-based unique constraints on the important writes already
   absorb the duplicate case.


Attributes
----------

.. autoapisummary::

   server.web.middleware.db_reset_retry.logger


Classes
-------

.. autoapisummary::

   server.web.middleware.db_reset_retry.DBResetRetryMiddleware


Functions
---------

.. autoapisummary::

   server.web.middleware.db_reset_retry.is_connection_reset


Module Contents
---------------

.. py:data:: logger

.. py:function:: is_connection_reset(exc: BaseException) -> bool

   Return True iff ``exc`` represents a transient connection-loss error.

   Only errors clearly caused by a severed connection are retryable.
   Deadlocks (1213), lock timeouts (1205), integrity errors, etc. must
   propagate.


.. py:class:: DBResetRetryMiddleware(app: starlette.types.ASGIApp, max_attempts: int = DEFAULT_MAX_ATTEMPTS, backoff_seconds: float = DEFAULT_BACKOFF_SECONDS, budget_seconds: float = DEFAULT_BUDGET_SECONDS)

   ASGI middleware that retries one HTTP request on a transient DB reset.

   Non-HTTP scopes (lifespan, websocket) are forwarded unchanged.

   Store retry policy.

   ``max_attempts`` must be >= 1. ``budget_seconds`` caps total
   time spent retrying so a slow-query + retry doesn't blow past
   the client's read_timeout (default 20s). We stop retrying when
   the next backoff would land us outside the budget, even if
   attempts remain.


   .. py:attribute:: DEFAULT_MAX_ATTEMPTS
      :value: 2


   .. py:attribute:: DEFAULT_BACKOFF_SECONDS
      :value: 0.2


   .. py:attribute:: DEFAULT_BUDGET_SECONDS
      :value: 15.0


   .. py:attribute:: SCOPE_STATE_KEY
      :value: 'db_reset_retry'


   .. py:attribute:: app


   .. py:method:: should_retry_connection_reset(scope: starlette.types.Scope) -> bool
      :staticmethod:


      Return True iff a retry attempt remains within the configured budget.

      Called from the generic exception handler to decide whether to
      re-raise a connection-reset error (so this middleware sees it
      and retries) or to let it flow through to a normal error
      response. Safe to call when no retry middleware is registered —
      returns False.