Startup cancellation and unwind
-
Cancellation-aware startup
ragweld unwinds partially-entered lifecycle components when startup is cancelled (for example, reloads or Ctrl+C).
-
Best-effort unwind
On any enter failure, ragweld calls
__aexit__(exc_type, exc, tb)so your components can clean up. -
Operator-safe termination
SIGTERM/SIGINT and uvicorn reloads won’t leak resources created during early startup.
-
API-first readiness
Until startup completes,
/api/readyremains “not ready.” On cancellation, readiness never flips to ready.
Startup & lifecycle Health & readiness API Testing Configuration
Who should read this
- Operators: understand what happens when you stop/reload the service.
- Engineers: implement lifecycle components that clean up correctly on cancellation.
What changed and why it matters
ragweld’s startup manager now treats cancellation as a first-class failure mode. Internally, _enter_lifecycle_cm (in server/main.py) was updated to catch BaseException, not just Exception. That matters because:
asyncio.CancelledErrorandKeyboardInterruptinherit fromBaseException(notException).- Without this, if startup is cancelled during
__aenter__, your__aexit__would never run. - With the change, ragweld always calls
__aexit__so partially-initialized resources get cleaned up.
# server/main.py (conceptual excerpt)
async def _enter_lifecycle_cm(cm):
try:
await cm.__aenter__()
except BaseException as e: # (1)!
try:
await cm.__aexit__(type(e), e, e.__traceback__) # (2)!
finally:
raise # (3)!
- Catch
BaseExceptionsoCancelledErrorandKeyboardInterruptare included. - Unwind via
__aexit__with the original exception details. - Re-raise; cancellation/failure still propagates and startup aborts.
Readiness stays false on cancellation
ragweld does not report ready until all lifecycle components have entered successfully. If a cancellation or error fires during entry, /api/ready will remain “not ready” and the process typically exits (or is restarted by your supervisor).
Operator runbook: safe cancels and reloads
- Stopping the service (SIGTERM/SIGINT) during startup:
- Resources created before the cancel (files, sockets, DB pools) are cleaned by
__aexit__. - The process exits cleanly; no need to run manual cleanup scripts.
- Dev reloads (uvicorn
--reload) mid-startup: - The in-flight startup is cancelled.
- Components that began
__aenter__are unwound. - The new worker then starts fresh.
- Health endpoints:
- Liveness is typically OK unless the process has crashed.
- Readiness stays false until startup completes.
- Check directly:
curl -fsS http://127.0.0.1:8012/api/ready || echo "not ready"
If you see resource contention on a restart
Rarely, external systems (DBs, queues) can hold onto ephemeral state briefly after a cancellation. Prefer short retry loops with backoff in your __aenter__, and ensure your __aexit__ is idempotent so a second call is harmless.
How to write a robust lifecycle component
Definition list
__aenter__- Do only what you can cleanly undo. If you create files, sockets, or connections, keep references so
__aexit__can close them. Fail fast on missing secrets or misconfiguration. __aexit__(exc_type, exc, tb)- Always attempt cleanup, whether
exc_typeis set or not. ReturnFalseso exceptions (including cancellations) propagate. Never raise from__aexit__unless absolutely necessary. Idempotency- Assume
__aexit__may run after partial__aenter__. Guard againstNone/uninitialized members. Multiple invocations should be safe. Logging- Log structured, single-line events for create/close with enough context to correlate in traces.
Here’s a minimal pattern you can drop into your component:
import asyncio
from typing import Optional
class MyLifecycle:
def __init__(self) -> None:
self._pool: Optional[object] = None
self._tmp_path: Optional[str] = None
async def __aenter__(self) -> "MyLifecycle":
# Allocate resources
self._pool = await self._open_pool() # (1)!
self._tmp_path = await self._create_tmp_dir() # (2)!
# Simulate a cancel point (e.g., long I/O)
await asyncio.sleep(0) # (3)!
return self
async def __aexit__(self, exc_type, exc, tb) -> bool:
# Unwind in reverse order; be idempotent and tolerate partial entry
try:
if self._tmp_path:
await self._remove_tmp_dir(self._tmp_path) # (4)!
self._tmp_path = None
finally:
if self._pool:
await self._close_pool(self._pool) # (5)!
self._pool = None
return False # (6)!
- Create a connection pool (or similar external handle).
- Create a temporary directory (or another local resource).
- Any
awaitpoint can be cancelled; be prepared to unwind. - Clean up local files/dirs first.
- Then close external handles.
- Return
Falseso exceptions (includingCancelledError) propagate.
If you’re not sure, do this
- Keep references to every resource created in
__aenter__. - Unwind in reverse order in
__aexit__. - Never swallow
CancelledError: returnFalseand let it bubble.
How it surfaces in tests
ragweld includes unit tests to exercise cancellation during lifecycle entry (see tests/unit/test_main_lifecycle.py). You can use a similar pattern in your own tests:
import asyncio
import pytest
from server.main import _enter_lifecycle_cm
class _CancelledLifecycleCM:
def __init__(self) -> None:
self.entered = False
self.exited = False
self.exit_exc_type = None
async def __aenter__(self) -> None:
self.entered = True
raise asyncio.CancelledError("startup cancelled") # (1)!
async def __aexit__(self, exc_type, exc, tb) -> bool:
self.exited = True
self.exit_exc_type = exc_type # (2)!
return False # (3)!
@pytest.mark.asyncio
async def test_enter_unwinds_on_cancel() -> None:
cm = _CancelledLifecycleCM()
with pytest.raises(asyncio.CancelledError, match="startup cancelled"): # (4)!
await _enter_lifecycle_cm(cm)
assert cm.entered is True
assert cm.exited is True
assert cm.exit_exc_type is asyncio.CancelledError
- Simulate cancellation during
__aenter__. - Verify
__aexit__receives the exception type. - Returning
Falseensures the cancellation propagates. - The calling code observes the original
CancelledError.
Dev workflow sanity check
- Start the backend, then immediately stop it (Ctrl+C) during “Starting…” logs.
- Expect no leaked temp dirs, sockets, or DB sessions.
- If you see leaks, verify your lifecycles’
__aexit__paths.
Health, readiness, and what clients should expect
- Readiness remains “not ready” until all components have fully entered.
- On cancellation during startup, your supervisor (systemd, Docker, k8s) restarts the process or exits; clients should continue to poll readiness.
- Useful endpoints:
GET http://127.0.0.1:8012/api/health— overall statusGET http://127.0.0.1:8012/api/ready— readiness gateGET http://127.0.0.1:8012/api/metrics— Prometheus scrape (when enabled)
See: Health & readiness API.
FAQ
Why catch BaseException instead of Exception?
asyncio.CancelledError and KeyboardInterrupt inherit from BaseException. If we caught only Exception, __aexit__ wouldn’t be called on cancellation or Ctrl+C, leaving resources partially allocated. Catching BaseException ensures unwind logic always runs, then we re-raise.
Does this swallow my errors?
No. After calling __aexit__, ragweld re-raises the original exception. Your process manager still observes the failure/cancellation, and startup aborts as expected.
What about shutdown?
This page focuses on cancellation during startup (inside __aenter__). Normal shutdown uses the same principle: run cleanup deterministically, make it idempotent, and allow exceptions to propagate after best-effort unwind.
Architecture mental model
flowchart LR
A["Startup"] --> B["Enter lifecycles"]
B --> C{"Enter ok?"}
C -- "Yes" --> D["Flip readiness gate"]
D --> E["Serve traffic"]
C -- "No (error/cancel)" --> F["Call __aexit__(exc)"]
F --> G["Re-raise and abort startup"] Common failure modes