Hazard¶
DO NOT EDIT BELOW THIS LINE UNLESS YOU KNOW WHAT YOU ARE DOING
Hazard name¶
Backend starts before FHIR server ready
General utility label¶
[2]
Likelihood scoring¶
TBC
Severity scoring¶
TBC
Description¶
Docker Compose backend service depends_on FHIR container start but not health status, causing backend to start and accept requests before FHIR server is ready to respond.
Causes¶
- Backend depends_on fhir without condition: service_healthy
- Backend starts before FHIR server fully initialized
- Initial patient data requests fail until FHIR server ready (30-60 seconds)
Effect¶
Backend logs errors on startup, patient list and demographics not loadable for first 30-60 seconds after deployment.
Hazard¶
Clinician thinks system is broken during startup window, uses backup paper system or delays accessing patient data.
Hazard type¶
- WrongPatientContext
Harm¶
Duplicate data entry in backup system causing fragmented medical history. Delayed access to critical patient information during startup window potentially affecting treatment decisions in emergencies.
Existing controls¶
None identified during initial analysis.
Assignment¶
Clinical Safety Officer
Labelling¶
TBC (awaiting scoring)
Project¶
Clinical Risk Management
Hazard controls¶
Design controls (manufacturer)¶
Implemented Controls ✓¶
- Atomic patient list response with readiness flag ✓: Backend
/api/patientsendpoint returns{patients: [...], fhir_ready: bool}in single atomic response. Eliminates race condition between separate health check and data fetch. Frontend usesfhir_readyflag to determine UI state. - Implementation: backend/app/main.py lines 287-320
-
Pattern: Single endpoint returns both data and readiness status, prevents flickering between "loading" and "no patients"
-
Frontend health polling during startup ✓: Frontend polls
/api/healthevery 5 seconds until FHIR available. Displays "Database is initialising" message (blue alert with clock icon) until health check confirms FHIR ready. Prevents "failed to load patients" errors during startup window. - Implementation: frontend/src/pages/Home.tsx lines 242-295
-
Pattern:
useEffectwithsetIntervalpolling, stops onceisFhirReady=true, triggers patient fetch only when ready -
Conservative FHIR readiness detection ✓: Frontend tracks whether patient data has ever loaded successfully using
useRef. Prevents briefly showing "No patients" when navigating back to home page after FHIR confirmed ready. Only shows "Database initialising" if FHIR never seen patient data. - Implementation: frontend/src/pages/Home.tsx lines 216-227
-
Pattern:
hasLoadedPatientsWithData.currentref, once set true never shows "initialising" again for empty arrays -
Actual data access health check ✓: Backend tests
/Patient?_count=1instead of just/metadatato verify FHIR can serve patient resources. HAPI FHIR can return 200 OK on metadata endpoint before search indexes built, causing false positives. Patient query ensures database truly ready. - Implementation: backend/app/main.py lines 110-139
-
Safety note: Documented as "safety-critical" - false positive causes clinicians to think database empty when still loading
-
Skeleton loading UI ✓: PatientsList component shows animated skeleton UI (gray gradient circles and rectangles with pulse animation) during
isLoading=true. Visually distinct from "Database initialising" message, indicates active data fetch in progress. - Implementation: frontend/src/components/patients/PatientsList.tsx lines 130-147
-
Pattern: Mantine
Skeletoncomponents, 3 rows of circle + rectangle, pulse animation -
StateMessage component for consistent UI messaging ✓: Extracted reusable StateMessage component for system state displays. Two message types: "database-initialising" (blue, IconClock) and "no-patients" (gray, IconUserOff). Provides consistent visual language across application.
- Implementation: frontend/src/components/state-message/StateMessage.tsx
-
Storybook: StateMessage.stories.tsx - Two stories showing both states
-
Automated loading sequence story ✓: Storybook AnimatedLoadingSequence story demonstrates complete startup flow: health check → database init → fetching → loaded. 30-second loop with state transitions every 5 seconds. Provides visual documentation of expected behavior.
- Implementation: frontend/src/components/patients/PatientsList.stories.tsx - AnimatedLoadingSequence story
Planned Controls (Not Yet Implemented)¶
-
Update compose.dev.yml backend service:
depends_on.fhir.condition: service_healthy. Docker waits for FHIR health check before starting backend. Status: Planned. -
Implement startup readiness probe:
/api/readyendpoint checks FHIR connectivity, returns 503 until ready. Caddy doesn't route traffic until ready. Status: Planned. -
Add retry logic with exponential backoff for FHIR client initialization: Retry after 1s, 2s, 4s, 8s, 16s (max 30s between retries). Log progress messages. Only accept HTTP requests after FHIR connection successful. Status: Planned.
-
Implement circuit breaker for FHIR unavailability: Open circuit after 5 consecutive failures, display "Patient data temporarily unavailable" banner. Status: Planned.
-
Add startup status page
/api/startup-status: Returns JSON with all service dependency statuses (FHIR, EHRbase, database). Frontend displays progress indicator. Status: Planned.
Testing controls (manufacturer)¶
Implemented Testing ✓¶
- Storybook visual testing ✓: AnimatedLoadingSequence story demonstrates complete startup flow with all states (health polling, database initialising, fetching, loaded). Provides visual regression baseline and documentation of expected UX during startup.
- Story: PatientsList.stories.tsx - AnimatedLoadingSequence
-
Duration: 30-second loop with 5s per transition state
-
Storybook state message testing ✓: StateMessage component has dedicated stories showing both "database-initialising" and "no-patients" states. Validates icon, color, and message text consistency.
- Stories: StateMessage.stories.tsx
Planned Testing (Not Yet Implemented)¶
-
Integration test: Deploy stack with docker-compose up, monitor backend logs, verify no "FHIR connection failed" errors during startup. Status: Planned.
-
Startup timing test: Deploy stack, measure time until backend
/api/readyreturns 200 OK. Verify consistent timing (60-90 seconds). Status: Planned. -
FHIR unavailability test: Start backend while FHIR stopped, verify waiting state, start FHIR, verify recovery within 30 seconds. Status: Planned.
-
Circuit breaker test: Simulate FHIR 503 errors, verify circuit opens after 5 failures, displays error, half-opens after 60 seconds. Status: Planned.
-
Frontend UX test: Load frontend during FHIR startup, verify "Database is initialising" message shown, verify patient list loads automatically when startup completes. Status: Manual testing performed, automated test planned.
Training controls (deployment)¶
- Train operations team on startup sequence: explain backend waits for FHIR, typical startup time 60-90 seconds, how to verify health checks passing, troubleshooting steps if startup hangs (check FHIR logs, database connectivity).
- Document expected startup behavior for clinicians: after system restart (e.g., maintenance window), expect 60-90 second delay before system available. Display "System starting up" message during this window. Contact IT support if delay exceeds 5 minutes.
Business process controls (deployment)¶
- Maintenance window communication: Before planned restarts (deployments, updates), notify clinicians 24 hours in advance. Provide estimated downtime (5 minutes) and startup delay (90 seconds). Schedule maintenance during low-usage periods (overnight, weekends).
- Startup monitoring: Operations team monitors startup metrics: time to ready, health check failures, dependency initialization time. Alert if startup time exceeds 5 minutes (indicates problem with FHIR, database, or network). Investigate alerts within 10 minutes.
- Graceful shutdown: Before restarting services, initiate graceful shutdown: display "System maintenance in progress" message to active users, complete in-flight requests before stopping containers, flush cached data to databases. Minimizes disruption and data loss.
- Rollback procedure: If deployment fails to start (FHIR health check never passes, backend crashes during startup), automatically rollback to previous version. Document rollback procedure: docker-compose down, revert to previous image tags, docker-compose up. Target: rollback completed within 5 minutes of startup failure detection.
Residual hazard risk assessment¶
Significantly mitigated (January 2025):
- Frontend health polling eliminates "failed to load patients" errors during startup - clinicians see clear "Database is initialising" message
- Atomic patient response with
fhir_readyflag eliminates race conditions and UI flickering - Conservative readiness tracking prevents false "No patients" display on navigation
- StateMessage component provides consistent, distinguishable visual states (initialising vs empty vs loading)
- Actual data access testing (
/Patient?_count=1) prevents false positives from metadata-only health checks
Residual risks:
- Backend still starts before FHIR ready (logs warnings, but frontend handles gracefully)
- No Docker orchestration enforcement of startup order (
depends_on: service_healthy) - No exponential backoff retry logic in backend startup
- No circuit breaker for prolonged FHIR outages
- No
/api/startup-statusprogress endpoint (frontend uses/healthpolling instead)
Impact: Clinicians see clear startup messaging, no "failed to load" errors, system usable immediately after FHIR ready. Startup UX significantly improved from initial hazard state.
Next steps for full mitigation:
- Add Docker Compose
depends_on: service_healthyfor backend → FHIR dependency - Implement
/api/readyendpoint for Caddy health checks - Add exponential backoff retry logic to backend FHIR client initialization
- Implement circuit breaker pattern for resilience during prolonged outages
Hazard status¶
Significantly mitigated (frontend and backend detection controls implemented, orchestration controls pending)
Code associated with hazard¶
- compose.dev.yml - Docker orchestration (pending update for health checks)
- backend/app/main.py:110-139 -
check_fhir_health()tests actual patient data - backend/app/main.py:256-286 -
/healthendpoint for runtime monitoring - backend/app/main.py:287-320 -
/patientsendpoint with atomicfhir_readyflag - frontend/src/pages/Home.tsx:242-295 - Health polling useEffect
- frontend/src/pages/Home.tsx:170-235 - Patient fetch with conservative readiness tracking
- frontend/src/components/state-message/StateMessage.tsx - UI state messaging component
- frontend/src/components/patients/PatientsList.tsx - Patient list with loading states
- frontend/src/components/patients/PatientsList.stories.tsx - AnimatedLoadingSequence story
- frontend/src/components/state-message/StateMessage.stories.tsx - StateMessage stories