Versioned Patient Repository (VPR)

A patient-owned record with professionally curated truth.

Welcome to the Versioned Patient Repository (VPR) documentation!

The VPR uses files rather than traditional databases to store patient data. On top of this, version control via Git is used to track changes to patient records over time, providing a robust and auditable history of all modifications.

Please view the VPR overview if you would like to read a detailed overview of the reasoning behind the design and build of the VPR.

Overview

Introduction

In today’s healthcare landscape, Electronic Patient Records (EPRs) are traditionally stored in centralised databases that serve the needs of organisations – such as hospitals, GP practices, and social care settings. While this approach is familiar and widely adopted, it can make it harder for patients to own their own data, directly access their data, understand it, and flag errors.

There are areas of improvement in this space – for example, patients in the UK can access parts of their record through the NHS App in the UK (NHS 2015) or local patient portals. However, these systems are still organisation-owned, lacking interoperability, fragmented, and often limited in scope.

The Versioned Patient Repository (VPR) introduces a shift in this paradigm by placing patients at the centre of their health and care record.

The VPR is a file-based health record architecture where each patient’s data is stored as structured, human-readable documents. Instead of overwriting records, each change creates a new version, managed through Git-like version control. This produces an immutable audit trail while maintaining portability and interoperability.

At the heart of VPR is a combined keystone principle: the patient comes first, and the canonical record is kept as human-readable files. Every design choice should reinforce patient agency while preserving an auditable, legible, file-based record that patients and clinicians alike can inspect and carry with them.

Patients first

When treating a patient, we put the patient at the heart of every decision. Their needs, preferences, and rights guide our actions. The same patient-first principle should extend to health data. Current EPR implementations, however, are built around organisational needs rather than those of the individual. We need to step back and reimagine the health record from the patient’s perspective. In fact, we need to make the patient’s data portable and accessible wherever they go. This is where the file shows its strength.

The VPR is a file-based data storage structure. To ensure data integrity and traceability, data entered into the VPR record is stored and signed off via the use of version control. Git is used as the underlying technology to manage versioning along with cryptography.

Using VPR, the patient holds a complete, versioned, and portable record that reflects their health and care journey across settings. Instead of organisations needing to broker complex integrations, the VPR offers a single, patient-held data layer – a consistent source of truth available wherever care is delivered.

Benefits of the VPR design

Placing the patient first unlocks multiple benefits. Wherever a patient can go, their record should follow – and the VPR makes this possible by using standard data structures to support interoperability by default.

From a safety perspective, patients can more easily spot errors or inconsistencies, adding an extra layer of assurance. From a financial and operational standpoint, the lightweight, open-source model of the VPR reduces infrastructure burden and supports cost-effective deployment in both small and large settings.

Technical Details

The Versioned Patient Repository (VPR) is built using a modular, open-source architecture that combines the reliability of file-based storage with the assurance of cryptographic version control. Each patient’s record consists of structured files that are stored and tracked in a Git-based system, ensuring traceability and data integrity. Instead of overwriting data, each change creates a new version that can be reviewed, audited, or rolled back if required.

The VPR is written in Rust, a systems programming language known for its safety, speed, and memory efficiency. The codebase is organised as a collection of independent Rust crates with clearly defined interfaces. This modular approach allows developers to adapt, extend, or replace components without altering the overall structure. For instance, separate crates can handle data storage, versioning logic, cryptographic signing, and API delivery.

The system supports multiple build configurations, enabling the same core codebase to serve both patient-facing and organisation-facing use cases. Compile-time flags determine which functionality is included in each build. A patient-side build contains only the features needed to view and manage one’s own record, keeping it lightweight and secure. An organisation-side build may include additional modules for managing multiple patients, enforcing access controls, and supporting integration with other clinical systems. This approach ensures that both variants remain aligned to the same specification while optimising performance for their respective roles.

Deployment is flexible. The VPR can be embedded within standalone desktop or mobile applications, distributed as an encrypted patient-held package, or hosted on secure institutional servers. Because data is file-based rather than database-bound, deployment does not rely on heavy infrastructure or proprietary database engines. Each record remains portable and can be reconstructed on any compatible instance of the system.

While files act as the canonical data source, efficient access for clinicians and applications requires high-performance querying. To support this, the VPR introduces database-based projections: pre-computed views of the file data that are optimised for specific operations such as patient summaries, correspondence lists, or message threads. Projections can be refreshed automatically whenever a new commit is made, or generated on demand for less frequently accessed data. This design provides the responsiveness of a traditional database while retaining the transparency and auditability of file storage.

Security is embedded at every layer. All files are cryptographically signed and checksummed to prevent tampering. Access is controlled through authenticated APIs, and sensitive data can be encrypted both at rest and in transit. The combination of version control, immutable history, and cryptographic verification ensures that every change is attributable and recoverable, which is essential for clinical safety and regulatory compliance.

In summary, the VPR merges the rigour of modern software engineering with the principles of safe clinical record-keeping. By treating files as the canonical source and databases as transient projections, it achieves both transparency and speed. The result is a system that is secure, flexible, and designed to evolve alongside the healthcare organisations and patients it serves.

Files as canonical, projections for performance, patient as the atomic unit.

Data Structure and Standards

The underlying data format follows openEHR models for clinical content and FHIR standards for demographics and coordination data. Files are stored as markdown and JSON-compatible structures. These act like non-relational documents – self-contained, structured, and readable – making them easy to process in a wide range of applications.

Patient data is organised into three separate repositories:

Clinical repository: openEHR-based clinical content (observations, diagnoses, clinical letters)
Demographics repository: FHIR-based patient demographics (name, date of birth, identifiers)
Coordination repository (Care Coordination Repository): care coordination data (encounters, appointments, episodes, referrals) – format to be determined, may adopt FHIR ideologies

This structure recreates the layered design of openEHR – a clear distinction between data content, clinical models, and terminology – while adding administrative coordination as a separate concern. None of these require a centralised relational database.

Versioning and Audit Trail

Every change to the VPR is committed using Git. Nothing is deleted or lost – a full cryptographic audit trail is preserved. This immutability is fundamental to patient safety, clinical governance, and legal compliance.

The Four Commit Actions

VPR uses a controlled vocabulary for all changes:

Create: Adding new content (new letter, observation, diagnosis, or record initialization)
Update: Modifying existing content (corrections, amendments, demographic updates)
Superseded: When newer clinical information replaces previous content (revised diagnoses, updated care plans)
Redact: The only action that removes data from view - used when data is entered into the wrong patient’s repository (clinical, demographics, or coordination)

How Redaction Works

When data is mistakenly entered into the wrong patient’s repository:

The data is removed from the active view
It is encrypted and stored securely in the Redaction Retention Repository
A non-human-readable tombstone/pointer remains in the original Git history
The commit records the redaction action with full audit metadata

This process maintains complete traceability without exposing sensitive data, ensuring both patient privacy and audit compliance.

What VPR Never Does

VPR never deletes data from version control history. Even redacted data is moved to secure storage rather than destroyed. This guarantees:

Patient Safety: All changes are traceable to specific authors at specific times
Legal Compliance: Complete audit trail meets regulatory requirements
Clinical Governance: Full accountability for all modifications
Research Value: Historical data remains available for authorized use

This immutability ensures auditability, safety, and trust – even in the face of human error.

Export and Portability

Patients can download their patient record as a bundle of files - on a USB stick, as an encrypted archive, or even loaded into a standalone reader app. These files remain functional offline and can be interpreted by lightweight applications without needing a local database engine. This simplicity ensures the records remain portable, long-lived, and system-agnostic.

Natural Progression

The VPR is the natural progression of the patient record, starting with the work of Dr Lawrence Weed in the 1960s.

There are residents and staff-men who maintain that the content of their records is their own business. In reality, however, it is the patient’s business and the business of those who, in the future, will have to depend on that record for the patient’s care, or for medical research (Weed 1964).

Lawrence Weed’s Problem-Oriented Medical Record (POMR) reframed medical documentation around the patient’s problems, rather than the clinician’s specialty or the hospital’s structure. His approach established a patient-centred logic for clinical reasoning, in which each problem linked observations, assessments, and plans in a transparent and auditable way.

Building on Dr Weed’s foundation, openEHR formalised Weed’s ideas into a computable data model. Its archetypes and templates capture the clinician’s reasoning processes and the structure of clinical encounters, allowing problem-oriented documentation to be represented in interoperable, machine-readable form.

The VPR extends the above principles further. The VPR provides a longitudinal, versioned record that preserves data integrity across institutions and regions, giving both patients and clinicians access to a single evolving source of truth. Where Weed’s POMR unified thought, and openEHR unified meaning, the VPR unifies time and ownership.

References

NHS (2025). ‘Personal health records’. Available at: https://www.nhs.uk/nhs-app/nhs-app-help-and-support/health-records-in-the-nhs-app/personal-health-records/ (Accessed: 5 Nov. 2025).

WEED, L.L. (1964). ‘MEDICAL RECORDS, PATIENT CARE, AND MEDICAL EDUCATION’, It. J. Med. Sc., 462, pp. 271-82.

Literature Review

The following review outlines the key developments, standards, and technologies that inform the design of the Versioned Patient Repository (VPR), particularly with regard to data storage, patient ownership, and open-source architecture.

Data Storage Models for patient records

Database-centric models

Most traditional EPR systems are built using centralised relational databases. These models are well-established in clinical informatics and can scale effectively within single organisations. However, they often pose significant challenges when records need to move between systems, and are typically tightly coupled to the organisation’s software stack.

File-based and version-controlled models

In contrast to centralised databases, file-based systems allow for portability, transparency, and simplified version control. Several notable efforts have explored this approach. We will explore them here.

Burstein (2020a & 2020b) describes a proof-of-concept system for medical record-keeping based entirely on plain-text files and Git, developed for rural health centres in Rwanda where internet connectivity is unreliable. Instead of using a traditional database, the system stores patient data in human-readable YAML files and uses Git to manage version control, replication, and audit trails. This architecture prioritises offline resilience, transparency, and long-term accessibility, avoiding vendor lock-in and enabling data portability across devices. While not suitable for all settings, the project demonstrates that file-based, version-controlled health records can meet real clinical needs, especially in environments where simplicity, traceability, and decentralisation are key.

Adams (2020a & 2020b) presents a lightweight system called Hugo Clinic Notes, designed for smaller clinics and written in Markdown. The tool organises patient notes by name, date, and appointment time, supports multiple note types (such as assessments and follow-ups with embedded media), and includes a printable view so records can be easily saved or shared. While patient data itself is not version controlled, Git is used to manage the form templates and archetypes, allowing clinical structures to evolve safely over time. Notes are edited manually as Markdown files outside the system, and Hugo is then used to regenerate the site as a set of static HTML pages. Emphasising portability, simplicity, and clinician or patient control over data location, the project demonstrates how static site generation and file-based structures can support clinical documentation when traditional EPR systems may be unnecessarily complex. Although primarily used and maintained by its creator, it remains a useful example of how low-dependency, open tooling can be adapted for healthcare use.

Wack et al. (2025) describe the gitOmmix approach for clinical omics data, which integrates version‑control systems (specifically Git and git‑annex) with provenance knowledge‑graphs (based on PROV‑O) to enhance clinical data warehouses. The authors argue that traditional CDWs (clinical data warehouses) lack robust support for large data files and longitudinal provenance tracking. In response, gitOmmix uses Git to version and track large files (via git‑annex) and aligns version history with a provenance graph so that each data analysis, decision, and patient sample can be traced back comprehensively. The system supports querying the relationships between raw files, analyses, and clinical outcomes by combining versioning metadata and provenance semantics. Although the work is tailored particularly to omics (genomics, pathology, radiology) rather than general EPRs, it provides a compelling file‑based, version‑aware model for health‑data systems and thus offers a useful precedent for the VPR’s versioned and patient‑centric architecture.

Blockchain

Reen et al. (2019) propose a decentralised e‑health record management system that combines blockchain technology with the InterPlanetary File System (IPFS) to give patients control of their health‑data flows. The architecture stores encrypted patient records on IPFS and uses smart contracts on a blockchain to manage access authorisations, thereby enabling patient‑centric sharing, auditability and privacy. While the system emphasises distributed storage and peer‑to‑peer data exchange rather than a central database, the authors note trade‑offs in terms of scalability and the maturity of supporting infrastructure. The work provides an instructive example of how versioning, audit trails and patient‑owned data constructs can be applied in health‑care settings and hence offers relevant insight for the design of the VPR.

Shi et al. (2020) conduct a systematic literature review of blockchain applications in electronic health‑record (EHR) systems, specifically assessing how such architectures tackle security and privacy challenges. They identify that while blockchain introduces transparency, immutability and decentralised control, its implementation in healthcare faces major hurdles in scalability, interoperability, and compliance with regulatory requirements. The study thereby underscores both the promise and the limitations of distributed‑ledger approaches for patient data management and highlights the viability of hybrid or alternate version‑controlled architectures — making it a relevant reference point when considering the design of the VPR.

Antwi et al. (2021) explore how Hyperledger Fabric, a private blockchain system, could be used to manage electronic health records securely. They set up a series of test cases that mimic real clinical use, including patient and clinician access permissions, data privacy controls and how different types of files such as X-rays are handled. The study found that Hyperledger Fabric worked well for keeping data confidential and traceable, but struggled with large-scale storage and the legal requirement to delete data completely. The authors suggest that while it is not a perfect solution, private blockchains like Fabric could form part of future systems that let patients control access to their records while maintaining a strong audit trail.

Kumari et al. (2024) describe HealthRec-Chain, a system designed to give patients greater control over their health data while keeping it secure and shareable. The approach combines two technologies: blockchain, to record who accesses information, and IPFS, a distributed file system used to store the medical files themselves. Each record is automatically encrypted before being stored, and patients can grant or remove access through simple permissions. The authors test the system’s performance and find that this hybrid model could offer a practical balance between security, transparency, and scalability—avoiding some of the heavy costs of traditional blockchain-only designs.

Patient focused systems

Fasten-OnPrem, is an open-source, self-hosted application for personal or family electronic medical records that aims to bring disparate data from many clinics, labs, and insurers into one place under the individual’s control. The record system was built by Kulantuga and Szilagyi (2025) and sponsored by Fasten Health. Fasten-OnPrem supports key standards such as FHIR and OAuth2 so users can link their existing records rather than manually scanning everything. The system is designed for non-clinical settings (families rather than hospitals), but demonstrates how file-based, patient-owned aggregation of health records can work in practice—emphasising portability, transparency and user control rather than heavy institutional infrastructure.

Healthcare Data Standards

This section surveys structural and exchange standards commonly used to represent and move electronic patient records.

Structural models

openEHR

A specification for modelling clinical content using archetypes and templates. It separates clinical knowledge from data persistence and can be serialised as JSON or XML. It includes constructs for composition, context and audit history.

HL7 CDA

A document-centric model for clinical correspondence and reports. CDA defines a structured container with narrative text and coded elements, typically exchanged as XML.

Exchange and APIs

HL7 v2

A widely deployed messaging standard for admissions, transfers, results and orders. It is compact and event-driven, and remains prevalent in secondary care integrations.

HL7 FHIR

A resource-based standard designed for web APIs. It serialises naturally to JSON or XML, supports profiles to constrain use, and provides resources for provenance, consent and audit. FHIR is now the dominant choice for modern interfaces and patient-facing apps.

IHE Profiles

Integration profiles such as XDS and MHD specify how documents and resources are published, discovered and retrieved across organisations, building on CDA and FHIR.

Open Source in Healthcare

The literature describes open source as a credible approach in digital health when paired with clear governance and resourcing. Reported benefits include transparency of code and data models, which supports independent security assessment, clinical safety review and reproducibility. Reuse is a second theme: open components can be adapted to local workflows, shortening time to deliver standard capabilities such as FHIR APIs, document rendering and integration gateways. Several studies note that open interfaces create incentives for interoperability by lowering switching costs across vendors and sites. Cost is presented more cautiously. Licence fees may fall, particularly for infrastructure, yet staffing, integration and long-term support remain material and require realistic budgets.

Sustainability depends on governance. Successful programmes set explicit licensing strategies, contribution guidelines and release cadences, and treat clinical safety artefacts as first-class versioned assets alongside source code. Open projects do not remove the need for security engineering. Threat modelling, coordinated vulnerability disclosure, continuous testing and dependency management are still required, and are often easier to scrutinise when build pipelines are public.

Case studies illustrate these points in practice. OpenEyes shows that a specialty EPR can be developed in the open and operated across several NHS Trusts with formal safety processes. OpenSAFELY demonstrates that transparent code and specifications can coexist with strict controls on patient data, enabling reproducible analytics at scale. OpenPrescribing provides public methods and code for prescribing analyses, supporting peer scrutiny and iterative improvement. Internationally, OpenMRS and GNU Health show long-running community models for longitudinal records and public health, while the openEHR community maintains shared archetypes and templates that allow vendors to converge on common clinical content.

Risks are also highlighted. Fragmentation can occur if forks diverge without stewardship, hidden costs can surface during integration and migration, and security can be misunderstood if openness is taken as a substitute for active assurance. The reported mitigations are straightforward but non-trivial: maintainers who curate contributions, product ownership with clinical sponsorship, published roadmaps, funded support arrangements and independent security testing. Overall, the evidence supports open source as a practical route to transparency, reuse and safer interoperability, provided it is treated as a long-term programme with disciplined governance rather than a short-term cost-saving exercise.

References

Adams, J. (2020a). ‘Hugo Clinic Notes Theme’. Available at: https://jmablog.com/post/hugo-clinic-notes/ (Accessed: 5 Nov. 2025).

Adams, J. (2020b). ‘Hugo Clinic Notes’. GitHub repository. Available at: https://github.com/jmablog/hugo-clinic-notes (Accessed: 5 Nov. 2025).

Antwi, M., Adnane, A., Ahmad, F., Hussain, R., Habib ur Rehman M. and Kerrache, C.A. (2021). ‘The case of HyperLedger Fabric as a blockchain solution for healthcare applications’, Blockchain: Research and Applications, 2 (1), pp. 1-15, doi: https://doi.org/10.1016/j.bcra.2021.100012.

Burstein, A. (2020a). ‘Improving Health Care with Plain-Text Medical Records and Git’. Available at: https://www.gizra.com/content/plain-text-medical-records/ (Accessed: 5 Nov. 2025).

Burstein, A. (2020b). ‘mdr-git’. Github repository. Available at: https://github.com/amitaibu/mdr-git (Accessed: 5 Nov. 2025).

Kulantuga, J. and Szilagyi, A (2025). ‘fasten-onprem’, GitHub repository. Available at: https://github.com/fastenhealth/fasten-onprem (Accessed: 10 Nov. 2025).

Kumari, D., Parmar, A.S., Goyal, H.S., Mishra, K. and Panda S. (2024). ‘HealthRec-Chain: Patient-centric blockchain enabled IPFS for privacy preserving scalable health data’, Computer Networks, 241, p. 110223, doi: https://doi.org/10.1016/j.comnet.2024.110223.

Reen, G. S., Mohandas, M. and Venkaresan S. (2019). ‘Decentralized Patient Centric e-Health Record Management System using Blockchain and IPFS’, IEEE. Available at:https://arxiv.org/pdf/2009.14285 (Accessed: 6 Nov. 2025).

Shi S., He, D., Li, L., Khan N., Khan, M. K. and Choo, K-K. R. (2020). ‘Applications of blockchain in ensuring the security and privacy of electronic health record systems: A survey’, Computers & Security, 97, pp. 1-20. doi: https://doi.org/10.1016/j.cose.2020.101966.

Wack, M., Coulet, A., Burgun, A. and Bastien, R. (2025). ‘Enhancing clinical data warehousing with provenance data to support longitudinal analyses and large file management: The gitOmmix approach for genomic and image data’, Journal of Biomedical Informatics, 193, p. 104788, doi: https://doi.org/10.1016/j.jbi.2025.104788 (Accessed: 5 Nov. 2025).

Development Tools

This project includes comprehensive Rust formatting and linting tools to maintain code quality.

Quick Commands

# Format code
./scripts/fmt.sh
# or
cargo fmt --all

# Run linter
./scripts/lint.sh  
# or
cargo clippy --all-targets --all-features -- -D warnings

# Run all quality checks
./scripts/check-all.sh

Pre-commit Hooks

The project uses pre-commit hooks to automatically check code quality before commits:

# Install pre-commit hooks
pre-commit install

# Run hooks manually on all files
pre-commit run --all-files

# Run specific hook
pre-commit run cargo-clippy

# Auto-fix formatting (manual stage)
pre-commit run cargo-fmt-fix --hook-stage manual

Configuration Files

rustfmt.toml - Code formatting configuration
clippy.toml - Linting rules and thresholds
.vscode/settings.json - VS Code settings for Rust development

Available Hooks

cspell - Spell checking for code and comments
cargo-fmt-check - Formatting validation (runs on commit)
cargo-clippy - Linting and code analysis (runs on commit)
cargo-check - Compilation check (runs on commit)
cargo-fmt-fix - Auto-format code (manual stage only)
cargo-test - Run tests (manual stage only)

VS Code Integration

The project includes VS Code settings that:

Enable format-on-save with rustfmt
Run clippy on save for real-time linting
Configure proper Rust file associations
Set up PATH for cargo/rustc tools

VPR – Versioned Patient Repository

Note: This document provides a high-level overview. For detailed technical specifications, see LLM Specification.

Purpose

Store patient records in a version-controlled manner, using Git.
Serve those records fast to clinicians, admins, or patients.
Keep everything accurate, secure, and auditable.

Technology Choices

Rust for everything (fast, safe, compiled to a single binary).
gRPC and REST APIs for system integration (fast, typed communication between systems).
Git as the underlying truth for documents (every version saved, nothing silently overwritten).
File-based storage with sharded directory structure for scalability.
Future: database projections (Postgres) and caching (Redis) for performance optimisation (planned).

Data Model

Records are stored as YAML and Markdown files inside Git repositories, versioned automatically.
Each patient has three separate Git repositories:
- Clinical repository: openEHR-based clinical content (observations, diagnoses, clinical letters)
- Demographics repository: FHIR-based patient demographics (name, date of birth, identifiers)
- Coordination repository (Care Coordination Repository): care coordination data (encounters, appointments, episodes, referrals) – format to be determined, may adopt FHIR ideologies
Patient data is sharded: patient_data/{clinical,demographics,coordination}/<s1>/<s2>/<uuid>/ where s1/s2 are first 4 hex chars of UUID.
Every new change makes a new Git commit, never overwriting the old one.
Commits can be cryptographically signed (ECDSA P-256) for authorship verification.

API

Dual transport: gRPC (tonic) and REST (axum/utoipa).
Create patient – initialise new patient with demographics and clinical template.
List patients – retrieve patient list from sharded directory structure.
Health endpoints – confirm service availability.
API authentication via API keys (gRPC and REST when enabled).
OpenAPI/Swagger documentation for REST endpoints.

Security

All communication uses encryption (TLS).
API key authentication for gRPC; REST authentication configurable.
Optional mTLS support planned.
Data on disk can be encrypted if required.
Commit signing with X.509 certificates for authorship verification.
PHI redaction in logs and metrics.

Corrections & Deletions

Normal use is append-only (you don’t delete history).
If wrong patient data is added:
- Prefer redaction (mark as wrong but leave audit trail).
- If legally required, remove with a special process (cryptographic erase or repo rewrite).

Performance Approach

Sharded directory structure to maintain predictable filesystem performance.
Clinical template seeded from validated template directory at patient creation.
Future: database projections and caching layer for API reads (planned).
Git operations per-patient ensure isolation and manageable repository sizes.

Reliability

Every change tracked in Git with complete audit trail.
Provenance (who did what and when) captured in Git commit metadata.
Commit signatures provide cryptographic proof of authorship where configured.
Defensive programming: validate inputs before side effects, fail fast on invalid config.

Operations

Runs as dual-service binary (vpr-run) or standalone gRPC/REST services.
Configured by environment variables (patient data dir, clinical template dir, RM system version, namespace, API keys, bind addresses).
CLI tool (vpr-cli) for administrative tasks.
Docker development environment with live reload.
Quality checks: ./scripts/check-all.sh (fmt, clippy, check, test).

Cargo features

A feature flag for code builds.

Features needed for a patient to view and edit their own records:

cargo build --features patient

Features needed for clinicians and admins to manage records in a multi-patient environment:

cargo build --features org

Architecture Boundaries

crates/core – Pure data operations: file/folder management, Git versioning, patient data CRUD. No API concerns.
crates/api-shared – Shared utilities: Protobuf types, HealthService, authentication.
crates/api-grpc – gRPC-specific implementation: VprService, interceptors.
crates/api-rest – REST-specific implementation: HTTP endpoints, OpenAPI.
crates/certificates – X.509 certificate generation for authentication and commit signing.
crates/cli – Command-line interface for administrative operations.

Wrong patient

Redact
Stub
- Preserve cryptographic proof of what was removed
- Hashed Message Authentication Code (mathematical fingerprint of the original data)
Quarantine vault
- Quarantine bytes

tombstone locally, escrow the content in a restricted space, and leave a non-revealing hash pointer for audit.

Healthcare Standards

VPR is built on two foundational healthcare standards: OpenEHR and FHIR. These standards provide complementary capabilities for clinical data management and interoperability.

Overview

OpenEHR

OpenEHR provides a vendor-independent architecture for storing and managing clinical data with built-in versioning, semantic interoperability, and clinical knowledge separation.

VPR uses OpenEHR for:

Clinical record structure and composition model
EHR status tracking and identity linkage
Version-controlled clinical data management
Archetype-based semantic definitions

FHIR

Fast Healthcare Interoperability Resources (FHIR) is a modern standard for exchanging healthcare data via RESTful APIs, with emphasis on ease of implementation and web-friendly formats.

VPR uses FHIR for:

Coordination repository wire formats
Messaging thread semantics (Communication resource)
Future API projections and integrations
Interoperability with external systems

Complementary Roles

OpenEHR and FHIR serve different but complementary purposes in VPR:

Aspect	OpenEHR	FHIR
Primary focus	Long-term clinical data storage	Real-time data exchange
Architecture	Repository-based, versioned	API-first, resource-based
Granularity	Document-level (Compositions)	Element-level (Resources)
Versioning	Built-in, audit-focused	Optional, implementation-specific
Clinical modeling	Archetypes + Templates	Profiles + Implementation Guides
Best for	EHR systems, clinical archives	HIE, mobile apps, integrations

VPR’s Hybrid Approach

VPR combines the strengths of both standards:

OpenEHR for Clinical Records:

Compositions stored in clinical repository
Full version history via Git
Archetype-based semantic structure
Long-term clinical archive

FHIR for Coordination:

Communication semantics for messaging
RESTful API patterns for future integration
Resource-based wire formats
Interoperability with external systems

This hybrid approach provides:

Best-in-class storage: OpenEHR’s robust clinical data model
Best-in-class exchange: FHIR’s practical API standards
Future flexibility: Can project either standard externally
Standards alignment: Both use standard terminologies (SNOMED, LOINC)

Design Principles

Semantic Preservation

VPR maintains the meaning of both standards:

OpenEHR composition structure is preserved
FHIR resource semantics are followed
Mappings between standards are explicit
No information loss in either direction

Implementation Pragmatism

VPR adapts standards for version-controlled storage:

YAML instead of JSON/XML for human readability
Git instead of database for version control
File-based storage for simplicity and auditability
Cryptographic signing for integrity

Progressive Enhancement

VPR can add standard APIs incrementally:

Core storage model is standards-aligned
APIs can be added without changing storage
Multiple projections possible (OpenEHR API, FHIR API, GraphQL)
Storage remains authoritative source

Standards Governance

OpenEHR Foundation

Develops and maintains OpenEHR specifications
Curates archetype repositories (Clinical Knowledge Manager)
Provides conformance testing
International community of users

VPR Compliance:

Uses OpenEHR Reference Model structures
Declares RM version in all files
Follows composition and versioning semantics
Compatible with OpenEHR tooling (parsers, validators)

HL7 International

Develops and maintains FHIR specifications
Manages terminology and code systems
Provides implementation guides and profiles
Large ecosystem of vendors and implementers

VPR Compliance:

Uses FHIR resource semantics (conceptual alignment)
Wire formats map to FHIR resources
Can project to FHIR REST API
Compatible with FHIR tooling (validators, servers)

External Resources

OpenEHR

FHIR

OpenEHR Standard

Overview

OpenEHR is an open standard specification for electronic health records (EHR) that provides a vendor-independent, future-proof architecture for storing and managing clinical data. Developed by the OpenEHR Foundation, it separates clinical knowledge (archetypes) from technical implementation, enabling healthcare systems to evolve without requiring system rewrites.

Core Concepts

Reference Model (RM):

The Reference Model defines the stable, information structures for representing EHR data. It includes:

Compositions: Documents or clinical encounters (e.g., discharge summaries, lab reports)
Entries: Individual clinical statements (observations, evaluations, instructions, actions)
Data structures: Elements, items, clusters for organizing clinical data
Version control: Built-in versioning for all clinical data

Archetypes:

Archetypes are reusable, computable definitions of clinical concepts (e.g., “blood pressure”, “medication order”). They:

Define the structure and constraints for specific clinical concepts
Are vendor-neutral and language-independent
Can be shared across systems and jurisdictions
Are maintained in centralized repositories (Clinical Knowledge Manager)

Templates:

Templates combine multiple archetypes into specific clinical documents (e.g., “Emergency Department Admission”, “Diabetes Review”). They:

Constrain archetypes further for specific use cases
Define which archetypes are mandatory or optional
Specify terminology bindings
Configure the data collection interface

Terminology Integration:

OpenEHR supports binding to standard terminologies:

SNOMED CT (clinical terms)
LOINC (laboratory and clinical observations)
ICD-10/ICD-11 (diagnoses)
Local terminologies as needed

Problems OpenEHR Solves

1. Semantic Interoperability

Problem: Different EHR systems represent the same clinical concept in incompatible ways, making data exchange difficult and error-prone.

Solution: Archetypes provide standardized, computable definitions of clinical concepts that work across systems. A “blood pressure” archetype means the same thing regardless of vendor.

2. Vendor Lock-in

Problem: Healthcare organizations become dependent on proprietary EHR systems, making migration expensive and risky.

Solution: OpenEHR’s vendor-neutral data model allows data to be stored in a portable format. Organizations can switch systems without data conversion.

3. Clinical Knowledge Evolution

Problem: Medical knowledge evolves faster than software development cycles. Adding new clinical concepts requires expensive system updates.

Solution: Archetypes can be created, modified, and deployed independently of the underlying software. Clinicians and informaticians can define new concepts without programmer intervention.

4. Data Quality and Validation

Problem: EHR systems often allow inconsistent or invalid data entry, compromising clinical safety.

Solution: Archetypes define constraints and validation rules at the clinical knowledge level, ensuring data quality at the point of entry.

5. Longitudinal Health Records

Problem: Patient data is fragmented across multiple systems, time periods, and care settings.

Solution: OpenEHR’s version-controlled composition model maintains complete audit trails and supports lifelong health records across organizational boundaries.

6. Research and Analytics

Problem: Clinical data locked in proprietary formats is difficult to query for research and quality improvement.

Solution: OpenEHR’s structured, semantically-defined data supports sophisticated querying (via AQL - Archetype Query Language) and data extraction.

How OpenEHR is Normally Used in Digital Health

1. National EHR Programs

OpenEHR is used for national-scale EHR deployments:

Norway: National EHR platform (Helse Vest)
Slovenia: National EHR infrastructure
Brazil: Public health information systems
Russia: National digital health initiatives

These implementations provide unified clinical data repositories serving entire populations.

2. Hospital Information Systems

OpenEHR-based clinical data repositories (CDRs) serve as:

Central clinical data stores for hospital groups
Integration hubs connecting departmental systems
Long-term clinical archives replacing legacy systems

3. Clinical Decision Support

OpenEHR’s structured data enables:

Rules-based clinical decision support
Guideline execution engines
Drug interaction checking
Clinical pathways automation

4. Research Data Platforms

OpenEHR supports:

Cohort identification for clinical trials
Observational research databases
Quality improvement analytics
Population health monitoring

5. Citizen Health Records

OpenEHR powers patient portals and personal health records:

Patient-accessible health data
Patient-entered observations (blood pressure, glucose)
Shared decision-making tools
Care plan tracking

6. Specialized Clinical Systems

OpenEHR is used in domain-specific applications:

Intensive care monitoring systems
Oncology treatment records
Maternal and child health tracking
Chronic disease management

How OpenEHR is Used in VPR

1. Clinical Record Structure

VPR uses OpenEHR Reference Model structures for clinical compositions:

EHR Status:

Every patient has an ehr_status.yaml file following OpenEHR’s EHR_STATUS specification:

_type: EHR_STATUS
subject:
  _type: PARTY_SELF
is_queryable: true
is_modifiable: true
uid:
  _type: HIER_OBJECT_ID
  value: "a4f91c6d-3b2e-4c5f-9d7a-1e8b6c0a9f12"

This provides:

Patient identity linkage
Record queryability flags
Modification permissions
External references to demographics

Compositions:

Clinical documents (letters, observations) use OpenEHR COMPOSITION structure:

_type: COMPOSITION declares the document type
name provides human-readable document title
archetype_node_id identifies the template/archetype used
uid provides version-controlled unique identifier
context captures care setting metadata
content contains the clinical data entries

Example composition.yaml for a clinical letter:

_type: COMPOSITION
name:
  _type: DV_TEXT
  value: Clinical Letter
archetype_node_id: openEHR-EHR-COMPOSITION.correspondence.v0
uid:
  _type: HIER_OBJECT_ID
  value: "20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000"
language:
  _type: CODE_PHRASE
  terminology_id:
    _type: TERMINOLOGY_ID
    value: ISO_639-1
  code_string: en

2. Version-Controlled Repository Model

VPR adopts OpenEHR’s versioning philosophy:

Immutability:

Clinical compositions are immutable once committed
Changes create new versions with full audit trail
Git provides the versioning infrastructure
Every composition has a unique timestamp-prefixed ID

Contribution Model:

Each Git commit represents an OpenEHR CONTRIBUTION:

Contains one or more VERSION objects
Records who made the change (commit author)
Records when the change occurred (commit timestamp)
Records why the change was made (commit message)

3. Semantic Interoperability

VPR uses OpenEHR conventions for:

Reference Model Version:

All files declare their RM version for compatibility:

_rm_version: "1.1.0"

This ensures:

Parsers know which specification to apply
Forward/backward compatibility can be managed
Systems can validate against the correct schema

Type Annotations:

Every complex object declares its _type for unambiguous parsing:

_type: COMPOSITION
_type: DV_TEXT
_type: DV_CODED_TEXT
_type: PARTY_SELF

4. Clinical Data Query Support

VPR’s structured data enables OpenEHR-style querying:

Archetype paths:

Data elements are addressable via standardized paths:

/content[openEHR-EHR-OBSERVATION.blood_pressure.v2]/data/events[at0006]/data/items[at0004]/value

This allows:

Precise data extraction
Cross-system queries
Research cohort identification
Quality improvement analytics

5. Template-Based Data Collection

VPR uses OpenEHR templates for:

Clinical Document Templates:

Templates stored in crates/core/templates/clinical/ define:

Which archetypes are included
Mandatory vs. optional elements
Terminology bindings
Default values and constraints

Initialization from Templates:

When creating a new clinical record, VPR:

Validates the template directory exists
Copies template files to the patient’s repository
Initializes ehr_status.yaml with proper structure
Commits the initial state to Git

6. Deviations from Standard OpenEHR

VPR adapts OpenEHR for a version-controlled repository model:

Storage Format:

Uses YAML instead of JSON or XML for human readability
One composition per file for Git-friendly diffs
Markdown for narrative content (e.g., letter body)

Server Architecture:

No OpenEHR REST API server
No query engine (yet)
File-based storage instead of database
Git instead of versioning database

Rationale:

This provides:

Human-readable audit trails
Standard version control tooling
Cryptographic signing and verification
Distribution and replication via Git
No runtime database dependencies

7. Future OpenEHR Integration

VPR is designed to support future OpenEHR capabilities:

Archetype Query Language (AQL):

The structured data format will support AQL queries:

SELECT
  c/uid/value,
  c/context/start_time,
  o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value
FROM
  EHR e
  CONTAINS COMPOSITION c
  CONTAINS OBSERVATION o[openEHR-EHR-OBSERVATION.blood_pressure.v2]
WHERE
  o/data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/magnitude > 140

API Projections:

VPR compositions can be projected to:

OpenEHR REST API responses
RM-compliant JSON
Canonical XML format
FHIR resources (via mappings)

Template Server:

Future template management:

Operational Template (OPT) import
Template validation
Web-based template designer integration
Archetype repository synchronization

References

FHIR Standard

Overview

Fast Healthcare Interoperability Resources (FHIR) is a modern healthcare data exchange standard developed by HL7 International. Released in 2014, FHIR combines the best features of HL7 v2, v3, and CDA while leveraging web technologies (REST, JSON, OAuth) to provide a practical, implementer-friendly approach to health data interoperability.

Core Concepts

Resources:

FHIR defines ~150 modular “resources” representing healthcare concepts:

Clinical: Patient, Observation, Condition, Procedure, MedicationStatement
Administrative: Encounter, Practitioner, Organization, Location
Financial: Claim, Coverage, PaymentNotice
Workflow: Task, Appointment, ServiceRequest
Infrastructure: Bundle, OperationOutcome, CapabilityStatement

Each resource:

Has a defined structure (elements and data types)
Can be represented as JSON, XML, or RDF
Includes human-readable narrative
Supports extensibility via extensions
Has a defined lifecycle and versioning model

RESTful API:

FHIR uses HTTP for all interactions:

GET /Patient/123 - Read a patient
POST /Observation - Create an observation
PUT /Condition/456 - Update a condition
DELETE /MedicationStatement/789 - Remove (or mark inactive)
GET /Patient?name=Smith - Search for patients

Profiles and Implementation Guides:

FHIR can be constrained for specific use cases:

Profiles: Constrain resources for particular jurisdictions or domains
Implementation Guides: Collections of profiles, value sets, and documentation
Examples: US Core, UK Core, International Patient Summary (IPS)

Terminology Integration:

FHIR supports standard terminologies:

CodeableConcept data type for coded values
ValueSets for allowed codes
ConceptMaps for code translation
Built-in support for SNOMED CT, LOINC, RxNorm, ICD-10, etc.

Extensions:

FHIR allows extending resources without breaking compatibility:

Standard extensions (e.g., patient ethnicity, race)
Local extensions for organization-specific needs
Extensions can be profiled and constrained

Problems FHIR Solves

1. API-First Health Data Exchange

Problem: Legacy standards (HL7 v2, CDA) weren’t designed for modern web APIs, making integration complex and expensive.

Solution: FHIR uses RESTful HTTP APIs that web developers understand. OAuth 2.0 for security, JSON for data format, and standard HTTP verbs make integration straightforward.

2. Implementation Complexity

Problem: HL7 v3 and CDA were powerful but extremely complex, leading to inconsistent implementations and high development costs.

Solution: FHIR prioritizes the “80% use case” with simple, practical designs. Complex scenarios are supported but don’t burden simple implementations.

3. Granular Data Access

Problem: Document-based standards (CDA) require exchanging entire documents when only specific data elements are needed.

Solution: FHIR resources are granular (e.g., single Observation for one vital sign). Systems retrieve only what they need, reducing bandwidth and processing overhead.

4. Mobile and Consumer Health

Problem: Legacy standards weren’t designed for patient-facing applications or mobile devices.

Solution: FHIR’s lightweight JSON format, RESTful APIs, and OAuth security work naturally with mobile apps and patient portals. SMART on FHIR enables app ecosystems.

5. Real-Time Clinical Decision Support

Problem: Batch-oriented standards delay clinical decision support until data is processed and stored.

Solution: FHIR’s API model supports real-time CDS Hooks—contextual cards that appear during clinical workflow without disrupting the EHR.

6. Data Heterogeneity

Problem: Healthcare data comes in many forms (structured, narrative, images, documents), and legacy standards handle some better than others.

Solution: FHIR resources accommodate:

Structured coded data (Observation with LOINC codes)
Narrative text (DomainResource.text)
Binary data (DocumentReference, Media)
Mixed content (DiagnosticReport with narrative + structured results)

7. International Adoption

Problem: Different countries have different healthcare models, terminologies, and regulations, making global standards difficult.

Solution: FHIR’s profiling mechanism allows local adaptation while maintaining core compatibility. US Core, UK Core, Australian Base, and others all build on the same foundation.

How FHIR is Normally Used in Digital Health

1. Health Information Exchange (HIE)

FHIR enables data sharing across organizations:

Query-based exchange: Pull patient data from other systems when needed
Subscription-based exchange: Get notified when patient data changes
Bulk data export: Extract large datasets for research or migration
National networks: CommonWell, Carequality (US), Summary Care Record (UK)

2. Patient Access to Health Records

FHIR powers patient-facing applications:

Patient portals: View records, request appointments, message providers
Mobile health apps: Apple Health, Google Fit integration
SMART on FHIR apps: Patient selects apps that access their EHR data
Blue Button 2.0: US Medicare beneficiaries download their claims data

3. Provider Access to External Data

FHIR brings outside data into clinical workflow:

CDS Hooks: Real-time clinical decision support during ordering
SMART on FHIR: Clinician-facing apps launch from within EHR
Payer data exchange: Claims history informs clinical care
Social determinants: Community resource directories, housing, food access

4. Clinical Research and Registries

FHIR supports research data collection:

HL7 FHIR Bulk Data: Extract cohorts for research studies
REDCap on FHIR: Capture study data in FHIR format
Quality registries: Automated reporting to cancer, cardiac registries
Phenotyping: Identify eligible patients for trials

5. Population Health and Value-Based Care

FHIR enables population-level analytics:

Risk stratification: Identify high-risk patients for intervention
Gap closure: Find patients missing preventive care
Care coordination: Track care plan execution across providers
Quality measurement: Automated HEDIS, CQM reporting

6. Public Health Reporting

FHIR modernizes public health surveillance:

Electronic case reporting (eCR): Automated notifiable disease reporting
Immunization forecasting: Calculate due/overdue vaccines
Lab result reporting: ELR via FHIR Observation
COVID-19 reporting: Vaccine administration, case reports, lab results

7. Payer-Provider Data Exchange

FHIR improves administrative efficiency:

Prior authorization: Check coverage and submit auth requests via FHIR
Formulary checking: Real-time medication coverage lookup
Claims attachments: Send supporting documentation with claims
Coverage discovery: Find patient’s insurance coverage

8. Clinical Decision Support

FHIR enables evidence-based care:

CDS Hooks: Cards appear at the right time (e.g., “Consider diabetes screening”)
Order sets: FHIR RequestGroup for protocol-driven ordering
Care plans: FHIR CarePlan for chronic disease management
Drug interaction checking: FHIRcast for real-time prescription review

How FHIR is Used in VPR

1. Wire Format for Coordination Data

VPR uses FHIR-aligned wire formats for coordination repository metadata:

Conceptual Alignment, Not Implementation:

VPR does not implement:

FHIR REST APIs
FHIR JSON or XML formats
FHIR resource validation
FHIR server capabilities

Instead, VPR uses FHIR semantics in YAML wire formats:

COORDINATION_STATUS.yaml:

Tracks coordination repository lifecycle:

coordination_id: "7f4c2e9d-4b0a-4f3a-9a2c-0e9a6b5d1c88"
clinical_id: "a4f91c6d-3b2e-4c5f-9d7a-1e8b6c0a9f12"
status:
  lifecycle_state: active
  record_open: true
  record_queryable: true
  record_modifiable: true

This corresponds conceptually to resource status tracking in FHIR.

Thread ledger.yaml:

Messaging thread metadata uses FHIR Communication resource semantics:

communication_id: 20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000
status: open # Maps to Communication.status
participants:
  - participant_id: 4f8c2a1d-9e3b-4a7c-8f1e-6b0d-2c5a9f12
    role: clinician # Maps to Communication.recipient
    display_name: Dr Jane Smith

Key mappings:

communication_id → Communication.identifier
status → Communication.status (open=in-progress, closed=completed, archived=stopped)
participants → Communication.recipient array
created_at → Communication.sent
visibility.sensitivity → Communication.meta.security

2. FHIR Module in VPR Core

The fhir crate provides wire format handling:

Module: fhir::CoordinationStatus

#![allow(unused)]
fn main() {
// Parse COORDINATION_STATUS.yaml
let status_data = fhir::CoordinationStatus::parse(yaml_text)?;

// Render to YAML
let yaml = fhir::CoordinationStatus::render(&status_data)?;
}

Domain types:

CoordinationStatusData - Top-level structure
StatusInfo - Status details
LifecycleState - Active, Suspended, Closed

Module: fhir::Messaging

#![allow(unused)]
fn main() {
// Parse thread ledger.yaml
let ledger_data = fhir::Messaging::ledger_parse(yaml_text)?;

// Render to YAML
let yaml = fhir::Messaging::ledger_render(&ledger_data)?;
}

Domain types:

LedgerData - Thread metadata
ThreadStatus - Open, Closed, Archived
LedgerParticipant - Participant with role
ParticipantRole - Clinician, Patient, CareTeam, System

3. Semantic Preservation for Future Projections

VPR’s FHIR-aligned design enables future conversions:

FHIR Communication Projection:

VPR messaging threads can be projected to FHIR Communication resources:

{
  "resourceType": "Communication",
  "id": "20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000",
  "status": "in-progress",
  "sent": "2026-01-11T14:35:22.045Z",
  "recipient": [
    {
      "reference": "Practitioner/4f8c2a1d-9e3b-4a7c-8f1e-6b0d-2c5a9f12",
      "display": "Dr Jane Smith"
    }
  ],
  "payload": [
    {
      "contentString": "Patient has reported increasing shortness of breath..."
    }
  ]
}

FHIR Task Projection:

Future coordination tasks could map to FHIR Task resources:

Task.status - requested, accepted, in-progress, completed
Task.intent - order, plan, proposal
Task.code - Type of task
Task.for - Patient reference
Task.owner - Responsible practitioner
Task.requester - Who requested the task

FHIR DocumentReference:

OpenEHR compositions could be exposed as FHIR DocumentReference:

{
  "resourceType": "DocumentReference",
  "status": "current",
  "type": {
    "coding": [
      {
        "system": "http://loinc.org",
        "code": "34133-9",
        "display": "Summary of episode note"
      }
    ]
  },
  "content": [
    {
      "attachment": {
        "contentType": "application/yaml",
        "url": "/clinical/a4/f9/a4f91c6d.../composition.yaml"
      }
    }
  ]
}

4. API Gateway Projection

VPR can expose FHIR APIs via an API gateway:

REST API (future):

GET /fhir/Communication?subject=Patient/123
GET /fhir/Patient/123
POST /fhir/Communication
PUT /fhir/Communication/456

The API gateway would:

Receive FHIR REST requests
Translate to VPR operations
Execute on Git-based repository
Project results to FHIR format
Return FHIR responses

GraphQL API (future):

query {
  patient(id: "123") {
    name
    communications {
      sent
      sender
      payload
    }
  }
}

5. Terminology Binding

VPR uses FHIR’s approach to coded data:

Participant Roles:

role: clinician # Maps to FHIR ParticipantRole value set

Future binding to standard terminologies:

SNOMED CT for clinical concepts
LOINC for observations and documents
Local code systems for organization-specific concepts

Visibility/Sensitivity:

sensitivity: confidential # Maps to FHIR security labels

Alignment with:

http://terminology.hl7.org/CodeSystem/v3-Confidentiality
Values: N (normal), R (restricted), V (very restricted)

6. FHIR Bulk Data Export

VPR’s Git-based storage supports bulk data patterns:

Patient-level export:

GET /fhir/$export?_type=Communication,Observation,Condition

Would generate:

NDJSON files with FHIR resources
Parallel processing of patient repositories
Streaming output via polling pattern

Group-level export:

GET /fhir/Group/high-risk-patients/$export

Cohort definition → Git repository query → FHIR resource generation

7. SMART on FHIR Integration

VPR can support SMART app launches:

Standalone Launch:

App redirects to VPR authorization endpoint
User authenticates and authorizes scopes
App receives access token
App queries VPR FHIR API

EHR Launch:

EHR launches SMART app with context (patient, encounter)
App exchanges launch token for access token
App queries VPR for contextual data

Scopes:

patient/Communication.read - Read patient’s messages
patient/Observation.read - Read patient’s observations
user/Practitioner.read - Read clinician’s profile
launch/patient - Patient context available

8. CDS Hooks Integration

VPR could provide clinical decision support:

Hook: patient-view

Triggered when clinician opens patient chart:

{
  "hookInstance": "abc123",
  "hook": "patient-view",
  "context": {
    "patientId": "123",
    "userId": "Practitioner/456"
  }
}

VPR could return cards suggesting:

Unread messages in coordination threads
Overdue care plan activities
Missing documentation

9. FHIR Subscriptions

VPR could support change notifications:

Subscription creation:

{
  "resourceType": "Subscription",
  "status": "requested",
  "criteria": "Communication?subject=Patient/123",
  "channel": {
    "type": "rest-hook",
    "endpoint": "https://example.org/webhook",
    "payload": "application/fhir+json"
  }
}

Git post-receive hooks could trigger subscription notifications.

10. Deviations from Standard FHIR

VPR adapts FHIR concepts for version-controlled storage:

Storage:

Git repositories, not FHIR server databases
YAML wire formats, not JSON/XML
File-based, not API-first

Versioning:

Git commits, not FHIR resource versions
Immutable files, not REST versioning
Complete history always available

Search:

File system traversal, not FHIR search parameters (yet)
Git log queries, not database queries
Future: AQL or FHIR search translation layer

Transactions:

Git atomic commits, not FHIR Bundle transactions
Repository-level consistency, not resource-level

Rationale:

This provides:

Human-readable audit trails
Cryptographic signing and verification
Distributed version control
No runtime database dependencies
Standard tooling (Git, text editors)

Future FHIR Integration

VPR’s FHIR-aligned design supports progressive enhancement:

Near-term (Phase 1)

REST API gateway: Expose FHIR resources via HTTP
Read-only operations: GET for Communication, Patient, Practitioner
Basic search: ?subject, ?date, ?status parameters
SMART on FHIR: OAuth 2.0 authorization for app access

Medium-term (Phase 2)

Write operations: POST, PUT for creating/updating resources
Bulk data export: System-level and patient-level export
FHIR Subscriptions: Webhook notifications for changes
Advanced search: Full FHIR search parameter support

Long-term (Phase 3)

CDS Hooks: Real-time clinical decision support integration
FHIR Questionnaire: Structured data collection forms
GraphQL API: Flexible querying alternative to REST
FHIR Mapping Language: Automated OpenEHR ↔ FHIR translation

References

Technical

See Design Decisions for more information on architecture and design choices.

Containers

Docker

Language

Rust

APIs

VPR provides two API interfaces for accessing patient records:

gRPC API

High-performance, type-safe API using Protocol Buffers and tonic.

Port: 50051
Protocol: HTTP/2 + Protocol Buffers
Authentication: API key via x-api-key header
See gRPC API Documentation

To start the grpcui viewer:

j g

REST API

HTTP/JSON API with OpenAPI documentation and Swagger UI.

Port: 3000
Protocol: HTTP/JSON
Interactive documentation: http://localhost:3000/swagger-ui/
See REST API Documentation

Linting

Rust Clippy markdownlint

Spelling

cspell

Pre-commit

pre-commit

Crate Separation

The VPR project uses a modular crate structure to maintain clear separation of concerns and enforce architectural boundaries:

Core Crates

crates/core (vpr-core): Contains pure data operations only. Handles file/folder management, patient repositories (clinical, demographics, coordination), and Git-based versioning. No API concerns. Provides the foundational services: ClinicalService, DemographicsService, CoordinationService, PatientService.
crates/files (vpr-files): Content-addressed file storage for binary attachments. Implements SHA-256-based immutable file storage with two-level sharding. Used by clinical repository for letter attachments.
crates/uuid (vpr-uuid): UUID generation and sharding utilities. Provides ShardableUuid for creating two-level sharded directory structures.
crates/fhir: FHIR-aligned data types and enums. Provides MessageAuthor, AuthorRole, ThreadStatus, SensitivityLevel, LifecycleState for care coordination.
crates/openehr: OpenEHR data structures and validation. Used for clinical content modeling.
crates/certificates (vpr-certificates): X.509 certificate generation and validation for professional registrations. Supports ECDSA P-256 cryptographic signing.

API Crates

crates/api-shared (api-shared): Shared utilities and definitions for both APIs. Includes Protocol Buffer definitions (vpr.proto), message types, and common authentication utilities.
crates/api-grpc (api-grpc): gRPC-specific implementation. Uses VprService with authentication interceptors and tonic integration. All RPCs delegate to services from vpr-core.
crates/api-rest (api-rest): REST-specific implementation. Provides HTTP endpoints with OpenAPI/Swagger UI via axum and utoipa. All handlers delegate to services from vpr-core.

CLI and Deployment

crates/cli (vpr-cli): Command-line interface. Provides comprehensive CLI commands for all patient record operations. Directly uses services from vpr-core.
src/main.rs (vpr-run): Deployment binary that runs both gRPC and REST servers concurrently.

This separation ensures that data logic remains isolated from API specifics, making the codebase maintainable, testable, and allowing multiple deployment configurations from the same core.

Design decisions

This document captures the key architectural and governance decisions behind VPR, and the reasoning for each. The emphasis throughout is on auditability, clinical accountability, privacy, and long-term robustness.

A standing keystone for every decision is the pairing of patient-first intent with human-readable files as the canonical record. VPR should make patient agency primary while keeping the record legible, portable, and auditable as plain files.

File layouts can be seen at openEHR file structure.

Separation of demographics, clinical, and coordination data

VPR stores patient demographics, clinical data, and coordination data in separate repositories.

The demographics repository (equivalent to a Master Patient Index) contains personal identifiers such as name, date of birth, and national identifiers.
The clinical repository contains all medical content, including observations, diagnoses, clinical letters, and results.
The coordination repository (Care Coordination Repository) contains administrative and care coordination information such as encounters, episodes of care, appointments, and referrals.

The demographics repository is linked to the clinical repository via a reference stored in ehr_status.subject.external_ref. The coordination repository references both demographics (for patient identity) and clinical records (for clinical context).

This design follows established openEHR principles and provides several benefits:

Clinical data can be shared, versioned, and audited independently of personally identifiable information.
Coordination data (appointments, referrals, encounters) can be managed separately from clinical content, allowing administrative workflows to evolve independently.
Privacy risks are reduced by minimising the spread of identifiers.
Systems remain modular, allowing demographics, clinical, and coordination services to evolve separately.

In practice:

FHIR is used for demographics.
openEHR is used for structured clinical data.
Coordination data format is to be determined (may adopt FHIR ideologies for encounters, appointments, episodes).

Reference:
https://specifications.openehr.org/releases/1.0.1/html/architecture/overview/Output/design_of_ehr.html

Sharded directory structure

VPR uses sharded directory layouts to maintain predictable filesystem performance as the number of patient repositories grows.

Rather than placing all repositories in a single directory, repositories are distributed across subdirectories derived from a UUID prefix or hash. This avoids filesystem bottlenecks, improves lookup performance, and keeps Git operations efficient at scale.

Sharding ensures that the system remains performant and manageable even with very large numbers of patient records.

Testing strategy

VPR’s core functionality depends on real filesystem behaviour. As a result, tests are designed to interact with actual temporary directories, not mocked filesystems.

Using crates such as tempfile, tests create isolated, automatically cleaned-up directories that closely mirror production behaviour. This allows tests to validate:

directory creation and layout,
Git repository initialisation,
file permissions and naming,
serialisation and cleanup behaviour.

This approach keeps tests realistic while remaining safe, reproducible, and free from side effects on the developer’s machine.

Error handling: bespoke enums over anyhow

VPR uses bespoke error enums (for example PatientError in the core crate) rather than using anyhow::Result throughout.

This is a deliberate choice. In a clinical record system, failures are not just “an error message”: they often need to be handled consistently, audited, and mapped to user-facing outcomes.

Why bespoke enums

Stable failure contract: A named enum defines the set of failure modes VPR considers meaningful (for example invalid input, YAML parse failure, Git initialisation failure). This makes behaviour predictable as the code evolves.
Structured handling at boundaries: API layers (gRPC/REST) can map specific error variants to appropriate status codes and responses without relying on string matching.
Better testability: Tests can assert specific variants rather than brittle message strings, which improves confidence during refactors.
Separates domain intent from library detail: An enum can express domain-relevant failures while still carrying underlying errors where useful.

What we lose by not using anyhow everywhere

Less convenience: anyhow is excellent for rapid development and rich, contextual error chains with minimal boilerplate.
More plumbing: Explicit enums require writing variants and conversion/mapping code.

Where anyhow can still be appropriate

At application entrypoints (for example a CLI binary), anyhow can still be a good fit for turning errors into high-quality diagnostics and an exit code. VPR keeps this style out of the core library surface so that upstream layers can make deterministic decisions based on typed errors.

Defensive programming as a clinical safety requirement

VPR treats defensive programming as a baseline requirement for clinical-safe systems.

Clinical record software must behave predictably under bad inputs, misconfiguration, partial filesystem failures, or unexpected environmental state. In this context, “defensive” means prioritising safe failure and auditability over convenience.

In practice, VPR follows these principles:

Validate before side effects: inputs and configuration are checked up-front wherever feasible, before creating directories, writing files, or initialising repositories.
Bounded work: operations that could otherwise become unbounded (for example retries, directory traversal, or template copying) are explicitly limited to prevent pathological behaviour.
No silent fallbacks: invalid configuration or malformed inputs return a typed error rather than being coerced into a “best guess”.
Explicit error contracts: failures are represented as named enum variants (for example PatientError) to support consistent handling at API boundaries and reliable testing.
Best-effort rollback with surfaced failures: when partial work has been done, VPR attempts to clean up, and treats cleanup failures as meaningful (not something to quietly ignore).

Concrete examples of defensive measures include:

Limiting UUID allocation retries when allocating a new patient directory.
Performing preflight checks (for example template resolution and safety checks) before creating patient directories.
Rejecting unsafe filesystem entries (such as symlinks) and applying size/depth limits when copying templates to avoid accidental “copy the world” behaviours.
Returning a distinct error when initialisation fails and cleanup also fails, so operators can detect and investigate residual on-disk state.

These practices reduce the likelihood of corrupted or ambiguous record state, improve operational visibility when something goes wrong, and keep clinical behaviour deterministic.

Signed Git commits in VPR (summary)

VPR uses cryptographically signed Git commits to provide immutable, auditable authorship of clinical records.

For signed commits, VPR embeds a self-contained cryptographic payload directly in the commit object, not as files in the repository. This payload includes:

an ECDSA P-256 signature over the canonical commit content,
the author’s public signing key,
an optional X.509 certificate issued by a trusted authority (for example a professional regulator).

The private key is generated and held by the author and is never shared or stored in the repository.

Because all verification material is attached to the commit itself, signed VPR commits can be verified offline, years later, without reliance on external services. Each commit therefore acts as a sealed attestation linking the clinical change to a named, accountable professional identity.

Why X.509 certificates

VPR mandates the use of X.509 certificates for commit signing.

X.509 is the same widely adopted standard used for:

secure web traffic (Transport Layer Security),
encrypted email,
enterprise public key infrastructure,
regulated identity systems.

Each certificate binds a public key to a verified identity and supports expiry and revocation, making it suitable for regulated healthcare environments.

Other signing mechanisms were deliberately rejected:

SSH keys lack identity assurance, expiry, and revocation.
GPG relies on a decentralised web-of-trust model that does not align with formal clinical governance.

X.509 provides a hierarchical, auditable trust model that fits naturally with healthcare regulation and organisational identity management.

X.509 in the NHS (context)

In the NHS, X.509 certificates are primarily used for identity and authentication, not for signing individual clinical entries.

The trust anchor is the NHS Public Key Infrastructure, operated nationally.

Key uses include:

NHS smartcards, which authenticate clinicians as known individuals.
Role-based access control, where identity is established first and permissions applied separately.
Access to national services such as demographic services and summary care records.
System-to-system communication using mutual Transport Layer Security.
Formal electronic signatures for legal or regulatory workflows.

VPR builds on this familiar model but applies X.509 certificates to authorship of clinical record changes, rather than to login or transport security.

The patient’s voice

VPR supports both professional clinical entries and patient contributions within the same repository, using distinct artefact paths:

/clinical/ contains authoritative, professionally authored and signed records.
/patient/ contains patient-contributed material such as reported outcomes, symptom logs, or uploaded documents.

Patient input may inform clinical care, but it never overwrites clinical records without explicit professional review and a new signed commit.

This preserves patient voice while maintaining clinical accountability.

Single-branch repository policy

Each VPR repository uses a single authoritative branch: refs/heads/main.

While Git itself allows multiple branches, VPR enforces a single-branch policy at the system level. Branches may exist transiently during local operations, but only main is accepted as authoritative.

This ensures a single, linear clinical history and avoids ambiguity about competing versions of truth.

File format conventions in VPR

VPR uses different on-disk file formats depending on the nature of the clinical information, not based on technical fashion. The guiding principle is to optimise for human readability, auditability, and safe review, while remaining fully interoperable via APIs.

Rule of thumb

Choose the file format based on how the information is used and reviewed.

Narrative clinical content
(for example medical histories, clinic letters, discharge summaries, clinical reasoning)
→ Markdown with YAML front matter
Structured clinical measurements
(for example observations, blood tests, vital signs, scores)
→ YAML
Machine-dense or high-volume data
(for example large panels, waveforms, derived analytics outputs)
→ YAML by default; JSON only if interoperability tooling absolutely requires it
APIs and external integrations (REST/gRPC, internet-facing)
→ JSON by default (wire format and payload shape). Use YAML/Markdown only for offline/shared on-disk artefacts we control end-to-end, not for internet APIs.

Rationale

Markdown preserves clinical narrative, nuance, and intent, and produces clear, reviewable Git diffs.
YAML is human-readable, diff-friendly, and well suited to structured clinical data that may need manual review or audit.
YAML is the preferred structured format when human review in Git matters; only fall back to JSON when an external consumer requires it.
JSON remains available for interoperability edge cases, but should be avoided when YAML/Markdown will suffice.
APIs use JSON for internet-facing REST/gRPC. YAML/Markdown stay for on-disk/shared artefacts or tightly controlled internal flows, not for public API payloads.

This approach keeps clinical records legible to clinicians, robust under version control, and straightforward to serialise for external systems. The underlying data model remains the same regardless of file format; only the on-disk representation differs.

Data flow and query model in VPR

VPR is designed around a clear separation between clinical truth, performance, and user experience. This separation is deliberate and underpins the system’s safety, auditability, and scalability.

Canonical source of truth

VPR stores clinical truth in Git-backed files. These files (YAML and Markdown with YAML front matter) are the authoritative record of what was written, by whom, and when.

Git provides:

a complete, immutable history of change
authorship and provenance
the ability to reconstruct record state at any point in time

These files are optimised for correctness, audit, and human review, not for fast querying.

Interpretation into typed components

When files change, VPR:

Reads the updated files
Parses them into typed Rust components (the internal representation of clinical meaning)

These components are the semantic pivot of the system. They represent what the system understands clinically, independent of file format, Git, databases, or APIs.

Projection into databases and caches

Typed components are then projected into databases and caches to support:

indexing
fast search
filtering
aggregation
responsive user interfaces

Databases and caches store derived representations, not the canonical files themselves. They exist to answer questions efficiently, not to define truth.

Serving user-facing queries

All interactive user queries are served from:

databases
search indexes
caches

Git and on-disk files are not queried on the hot path. This keeps the user experience fast and predictable, even as the canonical record remains careful and auditable.

CQRS principles in VPR

This architecture follows the core principles of Command Query Responsibility Segregation (CQRS):

Commands (writes)
Change clinical state by creating or modifying Git-backed files.
This path is slow, deliberate, validated, and fully auditable.
Queries (reads)
Retrieve current, useful views of the data from database projections and caches.
This path is fast, flexible, and optimised for user needs.

The write model (files + Git) and the read model (databases + caches) are intentionally different and evolve independently.

A useful mental model

A simple way to think about the system is:

Git-backed files describe what happened; databases describe what is currently useful to know.

Both are essential. They answer different questions and are optimised for different purposes.

Summary

Git-backed files are the canonical clinical record
Rust components represent interpreted clinical meaning
Databases and caches provide fast, queryable projections
User-facing queries never depend on Git or raw files
CQRS-style separation keeps the system auditable, performant, and safe

This design allows VPR to combine strong clinical governance with a responsive modern user experience, without compromising either.

APIs

gRPC API

The VPR gRPC API provides high-performance, type-safe access to all patient record operations.

Overview

The gRPC API is built using:

tonic 0.12 - Rust gRPC framework
Protocol Buffers - For message serialization
Authentication - API key-based authentication via x-api-key header

Service Definition

The API is defined in crates/api-shared/vpr.proto.

Service: `VPR`

All RPC methods are grouped under the vpr.v1.VPR service.

Authentication

All requests require an x-api-key header:

grpcurl -H 'x-api-key: YOUR_API_KEY' localhost:50051 vpr.v1.VPR/Health

The API key is configured via the API_KEY environment variable.

Available RPCs

Health Check

Health - Returns service health status

Patient Management

CreatePatient - Creates a new patient record (legacy)
ListPatients - Lists all patients
InitialiseFullRecord - Creates complete patient record (demographics, clinical, coordination)

Demographics

InitialiseDemographics - Initialises new demographics repository
UpdateDemographics - Updates patient demographics (given names, last name, birth date)

Clinical

InitialiseClinical - Initialises new clinical repository
LinkToDemographics - Links clinical repository to demographics via EHR status
NewLetter - Creates new clinical letter with markdown content
ReadLetter - Retrieves letter content and metadata
NewLetterWithAttachments - Creates letter with binary file attachments
GetLetterAttachments - Retrieves letter attachments (metadata and binary content)

Coordination

InitialiseCoordination - Initialises new coordination repository
CreateThread - Creates messaging thread with participants
AddMessage - Adds message to existing thread
ReadCommunication - Reads thread with ledger and all messages
UpdateCommunicationLedger - Updates thread participants, status, visibility
UpdateCoordinationStatus - Updates coordination lifecycle state and flags

Example Usage with grpcurl

Create Full Patient Record

grpcurl -plaintext -import-path crates/api-shared -proto vpr.proto \
  -d '{
    "given_names": ["Emily"],
    "last_name": "Davis",
    "birth_date": "1985-03-20",
    "author_name": "Dr. Robert Brown",
    "author_email": "robert.brown@example.com",
    "author_role": "Clinician",
    "author_registrations": [{"authority": "GMC", "number": "5555555"}],
    "care_location": "City General Hospital"
  }' \
  -H 'x-api-key: YOUR_API_KEY' \
  localhost:50051 vpr.v1.VPR/InitialiseFullRecord

Create Letter

grpcurl -plaintext -import-path crates/api-shared -proto vpr.proto \
  -d '{
    "clinical_uuid": "a701c3a94bf34a939d831d6183a78734",
    "author_name": "Dr. Sarah Johnson",
    "author_email": "sarah.johnson@example.com",
    "author_role": "Clinician",
    "author_registrations": [{"authority": "GMC", "number": "7654321"}],
    "care_location": "GP Clinic",
    "content": "# Consultation\\n\\nPatient presented with hypertension."
  }' \
  -H 'x-api-key: YOUR_API_KEY' \
  localhost:50051 vpr.v1.VPR/NewLetter

Create Letter with Attachments

Binary attachments are sent as base64-encoded bytes:

# Encode file to base64
base64 -i /path/to/file.pdf

grpcurl -plaintext -import-path crates/api-shared -proto vpr.proto \
  -d '{
    "clinical_uuid": "a701c3a94bf34a939d831d6183a78734",
    "author_name": "Dr. Chen",
    "author_email": "chen@example.com",
    "author_role": "Clinician",
    "care_location": "Hospital Lab",
    "attachment_files": ["<base64_content>"],
    "attachment_names": ["lab_results.pdf"]
  }' \
  -H 'x-api-key: YOUR_API_KEY' \
  localhost:50051 vpr.v1.VPR/NewLetterWithAttachments

Create Communication Thread

grpcurl -plaintext -import-path crates/api-shared -proto vpr.proto \
  -d '{
    "coordination_uuid": "da7e89a2a51647db89430dc3a781abb0",
    "author_name": "Dr. Brown",
    "author_email": "brown@example.com",
    "author_role": "Clinician",
    "care_location": "City Hospital",
    "participants": [
      {"id": "a701c3a94bf34a939d831d6183a78734", "name": "Dr. Brown", "role": "clinician"},
      {"id": "d4c6547ee14a4255a568aa66d7335561", "name": "Emily Davis", "role": "patient"}
    ],
    "initial_message_body": "Consultation scheduled.",
    "initial_message_author": {
      "id": "a701c3a94bf34a939d831d6183a78734",
      "name": "Dr. Brown",
      "role": "clinician"
    }
  }' \
  -H 'x-api-key: YOUR_API_KEY' \
  localhost:50051 vpr.v1.VPR/CreateThread

Message Types

Key message types defined in the protocol:

`Author Registration`

message AuthorRegistration {
  string authority = 1;  // e.g., "GMC", "NMC"
  string number = 2;     // Registration number
}

`Message Author`

message MessageAuthor {
  string id = 1;    // UUID
  string name = 2;  // Display name
  string role = 3;  // clinician, patient, system, etc.
}

Lifecycle States

Coordination lifecycle states:

active - Operational and accepting updates
suspended - Temporarily inactive
closed - Permanently closed

Thread statuses:

open - Active communication
closed - Concluded communication
archived - Historical record

Sensitivity levels:

standard - Normal clinical communication
confidential - Elevated privacy
restricted - Highest privacy level

Server Configuration

The gRPC server runs on port 50051 by default. Configuration via environment variables:

VPR_ADDR - Server bind address (default: 0.0.0.0:50051)
API_KEY - Required API key for authentication
VPR_ENABLE_REFLECTION - Enable gRPC reflection (default: false)
RUST_LOG - Logging configuration

Implementation

The gRPC service is implemented in crates/api-grpc/src/service.rs.

Key characteristics:

Authentication interceptor - Validates API key on all requests
Author construction - Builds Author objects from proto fields
Error handling - Maps Rust errors to gRPC status codes
File handling - Writes attachments to temp directory, uses FilesService, cleans up
Type conversions - Converts string enums to Rust enums (AuthorRole, ThreadStatus, etc.)

Error Handling

gRPC status codes used:

OK - Success
UNAUTHENTICATED - Invalid or missing API key
INVALID_ARGUMENT - Invalid input parameters
NOT_FOUND - Resource not found
INTERNAL - Server error

Error messages include descriptive details for debugging.

REST API

The VPR REST API provides HTTP/JSON access to patient record operations with OpenAPI documentation.

Overview

The REST API is built using:

axum 0.7 - Rust web framework
utoipa 4.x - OpenAPI specification and Swagger UI generation
JSON - Request and response format

Base URL

http://localhost:3000

Interactive Documentation

Swagger UI is available at:

http://localhost:3000/swagger-ui/

This provides interactive API documentation where you can test endpoints directly.

Authentication

Currently, the REST API does not require authentication (unlike the gRPC API). This is subject to change in future versions.

Available Endpoints

Health Check

GET /health - Returns service health status

Patient Management

POST /patients/full - Creates complete patient record (demographics, clinical, coordination)

Demographics

POST /demographics - Initialises new demographics repository
PUT /demographics/:id - Updates patient demographics

Clinical

POST /clinical - Initialises new clinical repository
POST /clinical/:id/link - Links clinical repository to demographics
POST /clinical/:id/letters - Creates new letter
GET /clinical/:id/letters/:letter_id - Retrieves letter content

Coordination

POST /coordination - Initialises new coordination repository

Example Usage with curl

Create Full Patient Record

curl -X POST http://localhost:3000/patients/full \
  -H 'Content-Type: application/json' \
  -d '{
    "given_names": ["Emily"],
    "last_name": "Davis",
    "birth_date": "1985-03-20",
    "author": {
      "name": "Dr. Robert Brown",
      "email": "robert.brown@example.com",
      "role": "Clinician",
      "registrations": [{"authority": "GMC", "number": "5555555"}],
      "care_location": "City General Hospital"
    }
  }'

Response:

{
  "demographics_uuid": "d4c6547ee14a4255a568aa66d7335561",
  "clinical_uuid": "a701c3a94bf34a939d831d6183a78734",
  "coordination_uuid": "da7e89a2a51647db89430dc3a781abb0"
}

Initialise Demographics

curl -X POST http://localhost:3000/demographics \
  -H 'Content-Type: application/json' \
  -d '{
    "author": {
      "name": "Dr. Jane Smith",
      "email": "jane.smith@example.com",
      "role": "Clinician",
      "registrations": [{"authority": "GMC", "number": "1234567"}],
      "care_location": "St. Mary'\''s Hospital"
    }
  }'

Update Demographics

curl -X PUT http://localhost:3000/demographics/d4c6547ee14a4255a568aa66d7335561 \
  -H 'Content-Type: application/json' \
  -d '{
    "given_names": ["Emily", "Rose"],
    "last_name": "Davis",
    "birth_date": "1985-03-20"
  }'

Initialise Clinical Repository

curl -X POST http://localhost:3000/clinical \
  -H 'Content-Type: application/json' \
  -d '{
    "author": {
      "name": "Dr. Robert Brown",
      "email": "robert.brown@example.com",
      "role": "Clinician",
      "care_location": "City Hospital"
    }
  }'

Link Clinical to Demographics

curl -X POST http://localhost:3000/clinical/a701c3a94bf34a939d831d6183a78734/link \
  -H 'Content-Type: application/json' \
  -d '{
    "demographics_uuid": "d4c6547ee14a4255a568aa66d7335561",
    "author": {
      "name": "Dr. Brown",
      "email": "brown@example.com",
      "role": "Clinician",
      "care_location": "City Hospital"
    },
    "namespace": "example.org"
  }'

Create Letter

curl -X POST http://localhost:3000/clinical/a701c3a94bf34a939d831d6183a78734/letters \
  -H 'Content-Type: application/json' \
  -d '{
    "content": "# Consultation Note\n\nPatient presented with hypertension.",
    "author": {
      "name": "Dr. Sarah Johnson",
      "email": "sarah.johnson@example.com",
      "role": "Clinician",
      "registrations": [{"authority": "GMC", "number": "7654321"}],
      "care_location": "GP Clinic"
    }
  }'

Response:

{
  "timestamp_id": "20260125T125621.563Z-8d263432-d614-4d51-8611-22d365b6afa7"
}

Read Letter

curl http://localhost:3000/clinical/a701c3a94bf34a939d831d6183a78734/letters/20260125T125621.563Z-8d263432-d614-4d51-8611-22d365b6afa7

Response:

{
  "body_content": "# Consultation Note\n\nPatient presented with hypertension.",
  "rm_version": "1.0.4",
  "composer_name": "Dr. Sarah Johnson",
  "composer_role": "Clinician",
  "start_time": "2026-01-25T12:56:21.563Z",
  "clinical_lists": [...]
}

Initialise Coordination

curl -X POST http://localhost:3000/coordination \
  -H 'Content-Type: application/json' \
  -d '{
    "clinical_uuid": "a701c3a94bf34a939d831d6183a78734",
    "author": {
      "name": "Dr. Brown",
      "email": "brown@example.com",
      "role": "Clinician",
      "care_location": "City Hospital"
    }
  }'

Request/Response Formats

Author Object

All mutation endpoints accept an author object:

{
  "author": {
    "name": "Dr. John Smith",
    "email": "john.smith@example.com",
    "role": "Clinician",
    "registrations": [
      {
        "authority": "GMC",
        "number": "1234567"
      }
    ],
    "care_location": "City General Hospital",
    "signature": "optional-pem-encoded-signature"
  }
}

Error Responses

Errors return appropriate HTTP status codes with JSON error details:

{
  "error": "Error message",
  "details": "Additional context"
}

Common status codes:

200 OK - Success
201 Created - Resource created
400 Bad Request - Invalid input
404 Not Found - Resource not found
500 Internal Server Error - Server error

OpenAPI Specification

The OpenAPI specification is automatically generated from code annotations and available at:

http://localhost:3000/api-doc/openapi.json

Server Configuration

The REST server runs on port 3000 by default. Configuration via environment variables:

VPR_REST_ADDR - Server bind address (default: 0.0.0.0:3000)
RUST_LOG - Logging configuration

Implementation

The REST API is implemented in crates/api-rest/src/main.rs.

Key characteristics:

Path parameter extraction - Uses axum Path extractor for UUIDs
JSON payloads - Uses axum Json extractor for request bodies
Author construction - Helper function builds Author from JSON
Error handling - Maps errors to HTTP status codes
OpenAPI annotations - Each handler annotated with #[utoipa::path]

Comparison with gRPC API

Feature	REST API	gRPC API
Protocol	HTTP/JSON	HTTP/2 + Protocol Buffers
Performance	Good	Excellent
Authentication	None (currently)	API key required
Type Safety	Runtime validation	Compile-time
Documentation	OpenAPI/Swagger	Protocol Buffer IDL
Binary Data	Base64 encoding	Native bytes
Streaming	Not supported	Supported

Future Enhancements

Planned additions:

Authentication and authorization
Additional endpoints for messaging operations
File upload support for letter attachments
Pagination for list operations
Filtering and search capabilities

OpenEHR from database to file structure

Based on openEHR specifications, VPR organises clinical data into a structured file system that mirrors the openEHR Reference Model (RM) where practical.

EHR Status file

OpenEHR EHR Status file

Below is an example of an EHR Status file in YAML format. The actual implementation may or may not include the other_details section depending on use case:

rm_version: rm_1_1_0

ehr_id:
  value: 1166765a-406a-4552-ac9b-8e141931a3dc

archetype_node_id: openEHR-EHR-STATUS.ehr_status.v1

name:
  value: EHR Status

subject:
  external_ref:
    id:
      value: 2db695ed-7cc0-4fc9-9b08-e0c738069b71
    namespace: vpr://mpi
    type: PERSON

is_queryable: true
is_modifiable: true

Note: The other_details field is optional and only included when additional metadata is needed for a specific use case.

Communications

VPR Letters – Design and Rationale

Purpose

The VPR letters system provides a clinical, auditable, interoperable record of formal written correspondence related to patient care.

It is designed to:

support cross-site and cross-system communication,
remain human-readable without specialist software,
withstand audit, legal, and regulatory review.

This document intentionally avoids imposing stylistic rules on how letters are written. Clinical correspondence varies widely by specialty, country, organisation, and individual clinician. VPR preserves this freedom while enabling safe reuse of selected clinical context.

Letters are version-controlled via git

Letters can be edited after creation, with all changes tracked through git version control. OpenEHR does not specify that letters must be closed to further edits.

This means:

Every edit creates a new git commit,
The full history of changes is preserved and auditable,
Previous versions can be retrieved at any time,
No data is ever lost or overwritten.

This provides both flexibility and a complete audit trail for clinical governance and patient safety.

File layout

Each letter is stored as a self-contained folder:

correspondence/
    letter/
        <letter-id>/
            composition.yaml
            body.md
            attachments/
                letter.pdf

This structure ensures that all artefacts related to a single letter are co-located, versioned, and auditable.

Letter identity

The <letter-id> is generated in the format:

YYYYMMDDTHHMMSS.sssZ-UUID

YYYYMMDDTHHMMSS.sssZ – timestamp with millisecond precision
UUID – random UUID v4 in RFC 4122 format (lowercase with hyphens)

Example:

20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000

This ensures letters are:

globally unique,
chronologically sortable within a patient record,
safe for distributed, batch-based, and concurrent systems.

Timestamps provide chronology, not global ordering guarantees.

`composition.yaml` – OpenEHR composition

The composition.yaml file contains the OpenEHR-aligned COMPOSITION envelope for the letter.

It captures:

identity
authorship
time context
semantic intent
structured, reusable clinical snapshots (optional)

Example `composition.yaml`

rm_version: "rm_1_1_0"
uid: "20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000"

archetype_node_id: "openEHR-EHR-COMPOSITION.correspondence.v1"

name:
  value: "Clinical letter"

category:
  value: "event"

composer:
  name: "Dr Jane Smith"
  role: "Clinical Practitioner"

context:
  start_time: "2026-01-12T10:14:00Z"

content:
  - section:
      archetype_node_id: "openEHR-EHR-SECTION.correspondence.v1"
      name:
        value: "Correspondence"

      items:
        # Canonical narrative letter
        - evaluation:
            archetype_node_id: "openEHR-EHR-EVALUATION.clinical_correspondence.v1"
            name:
              value: "Clinical correspondence"

            data:
              narrative:
                type: "external_text"
                path: "./body.md"

        # Optional reusable clinical lists (snapshots)
        - evaluation:
            archetype_node_id: "openEHR-EHR-EVALUATION.snapshot.v1"
            name:
              value: "Diagnoses (snapshot)"

            data:
              kind:
                value: "diagnoses"

              items:
                - text: "Hypertension"
                  code:
                    terminology: "SNOMED-CT"
                    value: "38341003"

                - text: "Hyperlipidaemia"

                - text: "Chronic obstructive pulmonary disease"
                  code:
                    terminology: "SNOMED-CT"
                    value: "13645005"

        - evaluation:
            archetype_node_id: "openEHR-EHR-EVALUATION.snapshot.v1"
            name:
              value: "Medication summary (snapshot)"

            data:
              kind:
                value: "medications"

              items:
                - text: "Amlodipine 10 mg once daily"

                - text: "Atorvastatin 20 mg nocte"

Notes

openEHR-EHR-EVALUATION.snapshot.v1 is a custom archetype, not a core OpenEHR entity.
This is intentional and aligned with OpenEHR practice.
Snapshots are letter-scoped, time-bound clinical summaries, not canonical state. We call these ClinicalLists.

Snapshot EVALUATION – design intent

The snapshot.v1 archetype is intentionally minimal and generic.

Its purpose is to support selective reuse of clinically relevant context without enforcing letter style or duplicating persistent records.

Snapshot properties

Each snapshot EVALUATION:

represents one kind of reusable clinical context,
is explicitly scoped to this letter only,
may be copied forward by user choice,
makes no claim of completeness or authority.

Minimal conceptual model

A snapshot contains:

kind – a semantic label identifying what this snapshot represents
(for example: diagnoses, medications, social_history, functional_status)
items – zero or more entries
optional narrative text (when structure is insufficient)

The set of possible kind values is open-ended. VPR does not enforce an enum.

Unknown kinds are valid and must degrade gracefully.

Coded and uncoded items

Snapshot items may be:

coded,
uncoded, or
mixed within the same snapshot.

Coding is optional and must never be required.

Example

items:
  - text: "Hypertension"
    code:
      terminology: "SNOMED-CT"
      value: "38341003"

  - text: "Lives alone, independent"

This supports real-world clinical practice where:

some concepts are well-coded,
others are contextual or narrative,
and forcing codes would lose meaning.

Relationship to persistent clinical lists

Snapshots are not persistent lists.

They answer a different question:

Persistent list: “What do we currently believe is true?”
Snapshot: “What did the author consider relevant for this letter at that time?”

Snapshots:

may omit persistent items,
may include provisional information,
may differ between letters,
must never automatically update canonical state.

Reconciliation occurs only through explicit clinical action and new COMPOSITIONs.

`body.md` – Canonical clinical letter

Purpose

body.md contains the canonical narrative letter.

It records:

clinical prose only,
written for human readers,
editable after creation with full git version history.

It must not contain workflow, delivery, or coordination semantics.

Example `body.md`

Dear Dr Patel,

Thank you for seeing Mrs Jane Jones (DOB 12/04/1968) in the respiratory clinic today.

She reports an improvement in breathlessness since her last review. She confirms that she is currently taking amlodipine 10 mg once daily, rather than the previously documented dose of 5 mg.

We reviewed her medication list together. Atorvastatin was started during her recent admission. The intended dose is 20 mg nocte.

There are no new red flag symptoms. Examination today was unremarkable.

Plan:

- Continue amlodipine 10 mg once daily
- Continue atorvastatin 20 mg nocte
- Routine follow-up in six months

Kind regards,

Dr Jane Smith  
Consultant Respiratory Physician  
Example NHS Trust

Properties

Editable after issue, with full git version history
Human-readable Markdown
Git-versioned with complete audit trail
Suitable for audit, legal review, and patient access

Large binary artefacts

Large binary artefacts (for example PDFs with embedded images or scans) are stored using Git Large File Storage (Git LFS).

This means:

a small pointer file is stored in the Git repository,
binary content is stored in an external object store,
pointers are versioned, immutable, and content-addressed.

From a clinical and audit perspective, these artefacts are first-class parts of the letter record.

Explicit non-features

The following are deliberately excluded from the letter model:

read or opened status
acknowledgements
urgency markers
task or workflow state

Letters represent clinical documentation, not behaviour or process.

VPR prioritises clarity, honesty, and auditability over convenience.

File layout

Each letter is stored as a self-contained folder:

correspondence/
    letter/
        <letter-id>/
            composition.yaml
            body.md
            attachments/
                letter.pdf

This structure ensures that all artefacts related to a single letter are co-located and auditable.

Letter identity

The <letter-id> is generated in the format YYYYMMDDTHHMMSS.sssZ-UUID:

YYYYMMDDTHHMMSS.sssZ – ISO 8601 timestamp with millisecond precision
UUID – Randomly generated UUID v4 in RFC 4122 format (lowercase with hyphens)
Example:
20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000

This ensures letters are:

globally unique,
chronologically sortable,
safe for distributed and concurrent systems.

`composition.yaml` – OpenEHR composition

The composition.yaml file contains the OpenEHR composition representing the letter’s metadata and structure, as below:

rm_version: "1.0.4" # updatable via api
uid: "20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000" # updatable via api
archetype_node_id: "openEHR-EHR-COMPOSITION.correspondence.v1"
name:
  value: "Clinical letter"
category:
  value: "event"
composer:
  name: "Dr Jane Smith" # updatable via api
  role: "Consultant Physician" # updatable via api
context:
  start_time: "2026-01-12T10:14:00Z" # updatable via api
content:
  - section:
      archetype_node_id: "openEHR-EHR-SECTION.correspondence.v1"
      name:
        value: "Correspondence"
      items:
        - evaluation:
            archetype_node_id: "openEHR-EHR-EVALUATION.clinical_correspondence.v1"
            name:
              value: "Clinical correspondence"

            data:
              narrative:
                type: "external_text"
                path: "./body.md"
        - evaluation:
            archetype_node_id: "openEHR-EHR-EVALUATION.problem_summary.v1"
            name:
              value: "Diagnoses at time of correspondence"

            data:
              diagnoses:
                - name: "Hypertension"
                - name: "Hyperlipidaemia"
                - name: "Chronic obstructive pulmonary disease"

NB: # updatable via api is placed to indicate fields that may be modified by the OpenEHR API.

`body.md` – Canonical clinical letter

Purpose

body.md is the canonical clinical representation of the letter. It records:

the full letter content only

An example of body.md might look like:

Dear Dr Patel,

Thank you for seeing Mrs Jane Jones (DOB 12/04/1968) in the respiratory clinic today.

She reports an improvement in breathlessness since her last review. She confirms that she is currently taking amlodipine 10 mg once daily, rather than the previously documented dose of 5 mg.

We reviewed her medication list together. Atorvastatin was started during her recent admission. The intended dose is 20 mg nocte.

There are no new red flag symptoms. Examination today was unremarkable.

Plan:

- Continue amlodipine 10 mg once daily
- Continue atorvastatin 20 mg nocte
- Routine follow-up in six months

Kind regards,

Dr Jane Smith  
Consultant Respiratory Physician  
Example NHS Trust

Properties

Editable after issue, with full git version history
Human-readable Markdown with front matter metadata
Git-versioned with complete audit trail
Suitable for audit, legal review, and patient access

Required structure (conceptual)

A letter SHOULD clearly contain:

header information (author, organisation, date),
recipient(s),
subject or reason for correspondence,
clinical narrative,
actions or recommendations (if any),
signature block.

The exact formatting is intentionally flexible to accommodate different clinical contexts.

Letter identity (internal)

Every letter MUST include a globally unique letter_id (RFC 4122 UUID with hyphens), recorded within the document.

Letter IDs exist to:

unambiguously reference letters,
allow later letters to reference earlier correspondence,
support indexing and cross-system linkage.

Timestamps provide chronology, not identity.

Corrections and follow-up

Errors or clarifications may be handled either by:

Editing the existing letter (with git tracking all changes), or
Issuing a new letter that references the prior one via references: <letter_id>.

Both approaches are valid. Git version control preserves an honest and legally defensible historical record of all changes.

Explicit non-features

letter.md does NOT record:

read or opened status,
acknowledgement,
urgency markers,
task or workflow state.

Letters represent communication, not behaviour.

`comments.md`

See Comments section for details.

Large binary artefacts

Large binary artefacts (for example PDFs with embedded images or scans) are stored using c.

In practice this means:

a small pointer file is stored in the Git repository,
the binary content is stored in a separate object store,
the pointer is versioned, immutable, and content-addressed.

From a clinical and audit perspective, these artefacts are first-class parts of the letter record.

Design decisions explicitly rejected

The following were deliberately excluded:

read receipts or confirmations,
urgency flags,
task or workflow semantics.

These features introduce legal ambiguity and false certainty.

VPR letters prioritise clarity, honesty, and auditability over convenience.

Demographics Repository

1. Purpose

The Demographics Repository is responsible for storing and managing patient identity and demographic information within VPR.

Its primary purpose is to provide a clear, authoritative, and interoperable representation of who the patient is, distinct from:

what care they have received,
what clinical observations have been recorded,
and how care is coordinated.

Demographic data is foundational. Errors in demographics propagate risk across all other systems. For this reason, the Demographics Repository is deliberately separated from clinical and care coordination data.

2. Scope

The Demographics Repository contains identity and demographic information only. This includes, but is not limited to:

Names and aliases
Date of birth
Sex and gender-related attributes
Addresses and contact details
Identifiers (NHS number, local identifiers)
Deceased status
Links to related persons where appropriate

It does not contain:

clinical observations,
diagnoses,
procedures,
correspondence content,
care plans or workflows.

3. Use of FHIR

VPR uses FHIR (Fast Healthcare Interoperability Resources) as the canonical model for demographic data.

FHIR is used because it:

is widely adopted across healthcare systems,
has a clear and extensible Patient model,
supports interoperability with existing NHS and international systems,
cleanly separates identity from clinical content.

FHIR resources are stored and handled in a way that preserves their structure and semantics.

4. Primary FHIR Resource

4.1 Patient Resource

The core resource used in the Demographics Repository is the FHIR Patient resource.

The Patient resource represents:

a single individual receiving or potentially receiving care,
with zero or more identifiers,
and zero or more contact and demographic attributes.

Only attributes relevant to identity and demographics are populated.

5. Separation from Clinical Repositories

The Demographics Repository is intentionally separate from the Clinical Repository.

Key reasons for this separation include:

Demographic data changes more frequently and independently.
Identity errors require different correction and governance processes.
Many systems need demographic access without clinical access.
Clinical data must not be invalidated by demographic corrections.

Clinical records reference patients by identifier rather than embedding demographic fields.

6. Corrections and Redactions

Demographic errors can have serious consequences.

When demographic information is determined to be incorrect or misattributed:

Corrections are made by updating or superseding the relevant FHIR resource.
Redacted demographic artefacts are moved to the Redaction Retention Repository (RRR).
A reference remains to indicate that a correction has occurred.

Demographic information is never silently deleted.

7. Versioning and Change History

Demographic changes are expected and supported.

The Demographics Repository maintains:

a full history of changes,
attribution of who made each change,
timestamps and reason codes where available.

This supports traceability, auditability, and patient safety.

8. Access and Authorisation

Access to demographic data is role-based and purpose-limited.

Different roles may have:

read-only access,
update access,
linkage access for cross-system identity resolution.

Demographic access does not imply access to clinical content.

9. Relationship to Other VPR Components

Clinical Repository: references patients by identifier only.
Care Coordination Repository: links to patient identity without duplicating demographics.
Redaction Retention Repository: stores superseded or misattributed demographic artefacts.
External systems: demographic data may be exchanged using FHIR interfaces.

10. Design Principles

Identity before care
Correction without erasure
Interoperability by default
Clear separation of concerns
Auditability without friction

11. Summary

The Demographics Repository provides a stable, interoperable, and auditable foundation for patient identity within VPR.

By using FHIR and maintaining strict separation from clinical and care coordination data, VPR ensures that identity errors can be corrected safely without compromising the integrity of the clinical record.

Care Coordination Repository

Overview

The Care Coordination Repository manages coordination state separate from clinical records.

It handles workflow coordination, cross-system state, and operational metadata that supports clinical care delivery without containing clinical content itself.

Repository Structure

The coordination repository follows the same sharded structure as clinical records:

patient_data/
  coordination/
    <s1>/
      <s2>/
        <uuid>/
          .git/
          COORDINATION_STATUS.yaml
          communications/
            <thread-id>/
              messages.md
              ledger.yaml
          encounters/
            ...
          appointments/
            ...

Root Status File

COORDINATION_STATUS.yaml

Each coordination repository includes a root status file that links it to the associated clinical record:

coordination_id: "7f4c2e9d-4b0a-4f3a-9a2c-0e9a6b5d1c88"
clinical_id: "a4f91c6d-3b2e-4c5f-9d7a-1e8b6c0a9f12"
status:
  lifecycle_state: active # active | suspended | closed
  record_open: true
  record_queryable: true
  record_modifiable: true

Purpose:

Links coordination record to clinical record via clinical_id
Tracks lifecycle state of the coordination repository
Controls operational permissions (queryable, modifiable)
Created during coordination repository initialization

Lifecycle states:

active: Coordination record is operational and accepting updates
suspended: Temporarily inactive (e.g., during data migration)
closed: Permanently closed (e.g., patient deceased, record archived)

Properties:

Mutable, overwriteable
Git-versioned for audit trail
Uses FHIR-aligned wire format for interoperability
Validated against strict schema with UUID checks

Key Components

Messaging Coordination

Manages clinical communication threads between clinicians, patients, and authorized participants.

See Messaging Design for detailed specifications.

Encounter Management

Tracks patient encounters and episodes of care:

Episode linkage and status
Care team coordination
Encounter documentation coordination

Appointment Coordination

Manages appointment scheduling and coordination:

Cross-system availability
Resource allocation
Cancellation and rescheduling coordination

Design Principles

Separation of Concerns

Coordination data is strictly separated from clinical content:

Clinical records (EHR): What happened, what was said, what was observed
Coordination state: Who needs to know, what needs to be done, system state

Soft State

Coordination data is reconstructible and non-critical:

Can be rebuilt from clinical records if lost
Stale data causes inconvenience, not clinical harm
Optimized for availability over consistency

Cross-System Coordination

Enables seamless care delivery across multiple systems:

Shared state for care teams
Consistent patient experience
Reduced administrative overhead

Integration with VPR Components

Relationship to Clinical Repository

Explicitly linked: Each coordination record has a clinical_id in COORDINATION_STATUS.yaml
Initialization dependency: Coordination records require an existing clinical record UUID
References not duplication: Does not duplicate clinical content
Separation of concerns: Clinical facts vs. coordination state
Enables coordination without coupling: Systems can coordinate without accessing clinical details

Relationship to Demographics

Links coordination activities to patient identity
Supports care team management
Enables patient portal integration

API Integration

REST and gRPC APIs provide coordination services
Separate from clinical record APIs
Optimized for coordination workflows

Lifecycle and Retention

Coordination data follows different retention policies than clinical records:

Short-term retention: Active coordination state (weeks/months)
Medium-term retention: Historical coordination for audit (years)
Long-term retention: Minimal essential coordination metadata

Retention policies balance operational needs with privacy and storage costs.

Future Extensions

The coordination repository provides foundation for:

Advanced workflow management: Task assignment, delegation tracking
Multi-organisation coordination: Cross-provider care coordination
Patient engagement: Portal integration, preference management
Quality improvement: Workflow analytics, performance metrics

References

VPR Architecture Overview
Clinical Repository Design
Messaging Design
FHIR Integration
API Specifications

Care Coordination Repository (CCR) Messaging – Design and Rationale

Purpose

The CCR messaging system provides a clinical, auditable, interoperable record of asynchronous communication between clinicians, patients, and other authorised participants.

It is designed to:

support cross-site and cross-system care coordination,
remain human-readable without specialist software,
withstand audit, legal, and regulatory review,
avoid asserting certainty about human behaviour that the system cannot honestly know.

Messaging in the Care Coordination Repository is treated as clinical communication, not as a transient chat feature.

Conceptually, CCR messaging is FHIR-aligned, using the semantics of the FHIR Communication resource as a guiding model, without adopting FHIR storage formats or server behaviour.

Conceptual model (FHIR-aligned)

Each CCR message corresponds conceptually to a FHIR Communication:

it represents something that has already been communicated,
it has an author, recipients, a timestamp, content, and a status,
it is a clinical artefact with medico-legal weight.

CCR does not implement FHIR JSON, REST endpoints, or transport semantics. Instead, it preserves FHIR meaning while using a versioned, repository-based storage model aligned with the Versioned Patient Repository.

This guarantees that CCR messaging can be projected to FHIR Communication in future integrations, without constraining internal design.

Core principles

1. Messaging is clinical

Messages exchanged between clinicians, patients, and other healthcare participants carry clinical and medico-legal weight equivalent to:

written advice,
clinic letters,
documented telephone or video consultations.

As such, CCR messages form part of the clinical coordination record.

2. Messages are immutable

Once recorded, messages:

MUST NOT be edited,
MUST NOT be deleted.

This mirrors paper records, professional guidance, and legal expectations.

Errors or clarifications are handled via corrections (addenda), never by modifying the original message.

3. Context matters more than individual messages

Individual messages often do not make sense in isolation.

For example:

“Yes, I will do that doctor”

only has meaning when read alongside preceding and subsequent messages.

For this reason, the conversation thread is the meaningful clinical unit, not the individual message.

This aligns with FHIR Communication, which is frequently contextualised by related communications, encounters, or care plans.

Repository placement

Messaging is a first-class concern of the Care Coordination Repository (CCR).

It sits alongside other coordination artefacts (for example, tasks or referrals added later), and is explicitly separated from:

clinical facts (clinical repository),
demographics and identity (demographics repository).

File layout

Each messaging thread is stored as:

coordination/
    <shard1>/
        <shard2>/
            <coordination-id>/
                COORDINATION_STATUS.yaml
                communications/
                    <communication-id>/
                        messages.md → thread.md
                        ledger.yaml

The coordination repository is sharded by UUID for scalability, similar to clinical records.

Conceptually:

A communication is a thread and a ledger file
A thread is a list of messages stored in thread.md
The ledger contains metadata such as participants, status, policies, and visibility settings

Where:

<communication-id> is a timestamp-prefixed UUID (e.g., 20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000)
thread.md contains the canonical clinical conversation (list of messages)
ledger.yaml contains thread metadata (participants, status, policies, visibility)

Communication identity

The <communication-id> is generated using a timestamp-prefixed identifier:

format: YYYYMMDDTHHMMSS.sssZ-UUID
timestamp: UTC, ISO 8601, millisecond precision
UUID: randomly generated

Example:

20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000

This ensures communication identifiers are:

globally unique,
chronologically sortable,
suitable for distributed systems.

The existing TimestampId struct is used to generate and validate these identifiers.

`thread.md` – Thread of messages

Purpose

thread.md is the canonical clinical record of the conversation thread.

It records:

what was communicated,
by whom,
when,
and in what coordination context.

Conceptually, each entry corresponds to a FHIR Communication instance.

Properties

Append-only
Immutable once written
Human-readable
Git-versioned
Suitable for audit and legal review

Message identity

Every message MUST include a globally unique message_id (UUID).

Message identifiers exist to:

unambiguously identify messages,
allow corrections to reference prior messages,
support projections, caches, and alert suppression.

Timestamps are used for ordering, not identity.

Message types

messages.md may contain:

clinician messages
patient messages
system messages
correction messages

System messages (for example, “participant added to thread”) are first-class entries, as they provide clinically and legally relevant coordination context.

Corrections (addenda)

Errors or clarifications are recorded as new messages, not edits.

A correction message:

is a new message,
has its own message_id,
references the original message via corrects: <message_id>.

The original message is never modified.

This preserves a truthful, auditable historical record.

Explicit non-features

thread.md does NOT record:

read or seen status,
urgency flags,
acknowledgement or acceptance,
task completion or responsibility transfer.

These concepts imply human cognition or behaviour that the system cannot verify and therefore does not assert.

Example structure

# Messages

## Message

**ID:** `3f7a8d2c-1e9b-4a6d-9f2e-5c8b7a4d1f92`  
**Type:** clinician  
**Timestamp:** 2026-01-11T14:36:15.234Z  
**Author ID:** `4f8c2a1d-9e3b-4a7c-8f1e-6b0d2c5a9f12`  
**Author:** Dr Jane Smith

Patient has reported increasing shortness of breath.
Please review chest X-ray and advise on next steps.

---

## Message

**ID:** `8b2f6a5c-3d1e-4a9b-8c7f-6d5e4a3b2c1d`  
**Type:** clinician  
**Timestamp:** 2026-01-11T15:42:30.567Z  
**Author ID:** `a1d3c5e7-f9b2-4680-b2d4-f6e8c0a9d1e3`  
**Author:** Dr Tom Patel

Reviewed X-ray. No acute changes. Continue current management
and reassess in 48 hours. If symptoms worsen, arrange urgent review.

`ledger.yaml` – Thread context and policy

Purpose

ledger.yaml stores contextual and policy metadata, not clinical narrative.

It answers:

“Who is involved in this conversation, and under what rules?”

Typical contents

participants and roles
visibility and sensitivity flags
thread status (open, closed, archived)
organisational access rules

communication_id: 20260111T143522.045Z-550e8400-e29b-41d4-a716-446655440000

status: open
created_at: 2026-01-11T14:35:22.045Z
last_updated_at: 2026-01-11T15:10:04.912Z

participants:
  - participant_id: 4f8c2a1d-9e3b-4a7c-8f1e-6b0d2c5a9f12
    role: clinician
    display_name: Dr Jane Smith

  - participant_id: a1d3c5e7-f9b2-4680-b2d4-f6e8c0a9d1e3
    role: clinician
    display_name: Dr Tom Patel

  - participant_id: 9b7c6d5e-4f3a-2b1c-0e8d-7f6a5b4c3d2e
    role: patient
    display_name: John Doe

visibility:
  sensitivity: standard
  restricted: false

policies:
  allow_patient_participation: true
  allow_external_organisations: true

Properties

Mutable
Overwriteable
Git-audited
Changes are deliberate and relatively infrequent
last_updated_at is automatically updated when messages are added

Thread-level metadata:

Thread status: open, closed, or archived
Participant list with roles (organisation field removed for simplicity)
Visibility and sensitivity settings
Participation policies (external organisations allowed by default)

Audit trail: Inherent in Git commit history and messages.md content - no separate audit section needed.

Explicit exclusions

ledger.yaml does NOT contain:

message content,
interaction or navigation state,
user interface hints.

Git Versioning

All changes to coordination records are Git-versioned for audit purposes:

coordination:create: Created messaging thread

Care-Location: Oxford University Hospitals

coordination:update: Added message to thread

Care-Location: Oxford University Hospitals

coordination:update: Updated thread participant list

Care-Location: Oxford University Hospitals

Commits include:

Structured commit messages with domain and action
Care location metadata
Optional cryptographic signatures
Full audit trail of all changes

Alerting behaviour

CCR does not record:

Read receipts or “seen” status
Acknowledgements
Urgency flags
Task completion or responsibility transfer

These concepts imply human cognition or behaviour that the system cannot verify.

Consuming systems may implement alerting by:

Tracking their own render/presentation state externally (not in VPR)
Comparing message timestamps to their last-viewed records
Presenting unread indicators in their user interface

This approach:

Avoids false certainty about human understanding
Reduces legal and clinical ambiguity
Maintains truthful audit trails
Enables consistent patient experience across systems

Alerting is a user-experience concern, not a clinical record.

Thread Lifecycle

Messaging threads follow a defined lifecycle:

Creation

Threads are created via CoordinationService::communication_create():

Generates timestamp-prefixed communication ID
Creates communications/<communication-id>/ directory
Writes initial thread.md (optionally with first message)
Writes ledger.yaml with participant list and policies
Commits atomically to Git

Message Addition

Messages are added via CoordinationService::add_message():

Generates unique message UUID
Appends to thread.md (preserves immutability)
Commits with structured message and care location
Returns the message ID for reference

Metadata Updates

Thread metadata is updated via CoordinationService::update_communication_ledger():

Modifies ledger.yaml (participants, status, policies)
Git commit records the change
Audit log tracks all modifications

Status Transitions

Threads can transition between states:

Open → Closed: Thread completed, no new messages accepted
Closed → Archived: Thread moved to archive, hidden from default views
Open → Archived: Direct archival without closing

Deletion

Threads are never deleted:

Immutability is preserved
Audit trail remains complete
Archival is used instead of deletion
Git history retains full record

Implementation Details

Initialization

Coordination repositories are initialized with:

#![allow(unused)]
fn main() {
CoordinationService::new(cfg)
    .initialise(author, care_location, clinical_id)
}

This creates:

Sharded directory structure: coordination/<s1>/<s2>/<uuid>/
COORDINATION_STATUS.yaml with link to clinical record
Git repository with initial commit
Lifecycle state set to active

Thread Creation

Messaging threads are created with:

#![allow(unused)]
fn main() {
service.communication_create(
    &author,
    care_location,
    participants,
    initial_message
)
}

This:

Generates timestamp-prefixed communication ID via TimestampIdGenerator
Creates communications/<communication-id>/ directory
Writes thread.md with optional initial message
Writes ledger.yaml with participant list and policies
Commits both files atomically to Git

Adding Messages

Messages are appended with:

#![allow(unused)]
fn main() {
service.add_message(
    &author,
    care_location,
    thread_id,
    message_content
)
}

This:

Generates unique message UUID
Appends to thread.md (preserves immutability)
Commits with structured message and care location
Returns the message ID

Type Safety

The CoordinationService uses type-state pattern:

CoordinationService<Uninitialised> - Can only call initialise()
CoordinationService<Initialised> - Can call thread and message operations

This prevents operations on non-existent repositories at compile time.

Error Handling

Operations return PatientResult<T> with comprehensive error types:

Author validation errors
Git operation failures
File I/O errors
FHIR wire format validation errors
UUID parsing errors

Cleanup is attempted on initialization failure to prevent partial repositories.

Design decisions explicitly rejected

The following were deliberately excluded:

read receipts (opening does not equal reading or understanding)
urgency flags (asynchronous messaging is not suitable for urgent care)
acknowledgement tracking (implies responsibility transfer)
workflow or task semantics (these may be added later using FHIR-aligned Task concepts)

These exclusions reduce legal ambiguity, false certainty, and unintended clinical inference.

References

FHIR Integration

Overview

The coordination repository uses FHIR-aligned wire formats for interoperability without implementing FHIR JSON, REST endpoints, or transport semantics.

This approach:

Preserves FHIR semantic meaning
Uses repository-based storage model
Enables future FHIR projections
Maintains human-readable formats
Provides strict schema validation

Coordination Status

Overview

The fhir::CoordinationStatus module handles parsing and rendering of COORDINATION_STATUS.yaml files.

API

#![allow(unused)]
fn main() {
// Parse from YAML
let status_data = fhir::CoordinationStatus::parse(yaml_text)?;

// Render to YAML
let yaml_text = fhir::CoordinationStatus::render(&status_data)?;
}

Domain Types

CoordinationStatusData - Top-level status structure
- coordination_id: Uuid - Coordination repository identifier
- clinical_id: Uuid - Linked clinical record identifier
- status: StatusInfo - Status information
StatusInfo - Status details
- lifecycle_state: LifecycleState - Current lifecycle state
- record_open: bool - Whether accepting new entries
- record_queryable: bool - Whether queries are permitted
- record_modifiable: bool - Whether modifications are permitted
LifecycleState - Enumeration
- Active - Operational and accepting updates
- Suspended - Temporarily inactive
- Closed - Permanently closed

Validation

UUID validation for coordination_id and clinical_id
Enum validation for lifecycle_state
Boolean validation for permission flags
Strict schema with deny_unknown_fields

Wire Format

Internal wire types use string UUIDs, translated to proper Uuid types at boundaries:

#![allow(unused)]
fn main() {
// Wire format (internal)
struct CoordinationStatusWire {
    coordination_id: String,
    clinical_id: String,
    status: StatusWire,
}

// Domain format (public)
struct CoordinationStatusData {
    coordination_id: Uuid,
    clinical_id: Uuid,
    status: StatusInfo,
}
}

Thread Ledgers

Overview

The fhir::Messaging module handles parsing and rendering of messaging thread ledger.yaml files.

This implementation uses FHIR Communication resource semantics without FHIR JSON transport.

API

#![allow(unused)]
fn main() {
// Parse from YAML
let ledger_data = fhir::Messaging::ledger_parse(yaml_text)?;

// Render to YAML
let yaml_text = fhir::Messaging::ledger_render(&ledger_data)?;
}

Domain Types

LedgerData - Top-level ledger structure
- thread_id: TimestampId - Thread identifier
- status: ThreadStatus - Thread status
- created_at: DateTime<Utc> - Creation timestamp
- last_updated_at: DateTime<Utc> - Last update timestamp
- participants: Vec<LedgerParticipant> - Participant list
- visibility: LedgerVisibility - Visibility settings
- policies: LedgerPolicies - Participation policies
- audit: LedgerAudit - Change audit trail
ThreadStatus - Enumeration
- Open - Active, accepting messages
- Closed - Closed to new messages
- Archived - Archived (hidden from default views)
LedgerParticipant - Participant information
- participant_id: Uuid - Participant identifier
- role: ParticipantRole - Participant role
- display_name: String - Human-readable name
- organisation: Option<String> - Organization affiliation
ParticipantRole - Enumeration
- Clinician - Clinical staff member
- Patient - Patient participant
- CareTeam - Care team member or healthcare professional
- System - System-generated participant
LedgerVisibility - Visibility settings
- sensitivity: String - Sensitivity level (standard, confidential, restricted)
- restricted: bool - Whether access is restricted beyond normal rules
LedgerPolicies - Participation policies
- allow_patient_participation: bool - Patient participation permitted
- allow_external_organisations: bool - External organizations permitted
LedgerAudit - Audit trail
- created_by: String - Creator identifier
- change_log: Vec<AuditChangeLog> - Chronological change log
AuditChangeLog - Single audit entry
- changed_at: DateTime<Utc> - Change timestamp
- changed_by: String - Actor identifier
- description: String - Human-readable description

Validation

UUID validation for thread_id (as TimestampId)
UUID validation for all participant_id fields
DateTime parsing with timezone handling
Enum validation for status and role fields
Strict schema with deny_unknown_fields

Wire Format

Internal wire types separate concerns:

#![allow(unused)]
fn main() {
// Wire format (internal)
struct Ledger {
    thread_id: String,
    status: ThreadStatus,
    created_at: DateTime<Utc>,
    // ... string UUIDs, raw timestamps
}

// Domain format (public)
struct LedgerData {
    thread_id: TimestampId,
    status: ThreadStatus,
    created_at: DateTime<Utc>,
    // ... proper UUID types, validated timestamps
}
}

Translation happens at parse/render boundaries using internal helper functions.

Wire Format Principles

Separation of Concerns

Wire types are internal implementation details
Domain types are public API surface
Translation happens at boundaries only
Consumers work with domain types exclusively

Strict Validation

All wire formats use #[serde(deny_unknown_fields)]:

Unknown fields are rejected
Prevents silent schema drift
Ensures forward compatibility is explicit
Catches typos and configuration errors

Type Safety

String identifiers validated and converted to proper types
UUIDs parsed and validated at boundaries
Timestamps validated and converted to DateTime<Utc>
Enumerations validated against allowed values

Human-Readable Formats

YAML is used for all wire formats:

Git-friendly diffs
Human-readable without tooling
Suitable for manual review
Easy to debug and inspect

Error Handling

Parse errors use serde_path_to_error for detailed diagnostics:

Thread ledger schema mismatch at participants[0].role:
unknown variant `doctor`, expected one of
`clinician`, `patient`, `careteam`, `system`

This provides:

Precise error location in document
Clear error description
Expected values for enumerations
Actionable feedback for corrections

FHIR Alignment

Conceptual Model

VPR coordination uses FHIR resource semantics:

COORDINATION_STATUS.yaml ≈ FHIR operational status tracking
Thread ledger.yaml ≈ FHIR Communication metadata
messages.md ≈ FHIR Communication content

This is conceptual alignment, not implementation:

No FHIR JSON format
No FHIR REST endpoints
No FHIR server behavior
No FHIR Bundle/Transaction semantics

Future Projections

FHIR-aligned wire formats enable future projections to:

FHIR Communication resources - For messaging threads
FHIR Task resources - For coordination tasks
FHIR DocumentReference - For compositions
FHIR RESTful APIs - For external integrations

Projection can happen:

At API boundaries (gRPC/REST to FHIR)
Via export tools (VPR to FHIR Bundle)
Through ETL pipelines (VPR to FHIR data warehouse)

Semantic Preservation

Key FHIR concepts preserved:

Communication.status → ThreadStatus (open, closed, archived)
Communication.recipient → participants with roles
Communication.sender → author metadata in messages
Communication.sent → created_at timestamp
Communication.payload → message content in messages.md

This ensures:

No semantic loss in translation
Clear mapping to FHIR when needed
Compatibility with FHIR-based systems
Standards-based interoperability

Implementation Details

Module Structure

crates/fhir/src/
    lib.rs                    # Public exports and error types
    coordination_status.rs    # COORDINATION_STATUS.yaml handling
    messaging.rs              # Thread ledger.yaml handling

Error Types

#![allow(unused)]
fn main() {
#[derive(Debug, thiserror::Error)]
pub enum FhirError {
    InvalidInput(String),
    InvalidYaml(serde_yaml::Error),
    Translation(String),
    InvalidUuid(String),
    // ...
}
}

Errors are converted to PatientError at boundaries via From trait.

Testing

Each module includes comprehensive tests:

Round-trip parsing (parse → render → parse)
Schema validation (reject unknown fields)
Type validation (reject wrong types)
UUID validation (reject malformed UUIDs)
Enum validation (reject unknown variants)
Edge cases (minimal valid documents, optional fields)

Dependencies

serde and serde_yaml - Serialization
serde_path_to_error - Detailed error paths
chrono - Timestamp handling
uuid - UUID types
vpr_uuid - TimestampId type

Usage Examples

Coordination Status

#![allow(unused)]
fn main() {
use fhir::{CoordinationStatus, CoordinationStatusData, StatusInfo, LifecycleState};

// Create new status
let status_data = CoordinationStatusData {
    coordination_id: Uuid::new_v4(),
    clinical_id: existing_clinical_uuid,
    status: StatusInfo {
        lifecycle_state: LifecycleState::Active,
        record_open: true,
        record_queryable: true,
        record_modifiable: true,
    },
};

// Render to YAML
let yaml = CoordinationStatus::render(&status_data)?;

// Write to file
fs::write("COORDINATION_STATUS.yaml", yaml)?;

// Later, parse back
let yaml_text = fs::read_to_string("COORDINATION_STATUS.yaml")?;
let parsed = CoordinationStatus::parse(&yaml_text)?;
assert_eq!(status_data, parsed);
}

Thread Ledger

#![allow(unused)]
fn main() {
use fhir::{Messaging, LedgerData, ThreadStatus, LedgerParticipant, ParticipantRole};

// Create ledger
let ledger_data = LedgerData {
    thread_id: thread_id,
    status: ThreadStatus::Open,
    created_at: Utc::now(),
    last_updated_at: Utc::now(),
    participants: vec![
        LedgerParticipant {
            participant_id: clinician_uuid,
            role: ParticipantRole::Clinician,
            display_name: "Dr Jane Smith".to_string(),
            organisation: Some("Example NHS Trust".to_string()),
        },
    ],
    // ... visibility, policies, audit
};

// Render to YAML
let yaml = Messaging::ledger_render(&ledger_data)?;

// Write to file
fs::write("ledger.yaml", yaml)?;

// Later, parse back
let yaml_text = fs::read_to_string("ledger.yaml")?;
let parsed = Messaging::ledger_parse(&yaml_text)?;
}

References

VPR File Storage (Binary and Non-Text Files)

Purpose

This document defines how non-text and binary files (for example PDFs, imaging, scans, waveforms, audio, and video) are stored, referenced, versioned, and governed within the Versioned Patient Repository (VPR).

The aim is to preserve clinical meaning, auditability, and long-term safety while remaining compatible with openEHR principles, offline use, and simple local operation (for example on a laptop), without introducing enterprise-only infrastructure.

Core Principles

Clinical meaning and binary bytes are deliberately separated
Binary files are not tracked in Git
Binary files are immutable once added (new content creates a new file)
References to files are explicit, auditable, and versioned
Clinical repositories remain valid even when binary files are absent
No global or cross-repository binary namespace exists

What Counts as a File

Files include, but are not limited to:

Portable Document Format (PDF) documents
Medical imaging (for example DICOM series)
Scanned paper documents
Audio or video recordings
Physiological waveforms or monitoring exports

These files are treated as clinical material, but are not part of the primary structured clinical data.

Repository-Scoped Storage Model

VPR does not use a global binary store.

Instead, each repository is self-contained and stores its own associated files alongside its versioned content.

This document describes the pattern using the Clinical Repository (CR) as the example. The same pattern applies independently to other repositories (CCR, DR, RRR).

Clinical Repository Layout

For a single Clinical Repository:

clinical/
└── <clinical_id>/
    ├── .gitignore
    ├── compositions/
    ├── indexes/
    ├── metadata/
    ├── … other CR-specific content …
    └── files/        # gitignored

Invariants

<clinical_id>/ is the repository root and Git root
The CR is independently portable and versioned
files/ is scoped only to this CR
files/ is explicitly excluded from Git tracking
The CR remains valid even if files/ is missing or incomplete

No patient identifier is implied or required by this structure.

The `files/` Directory

The files/ directory:

Contains binary files associated with this Clinical Repository
May include documents, imaging, video, audio, or other binary formats
Is not required to be present on all copies of the repository
Is never authoritative for clinical meaning

The name files/ is intentionally neutral and does not imply format, size, or readability.

File Identity and Integrity

Each file is identified by its content, not by its filename.

VPR implements content-addressed storage using SHA-256 hashes:

Files are stored using their SHA-256 hash as the filename
Two-level sharding is used to prevent excessive files per directory
Hashes are used to verify integrity
If file contents change, a new file is created

Storage structure:

files/
└── sha256/
    └── ab/          # First 2 characters of hash
        └── cd/      # Next 2 characters of hash
            └── abcdef123456...  # Full hash as filename

File References in the Clinical Repository

Purpose of a File Reference

Clinical artefacts do not embed binary data.

Instead, they include file references which:

Assert that a file exists or existed
Describe the file’s clinical role
Binds the reference immutably in time

File references are small, human-readable, and versioned as part of the CR.

Typical Reference Metadata

A file reference records:

Relative path to the file within files/
Cryptographic hash (SHA-256)
Hash algorithm identifier
Media type (MIME type, best-effort detection)
Original filename
File size in bytes
Storage timestamp (ISO 8601 format)

Example (matching FileMetadata structure):

file_reference:
  hash_algorithm: sha256
  hash: abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890
  relative_path: files/sha256/ab/cd/abcdef1234567890abcdef1234567890abcdef1234567890abcdef1234567890
  size_bytes: 1048576
  media_type: application/pdf
  original_filename: discharge-letter.pdf
  stored_at: "2026-01-24T10:30:00Z"

Note: The media_type is detected automatically using file content inspection and should not be considered authoritative for clinical purposes.

Placement Rules

File references are stored where the clinical meaning lives:

Letters, reports, results → referenced from CR artefacts
Workflow or administrative material → referenced from CCR artefacts
Withdrawn or redacted material → referenced from RRR artefacts

The origin of the file (patient, clinician, external organisation) does not determine placement.

Clinical meaning does.

External and Patient-Provided Files

Patient-provided or externally received files follow a simple, explicit workflow:

The file is placed into the CR’s files/ directory
A reference is created in an appropriate artefact
A clinician may later incorporate or reinterpret the material

This mirrors real-world clinical practice (for example “patient brought letter – reviewed”).

Versioning Behaviour

Files are immutable once added (enforced by the FilesService)
New or corrected content results in a new file with a different hash
References are append-only
Historical references remain valid indefinitely
Attempting to store a file with an existing hash returns an error

No reference is silently replaced or overwritten.

Redaction and Removal

VPR does not support silent deletion.

When a file must be withdrawn or redacted:

The reference in CR is explicitly marked as withdrawn or redacted
A tombstone remains in versioned history
The file may be removed from files/ as a separate, explicit action

The system always retains evidence that the file once existed in the Redacted Retention Repository (RRR).

Why Git Large File Storage Is Not Used

Git Large File Storage (LFS) is not suitable because:

It relies on repository paths rather than actual content identity
It complicates offline and partial copies
It does not align with openEHR-style separation of meaning and identity

Git is used to version clinical meaning, not binary bytes.

Enterprise Deployment and Acceleration (Non-Canonical Layer)

In enterprise deployments, VPR retains the on-disk Clinical Repository (CR) as the canonical source of truth, while performance, scale, and availability are achieved through derived acceleration layers. These include projection databases, indexes, and caches built by continuously reading the canonical CR and materialising fast read models for queries, lists, and search. Large files remain conceptually part of the CR but may be mirrored to object storage for durability and efficient delivery; such storage acts as a distribution and persistence layer, not a new authority. All enterprise components are explicitly rebuildable from the canonical repository, tolerate missing binary bytes, and never accept writes that bypass the CR. This preserves VPR’s laptop-first, openEHR-aligned philosophy while enabling high-throughput, low-latency operation at organisational scale.

Implementation

VPR provides the FilesService (in the vpr_files crate) for managing binary file storage:

Core Operations

add(source_path) — Adds a file to content-addressed storage
- Computes SHA-256 hash
- Creates sharded storage path
- Enforces immutability (errors if hash exists)
- Detects media type automatically
- Returns FileMetadata with all reference information
read(hash) — Retrieves file contents by hash
- Returns file as byte vector (Vec<u8>)
- Suitable for network transmission
- Errors if file not found

Service Characteristics

Repository-scoped: Each service instance is bound to one repository
Defensive: Validates all paths and prevents directory traversal
Stateless: No persistent state beyond filesystem
Safe: All paths canonicalised to prevent symlink attacks

See crates/files/src/files.rs for complete implementation details.

Summary

Each repository stores its own files locally
Files live in a files/ directory alongside versioned content
Files are not tracked by Git
References are explicit, relative, and auditable
Clinical meaning always lives in versioned artefacts

This design keeps VPR simple, portable, openEHR-aligned, and clinically honest.

Redaction Retention Repository (RRR)

1. Purpose

The Redaction Retention Repository (RRR) exists to ensure that patient-related information which has been removed from routine views is retained permanently, safely, and transparently.

RRR supports correctness, accountability, and trust in the VPR system by ensuring that no information is silently lost, while also ensuring that routine clinical, demographic, and care coordination views remain accurate and appropriate for day-to-day use.

2. Scope

The RRR applies to all patient-related information managed by VPR, including but not limited to:

Clinical entries
Demographic data
Care coordination artefacts
Referrals and correspondence
Attachments and structured documents

RRR is not limited to clinical data and is not patient-owned.

3. Core Principles

3.1 Retention, Not Deletion

Information placed into the RRR is never deleted. Retention is the default and permanent state unless explicitly governed by external policy or law.

3.2 Removal from Routine View

Items in the RRR must not appear in routine clinical or operational workflows. Their removal prevents inappropriate use while preserving traceability.

3.3 Neutrality

Placement into the RRR does not imply error, blame, review, or wrongdoing. It reflects a change in suitability for routine display only.

3.4 Transparency and Auditability

All movements into the RRR are recorded, attributable, and inspectable by authorised roles.

4. What “Redaction” Means in VPR

In VPR, redaction means:

Removal of an artefact from routine views while preserving the artefact in full elsewhere.

Redaction does not mean:

deletion,
erasure,
masking of content in-place.

Redaction is a relocation and reclassification operation.

5. Reasons for Redaction

Common reasons an artefact may be placed into the RRR include:

Wrong patient association
Misfiled demographic information
Incorrect referral or care coordination entry
Entered in error
Consent withdrawal
Jurisdictional or policy constraints

Reasons are recorded explicitly and separately from the artefact itself.

6. Relationship to Patient Repositories

When an artefact is redacted:

The artefact is removed from the relevant patient repository’s routine view.
A tombstone or pointer remains in the original location.
The artefact is stored in the RRR with full context and metadata.

The patient repository remains clinically clean while retaining traceability.

7. Access and Authorisation

Access to the RRR is:

Role-based
Audited
Intended for legitimate purposes such as governance, investigation, correction, or legal response

RRR access is expected and normal for authorised roles.

8. What the RRR Is Not

The RRR is not:

A temporary holding area
A review queue
A punishment mechanism
A hidden or secret store
A patient-facing record

9. Lifecycle Overview

Artefact created in a patient repository
Determination made that artefact should not appear in routine view
Redaction action performed
Artefact placed into RRR
Tombstone retained in original context
Artefact remains retained indefinitely

10. Future Considerations

Retention classes and policies
Cross-referencing with corrected or re-associated artefacts
Reporting and metrics on redaction activity
External regulatory access models

11. Summary

The Redaction Retention Repository is a foundational component of VPR that ensures integrity, transparency, and long-term trust in patient records by separating routine use from permanent retention, without loss of information.

Concurrency and Correctness in VPR

Purpose

VPR is a file-based patient record system where each patient record is stored in its own Git repository (for example containing files such as ehr_status.yaml).

In a production deployment, multiple worker processes and multiple servers may handle requests concurrently. This document explains the simple, robust approach used by VPR to ensure:

Only one update to a patient record happens at a time
No updates are lost
Git repositories are never left in an inconsistent or partially-written state
The system remains safe across crashes and restarts

The design intentionally favours correctness and clarity over complexity.

Core Principle

For any given patient, only one writer is allowed at a time, and every write is checked before it is saved.

This is achieved using two layers:

Per-patient serialisation (to decide whose turn it is to write)
Optimistic concurrency checks at the Git layer (to prevent lost updates)

Layer 1: Per-Patient Serialisation

Problem

In a clustered environment, two workers may attempt to update the same patient record at the same time.

Solution

Before making any change, a worker must acquire a per-patient lock from a shared, trusted service (typically the main relational database).

The lock is keyed by patient identifier
Only one worker can hold the lock at a time
Different patients can be updated in parallel
If a worker crashes, the lock is automatically released

This guarantees that, at the system level, only one writer is active for a given patient at any moment.

Mental Model

The database acts as a traffic light
Green means “you may edit this patient now”
Red means “wait or retry later”

Layer 2: Git-Based Optimistic Concurrency

Problem

Even with serialisation, extra protection is needed to ensure a write does not overwrite a newer version of the record.

Solution

Git already provides a perfect version check.

When a worker reads a patient repository, it records the current commit hash
When pushing an update, the worker asserts that the repository is still at that commit
If the repository has moved on, the push is rejected

This prevents:

Lost updates
Silent overwrites
Inconsistent repository state

Mental Model

“Only save my changes if nothing has changed since I last looked.”

End-to-End Write Flow

For a single patient update, the system follows this sequence:

Acquire the per-patient lock from the shared database
Read the patient Git repository and record the current commit hash
Apply changes locally in an isolated working copy
Create a Git commit containing the update
Push the commit, asserting the expected previous commit hash
Release the per-patient lock

If any step fails, the operation is retried or aborted safely without corrupting the patient record.

Failure and Crash Safety

The system is designed so that failures are safe by default.

If a worker crashes before pushing, the repository is unchanged
If a worker crashes after pushing, the change is already complete
Locks are not permanent and are released automatically
Git guarantees atomic updates of repository state

No manual intervention is required to recover from partial failures.

What This Design Guarantees

Exactly one writer per patient at any given time
No lost or overwritten updates
No partially-written files
Safe operation across multiple machines
Simple, auditable behaviour

What This Design Intentionally Avoids

At the current scale, VPR does not require:

Distributed consensus systems
Message queues for write coordination
Shared filesystem locks
Complex conflict resolution logic

These may be introduced later if throughput demands increase, but are not necessary for correctness.

Summary

VPR ensures correctness by combining:

A shared per-patient lock to serialise writes
Git commit checks to prevent overwriting newer data

This approach is intentionally boring, und

Git versioning and commit signatures

VPR stores each patient record as files on disk, and uses a Git repository per patient directory to version changes. This enables history, diffs, and (optionally) cryptographic signing of commits.

Immutability and Audit Trail Philosophy

Core Principle: Nothing is Ever Deleted

VPR maintains a completely immutable audit trail. Nothing is ever truly deleted from the version control history. This fundamental design choice ensures:

Patient Safety: Every change is traceable to a specific author at a specific time
Legal Compliance: Complete audit trail meets regulatory requirements
Clinical Governance: Full accountability for all modifications
Research and Quality: Historical data remains available for authorized retrospective analysis

Commit Actions and Their Meaning

VPR uses a controlled vocabulary for commit actions, each with specific semantics:

`Create`

Used when adding new content to an existing record. Examples:

Creating a new clinical letter
Adding a new observation
Recording a new diagnosis
Initializing a new patient record

This is the most common action for new data entry.

`Update`

Used when modifying existing content. Examples:

Correcting a typo in a letter
Updating patient demographics (address change, name change)
Linking demographics to clinical records
Amending administrative details

The previous version remains in Git history and can be compared via diff.

`Superseded`

Used when newer clinical information makes previous content obsolete. Examples:

A revised diagnosis based on new test results
An updated care plan
Replacement of preliminary findings with final results

This is distinct from Update as it represents a clinical decision that previous information should be replaced rather than corrected. The superseded content remains in history but is marked as no longer current for clinical decision-making.

`Redact`

Used when data was entered into the wrong patient’s repository by mistake. This can occur in any of the three repositories: clinical, demographics, or coordination. This is the only action that removes data from active view. The process:

Data is removed from the patient’s active record
Data is encrypted and moved to the Redaction Retention Repository
A non-human-readable tombstone/pointer remains in the Git history
The commit message records the redaction action for audit purposes

Even redacted data is preserved in secure storage and remains accessible to authorized auditors, ensuring complete traceability while protecting patient privacy.

What This Means in Practice

Every change is preserved: Git commits form an unbroken chain from initialization to present
Diffs show what changed: You can compare any two points in time
Authors are accountable: Each commit is signed (optionally cryptographically) with author metadata
No data loss: Even mistakes are preserved in history, allowing forensic analysis if needed
Audit compliance: Regulators can verify that no data has been improperly deleted

Where Git repos live

Clinical records are stored under the sharded directory structure:

patient_data/clinical/<s1>/<s2>/<32-hex-uuid>/

That patient directory is initialised as a Git repository (.git/ lives inside it).

Initial commit creation

When a new clinical record is created:

VPR copies the clinical-template/ directory into the patient directory.
VPR writes the initial ehr_status.yaml.
VPR stages all files (excluding .git/) and writes a tree.
VPR creates the initial commit.

The implementation lives in crates/core/src/clinical.rs in ClinicalService::initialise.

Branch behaviour (`main`)

Signed commits are created with git2::Repository::commit_signed. A key detail of libgit2 is:

commit_signed creates the commit object but does not update any refs (no branch ref, no HEAD update).

To ensure the repo behaves like a normal Git repo, VPR explicitly:

sets HEAD to refs/heads/main before creating the first commit, and
after the signed commit is created, creates/updates refs/heads/main to point at that commit and points HEAD to it.

Result: clinical repos “land on” the main branch.

How signing works

If Author.signature is provided during initialisation, VPR signs the initial commit using ECDSA P-256.

Payload: the unsigned commit buffer produced by Repository::commit_create_buffer.
- This is the exact byte payload that must be signed to match what commit_signed expects.
Algorithm: ECDSA over P-256 (p256 crate).
Signature encoding:
- VPR uses the raw 64-byte signature format (r || s) and base64-encodes it.
- This base64 string is passed to commit_signed and ends up stored in the commit header field gpgsig.

Notes:

Despite the gpgsig name, this is not a GPG signature; it is an ECDSA signature stored in that header field.
VPR currently focuses on “is this commit cryptographically valid for this key?”, not on GPG identity chains.

How verification works

VPR can verify that a commit was signed by the private key corresponding to a provided public key.

Verification steps (implemented in ClinicalService::verify_commit_signature):

Open the patient Git repo.
Resolve the latest commit from HEAD.
Read the gpgsig header field from the commit.
Normalise it (handle whitespace wrapping), base64-decode it, and parse as a P-256 ECDSA signature.
Recreate the unsigned commit buffer with commit_create_buffer using the commit’s tree/parents/author/committer/message.
Verify the signature over that recreated buffer using the provided public key.

Important behaviour:

Verification currently requires a valid HEAD (it does not attempt to recover commits from an unborn branch).
The verifier accepts either:
- a PEM-encoded public key, or
- a PEM-encoded X.509 certificate (.crt), in which case the EC public key is extracted and used.

CLI usage

The CLI exposes verification as:

vpr verify-clinical-commit-signature <clinical_uuid> <public_key_or_cert>

Examples:

vpr verify-clinical-commit-signature 572ae9ebde8c480ba20b359f82f6c2e7 dr_smith.crt
vpr verify-clinical-commit-signature 572ae9ebde8c480ba20b359f82f6c2e7 ./dr_smith_public_key.pem

What this does (and does not) prove

This verification proves:

the commit’s signature is mathematically valid for the provided public key, over the exact commit payload VPR signs.

It does not (by itself) prove:

that a certificate is trusted (no chain/CA validation),
that the author identity is “real” (it’s still a local signature check).

Comments

VPR - Versioned Patient Repository

Install pre-commit hooks

pre-commit install

install rust locally if you want to test on local machine

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

start a new terminal to be able to use rust

Install protobuf compiler

brew install protobuf

Build

cargo build

Nuke Docs

As the docs run on a cache, you will likely need to nuke the docs if you remove files. Just manually run nuke docs cache (manual) from GitHub Actions.

Future: Database Projections

Note: The following database benchmarks and setup instructions are for planned future implementation of database projections (Postgres) and caching (Redis) for performance optimisation. The current system uses file-based storage only.

Time trial benchmarks

Preliminary benchmarks comparing Postgres vs Git for single entry operations:

Postgres: 22.45 ops/sec
Git: 8.11 ops/sec

Postgres is approximately 3 times faster for these operations.

Postgres setup (for future implementation)

brew install hyperfine
brew install postgresql@16
brew services start postgresql@16
PGURL="postgres://user:pass@localhost:5432/postgres" N=10000 ./file_db_time_trial.sh
createuser -s postgres || true
psql -U postgres -c "ALTER USER postgres WITH PASSWORD 'postgres';" || true

Test VPR server

With server reflection enabled (set VPR_ENABLE_REFLECTION=true), you can use:

grpcurl -plaintext -d '{}' localhost:50051 vpr.v1.VPR/Health

To get a reflection of the service:

grpcurl -plaintext localhost:50051 describe vpr.v1.VPR

You can check out endpoints specifics like this:

grpcurl -plaintext localhost:50051 describe .vpr.v1.CreatePatientReq

Or with the proto file (without reflection):

grpcurl -plaintext \
  -import-path crates/api/proto \
  -proto crates/api/proto/vpr/v1/vpr.proto \
  -d '{}' \
  localhost:50051 vpr.v1.VPR/Health

Note: Server reflection is disabled by default for security in production. Set VPR_ENABLE_REFLECTION=true to enable it.

CLI

The VPR command-line interface (CLI) provides comprehensive tools for managing patient records, including demographics, clinical data, and care coordination.

Usage

Inside the ‘vpr-dev’ Docker container or after building the vpr-cli crate:

vpr --help

Available Commands

Patient Management

list - Lists all patients in the system
initialise-full-record - Creates a complete patient record (demographics, clinical, and coordination repositories)

Demographics

initialise-demographics - Initialises a new demographics repository
update-demographics - Updates demographic information (given names, last name, birth date)

Clinical Records

initialise-clinical - Initialises a new clinical repository
write-ehr-status - Links clinical repository to demographics by writing EHR status file
new-letter - Creates a new clinical letter with markdown content
new-letter-with-attachments - Creates a new letter with file attachments
read-letter - Reads and displays a clinical letter
get-letter-attachments - Retrieves attachments for a letter

Care Coordination

initialise-coordination - Initialises a new coordination repository linked to clinical record
create-thread - Creates a new messaging thread
add-message - Adds a message to an existing thread
read-communication - Reads a communication thread with all messages
update-communication-ledger - Updates ledger (participants, status, visibility)
update-coordination-status - Updates lifecycle status and flags

Security

create-certificate - Creates a professional registration certificate with X.509 encoding
verify-clinical-commit-signature - Verifies cryptographic signature on latest clinical commit

Development

delete-all-data - DEV ONLY: Deletes all patient data (requires DEV_ENV=true)

Common Options

Author Registration

Many commands support professional registrations using the --registration flag, which can be repeated:

--registration "GMC" "1234567" --registration "NMC" "98765"

Digital Signatures

Commands that modify records support optional digital signatures using the --signature flag:

--signature <ecdsa_private_key_pem>

The signature can be provided as PEM text, base64-encoded PEM, or a file path.

Example Workflows

Creating a Complete Patient Record

# 1. Create full record
vpr initialise-full-record "Emily" "Davis" "1985-03-20" \
  "Dr. Robert Brown" "robert.brown@example.com" "Clinician" "City Hospital"

# Outputs: Demographics UUID, Clinical UUID, Coordination UUID

Adding a Letter

vpr new-letter <clinical_uuid> "Dr. Sarah Johnson" "sarah.johnson@example.com" \
  --role "Clinician" \
  --care-location "GP Clinic" \
  --content "# Clinical Note\n\nPatient assessment..."

Adding a Letter with Attachments

vpr new-letter-with-attachments <clinical_uuid> \
  "Dr. Michael Chen" "michael.chen@example.com" \
  --role "Clinician" \
  --care-location "Hospital Laboratory" \
  --attachment-file "/path/to/lab_results.pdf"

Creating a Communication Thread

vpr create-thread <coordination_uuid> "Dr. Brown" "brown@example.com" \
  --role "Clinician" \
  --care-location "City Hospital" \
  --participant "<clinical_uuid>" "clinician" "Dr. Brown" \
  --participant "<demographics_uuid>" "patient" "Emily Davis" \
  --initial-message "Initial consultation scheduled."

Adding Messages to a Thread

vpr add-message <coordination_uuid> <thread_id> \
  "Nurse Wilson" "wilson@example.com" \
  --role "Clinician" \
  --care-location "City Hospital" \
  --message-type "clinician" \
  --message-body "Patient vitals recorded." \
  --message-author-id "<clinician_uuid>" \
  --message-author-name "Nurse Wilson"

Getting Help

For detailed help on any command:

vpr <command> --help

Large Language Model (LLM) Documentation Index

AI Contributor Guidance
LLM Specification
LLM Roadmap

VPR — AI contributor notes

These notes are for automated coding agents and should be short, concrete, and codebase-specific.

Specifications live in spec.md; roadmap is tracked in roadmap.md. Keep this document consistent with those sources.

Overview

Purpose: VPR is a file-based patient record system with Git-like versioning, built as a Rust Cargo workspace. It provides dual gRPC and REST APIs for health checks and patient creation. The system stores patient data as JSON/YAML files in a sharded directory structure under patient_data/, with each patient having their own Git repositories (clinical, demographics, and coordination) for version control.
Key crates:
- crates/core (vpr-core) — PURE DATA OPERATIONS ONLY: File/folder management, patient data CRUD, Git versioning with X.509 commit signing. NO API concerns (authentication, HTTP/gRPC servers, service interfaces).
- crates/api-shared — Shared utilities and definitions for both APIs: Protobuf types, HealthService, authentication utilities.
- crates/api-grpc — gRPC-specific implementation: VprService, authentication interceptors, tonic integration.
- crates/api-rest — REST-specific implementation: HTTP endpoints, OpenAPI/Swagger, axum integration.
- crates/certificates (vpr-certificates) — Digital certificate generation utilities: X.509 certificate creation for user authentication and commit signing.
- crates/cli (vpr-cli) — Command-line interface: CLI tools for patient record management and certificate generation.
Main binary: vpr-run (defined in root Cargo.toml), runs both gRPC (port 50051) and REST (port 3000) servers concurrently using tokio::join.

Important files to reference

src/main.rs — Main binary that performs startup validation (checks for patient_data, clinical-template directories; creates clinical/demographics/coordination subdirs), creates runtime constants, and starts both gRPC (port 50051) and REST (port 3000) servers concurrently using tokio::join.
crates/core/src/lib.rs — PURE DATA OPERATIONS: Services for file/folder operations (sharded storage, directory traversal, Git repos per patient). NO API CODE.
crates/core/src/config.rs — CoreConfig and helpers used to resolve/validate configuration once at startup.
crates/core/src/clinical.rs — ClinicalService: Initialises patients with clinical template copy, creates Git repo, signs commits with X.509.
crates/core/src/demographics.rs — DemographicsService: Updates patient demographics JSON, lists patients via directory traversal.
crates/api-grpc/src/service.rs — gRPC service implementation (VprService) with authentication, using core services.
crates/api-shared/vpr.proto — Canonical protobuf definitions for VPR service (note: national_id field present but unused in current impl).
crates/api-shared/src/health.rs — Shared HealthService used by both gRPC and REST APIs.
Justfile — Developer commands: just start-dev (Docker dev), just docs (mdBook site), just pre-commit.
compose.dev.yml — Development Docker setup with cargo-watch live reload and healthcheck (grpcurl -plaintext localhost:50051 list && curl -f http://localhost:3000/health).
scripts/check-all.sh — Quality checks: cargo fmt --check, cargo clippy -D warnings, cargo check, cargo test.
docs/src/overview.md — Detailed project overview and architecture.

Build and test workflows (concrete)

Local quick compile: cargo build -p api-grpc (or cargo run for full binary).
Full workspace checks: ./scripts/check-all.sh (runs fmt, clippy, check, test).
Docker dev runtime: just start-dev or docker compose -f compose.dev.yml up --build.
Healthcheck: grpcurl -plaintext localhost:50051 list (gRPC) and curl http://localhost:3000/health (REST).
Documentation: just docs serves mdBook site with integrated rustdoc.

Conventions and patterns to follow

Protobufs: Canonical proto in crates/api-shared/vpr.proto; generated Rust in api_shared::pb via build script.
Service wiring: crates/api-grpc implements VprService using core services; binaries construct it via VprService::new(Arc<CoreConfig>).
Patient storage: Sharded under the configured patient data directory (default: patient_data):
- Clinical: clinical/<s1>/<s2>/<32hex-uuid>/ (ehr_status.yaml, copied clinical-template files, Git repo)
- Demographics: demographics/<s1>/<s2>/<32hex-uuid>/patient.json (FHIR-like JSON, Git repo)
- Coordination: coordination/<s1>/<s2>/<32hex-uuid>/ (Care Coordination Repository: encounters, appointments, episodes, referrals; Git repo) where s1/s2 are first 4 hex chars of UUID.
APIs: Dual gRPC/REST with identical functionality; REST uses axum, utoipa for OpenAPI.
Logging: tracing with RUST_LOG env var (e.g., vpr=debug).
Error handling: tonic Status for gRPC, axum StatusCode for REST; internal errors logged with tracing::error!.
File I/O: Direct std::fs operations with serde_json/serde_yaml for patient data; no database layer.
Git versioning: Each patient directory is a Git repo; commits signed with X.509 certificates from author.signature.
- Clinical template: templates/clinical/ directory copied to new patient clinical dirs; validated at startup.

Runtime configuration and environment variables

Resolve environment variables once at process startup (or CLI startup) and pass configuration down.
- Create a vpr_core::CoreConfig (see crates/core/src/config.rs) in the binary entrypoints:
  - src/main.rs (vpr-run)
  - crates/api-grpc/src/main.rs (standalone gRPC)
  - crates/api-rest/src/main.rs (standalone REST)
  - crates/cli/src/main.rs (CLI)
- Typical env inputs: PATIENT_DATA_DIR, VPR_CLINICAL_TEMPLATE_DIR, RM_SYSTEM_VERSION, VPR_NAMESPACE.
- Use the helpers in crates/core/src/config.rs to resolve/validate template and parse the RM version.
crates/core (vpr-core) must not read environment variables during operations.
- Do not call std::env::var in core service methods or helpers.
- Prefer constructors like ClinicalService::new(Arc<CoreConfig>) for uninitialised state, or ClinicalService::with_id(Arc<CoreConfig>, Uuid) for initialised state. Same for DemographicsService::new(Arc<CoreConfig>).
- This avoids rare-but-real process-wide env races and keeps behaviour consistent within a request.

Defensive programming (clinical safety)

Treat defensive programming as a non-negotiable requirement.
Validate inputs and configuration early and fail fast (arguments, resolved startup configuration, parsed identifiers) before doing filesystem/Git side effects.
Prefer bounded work over unbounded behaviour (retry limits, traversal depth, file counts/sizes, timeouts where applicable).
Avoid silent fallbacks and “best effort” behaviour in core logic: return a typed error when something is invalid.
Avoid panic!/expect() on paths influenced by inputs or environment; reserve them for internal invariants only.
When partial work has occurred, attempt cleanup/rollback and do not ignore cleanup failures.- Strong static typing: Leverage Rust’s type system to encode invariants and prevent errors at compile time. Use wrapper types to represent validated data (e.g., ShardableUuid for canonical UUIDs, Author for validated commit authors). Avoid stringly-typed data, primitive obsession, and runtime checks where types can express constraints. Prefer newtype patterns and distinct types over raw strings, integers, or booleans when domain concepts have specific rules.- Formatting: All Rust code MUST follow cargo fmt standards. Before completing any changes, run cargo fmt on the workspace. Do not commit code that fails cargo fmt --check. The project uses rustfmt.toml for consistent formatting enforced by pre-commit hooks.
Spelling: Use British English (en-GB) for documentation and other prose (mdBook pages, README, Rustdoc/comments).
Documentation style:
- Use Rustdoc (doc comments) with standard section headings.
- For functions/methods (including private helpers), include clear # Arguments, # Returns, and # Errors sections when applicable.
  - Include # Arguments for all methods with parameters (public or private), documenting what each parameter represents.
  - Include # Returns for all methods that return non-unit values (public or private), describing what is returned.
  - If there are no arguments/meaningful return value/no error conditions to document, omit the empty section.
  - For # Errors, prefer a short, grouped bullet list describing the conditions under which an error is returned (not an exhaustive list of enum variants).
    - Use the form: Returns <ErrorType> if: then - ... bullets.
    - Group by category when helpful (validation/config, filesystem I/O, serialisation, Git, crypto).
- For each module, start the file with //! module-level Rustdoc that outlines what the module does and what it is intended to do.
- Documentation examples: In Rust, documentation examples are executable doctests and should be used deliberately, not everywhere by default. Examples are encouraged when they clarify lifecycle rules, state transitions, ordering constraints, or non-obvious correct usage, as they act as part of the correctness and safety contract of the code. Avoid adding examples to trivial helpers or internal plumbing where the signature is self-explanatory. Prefer a small number of minimal, focused examples that encode important invariants rather than repetitive or decorative usage snippets.
Imports and naming:
- Prefer adding clear use imports (for example, use crate::uuid::ShardableUuid;) rather than repeating long paths like crate::... throughout the file.
- Prefer calling imported items directly (e.g. copy_dir_recursive(...)) instead of qualifying call sites with crate::copy_dir_recursive(...).
  - Exception: keep fully-qualified paths only when needed to disambiguate names.
- For constants, prefer importing the specific items by name (for example use crate::constants::{EHR_STATUS_FILENAME, LATEST_RM};) so call sites don’t need constants::... prefixes.
- Avoid glob imports (use crate::foo::*;) unless there is a strong reason.
- Keep imports scoped to what the file uses; remove unused imports to satisfy clippy -D warnings.
- If two imports would conflict, use explicit renaming (use crate::thing::Type as ThingType;) rather than falling back to fully-qualified paths everywhere.
Architecture boundaries:
- core: ONLY file/folder/git operations (ClinicalService, DemographicsService, data persistence)
- api-shared: Shared API utilities (HealthService, auth, protobuf types)
- api-grpc: gRPC-specific concerns (service implementation, interceptors)
- api-rest: REST-specific concerns (HTTP endpoints, JSON handling)
- main.rs: Startup validation (patient_data, clinical-template dirs), runtime constants, service orchestration

Testing boundaries

Test where the rule lives:
- If a function implements validation rules (for example Author::validate_commit_author), write exhaustive unit tests for each failure mode and a success case.
- If a function merely calls validation (for example ClinicalService::initialise calling author.validate_commit_author()?), write only wiring tests:
  - validation errors are returned unchanged,
  - no side effects occur when validation fails.
Prefer true unit tests (no filesystem/Git/network) where possible; use TempDir-backed tests only for integration-level behaviour (directory layout, Git repo creation, template copying).

Change policy and safety

Prefer minimal, well-scoped PRs updating single crates or modules.
Run ./scripts/check-all.sh before proposing changes; fix clippy warnings.
When changing protos: Update crates/api-shared/vpr.proto, regenerate with cargo build.
Patient data paths: Hardcoded sharding logic in core; avoid changing without testing directory traversal in list_patients.
Environment config: Env vars are read in binaries/CLI at startup to build CoreConfig; avoid adding env reads to crates/core.
Proto fields: Some fields (e.g., national_id) present but unused in current implementation.

Examples (copyable snippets)

Start dev servers: just start-dev b (builds and runs Docker containers).
Health check: grpcurl -plaintext localhost:50051 vpr.v1.VPR/Health or curl http://localhost:3000/health.
Create patient: grpcurl -plaintext -d '{"first_name":"John","last_name":"Doe"}' localhost:50051 vpr.v1.VPR/CreatePatient.
List patients: grpcurl -plaintext localhost:50051 vpr.v1.VPR/ListPatients or curl http://localhost:3000/patients.

Edge cases for automated edits

Do not change workspace members in root Cargo.toml without verifying all crates build.
Avoid altering patient directory sharding in core/src/lib.rs — list_patients relies on exact structure.
Main.rs runs both servers; changes must maintain concurrency (tokio::join).
Docker mounts ./patient_data for persistence; test with actual file creation/deletion.
Clinical template validation: clinical-template/ must exist and contain files; clinical init copies it recursively.

If unsure, ask for clarification and provide a short plan: files to change, tests to add, and commands you will run to validate.

If you’d like I can expand any section (e.g., CI, proto build details, or example PR checklist).

LLM Specification (Draft)

Purpose and Scope

Define how LLM tooling supports the VPR project while respecting safety, auditability, and architecture boundaries.
Focus on assistant-driven code/docs changes and developer workflows; avoid introducing runtime LLM features.
Keep this spec aligned with docs/src/llm/copilot-instructions.md (canonical guidance for AI contributors).

System Context

VPR is a Rust Cargo workspace delivering dual gRPC/REST services plus a CLI over a file-based, Git-versioned patient record store.
Core data operations live in crates/core; transports live in crates/api-grpc and crates/api-rest; shared proto/auth/health in crates/api-shared; certificate utilities in crates/certificates; CLI in crates/cli.
Patient data is stored on disk, sharded by UUID under patient_data/, with separate clinical, demographics, and coordination repos per patient and Git history for audit.
Future separation: we may split the core VPR code into its own library crate; it must remain independent of organisational layers (security, APIs, enterprise back-office). Core must not depend on organisational code, but organisational layers may depend on core.

Patient-Centred Posture

Put the patient at the centre of every decision: safety, clarity, and agency outweigh convenience.
Treat the combination of patient-first intent and human-readable files as the keystone: files remain the canonical, inspectable record that patients and clinicians can understand and carry.
Support two deployment shapes: (a) patient/self-hosted mode on a personal machine using CLI and a simple UX (to be built later in the epics), and (b) enterprise/organisation mode serving hundreds or thousands of patients.
Design interfaces and logging with the assumption that patients may access their own records; avoid leaking PHI in tooling output while keeping auditability for clinical and organisational users.
Keep on-disk formats human-readable (YAML and Markdown with front matter) so patients and clinicians can inspect history; use JSON only for internet-facing APIs (REST/gRPC) where required.

LLM Responsibilities (assistant mode)

Follow canonical contributor instructions: defensive programming, British English docs, architecture boundaries, startup config resolution in binaries only.
Generate scoped changes with clear rationale, minimal blast radius, and accompanying tests when behaviour changes.
Keep docs consistent across mdBook sources (docs/src/**) and README; prefer linking to canonical sources instead of duplicating.

Data and Storage Invariants

Sharded directories: patient_data/clinical/<s1>/<s2>/<uuid>/, patient_data/demographics/<s1>/<s2>/<uuid>/, and patient_data/coordination/<s1>/<s2>/<uuid>/ (s1/s2 are first 4 hex chars).
Clinical repo seeded from validated clinical template directory (no symlinks; depth/size limits enforced); demographics repo holds FHIR-like patient.json; coordination repo (Care Coordination Repository) holds encounters, episodes, appointments, and referrals.
Git repos per patient with signed commits (ECDSA P-256) where configured; single branch main.
Clinical ehr_status links to demographics via external reference; coordination entries reference both clinical and demographics records.

API Surfaces (high level)

gRPC service (tonic): health, patient creation, patient listing; API key interceptor expected on gRPC.
REST service (axum/utoipa): mirrors gRPC behaviour; Swagger/OpenAPI exposed; currently open by default unless otherwise configured.
Health endpoints on both transports; reflection optional for gRPC.

Configuration and Startup

Env resolved once at startup in binaries/CLI, then passed via CoreConfig: PATIENT_DATA_DIR, VPR_CLINICAL_TEMPLATE_DIR, RM_SYSTEM_VERSION, VPR_NAMESPACE, API key, bind addresses, reflection flag, dev guard for destructive CLI.
Startup flow (vpr-run): validate patient_data and template dirs, ensure shard subdirs exist (clinical, demographics, coordination), build config, launch REST and gRPC concurrently with tokio::join.

Safety and Quality Bar

Fail fast on invalid config/inputs; avoid panics on input-driven paths; no silent fallbacks.
Respect architecture boundaries: crates/core must not read env; transports handle auth and request wiring.
Strong static typing: Prefer type safety over runtime checks. Use distinct types to encode invariants (e.g., ShardableUuid for canonical UUIDs, Author for validated commit authors). Avoid stringly-typed data and primitive obsession; let the type system catch errors at compile time.
Add tests where rules live; wiring tests ensure errors propagate and side effects do not occur on failure.
Use British English in prose and Rustdoc; prefer module-level //! docs and function docs with # Arguments, # Returns, # Errors when applicable.

Security Expectations (for LLM-driven changes)

Default to least privilege and minimise new attack surface; do not introduce new network listeners, env reads in crates/core, or unsafe defaults.
Keep authentication posture aligned with project decisions: API key for gRPC (and REST when enabled); defer mTLS or other mechanisms to explicit user approval.
Handle secrets safely in code and docs: avoid logging API keys, certificates, or patient identifiers; redact in logs and examples.
Preserve commit-signing and integrity paths: do not weaken signature requirements or verification flows without explicit agreement.
Avoid introducing PHI (Protected Health Information) into logs, test fixtures, or examples; prefer synthetic/non-identifying data.
When adding dependencies, prefer well-maintained crates with permissive licenses; avoid unsafe code unless strictly necessary and justified.

Build, Test, and Tooling

Primary check pipeline: ./scripts/check-all.sh (fmt, clippy -D warnings, check, test).
Docker dev: docker compose -f compose.dev.yml up --build or just start-dev; health via grpcurl and REST /health.
Proto changes: edit crates/api-shared/vpr.proto, rebuild to regenerate bindings.

Open Questions / Next Decisions

Confirm scope: Is LLM limited to contributor assistance, or will user-facing LLM features (summaries/search) be added? If runtime features are desired, specify data access boundaries, PHI handling, and auditing requirements.
Define authentication posture for REST (API key, mTLS, or other) to align with gRPC.
Clarify expected commit-signing defaults (enforce vs optional) and how LLM-generated changes should treat signing in CI/local dev.

VPR Development Roadmap

Overview

Purpose:

This roadmap outlines the planned development work for the Versioned Patient Repository (VPR) – a Git-backed clinical record system designed to preserve verifiable clinical truth, authorship, and history over decades. VPR treats patient records as durable, inspectable artefacts with explicit provenance, rather than mutable database rows.

Guiding principles:

Patients first and human readable
Clinical truth is append-only and auditable

Phase Grouping

Phase 1 – Foundations of Truth: Epics 1–3
Phase 2 – Semantics and Meaning: Epics 4–6
Phase 3 – Operational Reality: Epics 7–9
Phase 4 – Access, Projections, and Record upload: Epics 10–15

Epic 1. Core Storage, Integrity, and Templates

Business Value:

Establishes the foundational storage and integrity model for VPR. Every patient record change is durable, inspectable, and tamper-evident.

File-based patient record store with sharded layout and per-patient Git repositories (clinical + demographics separation)
Clinical template seeding and validation at startup
Commit signing optional in development environments
Integrate cargo-audit into CI/CD
Integrate cargo-deny into CI/CD
Tighten traversal and allocation limits for patient discovery
Implement retry and back-off strategy for filesystem and Git operations
Validate all user-supplied identifiers and namespaces before side effects
Add monitoring for template validation failures
Conservative git gc strategy for per-patient repos
Enforce “no symlinks ever” policy across templates, imports, and repos

Epic 2. openEHR Alignment and Reference Model Semantics

Business Value:
Ensures long-term interoperability while preventing openEHR wire models from contaminating internal domain logic.

Define supported openEHR RM versions and validation strategy
Specify namespace formation and validation rules
Publish RM/namespace compatibility matrix per deployment
Validate ehr_status linkage to demographics (external_ref)
Map clinical templates to openEHR archetype expectations
Add RM/archetype validation or linting where practical
Define supported artefact types:
- ehr_status.yaml
- Clinical letters (Markdown with YAML front matter)
- Documents (PDF with sidecar metadata)
- Structured messaging threads (YAML/JSON)
Implement large-file-storage for binary artefacts (PDFs, images, scans) outside Git to preserve repository performance
Support patient-contributed artefacts and annotations
Explicitly document boundary between wire models and internal domain models

Epic 3. Demographics via FHIR

Business Value:
Separates patient identity from clinical truth while enabling interoperability.

Separate demographics repository (FHIR-like patient.json)
Implement demographics service parity with clinical service
Validate demographics against selected FHIR profile
Pagination and limits for demographics listing and queries
Document demographics data contract and evolution strategy

Epic 4. Clinical Record Lifecycle and Semantic States

Business Value:
Removes ambiguity about what a clinical record means over time.

Define lifecycle states (created, amended, corrected, superseded, closed)
Define metadata conventions for lifecycle state
Distinguish “wrong at the time” vs “correct then, obsolete now”
Define closure and reopening semantics
Document how consumers should interpret lifecycle state
Explicitly document what VPR does not infer automatically

Epic 5. Temporal Semantics and Clinical Time

Business Value:
Ensures timestamps are clinically and legally interpretable.

Define event time vs documentation time vs commit time
Support retrospective documentation
Define correction and amendment timing semantics
Handle clock skew and external system timestamps
Document required and optional temporal fields per artefact type
Ensure Git commit time is never misrepresented as clinical event time

Epic 6. Logging, Auditability, and Provenance

Business Value:
Supports investigation, compliance, and forensic reconstruction.

Define structured logging schema
Enforce PHI redaction rules in logs
Standardise error taxonomy
Correlate operations with request IDs and commit hashes
Log validation, security, and auth failures
Log operational signals (retries, maintenance tasks)
Document log retention, sinks, and access controls

Epic 7. Failure Modes and Recovery Semantics

Business Value:
Ensures predictable behaviour on bad days.

Enumerate supported failure modes (partial writes, corruption, tampering)
Classify failures (fatal, recoverable, operator intervention)
Define system behaviour per failure class
Define which failures must always be surfaced to operators
Document guarantees around non-silent failure

Epic 8. Operational Hardening and Catastrophic Recovery

Business Value:
Ensures patient data survives hardware failure, human error, and attack.

Define write-through backup strategy for patient repos
Physically and administratively separate backup storage
Offline cold backups at defined intervals
Restore drills into clean environments
Verify integrity and signatures on restore
Define and document RPO and RTO targets
Implement recovery marker commits with provenance
Guarantee no silent history rewriting during restore
Define encryption-at-rest and key management posture
Finalise commit-signing policy for production
Implement configurable signature verification on read paths

Epic 9. Governance, Authority, and Evolution Boundaries

Business Value:
Prevents architectural drift and unresolvable disputes.

Define authority for RM version acceptance and deprecation
Define schema evolution and incompatibility handling
Document which decisions live outside the codebase
Define escalation paths for semantic disputes
Explicitly separate technical enforcement from organisational policy

Epic 10. Care Coordination and PAS-like Functions

Business Value:
Supports operational workflows without polluting clinical truth.

Define coordination domain model (encounters, referrals, appointments)
Implement Care Coordination Repository with Git-backed storage
Link coordination artefacts to clinical and demographics records
Define authorisation rules for coordination actions
Define YAML schemas for coordination artefacts
Support UX state (read/unread, task completion)
Explicitly document non-authoritative status vs clinical record

Epic 11. API Transport, Auth, and Contracts

Business Value:
Provides secure, well-defined access to VPR.

REST and gRPC transports with shared protobufs
API key authentication for gRPC
Configuration options to enable/disable gRPC and/or REST APIs independently (allow both, either, or neither)
Disable reflection in production
REST authentication parity with gRPC
Optional mTLS design (future)
Structured error models for REST and gRPC
Pagination and validation for all listing APIs
Secrets storage and rotation strategy
API versioning and upgrade documentation

Epic 12. Read Models, Projections, and Performance

Business Value:
Improves performance without betraying truth.

Define projection formats and cache semantics
Explicitly mark projections as non-authoritative
Ensure projections are disposable and rebuildable
Link projections back to commit hashes
Benchmark read and write paths under load
Document acceptable projection lag

Epic 13. Patient Data Portability and Agency

Business Value:
Supports patient autonomy and regulatory compliance.

Define patient download formats (full history vs snapshot)
Implement authenticated download APIs
Log and audit all patient downloads
Define accepted upload formats and version compatibility
Implement robust upload validation and sanitisation
Reject symlinks, executables, and path traversal
Support upload dry-run and preview
Define merge and reconciliation strategies
Log and audit all upload attempts
Define trust boundaries for externally signed records

Epic 14. Education, Invariants, and Operational Literacy

Business Value:
Reduces institutional memory risk and misuse.

Operator runbooks (backup, restore, failure handling)
Developer invariants (what must never be violated)
“What VPR does not do” documentation
Shared mental model for contributors and operators

Epic 15. Core and Organisational Separation

Business Value:
Keeps the patient-record core reusable as a standalone library for patient/self-hosted deployments while allowing organisational layers (security, APIs, projections, back-office) to evolve independently without contaminating core invariants.

Define the boundary for a standalone core library crate (patient data model, filesystem/Git, validation) that excludes organisational concerns.
Document dependency direction: core must not depend on organisational code; organisational layers may depend on core.
Identify organisational-only modules (authentication/authorisation, API transport, projections/cache, observability/ops) to reside outside the core crate.
Evaluate packaging and repository split options (single repo with crates vs separate repositories) and their impact on versioning and CI.
Plan migration and testing strategy for the split (CI matrices, contract tests, release cadence, documentation updates).

Code

Thanks to

The work that I have done with VPR is in many ways due to the time and effort that Dr Marcus Baw has put into his own version of a git file based patient record system call gitEHR. I openly admit to copying his ideas and implementations in my own approach to building VPR. Many thanks to Marcus for his pioneering work in this area.

Mark

Keyboard shortcuts

VPR docs