Google Cloud Platform infrastructure¶

Overview¶

Quill Medical runs on three separate GCP projects, each in europe-west2 (London):

Environment	Project ID	Purpose	Status
Production	`quill-medical-production`	Clinical app for real patients	Hibernated
Staging	`quill-medical-staging`	Integration testing + landing page	Active
Teaching	`quill-medical-teaching`	Educational environment (no clinical data)	Active

Estimated cost: £72–107/month across staging and teaching (production hibernated — see Production hibernation below).

Architecture¶

                        ┌──────────────────────────┐
                        │   Cloud DNS              │
                        │   quill-medical.com      │
                        └────────────┬─────────────┘
                                     │
         ┌───────────────────────────┼───────────────────┐
         │                           │                   │
    quill-medical.com    staging.quill-    teaching.quill-
    (landing page)       medical.com       medical.com
         │                           │                   │
         │              ┌────────────▼────────┐   ┌──────▼───────┐
         └──────────────►   Global LB        │   │  Global LB   │
                        │   Cloud Armor      │   │  Cloud Armor │
                        │   WAF              │   │  WAF         │
                        └──┬───────┬───┬─────┘   └──┬────────┬──┘
                           │       │   │            │        │
                        /api/*   /*   landing     /api/*    /*
                           │       │   page         │        │
                        ┌──▼──┐ ┌──▼──┐ ┌──┐    ┌──▼──┐  ┌──▼──┐
                        │Back │ │Front│ │GCS│   │Back │  │Front│
                        │end  │ │end  │ │   │   │end  │  │end  │
                        └──┬──┘ └─────┘ └───┘   └──┬──┘  └─────┘
                           │                       │
                     ┌─────┼─────────┐          ┌──▼────────┐
                     │     │         │          │Cloud SQL  │
                  ┌──▼──┐┌─▼──┐┌────▼┐         │Auth only  │
                  │Auth ││FHIR││EHR- │         └───────────┘
                  │DB   ││DB  ││base │
                  └─────┘└────┘└─────┘
                  3× Cloud SQL instances
                           │
                        ┌──▼────────┐
                        │HAPI FHIR  │
                        │EHRbase VM │
                        └───────────┘

                         Staging                 Teaching

  Production: HIBERNATED (project exists, all resources destroyed)

Each environment has:

Global HTTPS Load Balancer — path-based routing, Cloud Armor WAF, Google-managed SSL
Cloud Run — backend (FastAPI) and frontend (React/Vite), auto-scaling
Cloud SQL — PostgreSQL for the auth database (all environments)
Secret Manager — JWT keys, database passwords, VAPID keys
VPC — private networking, no public database IPs
Monitoring — uptime checks on /api/health with email alerts

Production and staging also have:

Cloud SQL — additional FHIR and EHRbase databases
Compute Engine — e2-small VM running HAPI FHIR and EHRbase via Docker

Teaching additionally has:

Cloud Storage — image bucket for educational content (question bank YAML + images deployed by CI from quill-question-bank repo)

What has been set up¶

GCP projects (done)¶

Three projects created in the GCP console, all linked to the same billing account.

APIs enabled (done)¶

The following APIs were enabled on all three projects:

Cloud Run
Cloud SQL Admin
Compute Engine (production and staging only)
Secret Manager
Artifact Registry
Cloud DNS
Service Networking
Serverless VPC Access
IAM
Cloud Resource Manager
Cloud Monitoring

Terraform state bucket (done)¶

Remote state is stored in a versioned GCS bucket in the production project:

gs://quill-medical-terraform-state

Terraform uses workspace prefixes to separate state per environment.

Workload Identity Federation (done)¶

Each project has a WIF setup that lets GitHub Actions authenticate without long-lived JSON key files:

Component	Value
Service account	`github-actions@quill-medical-{env}.iam.gserviceaccount.com`
WIF pool	`github-pool`
WIF provider	`github-provider`
Attribute condition	See below

The teaching project's WIF provider allows authentication from two repositories (the main app and the question bank):

assertion.repository == 'bailey-medics/quillmedical' || assertion.repository == 'bailey-medics/quill-question-bank'

Production and staging WIF providers only allow bailey-medics/quillmedical.

Adding a new repository to WIF requires two steps

Updating the WIF provider attribute condition is not enough. You must also add a roles/iam.workloadIdentityUser IAM binding on the service account for the new repo's principal. Without this, GitHub Actions will authenticate but fail with iam.serviceAccounts.getAccessToken permission denied when trying to impersonate the service account.

```bash # Step 1: Update the WIF provider attribute condition gcloud iam workload-identity-pools providers update-oidc github-provider \ --project=quill-medical-{env} \ --location=global \ --workload-identity-pool=github-pool \ --attribute-condition="assertion.repository == 'bailey-medics/quillmedical' || assertion.repository == 'bailey-medics/{new-repo}'"

# Step 2: Grant the new repo's WIF identity permission to impersonate the SA gcloud iam service-accounts add-iam-policy-binding \ github-actions@quill-medical-{env}.iam.gserviceaccount.com \ --project=quill-medical-{env} \ --role=roles/iam.workloadIdentityUser \ --member="principalSet://iam.googleapis.com/projects/{project-number}/locations/global/workloadIdentityPools/github-pool/attribute.repository/bailey-medics/{new-repo}" ```

The service accounts have the following IAM roles:

roles/editor — manage most GCP resources
roles/secretmanager.admin — create and manage secrets
roles/run.admin — deploy Cloud Run services
roles/iam.serviceAccountUser — let Cloud Run services run as other service accounts

GitHub secrets (done)¶

quillmedical repository¶

Nine repository secrets set via gh secret set:

Secret	Value pattern
`GCP_{ENV}_WIF_PROVIDER`	`projects/{number}/locations/global/workloadIdentityPools/github-pool/providers/github-provider`
`GCP_{ENV}_SERVICE_ACCOUNT`	`github-actions@quill-medical-{env}.iam.gserviceaccount.com`
`GCP_{ENV}_PROJECT_ID`	`quill-medical-{env}`

Where {ENV} is PROD, STAGING, or TEACHING.

Additional secret:

Secret	Purpose
`SLACK_WEBHOOK_URL`	Slack incoming webhook for CI notifications

quill-question-bank repository¶

Four repository secrets for the question bank CI/CD pipeline:

Secret	Purpose
`GCP_TEACHING_WIF_PROVIDER`	WIF provider path for teaching project
`GCP_TEACHING_SERVICE_ACCOUNT`	Service account for GCS access
`GCP_TEACHING_GCS_BUCKET`	GCS bucket name (`quill-images-teaching`)
`SLACK_CICD_WEBHOOK_URL`	Slack incoming webhook for CI/CD notifications

These secrets authenticate the question bank deploy workflow (deploy.yml) to sync validated question bank content to the teaching GCS bucket on merge to main.

Terraform configuration (done)¶

The infrastructure is defined in infra/ using Terraform modules:

Module	Purpose
`secrets`	Secret Manager secret containers
`networking`	VPC, subnet, Cloud NAT, VPC connector, firewall rules
`cloud-sql`	PostgreSQL instances with private IP, backups, auto-generated passwords
`cloud-run`	Backend and frontend services with secret injection
`cloud-run-job`	Admin CLI jobs (create-user, update-permissions, etc.)
`load-balancer`	Global HTTPS LB, Cloud Armor WAF, serverless NEGs, SSL certs
`compute-fhir`	VM running HAPI FHIR + EHRbase (prod/staging only)
`monitoring`	Uptime checks and email alerting
`dns`	Cloud DNS zone management
`cloud-storage`	Image bucket (teaching only)

Environment-specific settings live in infra/environments/{env}/terraform.tfvars.

Artifact Registry (done)¶

Each project has a Docker repository in Artifact Registry:

europe-west2-docker.pkg.dev/quill-medical-{env}/quill/

Container images are pushed here by CI (not GHCR — Cloud Run only supports Artifact Registry, GCR, or Docker Hub). Image paths:

europe-west2-docker.pkg.dev/quill-medical-{env}/quill/backend:main — backend service (built from prod Dockerfile stage)
europe-west2-docker.pkg.dev/quill-medical-{env}/quill/frontend:main — frontend service (built from prod Dockerfile stage)
europe-west2-docker.pkg.dev/quill-medical-{env}/quill/admin:latest — admin CLI (built from admin Dockerfile stage, via just build-admin)

Docker build targets

The backend Dockerfile has three stages: dev, prod, and admin. The admin stage is last, so building without --target produces the admin CLI image, not the web server. CI deploy workflows must always specify target: prod.

Organisation policy override (done)¶

The GCP organisation (826360329716) enforces Domain Restricted Sharing by default, which blocks allUsers IAM bindings. This was overridden at the project level for all three projects to allow public Cloud Run access:

gcloud resource-manager org-policies set-policy policy.yaml --project=quill-medical-staging
gcloud resource-manager org-policies set-policy policy.yaml --project=quill-medical-teaching
gcloud resource-manager org-policies set-policy policy.yaml --project=quill-medical-production

This required the roles/orgpolicy.policyAdmin role at the organisation level.

Staging Terraform apply (done)¶

terraform apply completed for the staging environment — all resources created successfully:

VPC, subnet, Cloud NAT, VPC connector, firewall rules
3 Cloud SQL instances (auth, FHIR, EHRbase) with auto-generated passwords
Secret Manager secrets with initial values (jwt-secret, vapid-private, db passwords)
Cloud Run backend and frontend (placeholder images)
Compute Engine VM for HAPI FHIR + EHRbase
Artifact Registry Docker repository
IAM bindings for public Cloud Run access and Secret Manager
Monitoring uptime checks and email alerts

Cloud Run URLs (placeholder containers, will serve real app after first CI deploy):

Backend: https://quill-backend-staging-fptrrusgxa-nw.a.run.app
Frontend: https://quill-frontend-staging-fptrrusgxa-nw.a.run.app

Teaching Terraform apply (done)¶

terraform apply completed for the teaching environment — 32 resources created:

VPC, subnet, Cloud NAT, VPC connector, firewall rules
1 Cloud SQL instance (auth only — no FHIR/EHRbase) with auto-generated password
Secret Manager secrets with initial values
Cloud Run backend and frontend (placeholder images)
Artifact Registry Docker repository
Cloud Storage image bucket
IAM bindings for public Cloud Run access and Secret Manager
Monitoring uptime checks and email alerts

Cloud Run URLs:

Backend: https://quill-backend-teaching-izhomeiy6q-nw.a.run.app
Frontend: https://quill-frontend-teaching-izhomeiy6q-nw.a.run.app

Production Terraform apply (done) (hibernated)¶

Production was fully provisioned and deployed, then hibernated via terraform destroy to save costs while not needed. See Production hibernation for details and restore instructions.

Global HTTPS Load Balancer (done)¶

Each environment has a Global HTTPS Load Balancer that sits in front of the Cloud Run services. This provides:

Path-based routing: /api/* goes to the backend Cloud Run service, everything else goes to the frontend
Google-managed SSL certificates: automatically provisioned and renewed for each domain
Cloud Armor WAF: rate limiting at 500 requests per minute per IP address
HTTP to HTTPS redirect: all port 80 traffic is redirected to port 443
Static global IP: stable IP addresses for DNS A records

Environment	Domain	Load Balancer IP	Status
Staging	`staging.quill-medical.com`	`35.186.223.130`	Active
Staging	`quill-medical.com`	`35.186.223.130`	Active (landing page)
Teaching	`teaching.quill-medical.com`	`136.110.221.126`	Active
Production	`app.quill-medical.com`	—	Hibernated

The Caddyfile no longer reverse-proxies /api/* to the backend — the load balancer handles all routing. Caddy now just serves static frontend files and provides a /healthz endpoint for health checks.

Domain architecture (done)¶

Domain	Purpose	Update process	Status
`quill-medical.com`	Public landing/marketing site	Update anytime, no clinical sign-off	Active (staging LB)
`app.quill-medical.com`	Live clinical application	Release versions, DCB0129, UAT	Hibernated
`staging.quill-medical.com`	Staging/integration testing	Auto-deploy from main branch	Active
`teaching.quill-medical.com`	Teaching/training environment	Auto-deploy from main branch	Active

The public landing site (quill-medical.com and www.quill-medical.com) is served from a GCS bucket behind the staging load balancer. The site is built from the frontend/public_pages/ Vite workspace and deployed via the public-site.yml CI workflow on pushes to main. This allows marketing pages and feature announcements to be updated without going through clinical release gates.

DNS records (done)¶

Cloud DNS zone quill-medical-zone in the production project holds all DNS records:

Record	Type	TTL	Value	Notes
`quill-medical.com`	A	300	`35.186.223.130`	Landing page (staging LB)
`www.quill-medical.com`	CNAME	300	`quill-medical.com`	www redirect to apex
`staging.quill-medical.com`	A	300	`35.186.223.130`
`teaching.quill-medical.com`	A	300	`136.110.221.126`

GoDaddy nameservers were updated to delegate to Google Cloud DNS:

ns-cloud-c1.googledomains.com
ns-cloud-c2.googledomains.com
ns-cloud-c3.googledomains.com
ns-cloud-c4.googledomains.com

Terraform workspaces (done)¶

Each environment uses a separate Terraform workspace to isolate state:

Workspace	Environment	State path in GCS
`staging`	Staging	`gs://quill-medical-terraform-state/terraform/state/staging.tfstate`
`teaching`	Teaching	`gs://quill-medical-terraform-state/terraform/state/teaching.tfstate`
`production`	Production	`gs://quill-medical-terraform-state/terraform/state/production.tfstate`

Terraform and the gh CLI were installed via Homebrew on the admin account.

Branching and deployment model¶

feature/*  ──►  main  ──►  release/*  ──►  clinical-live
                  │                            │
           deploys to:                   deploys to:
           staging                       production
           teaching
           landing page
           docs

Staging deployment (push to main)¶

Workflow: .github/workflows/deploy-staging-teaching.yml

Detect what changed (backend, frontend, shared)
Build and push container images to Artifact Registry, tagged main-{sha}
Deploy to staging and teaching Cloud Run
Smoke test: GET /api/health (5 retries, 10s intervals)
Slack notification

Note: Alembic migrations run automatically via the backend container's entrypoint script on startup, not as a separate CI step.

Production deployment (push to clinical-live)¶

Workflow: .github/workflows/deploy-production.yml

Detect what changed
Build and push container images to Artifact Registry, tagged clinical-live-{sha} and latest
Deploy to production Cloud Run
Smoke test: GET /api/health
Slack notification

Note: Alembic migrations run automatically via the backend container's entrypoint script on startup, not as a separate CI step.

Production deploys are never cancelled mid-flight.

Infrastructure changes (changes to infra/)¶

Workflow: .github/workflows/terraform.yml

Pull requests — runs terraform plan and posts the diff as a PR comment
Merge to main — runs terraform apply for staging and teaching
Merge to clinical-live — runs terraform apply for production

Environment configuration¶

Production¶

project_id              = "quill-medical-production"
environment             = "prod"
enable_fhir             = true
enable_ha               = false
db_tier                 = "db-f1-micro"
cloud_run_max_instances = 10

Staging¶

project_id              = "quill-medical-staging"
environment             = "staging"
enable_fhir             = true
enable_ha               = false
db_tier                 = "db-f1-micro"
cloud_run_max_instances = 3

Teaching¶

project_id              = "quill-medical-teaching"
environment             = "teaching"
enable_fhir             = false
enable_ha               = false
db_tier                 = "db-f1-micro"
cloud_run_max_instances = 5

Teaching-specific Cloud Run backend environment variables (in addition to the standard set):

Variable	Value	Purpose
`CLINICAL_SERVICES_ENABLED`	`false`	Disables FHIR/EHRbase endpoints
`TEACHING_STORAGE_BACKEND`	`gcs`	Use GCS for teaching image storage
`TEACHING_GCS_BUCKET`	`quill-images-teaching`	GCS bucket containing question banks
`TEACHING_IMAGES_BASE_URL`	`https://storage.googleapis.com/quill-images-teaching`	Public URL prefix for question bank images

Environment variable naming¶

Terraform injects environment variables into Cloud Run services via the env_vars and secret_env_vars maps in the Cloud Run module. The variable names must exactly match the Pydantic Settings field names in backend/app/config.py.

Key mappings:

Terraform env var	Config field	Default (Docker Compose)
`AUTH_DB_HOST`	`AUTH_DB_HOST`	`postgres-auth`
`AUTH_DB_NAME`	`AUTH_DB_NAME`	`quill_auth`
`AUTH_DB_USER`	`AUTH_DB_USER`	`auth_user`
`FHIR_SERVER_URL`	`FHIR_SERVER_URL`	`http://fhir:8080/fhir`
`EHRBASE_URL`	`EHRBASE_URL`	`http://ehrbase:8080/ehrbase`

If names don't match, the backend silently falls back to the Docker Compose defaults (which are unresolvable hostnames in Cloud Run), causing FHIR/EHRbase health checks to fail.

Cloud Storage IAM¶

The teaching GCS bucket (quill-images-teaching) requires an explicit IAM binding for the Cloud Run backend service account. The default compute service account ({project-number}-compute@developer.gserviceaccount.com) does not automatically inherit storage.objects.list permission, even though it is a project editor — legacy bucket IAM grants access to projectEditor/projectViewer principal groups, but the compute SA is not automatically a member for API-level object listing.

The required binding:

gcloud storage buckets add-iam-policy-binding gs://quill-images-teaching \
  --member="serviceAccount:{project-number}-compute@developer.gserviceaccount.com" \
  --role="roles/storage.objectViewer" \
  --project=quill-medical-teaching

Symptom of missing binding

The backend logs Failed to list GCS banks and the Admin > Teaching page shows "No teaching modules found" after clicking Sync. The underlying error is google.api_core.exceptions.Forbidden: 403 ... does not have storage.objects.list access.

This binding should be added to the cloud-storage Terraform module to avoid manual steps on future environments.

Security¶

No public database IPs — Cloud SQL is accessible only via VPC
SSH via IAP only — no open SSH ports, all access through Identity-Aware Proxy
Secrets in Secret Manager — never in environment variables (initial values auto-generated by Terraform)
WIF authentication — no long-lived JSON key files, short-lived tokens only
Attribute condition on WIF — only bailey-medics/quillmedical can authenticate (production/staging); teaching also allows bailey-medics/quill-question-bank
Least-privilege service accounts — each environment has its own service account
Explicit GCS IAM bindings — Cloud Run service accounts need bucket-level roles/storage.objectViewer even when they are project editors (see Cloud Storage IAM)
Cloud Armor WAF — rate limiting (500 req/min per IP) on all load balancers
HTTPS enforced — HTTP to HTTPS redirect on all environments, Google-managed SSL certificates
Google-managed TLS — certificates auto-provisioned and auto-renewed, no manual cert management
Content Security Policy — browser-enforced allowlists per resource type (see CSP headers below)

Content Security Policy (CSP) headers¶

The production Caddyfile (caddy/prod/Caddyfile) sets a Content-Security-Policy response header on every page. This tells the browser which origins are allowed to load each type of resource, providing defence against XSS and data-injection attacks.

Current policy:

default-src 'self';
script-src  'self';
style-src   'self' 'unsafe-inline';
img-src     'self' data: https://storage.googleapis.com;
font-src    'self';
connect-src 'self';
frame-ancestors 'none'

Directive	Allowed origins	Notes
`default-src`	`'self'`	Fallback for any type not listed below
`script-src`	`'self'`	Only first-party JavaScript
`style-src`	`'self' 'unsafe-inline'`	Mantine injects inline styles at runtime
`img-src`	`'self' data: https://storage.googleapis.com`	GCS signed URLs for teaching images
`font-src`	`'self'`	Only first-party fonts
`connect-src`	`'self'`	XHR/fetch — API calls go via the same-origin LB
`frame-ancestors`	`'none'`	Prevents the app being embedded in an iframe

Adding external image or API sources

If a new feature loads images from an external origin (e.g. a different CDN or FHIR server), that origin must be added to the relevant CSP directive in caddy/prod/Caddyfile. Without this, the browser silently blocks the request and images appear broken with no errors in the application logs — only a CSP violation message in the browser console.

Remaining steps¶

Set real VAPID key (pending) (done)¶

VAPID keys were generated and stored in Secret Manager for all three environments (version 2):

Public key: BC0B26JO27tGc5qkbt2-QzY8M7_0u3gt5hmFj1RGWvZp9Vr9fDQ3-lpQ6YxqNlU0fFKlIUzCnb-baAE0rzIL-Ys
Private key: stored in Secret Manager (vapid-private, version 2) for all three projects

The public key is baked into the frontend Docker image at build time via the VITE_VAPID_PUBLIC build argument (set in CI workflows).

The jwt-secret auto-generated value is fine for use — it's a strong random 64-character string.

Database passwords are auto-generated by Terraform and stored in Secret Manager automatically.

DNS delegation (pending) (done)¶

GoDaddy nameservers updated to point to Cloud DNS:

ns-cloud-c1.googledomains.com
ns-cloud-c2.googledomains.com
ns-cloud-c3.googledomains.com
ns-cloud-c4.googledomains.com

First deployment¶

Once DNS is fully propagated and SSL certificates are provisioned, merge the feature/gcp-setup branch to main. The CI pipeline will:

Build container images
Push to Artifact Registry
Deploy to staging and teaching Cloud Run
Smoke test the health endpoint

Create Alembic migration Cloud Run job (solved)¶

Database migrations are handled by the backend's entrypoint.sh, which runs alembic upgrade head before starting the uvicorn server. This runs automatically on every Cloud Run revision deployment — no separate migration job is needed.

Admin Cloud Run Job (done)¶

Each active environment has a quill-admin-{env} Cloud Run Job for one-off admin tasks (creating superadmin users, updating permissions, assigning roles). See the admin tasks documentation for usage.

The job is defined in the cloud-run-job Terraform module and uses a separate Docker image built from the admin target in the backend Dockerfile. The admin image is a CLI tool — it does not run an HTTP server.

Production go-live¶

Cut a release/* branch from main
Test on staging
PR to clinical-live
CI deploys to production
Verify health checks pass

Future improvements¶

Slack webhook for deployment notifications
CPU/memory/error-rate monitoring (beyond uptime checks)
Production database tier upgrade from db-f1-micro
High availability for production Cloud SQL
Restrict Cloud Run ingress to INGRESS_TRAFFIC_INTERNAL_LOAD_BALANCER (once LB is confirmed working)
Add roles/storage.objectViewer binding to the cloud-storage Terraform module (currently applied manually)

Production hibernation¶

Production was hibernated to save costs while the environment is not actively needed. All Terraform-managed resources were destroyed; the GCP project and certain manually-created resources remain intact.

What was destroyed¶

All ~67 Terraform-managed resources including:

VPC, subnet, Cloud NAT, VPC connector, firewall rules
3 Cloud SQL instances (auth, FHIR, EHRbase) and all data
Secret Manager secret versions (containers remain)
Cloud Run backend and frontend services
Compute Engine VM (HAPI FHIR + EHRbase)
Global HTTPS Load Balancer, Cloud Armor policy, SSL certificate
Monitoring uptime checks and alert policies

Two orphaned Cloud Run services (quill-backend-production, quill-frontend-production) were also manually deleted.

What is preserved¶

The following resources survive terraform destroy and do not need recreating:

Resource	Location	Notes
GCP project	`quill-medical-production`	Project itself is not Terraform-managed
Workload Identity Federation	`github-pool` / `github-provider`	GitHub Actions can still authenticate
GitHub secrets	`GCP_PROD_*`	3 repository secrets remain valid
Cloud DNS zone	`quill-medical-zone`	Manually created, holds all DNS records
Organisation policy override	Domain Restricted Sharing	Allows `allUsers` IAM bindings
Artifact Registry	`europe-west2-docker.pkg.dev/quill-medical-production/quill/`	Container images still stored
Terraform state	`gs://quill-medical-terraform-state` (production workspace)	Empty state, workspace exists
Secret Manager containers	`jwt-secret`, `db-password-*`, `vapid-private`, etc.	Empty (no versions), will be repopulated on apply
Enabled APIs	Cloud Run, Cloud SQL Admin, etc.	Remain enabled on the project

Restore procedure¶

To bring production back online:

# 1. Authenticate Terraform
gcloud auth application-default login

# 2. Select the production workspace
cd infra
terraform workspace select production

# 3. Recreate all resources (~50 resources, takes ~10 minutes)
terraform apply -var-file=environments/prod/terraform.tfvars

# 4. Note the new load balancer IP from the output
# lb_ip = "x.x.x.x"

# 5. Create DNS A record for the app domain
gcloud dns record-sets create app.quill-medical.com. \
  --type=A --ttl=300 \
  --rrdatas="<NEW_LB_IP>" \
  --zone=quill-medical-zone \
  --project=quill-medical-production

# 6. Wait for SSL certificate to provision (requires DNS propagation)
gcloud compute ssl-certificates describe quill-cert-v3-prod \
  --project=quill-medical-production --global \
  --format="value(managed.status)"
# Repeat until status is ACTIVE (can take up to 30 minutes)

# 7. Trigger a deployment (push to clinical-live or manually deploy)
# The CI pipeline will build, push images, and deploy to Cloud Run

# 8. Run database migrations
# Either via CI or manually inside the backend container

# 9. Verify health
curl https://app.quill-medical.com/api/health

Important: New Cloud SQL instances will have fresh auto-generated passwords (stored in Secret Manager). All databases will be empty — a data restore from backups would be needed if any data existed previously.

Landing page during hibernation¶

The landing page at quill-medical.com and www.quill-medical.com is served from the staging load balancer (35.186.223.130). The staging SSL certificate covers staging.quill-medical.com, quill-medical.com, and www.quill-medical.com. The landing_domain variable in the staging tfvars controls this.

The site is built from the frontend/public_pages/ Vite workspace and deployed to the {project_id}-landing GCS bucket by the .github/workflows/public-site.yml CI workflow. Changes to frontend/public_pages/**, frontend/src/components/**, or frontend/src/theme.ts trigger a rebuild and upload.

When production is restored, you may optionally move the landing page back to the production LB by:

Setting landing_domain = "quill-medical.com" in environments/prod/terraform.tfvars
Removing landing_domain and quill-medical.com from monitored_hostnames in environments/staging/terraform.tfvars
Updating the DNS A record for quill-medical.com to point to the production LB IP

Google Cloud Platform infrastructure¶

Overview¶

Architecture¶

What has been set up¶

GCP projects (done)¶

APIs enabled (done)¶

Terraform state bucket (done)¶

Workload Identity Federation (done)¶

GitHub secrets (done)¶

quillmedical repository¶

quill-question-bank repository¶

Terraform configuration (done)¶

Artifact Registry (done)¶

Organisation policy override (done)¶

Staging Terraform apply (done)¶

Teaching Terraform apply (done)¶

Production Terraform apply ~~(done)~~ (hibernated)¶

Global HTTPS Load Balancer (done)¶

Domain architecture (done)¶

DNS records (done)¶

Terraform workspaces (done)¶

Branching and deployment model¶

Staging deployment (push to main)¶

Production deployment (push to clinical-live)¶

Infrastructure changes (changes to infra/)¶

Environment configuration¶

Production¶

Staging¶

Teaching¶

Environment variable naming¶

Cloud Storage IAM¶

Security¶

Content Security Policy (CSP) headers¶

Remaining steps¶

Set real VAPID key ~~(pending)~~ (done)¶

DNS delegation ~~(pending)~~ (done)¶

First deployment¶

~~Create Alembic migration Cloud Run job~~ (solved)¶

Admin Cloud Run Job (done)¶

Production go-live¶

Future improvements¶

Production hibernation¶

What was destroyed¶

What is preserved¶

Restore procedure¶

Landing page during hibernation¶

Production Terraform apply (done) (hibernated)¶

Set real VAPID key (pending) (done)¶

DNS delegation (pending) (done)¶

Create Alembic migration Cloud Run job (solved)¶