Skip to main content

Servers & Networking

Server Fleet

Three Hetzner Cloud VPS instances, all connected via WireGuard VPN.

ServerLocationRoleSpecInternal IP
Hermeshel1 (Helsinki)Production + CI runner4 vCPU, 8GB RAM, 80GB NVMe10.1.0.1
Atlasnbg1 (Nuremberg)Staging + CI runner2 vCPU, 4GB RAM, 40GB SSD10.1.0.2
Iris(observability)Observability + Docs2 vCPU, 4GB RAM, 40GB SSD10.1.0.4

Hermes runs production workloads. The GitHub Actions self-hosted runner also runs on Hermes — a documented risk accepted at current scale. Post-launch, the runner should move to a dedicated instance.

Atlas fills up quickly from Docker image accumulation. A daily cron at 00:01 Athens time runs docker system prune to clean up. See Disk Management runbook.


WireGuard VPN (Olympus Network)

All three servers are on a WireGuard mesh VPN: olympus, 10.1.0.0/24.

All inter-service traffic uses the WireGuard IPs — never the public internet. Examples:

  • API → Loki: http://10.1.0.4:3100
  • Backups: Hermes DB → pg_dump → copy to Iris over WireGuard

Domains & Routing

All public traffic: Cloudflare (CDN + TLS) → Hetzner firewall (Cloudflare IPs only on 80/443) → Traefik → Docker containers.

DomainServerNotes
pcmr.grHermesProduction web
api.pcmr.grHermesProduction API (CORS-locked to pcmr.gr)
staging.pcmr.grAtlasStaging web
api-staging.pcmr.grAtlasStaging API
staff.pcmr.grHermesStaff portal — Cloudflare Access gated
staff-staging.pcmr.grAtlasStaging staff portal — Cloudflare Access gated
docs.pcmr.grIrisThis documentation site — Cloudflare Access gated
status.pcmr.grIrisUptime Kuma (public)
coolify.ctsolutions.grHermesCoolify UI — Cloudflare Access gated

Cloudflare Access (Zero Trust)

Cloudflare Access policies gate:

  • coolify.ctsolutions.gr — email OTP auth
  • staff.pcmr.gr + staff-staging.pcmr.gr — email OTP auth
  • docs.pcmr.gr — email OTP auth

These are a first layer of defense. The application still enforces its own auth — Cloudflare Access is defense-in-depth, not the sole gate.


Traefik

Traefik runs on each server as the reverse proxy. It handles SSL termination and routing to Docker containers.

Gotchas:

  1. acme.json permissions — Traefik will not start if acme.json has permissions other than 600:

    chmod 600 /path/to/acme.json
  2. Cloudflare Universal SSL — Does not cover third-level subdomains (e.g., staff.pcmr.gr). Use Full (Strict) mode in Cloudflare SSL/TLS settings, which requires a valid origin certificate.

  3. tls: {} behind Cloudflare Access — When Traefik is behind Cloudflare Access, use tls: {} (no certResolver) — Cloudflare handles TLS termination and presents its own cert to clients. certResolver would try to issue Let's Encrypt certs but ACME challenges would fail because Cloudflare intercepts the traffic.

  4. Non-root Coolify user — Coolify uses a non-root user. For manual Docker operations not managed by Coolify:

    docker compose up -d

    Do not use sudo for Coolify-managed containers — it can cause ownership conflicts.


Observability Stack (on Iris)

All services on Iris communicate internally — not exposed publicly (except Uptime Kuma):

ServicePortPurpose
Loki10.1.0.4:3100Log aggregation (receives logs from API)
Grafana127.0.0.1:3200Log dashboards (access via SSH tunnel)
GlitchTip127.0.0.1:8080Error tracking (Sentry-compatible)
Umami127.0.0.1:3000Privacy-friendly web analytics
Uptime Kumapublic :3001Uptime monitoring (public at status.pcmr.gr)
Homarr127.0.0.1:7575Internal dashboard

Accessing Grafana: SSH tunnel required:

ssh -L 3200:127.0.0.1:3200 iris -N
# Then open http://localhost:3200 in browser

Backups (3-2-1 Strategy)

  1. Live DB on Hermes (PostgreSQL)
  2. Daily pg_dump — cron on Hermes, copies dump to Iris over WireGuard (30-day retention)
  3. Manual cold backup — hardware-encrypted Samsung T9 SSD via scripts/backup-local.ps1

Additionally: Hetzner takes daily VPS snapshots (7-day retention).


Hetzner Object Storage

S3-compatible object storage for file attachments and generated PDFs.

EnvironmentBucketRegion
Productionmnemehel1
Stagingmneme-stagingnbg1

Data stays in EU data centers (GDPR compliant). Implementation uses the standard AWS SDK with forcePathStyle: true — zero code changes needed if migrating to another S3-compatible provider.