Skip to main content

Design Decisions

Key architectural decisions with the reasoning behind them. Understanding the "why" prevents re-relitigating these choices.


Why Better Auth Instead of NextAuth?

Decision: Auth runs in NestJS (not Next.js).

Why: NextAuth is tightly coupled to Next.js — it runs as Next.js API routes and stores sessions in Next.js-accessible storage. The NestJS API would need to forward every authenticated request to Next.js to validate the session, or duplicate the session validation logic.

Better Auth is framework-agnostic. Running it in NestJS means:

  • Single source of truth for auth — the API validates sessions directly
  • No round-trips from NestJS → Next.js for session validation
  • All auth operations (2FA, admin plugin, password reset) are accessible to NestJS services
  • The web app is a thin client — it proxies auth requests to NestJS

Trade-off: More complex setup. Auth configuration is more verbose in NestJS than in a Next.js project using NextAuth/Auth.js.


Why is BETTER_AUTH_URL Set to the Frontend URL (in the API)?

Decision: The API's BETTER_AUTH_URL env var is set to the frontend public URL, not the API URL.

Why: Better Auth uses baseURL to construct links in transactional emails (email verification links, password reset links). These links must point to the frontend — clicking a verification link should open the web app, not the API.

The frontend proxies all /api/auth/* requests to NestJS, so Better Auth's verification can work through the frontend URL even though it runs in NestJS.

Confusing naming: BETTER_AUTH_URL is used by both the API and web containers but means different things:

  • API: set to frontend URL (for email links)
  • Web: set to API URL (for getServerSession() to call)

See Environment Variables for the confusion this causes.


Decision: The Next.js 2FA verify route manually signs and sets the session cookie instead of proxying Better Auth's response.

Why: [email protected] has a bug: headers.set('set-cookie') overwrites previous Set-Cookie values instead of appending. When verifyTotp needs to set both the session token cookie and the 2FA state cookie, only the last one survives — the user ends up without a valid session after successful 2FA.

The workaround: the route handler reads the TOTP verification response, extracts the session token, signs it using BETTER_AUTH_SECRET, and manually sets the __Secure-better-auth.session_token cookie.

Why not fix it upstream? The bug is in a transitive dependency of Better Auth. Upstream fix would require a Better Auth patch release. The workaround is isolated to a single file.

Risk: The workaround duplicates cookie signing logic from Better Auth. If Better Auth changes its cookie format or signing algorithm, this file must be updated.


Why Three-Axis Order Status?

Decision: Orders have three independent status fields: orderStatus, paymentStatus, fulfillmentStatus.

Why: These three dimensions evolve independently:

  • A quote can be confirmed (business-wise) before payment is initiated
  • A build can be in_progress before paymentStatus is paid (staff start building after confirmation, not after payment, at their discretion)
  • fulfillmentStatus can be shipped while paymentStatus is still processing a refund

Collapsing all states into a single enum would require a combinatorial explosion of states, or would make invalid state combinations expressible.

Trade-off: More complex UI (three separate badges). Status transitions are validated independently — a staff member can't accidentally skip required steps.


Why pnpm deploy --filter=api --prod --legacy?

Decision: Use pnpm deploy for production dependency installation in the Dockerfile rather than copying node_modules.

Why:

  • pnpm deploy produces a clean, flat node_modules without the pnpm virtual store structure
  • Without it, the Docker image either contains all dev dependencies (too large) or has symlinks to the pnpm store that don't exist in the image
  • --legacy is required because the lockfile is at the monorepo root, not at the package level

--legacy flag: Without it, pnpm may fail with "This command must be run in the root of a workspace project" or similar. It tells pnpm to use the parent workspace's lockfile for dependency resolution.


Why Hetzner Object Storage Instead of AWS S3?

Decision: File storage uses Hetzner Object Storage (S3-compatible API).

Why:

  1. Cost: Hetzner is ~10x cheaper than S3 for storage + egress at our scale
  2. Data sovereignty: All files stay in EU (Germany/Finland) data centers — GDPR compliant
  3. No code changes needed: The AWS SDK with endpoint + forcePathStyle: true works identically with Hetzner's S3-compatible API
  4. Proximity: Storage and compute on the same provider — no cross-provider egress fees

Trade-off: Hetzner Object Storage lacks some advanced S3 features (lifecycle policies, advanced replication). Not a concern at current scale.

Migration path: If we need to move to S3 in the future, changing STORAGE_ENDPOINT, STORAGE_REGION, STORAGE_ACCESS_KEY, STORAGE_SECRET_KEY is sufficient — zero code changes.


Why Self-Hosted Umami Instead of Google Analytics?

Decision: Use self-hosted Umami (running on Iris) for web analytics.

Why:

  1. GDPR compliance: Umami in cookieless mode doesn't set cookies or process personal data, making it arguably exempt from consent requirements under GDPR Recital 173
  2. No consent banner needed for analytics: Removes the need for a cookie consent dialog for the analytics tool (we still have one for other purposes — legal copy TBD)
  3. Privacy: No data sent to Google's servers
  4. Cost: Free (self-hosted, runs on existing Iris server)

Trade-off: Less powerful than GA4 (no user journey analysis, limited segmentation). For a boutique service with low traffic, basic page views and session counts are sufficient.


Why Migration 9 is a Full Baseline?

Decision: Rather than maintaining a linear chain of 9 migrations, migration 9 is a complete schema snapshot. Migrations 1–8 are marked as applied without running them on fresh databases.

Why: During development, the schema underwent multiple major refactors. Running all 9 migrations sequentially would create intermediate states that the current application doesn't support. Migrations 1–8 create tables or columns that migrations 4–8 then drop or rename.

A baseline migration approach:

  • Fresh environments run only migration 9 (the complete schema)
  • Existing environments that have already run migrations 1–8 apply only migration 9's incremental changes
  • Avoids the "migration 3 creates a table that migration 7 drops" problem

Trade-off: New environments require a manual step to mark migrations 1–8 as applied. Documented in Database runbook.


Why Is proxy.ts a "Routing Hint" Not a Security Gate?

Decision: The Next.js middleware (proxy.ts) only checks cookie presence and makes routing decisions — it does not validate sessions or enforce permissions.

Why: The Next.js Edge Runtime (where middleware runs) does not have:

  • Access to the NestJS session validation logic
  • A reliable way to query the database
  • Cryptographic session token validation without importing the full Better Auth library

Even if middleware could validate sessions, it would only be an additional layer — the API must always validate the session independently because any authenticated API call could come from a non-browser client.

Security model: Security is the API's responsibility. The middleware exists to give users the right UX (redirect to login rather than showing a blank page). A session cookie that passes the cookie presence check but is expired or tampered will be rejected by SessionGuard when the next API call is made.


Why Are Order Line Items Snapshot-Based?

Decision: When a component or service is added to an order, its price and display name are stored as a snapshot on the order item, not as a live reference to the catalog price.

Why:

  1. Price stability: Catalog prices change over time. A customer who accepted a quote for €2,500 should not see their order total change because a component's catalog price was updated
  2. Legal: Quoted prices are contractual — they must not change after the customer accepts
  3. Historical accuracy: Order history remains accurate even after catalog items are discontinued, renamed, or repriced

Trade-off: If the catalog price changes, existing order items are not automatically updated. Staff must manually update order items if a price negotiation happens after quoting.

The catalog reference (componentId, etc.) is stored alongside the snapshot for reference, but is nullable — the snapshot fields are what matter for billing.