Operations¶
How to run a LUONVUITUOI-CERT deployment day-to-day: health probes, log surfaces, KV backend choice, session revocation, and the incident checklist.
Health probe¶
Returns {"ok": true} with HTTP 200. The endpoint:
- Does not read the database.
- Does not write to KV.
- Does not hit the rate limiter.
- Has no auth requirement.
Docker HEALTHCHECK, Kubernetes liveness, and load-balancer probes should all target this path. Older deploys that probed POST /api/captcha as a health signal should switch β every probe minted a KV entry and pushed the rate-limit bucket.
KV backends¶
KV_BACKEND decides where ephemeral state lives: CAPTCHA challenges, rate-limit counters, OTP codes, magic-link hashes, JWT denylist. Pick one:
| Backend | When to use | Multi-worker safe? |
|---|---|---|
local |
Local dev, single-container Docker | No β file-lock only scopes inside one process. Startup logs a warning when WEB_CONCURRENCY > 1. |
upstash |
Anywhere β recommended for production | Yes β Redis atomic GETDEL backs kv.consume(). |
vercel-kv |
Vercel deploys | Yes β auto-injected via the Vercel KV integration. |
If you run gunicorn with multiple workers against local, watch the startup log for:
Either drop to 1 worker (WEB_CONCURRENCY=1) or switch to upstash.
Logs¶
Everything flows through the stdlib logging module at WARNING and above. In Docker, those go to stdout and are captured by docker logs. On Vercel, vercel logs --follow streams them.
Loud messages worth alerting on:
| Logger | Message | Meaning |
|---|---|---|
luonvuitoi_cert_cli.server.app |
RESEND_API_KEY not set |
OTP/magic-link emails are being silently dropped. |
luonvuitoi_cert.storage.kv.factory |
KV_BACKEND=local with N workers is unsafe |
Concurrent workers racing on a single file KV. |
luonvuitoi_cert.auth.activity_log |
must be https:// |
Someone set GSHEET_WEBHOOK_URL to a non-HTTPS target; forwarding is disabled. |
luonvuitoi_cert.auth.activity_log |
activity log webhook POST failed |
The GSheet endpoint is down. Local SQLite record is still authoritative. |
Handled errors (rate-limit 429s, CAPTCHA rejections, 404 searches) are not logged β they're normal traffic.
Audit log¶
Admin actions land in the admin_activity SQLite table:
| Column | Notes |
|---|---|
id |
UUID4 per entry. |
timestamp |
ISO-8601 UTC (YYYY-MM-DDTHH:MM:SSZ). |
user_id / user_email |
JWT sub + email claims. |
action |
admin.login.success, admin.login.failure, student.update, shipment.upsert, etc. |
target_id |
Subject of the change (e.g. students:12345). |
metadata |
JSON blob β never includes PII; student.update records {column, changed, value_length_delta}, not raw old/new values. |
ip |
Client IP (honors TRUST_PROXY_HEADERS). |
If GSHEET_WEBHOOK_URL is set (and https://), each entry is POSTed fire-and-forget to the sheet on a bounded ThreadPoolExecutor(4). The SQLite write is authoritative β webhook failures never break admin flows.
Query recipes¶
sqlite3 data/portal.db "SELECT timestamp, user_email, action, target_id FROM admin_activity ORDER BY timestamp DESC LIMIT 20;"
# Failed logins in the last hour:
sqlite3 data/portal.db "SELECT timestamp, user_email, metadata FROM admin_activity WHERE action = 'admin.login.failure' AND timestamp > datetime('now', '-1 hour');"
Session revocation¶
See admin-auth.md for the full flow. Summary:
POST /api/admin/logoutwith the user's current JWT βjtiadded to the KV denylist with TTL matching the token's remaining life.- Any endpoint that threads
kv=intoverify_admin_tokenwill reject the token. - Denylist self-expires β no cron needed.
Use this instead of rotating JWT_SECRET (which invalidates every session at once and requires all admins to log back in).
Public-surface feature gates¶
Two super-admin-only toggles control whether the public surfaces are live. State is stored in KV so flips take effect immediately, no redeploy.
- Lookup toggle β gates
POST /api/search(student mode). When off, the endpoint returns503with{"error": "public lookup is currently disabled by the operator"}. Admin mode (mode=admin, JWT required) is unaffected so operators keep working during a freeze. - Download toggle β gates
POST /api/download(student mode). When off, the endpoint returns503with a download-specific message. Admin mode unaffected.
Invariant: download requires lookup. Lookup off β download forced off. This is enforced both on write (the toggle API clamps the request) and on read (the gate checks both flags).
How to flip¶
Sign in as super-admin β the Public surface card appears above the student lookup form β tick the boxes β Save. The download checkbox auto-disables when lookup is off.
REST equivalent:
curl -X POST https://your.host/api/admin/features/update \
-H 'Content-Type: application/json' \
-d '{"token": "<super-admin JWT>", "lookup_enabled": false, "download_enabled": false}'
Only the super-admin role is accepted; admin / viewer get 403. Every flip is recorded in the audit log as admin.features.update with the new state in metadata.
Defaults¶
Fresh deploys: both on (pre-gate behavior). The KV is only written when a super-admin first saves β a never-touched deploy reads the baked-in default.
Updating dependencies¶
Dependabot scans weekly (pip) and monthly (github-actions, docker) and opens PRs. Review and merge them promptly β reportlab, pypdf, and cryptography are supply-chain targets.
Backing up¶
# Stop the container for a consistent snapshot:
docker compose stop
tar czf "backup-$(date +%Y%m%d).tar.gz" data/
docker compose start
Or with the container running, use SQLite's online backup:
The DB is the only stateful artifact β the KV entries (rate-limit counters, CAPTCHA challenges) are ephemeral.
Incident checklist¶
One admin's session looks compromised:
POST /api/admin/logoutwith their token (revokes the JTI).UPDATE admin_users SET is_active = 0 WHERE email = 'β¦';(belt-and-suspenders).- Rotate the admin's password (or reset via OTP).
- Grep
admin_activityfortarget_idthey touched in the last window.
JWT_SECRET leaked:
- Generate a new secret.
- Redeploy with the new value (invalidates every session).
- Rotate the QR signing key too if the secret leaked via the same channel β
private_key.pemis a separate artifact but often co-located.