Realtime Interview
Realtime Interview Operations Guide
This document covers everything operators need to ship the OpenAI-powered realtime voice interview: the OpenAI console setup, the env vars, the kill switches, the integrity defenses, the rollout plan, and a rollback playbook.
Scope: the realtime voice flow only. General API / landing page / mobile release processes live in
docs/DEPLOYMENT-SOP.mdanddocs/08-PRODUCTION-READINESS.md.
1. OpenAI console checklist
Complete every item below before flipping EXPO_PUBLIC_REALTIME_ENABLED
to true in a public build.
- Enable billing on the OpenAI account. Realtime is a paid-only tier — there is no free quota.
- Create a dedicated project named
civix-interview-realtime. A separate project keeps billing, logs, rate limits, and scope-creep blast radius cleanly partitioned. - Generate a project-scoped API key with only
Realtime+Chat Completionspermissions. If the organization allows it, restrict the key to the Vercel egress IPs for the production deployment. - Set a hard monthly spending cap under Project → Limits. Recommendation: $50/mo for the beta; raise in $50 increments per week once usage is understood.
- Set project-level rate limits. Conservative starting point: 10 RPM and the platform default TPM. This prevents a single leak / bug from blowing through the monthly cap in minutes.
- Pick the default voice:
cedaris our shipped default — OpenAI's recommended best-quality voice alongsidemarin. Full roster:alloy | ash | ballad | cedar | coral | echo | sage | shimmer | verse | marin. Configure viaOPENAI_REALTIME_VOICE. - Enable the Moderation API for the project. It's free and usually on by default; verify under Project → Settings.
- Enable request/response logging under Project → Data Controls for the first 30 days. Useful for abuse triage. Disable or shorten retention after the beta.
- Create a Usage Alert webhook under Organization → Usage at 50% and 80% of the spending cap. Point it at the internal Slack webhook (or a dedicated email).
- Create a separate dev API key inside a smaller
civix-realtime-devproject (with a $10 cap) for staging / EAS preview builds. UseOPENAI_API_KEY_DEV. - Apply for the Realtime Scale tier once usage approaches the default rate limits. Approval is region-dependent and can take several business days — start the application before you need it.
1b. Premium entitlement (you set this up AFTER console setup)
Realtime voice is paid-only. Free users hit REALTIME_PREMIUM_REQUIRED (403).
- Free tier: 0 realtime sessions.
- Paid tier ($20/month via RevenueCat): up to
OPENAI_REALTIME_PAID_SESSIONS_PER_HOURsessions per hour (default 3). Still gated by per-user monthly cost cap and global daily cap. - TestFlight reviewer bypass: unlimited (existing
x-testflight-bypassheader).
MVP tier detection (Phase 1): the mobile client sends an advisory x-civix-tier: free | paid header derived from the local isPremiumUser() store helper (backed by RevenueCat + test bypass + sadaqah unlock). The backend treats this header as advisory — a malicious user could flip it, but every cost-enforcement layer (per-user monthly USD cap, global daily USD cap, hourly session starts, daily minutes) still applies, so the worst-case financial exposure is bounded by our existing realtime-limits.ts.
Phase 2 hardening (post-launch): server-side RevenueCat receipt validation. The mobile client will post a receipt JWT to a new /api/entitlement route; the server validates via RevenueCat's server API and returns a signed x-civix-entitlement: <jwt> header the mobile client attaches to every subsequent realtime request. Server verifies the JWT signature instead of trusting the advisory header.
2. Environment variables
Mobile (apps/mobile/.env.example)
| Var | Default | Description |
|---|---|---|
EXPO_PUBLIC_API_URL | https://api.civixapp.us | Interview API base URL. Local dev should point at http://localhost:3000 or an ngrok tunnel. |
EXPO_PUBLIC_REVENUECAT_IOS_API_KEY | — | iOS RevenueCat key (required for paywall). |
EXPO_PUBLIC_REVENUECAT_ANDROID_API_KEY | — | Android RevenueCat key. |
EXPO_PUBLIC_REALTIME_ENABLED | true | Kill-switch for the realtime voice flow. |
EXPO_PUBLIC_HANDWRITING_ENABLED | true | Kill-switch for handwriting canvas + OCR. |
EXPO_PUBLIC_NEW_INTERVIEW_UI | false | Force the new interview UI for all users. |
EXPO_PUBLIC_NATIVE_POLISH_ENABLED | true | Haptics / blur / native animations. |
EXPO_PUBLIC_MOCK_REALTIME | false | Use the local mock realtime transport (demos / E2E). |
EXPO_PUBLIC_SENTRY_DSN | — | Sentry DSN for release crash reporting. |
EXPO_PUBLIC_ENABLE_TEST_BYPASS | false | TestFlight QA bypass (never set in App Store builds). |
EXPO_PUBLIC_TEST_BYPASS_CODE | — | TestFlight access code (min 8 chars). |
EXPO_PUBLIC_TESTFLIGHT_INTERVIEW_BYPASS | false | Allow interview rate-limit bypass for reviewers. |
EXPO_PUBLIC_TESTFLIGHT_BYPASS_TOKEN | — | Shared-secret token matching TESTFLIGHT_BYPASS_SECRET. |
Backend (apps/web/.env.example)
| Var | Required | Description |
|---|---|---|
OPENAI_API_KEY | yes | Production OpenAI key. Used for realtime + OCR. |
OPENAI_API_KEY_DEV | no | Dev/staging key used on preview deployments. |
OPENAI_REALTIME_MODEL | yes | Model alias. Default gpt-realtime (GA, recommended). Cheaper variant: gpt-realtime-mini. Legacy preview: gpt-4o-realtime-preview. |
OPENAI_REALTIME_VOICE | yes | Default voice. Default echo. See step 6 for the full roster. |
OPENAI_REALTIME_TRANSCRIBE_MODEL | no | Transcription model for the applicant's audio. Default gpt-4o-mini-transcribe (cheapest quality-first). Alternatives: whisper-1 (budget), gpt-4o-transcribe (best quality). |
OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY | yes | Per-user daily cap in minutes (default 15). |
OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP | yes | Global daily USD cap (default 100). |
OPENAI_REALTIME_PAID_SESSIONS_PER_HOUR | no | Sessions-per-hour cap for paid users. Default 3. |
OPENAI_REALTIME_PAID_SESSIONS_WINDOW | no | Window for the above limit (Upstash syntax). Default 1 h. |
OPENAI_REALTIME_SESSION_MAX_MINUTES | no | Hard wall-clock cap per session. Default 25. |
OPENAI_MODERATION_ENABLED | yes | Run inputs through the Moderation API. |
ADMIN_TOKEN | yes | Bearer token guarding /api/admin/*. |
UPSTASH_REDIS_URL | no | Upstash Redis URL for rate-limit + usage counters. |
UPSTASH_REDIS_TOKEN | no | Upstash Redis token. |
TESTFLIGHT_BYPASS_ENABLED | no | Enable TestFlight reviewer bypass server-side. |
TESTFLIGHT_BYPASS_SECRET | no | Must match EXPO_PUBLIC_TESTFLIGHT_BYPASS_TOKEN. |
2b. Cost & model selection cheat sheet
Approximate OpenAI pricing as of Q2 2026 (verify on the current pricing page
before launch). Our per-session cost estimate in realtime-token/route.ts
(ESTIMATED_SESSION_COST_USD = 0.30) assumes ~2 minutes of conversation on
gpt-realtime with gpt-4o-mini-transcribe.
Realtime conversation models
| Model | Audio input / 1M | Audio output / 1M | Latency | Use for |
|---|---|---|---|---|
gpt-realtime (GA) | ~$32 | ~$64 | low | Default. Best quality / latency balance. |
gpt-realtime-mini | ~$10 | ~$20 | lowest | Cost-sensitive deployments (set via OPENAI_REALTIME_MODEL). |
gpt-4o-realtime-preview | ~$40 | ~$80 | medium | Legacy / A-B tests. Do not run in prod. |
Transcription models (applicant audio)
| Model | Price / min | Use for |
|---|---|---|
gpt-4o-mini-transcribe (default) | ~$0.003 | Cheapest quality-first — ship here. |
whisper-1 | ~$0.006 | Fallback when mini hits rate limits. |
gpt-4o-transcribe | ~$0.006 | Highest quality for noisy audio. |
Voices (no price difference)
All voices bill at the same output rate as the conversation model. Roster:
alloy, ash, ballad, cedar (playground default), coral, echo
(our default), sage, shimmer, verse. Voice is a stylistic choice
only — pick based on your brand persona.
3. Feature flags
All kill-switches live in apps/mobile/lib/feature-flags.ts. Values are read
from process.env.EXPO_PUBLIC_* at build time.
| Flag | Default | Flip when |
|---|---|---|
EXPO_PUBLIC_REALTIME_ENABLED | true | OpenAI Realtime outage, cost spike, abuse wave. Set to 'false' to hide the realtime button entirely; the interview falls back to text. |
EXPO_PUBLIC_HANDWRITING_ENABLED | true | OCR provider outage, misrecognition regression. Set to 'false' to force the typed-input fallback. |
EXPO_PUBLIC_NEW_INTERVIEW_UI | false | Flip to 'true' for cohorts currently flagged into the legacy UI. |
EXPO_PUBLIC_NATIVE_POLISH_ENABLED | true | Isolating a haptics / blur / animation regression. |
EXPO_PUBLIC_MOCK_REALTIME | false | Demos, Maestro E2E, airplane-mode dev. Never enable in production builds. |
Because these are compile-time inlined, a hard flip requires an OTA update
(eas update --branch production) or a new EAS build. Server-side kill
switches (e.g. OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=0) take effect
immediately and should be the first lever in an incident.
4. Integrity defense map
| Attack vector | Detection layer | Code reference |
|---|---|---|
| Prompt injection (user tries to override system rules) | System-prompt hardening + output regex gate | apps/web/lib/interview/prompts.ts, apps/web/lib/interview/output-guard.ts |
| Answer fishing (user tries to extract civics answer keys) | Answer-leak detector in the realtime transcript stream | apps/web/lib/interview/answer-leak-detector.ts |
| Cost abuse (same user spins up many sessions) | Upstash per-user daily minute cap + global USD cap | apps/web/lib/realtime/usage.ts, env OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY, OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP |
| Off-topic / disallowed content | OpenAI Moderation API + keyword blocklist | apps/web/lib/interview/moderation.ts, env OPENAI_MODERATION_ENABLED |
| Session loops / stuck retries | Per-session turn cap + idle timeout | apps/web/lib/interview/session-limits.ts |
| Endpoint enumeration / brute force | Upstash IP-based rate limits | apps/web/lib/rate-limit.ts, env UPSTASH_REDIS_URL |
| Leaked API key reuse | Project-scoped OpenAI key + IP restriction | OpenAI console (Project → Keys) |
| Runaway cost from bug | Hard monthly cap + 50% / 80% usage-alert webhook | OpenAI console (Project → Limits, Org → Usage) |
| Officer leaks mid-interview feedback ("correct", "good") | detectFeedbackLeak + server injects correction, client-side lint mirror | apps/web/lib/interview/answer-leak-detector.ts |
| Runaway session (cost / UX) | 25-min wall-clock cap + soft warn at 22 min + server-enforced OPENAI_REALTIME_SESSION_MAX_MINUTES | apps/mobile/lib/realtime/useRealtimeOfficer.ts + apps/web/lib/interview/realtime-limits.ts |
| Model fabricates pass/fail | Tool sandbox strips outcome from end_interview; server computes verdict from save_evidence aggregation | apps/web/lib/interview/tool-sandbox.ts + apps/web/lib/interview/verdict-builder.ts |
If a path above does not yet exist in the repo (some modules are owned by the backend agent), treat it as the intended location and keep this table in sync with the PR that ships the module.
5. Rollout plan
Each phase is gated by the previous one. Every phase has a server-side kill
switch (set OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=0) and a client kill switch
(OTA-flip EXPO_PUBLIC_REALTIME_ENABLED=false).
Phase 1 — Internal dogfooding (week 1)
- Audience: ≤ 10 engineers + PM, signed in with allowlisted emails.
- Build:
eas build --profile preview. - Flags:
EXPO_PUBLIC_REALTIME_ENABLED=true,EXPO_PUBLIC_MOCK_REALTIME=false,EXPO_PUBLIC_NEW_INTERVIEW_UI=true. - Server caps:
OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=30,OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=20. - Kill switch:
vercel env rm OPENAI_API_KEY production && vercel --prod
Phase 2 — TestFlight beta, realtime disabled (week 2)
- Audience: TestFlight / Google Play internal testers (~100 users).
- Build:
eas build --profile testflight. - Flags:
EXPO_PUBLIC_REALTIME_ENABLED=false— UI surfaces everything except the realtime button. Validate text-mode interview end to end. - Kill switch (emergency backout):
eas update --branch production --message "disable realtime UI"
Phase 3 — TestFlight beta, realtime enabled with tight limits (weeks 3–4)
- Audience: Same TestFlight group.
- Flags:
EXPO_PUBLIC_REALTIME_ENABLED=true. - Server caps:
OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=10,OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=30. - Monitoring: Sentry, Datadog cost widget, Slack usage-alert webhook.
- Kill switch: flip server cap to
0and ship an OTA in parallel.
Phase 4 — Public release
-
Audience: App Store + Play Store.
-
Flags:
EXPO_PUBLIC_REALTIME_ENABLED=true— unchanged. -
Server caps:
OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=15,OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=100. -
Kill switches (in priority order):
- Server:
OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=0(instant, no redeploy). - Server:
vercel env rm OPENAI_API_KEY production && vercel --prod. - Client:
eas update --branch productionwith the flag flipped.
- Server:
-
Phase 2a — soft-gated beta: free users see
REALTIME_PREMIUM_REQUIREDupsell sheet. Paid users get 3 sessions/hour. Monitor per-user monthly USD cap. -
Phase 2b — real receipt validation: replace advisory
x-civix-tierheader with server-side RevenueCat receipt validation + signed JWT entitlement header. -
Phase 2c — rate tuning: based on observed abuse / cost, tune
OPENAI_REALTIME_PAID_SESSIONS_PER_HOURup or down without redeploying.
5b. Post-interview verdict contract
At the end of each realtime session, the mobile client POSTs to /api/interview/realtime-verdict with { sessionId, mode, phase }. The server aggregates every save_evidence event saved under civix:realtime:evidence:${sessionId} (Redis, 24h TTL) and computes a USCIS-rules outcome.
Response payload:
| Field | Type | Notes |
|---|---|---|
outcome | 'pass' | 'fail' | 'continued' | 'timeout_incomplete' | Server-authoritative. Model never invents this. |
perPhase | Record<Phase, { status, ... }> | Per-USCIS rules: civics >=6/10, reading 1-of-3, writing 1-of-3, identity/n400/oath subjective. |
officerSummary | string | One neutral paragraph; read aloud by the officer voice via conversation.item.create on the open WebSocket. |
coachTips | string[] | Phase-failure-derived tips; NEVER leaks specific answers. |
durationSec | number | Elapsed wall-clock. |
sessionId | string | Echo. |
The mobile client narrates officerSummary back to the user via the open WS (the officer speaks the verdict), then closes the call sheet and navigates to InterviewResults with the full payload rendered in a per-phase grid.
Idempotency: the verdict is persisted under civix:realtime:verdict:${sessionId} (24h TTL). Re-POSTs return the cached payload.
6. Known risks + rollback playbook
| Risk | Detection | Mitigation | Rollback action |
|---|---|---|---|
| Cost spike (runaway loop, viral session) | OpenAI usage alert at 50% / 80%; Datadog cost widget | Per-user minute cap, global USD cap, project monthly cap | Set OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=0 → redeploy Vercel; OTA flip EXPO_PUBLIC_REALTIME_ENABLED=false. |
| Prompt-injection jailbreak | Sentry regex trigger on forbidden output; manual transcript review | System prompt, output guard, moderation API | Server-side disable the affected prompt mode; ship a prompt hotfix and redeploy. |
| Moderation false-positive spike | Drop in completion rate, user reports | Keep OPENAI_MODERATION_ENABLED=true but log overrides | Lower moderation threshold temporarily; push fix to classifier wrapper. |
| Realtime model latency regression | p95 first-token > 3s in Datadog | Failover to text-mode in the same session | OTA flip EXPO_PUBLIC_REALTIME_ENABLED=false while investigating. |
| OpenAI outage | 5xx burst on /api/interview/realtime/* | Automatic client fallback to text mode | No action if fallback works; otherwise OTA flip realtime off. |
| Handwriting OCR outage / regression | OCR accuracy alert, user support tickets | Typed-input fallback always available | OTA flip EXPO_PUBLIC_HANDWRITING_ENABLED=false. |
| API key leak | GitHub secret scanning, OpenAI unusual-usage alert | Project-scoped keys with IP restriction | Rotate key in OpenAI console, update OPENAI_API_KEY in Vercel, redeploy. Invalidate the compromised project. |
| Upstash outage (rate limits open) | Upstash dashboard, Sentry 5xx spike on rate-limit path | Graceful degrade: deny on failure if flag set, allow otherwise | Temporarily enable deny-on-error mode in apps/web/lib/rate-limit.ts; redeploy. |
7. On-call cheat sheet
# Check current server caps
vercel env ls production | rg OPENAI_
# Pause realtime (server, instant)
vercel env add OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP production <<< "0"
vercel --prod
# Pause realtime (client, ~1 min to propagate via OTA)
cd apps/mobile
EXPO_PUBLIC_REALTIME_ENABLED=false eas update --branch production --message "disable realtime UI"
# Rotate the OpenAI key
vercel env rm OPENAI_API_KEY production
vercel env add OPENAI_API_KEY production # paste new key
vercel --prod
Keep this document in sync with any change to the realtime flow, the flags, or the server caps. It is the single entry point new on-call engineers are pointed at during realtime incidents.