Realtime Interview

Realtime Interview Operations Guide

This document covers everything operators need to ship the OpenAI-powered realtime voice interview: the OpenAI console setup, the env vars, the kill switches, the integrity defenses, the rollout plan, and a rollback playbook.

Scope: the realtime voice flow only. General API / landing page / mobile release processes live in docs/DEPLOYMENT-SOP.md and docs/08-PRODUCTION-READINESS.md.


1. OpenAI console checklist

Complete every item below before flipping EXPO_PUBLIC_REALTIME_ENABLED to true in a public build.

  1. Enable billing on the OpenAI account. Realtime is a paid-only tier — there is no free quota.
  2. Create a dedicated project named civix-interview-realtime. A separate project keeps billing, logs, rate limits, and scope-creep blast radius cleanly partitioned.
  3. Generate a project-scoped API key with only Realtime + Chat Completions permissions. If the organization allows it, restrict the key to the Vercel egress IPs for the production deployment.
  4. Set a hard monthly spending cap under Project → Limits. Recommendation: $50/mo for the beta; raise in $50 increments per week once usage is understood.
  5. Set project-level rate limits. Conservative starting point: 10 RPM and the platform default TPM. This prevents a single leak / bug from blowing through the monthly cap in minutes.
  6. Pick the default voice: cedar is our shipped default — OpenAI's recommended best-quality voice alongside marin. Full roster: alloy | ash | ballad | cedar | coral | echo | sage | shimmer | verse | marin. Configure via OPENAI_REALTIME_VOICE.
  7. Enable the Moderation API for the project. It's free and usually on by default; verify under Project → Settings.
  8. Enable request/response logging under Project → Data Controls for the first 30 days. Useful for abuse triage. Disable or shorten retention after the beta.
  9. Create a Usage Alert webhook under Organization → Usage at 50% and 80% of the spending cap. Point it at the internal Slack webhook (or a dedicated email).
  10. Create a separate dev API key inside a smaller civix-realtime-dev project (with a $10 cap) for staging / EAS preview builds. Use OPENAI_API_KEY_DEV.
  11. Apply for the Realtime Scale tier once usage approaches the default rate limits. Approval is region-dependent and can take several business days — start the application before you need it.

1b. Premium entitlement (you set this up AFTER console setup)

Realtime voice is paid-only. Free users hit REALTIME_PREMIUM_REQUIRED (403).

  • Free tier: 0 realtime sessions.
  • Paid tier ($20/month via RevenueCat): up to OPENAI_REALTIME_PAID_SESSIONS_PER_HOUR sessions per hour (default 3). Still gated by per-user monthly cost cap and global daily cap.
  • TestFlight reviewer bypass: unlimited (existing x-testflight-bypass header).

MVP tier detection (Phase 1): the mobile client sends an advisory x-civix-tier: free | paid header derived from the local isPremiumUser() store helper (backed by RevenueCat + test bypass + sadaqah unlock). The backend treats this header as advisory — a malicious user could flip it, but every cost-enforcement layer (per-user monthly USD cap, global daily USD cap, hourly session starts, daily minutes) still applies, so the worst-case financial exposure is bounded by our existing realtime-limits.ts.

Phase 2 hardening (post-launch): server-side RevenueCat receipt validation. The mobile client will post a receipt JWT to a new /api/entitlement route; the server validates via RevenueCat's server API and returns a signed x-civix-entitlement: <jwt> header the mobile client attaches to every subsequent realtime request. Server verifies the JWT signature instead of trusting the advisory header.


2. Environment variables

Mobile (apps/mobile/.env.example)

VarDefaultDescription
EXPO_PUBLIC_API_URLhttps://api.civixapp.usInterview API base URL. Local dev should point at http://localhost:3000 or an ngrok tunnel.
EXPO_PUBLIC_REVENUECAT_IOS_API_KEYiOS RevenueCat key (required for paywall).
EXPO_PUBLIC_REVENUECAT_ANDROID_API_KEYAndroid RevenueCat key.
EXPO_PUBLIC_REALTIME_ENABLEDtrueKill-switch for the realtime voice flow.
EXPO_PUBLIC_HANDWRITING_ENABLEDtrueKill-switch for handwriting canvas + OCR.
EXPO_PUBLIC_NEW_INTERVIEW_UIfalseForce the new interview UI for all users.
EXPO_PUBLIC_NATIVE_POLISH_ENABLEDtrueHaptics / blur / native animations.
EXPO_PUBLIC_MOCK_REALTIMEfalseUse the local mock realtime transport (demos / E2E).
EXPO_PUBLIC_SENTRY_DSNSentry DSN for release crash reporting.
EXPO_PUBLIC_ENABLE_TEST_BYPASSfalseTestFlight QA bypass (never set in App Store builds).
EXPO_PUBLIC_TEST_BYPASS_CODETestFlight access code (min 8 chars).
EXPO_PUBLIC_TESTFLIGHT_INTERVIEW_BYPASSfalseAllow interview rate-limit bypass for reviewers.
EXPO_PUBLIC_TESTFLIGHT_BYPASS_TOKENShared-secret token matching TESTFLIGHT_BYPASS_SECRET.

Backend (apps/web/.env.example)

VarRequiredDescription
OPENAI_API_KEYyesProduction OpenAI key. Used for realtime + OCR.
OPENAI_API_KEY_DEVnoDev/staging key used on preview deployments.
OPENAI_REALTIME_MODELyesModel alias. Default gpt-realtime (GA, recommended). Cheaper variant: gpt-realtime-mini. Legacy preview: gpt-4o-realtime-preview.
OPENAI_REALTIME_VOICEyesDefault voice. Default echo. See step 6 for the full roster.
OPENAI_REALTIME_TRANSCRIBE_MODELnoTranscription model for the applicant's audio. Default gpt-4o-mini-transcribe (cheapest quality-first). Alternatives: whisper-1 (budget), gpt-4o-transcribe (best quality).
OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAYyesPer-user daily cap in minutes (default 15).
OPENAI_REALTIME_GLOBAL_DAILY_USD_CAPyesGlobal daily USD cap (default 100).
OPENAI_REALTIME_PAID_SESSIONS_PER_HOURnoSessions-per-hour cap for paid users. Default 3.
OPENAI_REALTIME_PAID_SESSIONS_WINDOWnoWindow for the above limit (Upstash syntax). Default 1 h.
OPENAI_REALTIME_SESSION_MAX_MINUTESnoHard wall-clock cap per session. Default 25.
OPENAI_MODERATION_ENABLEDyesRun inputs through the Moderation API.
ADMIN_TOKENyesBearer token guarding /api/admin/*.
UPSTASH_REDIS_URLnoUpstash Redis URL for rate-limit + usage counters.
UPSTASH_REDIS_TOKENnoUpstash Redis token.
TESTFLIGHT_BYPASS_ENABLEDnoEnable TestFlight reviewer bypass server-side.
TESTFLIGHT_BYPASS_SECRETnoMust match EXPO_PUBLIC_TESTFLIGHT_BYPASS_TOKEN.

2b. Cost & model selection cheat sheet

Approximate OpenAI pricing as of Q2 2026 (verify on the current pricing page before launch). Our per-session cost estimate in realtime-token/route.ts (ESTIMATED_SESSION_COST_USD = 0.30) assumes ~2 minutes of conversation on gpt-realtime with gpt-4o-mini-transcribe.

Realtime conversation models

ModelAudio input / 1MAudio output / 1MLatencyUse for
gpt-realtime (GA)~$32~$64lowDefault. Best quality / latency balance.
gpt-realtime-mini~$10~$20lowestCost-sensitive deployments (set via OPENAI_REALTIME_MODEL).
gpt-4o-realtime-preview~$40~$80mediumLegacy / A-B tests. Do not run in prod.

Transcription models (applicant audio)

ModelPrice / minUse for
gpt-4o-mini-transcribe (default)~$0.003Cheapest quality-first — ship here.
whisper-1~$0.006Fallback when mini hits rate limits.
gpt-4o-transcribe~$0.006Highest quality for noisy audio.

Voices (no price difference)

All voices bill at the same output rate as the conversation model. Roster: alloy, ash, ballad, cedar (playground default), coral, echo (our default), sage, shimmer, verse. Voice is a stylistic choice only — pick based on your brand persona.


3. Feature flags

All kill-switches live in apps/mobile/lib/feature-flags.ts. Values are read from process.env.EXPO_PUBLIC_* at build time.

FlagDefaultFlip when
EXPO_PUBLIC_REALTIME_ENABLEDtrueOpenAI Realtime outage, cost spike, abuse wave. Set to 'false' to hide the realtime button entirely; the interview falls back to text.
EXPO_PUBLIC_HANDWRITING_ENABLEDtrueOCR provider outage, misrecognition regression. Set to 'false' to force the typed-input fallback.
EXPO_PUBLIC_NEW_INTERVIEW_UIfalseFlip to 'true' for cohorts currently flagged into the legacy UI.
EXPO_PUBLIC_NATIVE_POLISH_ENABLEDtrueIsolating a haptics / blur / animation regression.
EXPO_PUBLIC_MOCK_REALTIMEfalseDemos, Maestro E2E, airplane-mode dev. Never enable in production builds.

Because these are compile-time inlined, a hard flip requires an OTA update (eas update --branch production) or a new EAS build. Server-side kill switches (e.g. OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=0) take effect immediately and should be the first lever in an incident.


4. Integrity defense map

Attack vectorDetection layerCode reference
Prompt injection (user tries to override system rules)System-prompt hardening + output regex gateapps/web/lib/interview/prompts.ts, apps/web/lib/interview/output-guard.ts
Answer fishing (user tries to extract civics answer keys)Answer-leak detector in the realtime transcript streamapps/web/lib/interview/answer-leak-detector.ts
Cost abuse (same user spins up many sessions)Upstash per-user daily minute cap + global USD capapps/web/lib/realtime/usage.ts, env OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY, OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP
Off-topic / disallowed contentOpenAI Moderation API + keyword blocklistapps/web/lib/interview/moderation.ts, env OPENAI_MODERATION_ENABLED
Session loops / stuck retriesPer-session turn cap + idle timeoutapps/web/lib/interview/session-limits.ts
Endpoint enumeration / brute forceUpstash IP-based rate limitsapps/web/lib/rate-limit.ts, env UPSTASH_REDIS_URL
Leaked API key reuseProject-scoped OpenAI key + IP restrictionOpenAI console (Project → Keys)
Runaway cost from bugHard monthly cap + 50% / 80% usage-alert webhookOpenAI console (Project → Limits, Org → Usage)
Officer leaks mid-interview feedback ("correct", "good")detectFeedbackLeak + server injects correction, client-side lint mirrorapps/web/lib/interview/answer-leak-detector.ts
Runaway session (cost / UX)25-min wall-clock cap + soft warn at 22 min + server-enforced OPENAI_REALTIME_SESSION_MAX_MINUTESapps/mobile/lib/realtime/useRealtimeOfficer.ts + apps/web/lib/interview/realtime-limits.ts
Model fabricates pass/failTool sandbox strips outcome from end_interview; server computes verdict from save_evidence aggregationapps/web/lib/interview/tool-sandbox.ts + apps/web/lib/interview/verdict-builder.ts

If a path above does not yet exist in the repo (some modules are owned by the backend agent), treat it as the intended location and keep this table in sync with the PR that ships the module.


5. Rollout plan

Each phase is gated by the previous one. Every phase has a server-side kill switch (set OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=0) and a client kill switch (OTA-flip EXPO_PUBLIC_REALTIME_ENABLED=false).

Phase 1 — Internal dogfooding (week 1)

  • Audience: ≤ 10 engineers + PM, signed in with allowlisted emails.
  • Build: eas build --profile preview.
  • Flags: EXPO_PUBLIC_REALTIME_ENABLED=true, EXPO_PUBLIC_MOCK_REALTIME=false, EXPO_PUBLIC_NEW_INTERVIEW_UI=true.
  • Server caps: OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=30, OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=20.
  • Kill switch:
    vercel env rm OPENAI_API_KEY production && vercel --prod
    

Phase 2 — TestFlight beta, realtime disabled (week 2)

  • Audience: TestFlight / Google Play internal testers (~100 users).
  • Build: eas build --profile testflight.
  • Flags: EXPO_PUBLIC_REALTIME_ENABLED=false — UI surfaces everything except the realtime button. Validate text-mode interview end to end.
  • Kill switch (emergency backout):
    eas update --branch production --message "disable realtime UI"
    

Phase 3 — TestFlight beta, realtime enabled with tight limits (weeks 3–4)

  • Audience: Same TestFlight group.
  • Flags: EXPO_PUBLIC_REALTIME_ENABLED=true.
  • Server caps: OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=10, OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=30.
  • Monitoring: Sentry, Datadog cost widget, Slack usage-alert webhook.
  • Kill switch: flip server cap to 0 and ship an OTA in parallel.

Phase 4 — Public release

  • Audience: App Store + Play Store.

  • Flags: EXPO_PUBLIC_REALTIME_ENABLED=true — unchanged.

  • Server caps: OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=15, OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=100.

  • Kill switches (in priority order):

    1. Server: OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=0 (instant, no redeploy).
    2. Server: vercel env rm OPENAI_API_KEY production && vercel --prod.
    3. Client: eas update --branch production with the flag flipped.
  • Phase 2a — soft-gated beta: free users see REALTIME_PREMIUM_REQUIRED upsell sheet. Paid users get 3 sessions/hour. Monitor per-user monthly USD cap.

  • Phase 2b — real receipt validation: replace advisory x-civix-tier header with server-side RevenueCat receipt validation + signed JWT entitlement header.

  • Phase 2c — rate tuning: based on observed abuse / cost, tune OPENAI_REALTIME_PAID_SESSIONS_PER_HOUR up or down without redeploying.


5b. Post-interview verdict contract

At the end of each realtime session, the mobile client POSTs to /api/interview/realtime-verdict with { sessionId, mode, phase }. The server aggregates every save_evidence event saved under civix:realtime:evidence:${sessionId} (Redis, 24h TTL) and computes a USCIS-rules outcome.

Response payload:

FieldTypeNotes
outcome'pass' | 'fail' | 'continued' | 'timeout_incomplete'Server-authoritative. Model never invents this.
perPhaseRecord<Phase, { status, ... }>Per-USCIS rules: civics >=6/10, reading 1-of-3, writing 1-of-3, identity/n400/oath subjective.
officerSummarystringOne neutral paragraph; read aloud by the officer voice via conversation.item.create on the open WebSocket.
coachTipsstring[]Phase-failure-derived tips; NEVER leaks specific answers.
durationSecnumberElapsed wall-clock.
sessionIdstringEcho.

The mobile client narrates officerSummary back to the user via the open WS (the officer speaks the verdict), then closes the call sheet and navigates to InterviewResults with the full payload rendered in a per-phase grid.

Idempotency: the verdict is persisted under civix:realtime:verdict:${sessionId} (24h TTL). Re-POSTs return the cached payload.


6. Known risks + rollback playbook

RiskDetectionMitigationRollback action
Cost spike (runaway loop, viral session)OpenAI usage alert at 50% / 80%; Datadog cost widgetPer-user minute cap, global USD cap, project monthly capSet OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=0 → redeploy Vercel; OTA flip EXPO_PUBLIC_REALTIME_ENABLED=false.
Prompt-injection jailbreakSentry regex trigger on forbidden output; manual transcript reviewSystem prompt, output guard, moderation APIServer-side disable the affected prompt mode; ship a prompt hotfix and redeploy.
Moderation false-positive spikeDrop in completion rate, user reportsKeep OPENAI_MODERATION_ENABLED=true but log overridesLower moderation threshold temporarily; push fix to classifier wrapper.
Realtime model latency regressionp95 first-token > 3s in DatadogFailover to text-mode in the same sessionOTA flip EXPO_PUBLIC_REALTIME_ENABLED=false while investigating.
OpenAI outage5xx burst on /api/interview/realtime/*Automatic client fallback to text modeNo action if fallback works; otherwise OTA flip realtime off.
Handwriting OCR outage / regressionOCR accuracy alert, user support ticketsTyped-input fallback always availableOTA flip EXPO_PUBLIC_HANDWRITING_ENABLED=false.
API key leakGitHub secret scanning, OpenAI unusual-usage alertProject-scoped keys with IP restrictionRotate key in OpenAI console, update OPENAI_API_KEY in Vercel, redeploy. Invalidate the compromised project.
Upstash outage (rate limits open)Upstash dashboard, Sentry 5xx spike on rate-limit pathGraceful degrade: deny on failure if flag set, allow otherwiseTemporarily enable deny-on-error mode in apps/web/lib/rate-limit.ts; redeploy.

7. On-call cheat sheet

# Check current server caps
vercel env ls production | rg OPENAI_

# Pause realtime (server, instant)
vercel env add OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP production <<< "0"
vercel --prod

# Pause realtime (client, ~1 min to propagate via OTA)
cd apps/mobile
EXPO_PUBLIC_REALTIME_ENABLED=false eas update --branch production --message "disable realtime UI"

# Rotate the OpenAI key
vercel env rm OPENAI_API_KEY production
vercel env add OPENAI_API_KEY production  # paste new key
vercel --prod

Keep this document in sync with any change to the realtime flow, the flags, or the server caps. It is the single entry point new on-call engineers are pointed at during realtime incidents.