Realtime Interview Operations Guide

This document covers everything operators need to ship the OpenAI-powered realtime voice interview: the OpenAI console setup, the env vars, the kill switches, the integrity defenses, the rollout plan, and a rollback playbook.

Scope: the realtime voice flow only. General API / landing page / mobile release processes live in docs/DEPLOYMENT-SOP.md and docs/08-PRODUCTION-READINESS.md.

1. OpenAI console checklist

Complete every item below before flipping EXPO_PUBLIC_REALTIME_ENABLED to true in a public build.

Enable billing on the OpenAI account. Realtime is a paid-only tier — there is no free quota.
Create a dedicated project named civix-interview-realtime. A separate project keeps billing, logs, rate limits, and scope-creep blast radius cleanly partitioned.
Generate a project-scoped API key with only Realtime + Chat Completions permissions. If the organization allows it, restrict the key to the Vercel egress IPs for the production deployment.
Set a hard monthly spending cap under Project → Limits. Recommendation: $50/mo for the beta; raise in $50 increments per week once usage is understood.
Set project-level rate limits. Conservative starting point: 10 RPM and the platform default TPM. This prevents a single leak / bug from blowing through the monthly cap in minutes.
Pick the default voice: cedar is our shipped default — OpenAI's recommended best-quality voice alongside marin. Full roster: alloy | ash | ballad | cedar | coral | echo | sage | shimmer | verse | marin. Configure via OPENAI_REALTIME_VOICE.
Enable the Moderation API for the project. It's free and usually on by default; verify under Project → Settings.
Enable request/response logging under Project → Data Controls for the first 30 days. Useful for abuse triage. Disable or shorten retention after the beta.
Create a Usage Alert webhook under Organization → Usage at 50% and 80% of the spending cap. Point it at the internal Slack webhook (or a dedicated email).
Create a separate dev API key inside a smaller civix-realtime-dev project (with a $10 cap) for staging / EAS preview builds. Use OPENAI_API_KEY_DEV.
Apply for the Realtime Scale tier once usage approaches the default rate limits. Approval is region-dependent and can take several business days — start the application before you need it.

1b. Premium entitlement (you set this up AFTER console setup)

Realtime voice is paid-only. Free users hit REALTIME_PREMIUM_REQUIRED (403).

Free tier: 0 realtime sessions.
Paid tier ($20/month via RevenueCat): up to OPENAI_REALTIME_PAID_SESSIONS_PER_HOUR sessions per hour (default 3). Still gated by per-user monthly cost cap and global daily cap.
TestFlight reviewer bypass: unlimited (existing x-testflight-bypass header).

MVP tier detection (Phase 1): the mobile client sends an advisory x-civix-tier: free | paid header derived from the local isPremiumUser() store helper (backed by RevenueCat + test bypass + sadaqah unlock). The backend treats this header as advisory — a malicious user could flip it, but every cost-enforcement layer (per-user monthly USD cap, global daily USD cap, hourly session starts, daily minutes) still applies, so the worst-case financial exposure is bounded by our existing realtime-limits.ts.

Phase 2 hardening (post-launch): server-side RevenueCat receipt validation. The mobile client will post a receipt JWT to a new /api/entitlement route; the server validates via RevenueCat's server API and returns a signed x-civix-entitlement: <jwt> header the mobile client attaches to every subsequent realtime request. Server verifies the JWT signature instead of trusting the advisory header.

2. Environment variables

Mobile (`apps/mobile/.env.example`)

Var	Default	Description
`EXPO_PUBLIC_API_URL`	`https://api.civixapp.us`	Interview API base URL. Local dev should point at `http://localhost:3000` or an ngrok tunnel.
`EXPO_PUBLIC_REVENUECAT_IOS_API_KEY`	—	iOS RevenueCat key (required for paywall).
`EXPO_PUBLIC_REVENUECAT_ANDROID_API_KEY`	—	Android RevenueCat key.
`EXPO_PUBLIC_REALTIME_ENABLED`	`true`	Kill-switch for the realtime voice flow.
`EXPO_PUBLIC_HANDWRITING_ENABLED`	`true`	Kill-switch for handwriting canvas + OCR.
`EXPO_PUBLIC_NEW_INTERVIEW_UI`	`false`	Force the new interview UI for all users.
`EXPO_PUBLIC_NATIVE_POLISH_ENABLED`	`true`	Haptics / blur / native animations.
`EXPO_PUBLIC_MOCK_REALTIME`	`false`	Use the local mock realtime transport (demos / E2E).
`EXPO_PUBLIC_SENTRY_DSN`	—	Sentry DSN for release crash reporting.
`EXPO_PUBLIC_ENABLE_TEST_BYPASS`	`false`	TestFlight QA bypass (never set in App Store builds).
`EXPO_PUBLIC_TEST_BYPASS_CODE`	—	TestFlight access code (min 8 chars).
`EXPO_PUBLIC_TESTFLIGHT_INTERVIEW_BYPASS`	`false`	Allow interview rate-limit bypass for reviewers.
`EXPO_PUBLIC_TESTFLIGHT_BYPASS_TOKEN`	—	Shared-secret token matching `TESTFLIGHT_BYPASS_SECRET`.

Backend (`apps/web/.env.example`)

Var	Required	Description
`OPENAI_API_KEY`	yes	Production OpenAI key. Used for realtime + OCR.
`OPENAI_API_KEY_DEV`	no	Dev/staging key used on preview deployments.
`OPENAI_REALTIME_MODEL`	yes	Model alias. Default `gpt-realtime` (GA, recommended). Cheaper variant: `gpt-realtime-mini`. Legacy preview: `gpt-4o-realtime-preview`.
`OPENAI_REALTIME_VOICE`	yes	Default voice. Default `echo`. See step 6 for the full roster.
`OPENAI_REALTIME_TRANSCRIBE_MODEL`	no	Transcription model for the applicant's audio. Default `gpt-4o-mini-transcribe` (cheapest quality-first). Alternatives: `whisper-1` (budget), `gpt-4o-transcribe` (best quality).
`OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY`	yes	Per-user daily cap in minutes (default `15`).
`OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP`	yes	Global daily USD cap (default `100`).
`OPENAI_REALTIME_PAID_SESSIONS_PER_HOUR`	no	Sessions-per-hour cap for paid users. Default `3`.
`OPENAI_REALTIME_PAID_SESSIONS_WINDOW`	no	Window for the above limit (Upstash syntax). Default `1 h`.
`OPENAI_REALTIME_SESSION_MAX_MINUTES`	no	Hard wall-clock cap per session. Default `25`.
`OPENAI_MODERATION_ENABLED`	yes	Run inputs through the Moderation API.
`ADMIN_TOKEN`	yes	Bearer token guarding `/api/admin/*`.
`UPSTASH_REDIS_URL`	no	Upstash Redis URL for rate-limit + usage counters.
`UPSTASH_REDIS_TOKEN`	no	Upstash Redis token.
`TESTFLIGHT_BYPASS_ENABLED`	no	Enable TestFlight reviewer bypass server-side.
`TESTFLIGHT_BYPASS_SECRET`	no	Must match `EXPO_PUBLIC_TESTFLIGHT_BYPASS_TOKEN`.

2b. Cost & model selection cheat sheet

Approximate OpenAI pricing as of Q2 2026 (verify on the current pricing page before launch). Our per-session cost estimate in realtime-token/route.ts (ESTIMATED_SESSION_COST_USD = 0.30) assumes ~2 minutes of conversation on gpt-realtime with gpt-4o-mini-transcribe.

Realtime conversation models

Model	Audio input / 1M	Audio output / 1M	Latency	Use for
`gpt-realtime` (GA)	~$32	~$64	low	Default. Best quality / latency balance.
`gpt-realtime-mini`	~$10	~$20	lowest	Cost-sensitive deployments (set via `OPENAI_REALTIME_MODEL`).
`gpt-4o-realtime-preview`	~$40	~$80	medium	Legacy / A-B tests. Do not run in prod.

Transcription models (applicant audio)

Model	Price / min	Use for
`gpt-4o-mini-transcribe` (default)	~$0.003	Cheapest quality-first — ship here.
`whisper-1`	~$0.006	Fallback when mini hits rate limits.
`gpt-4o-transcribe`	~$0.006	Highest quality for noisy audio.

Voices (no price difference)

All voices bill at the same output rate as the conversation model. Roster: alloy, ash, ballad, cedar (playground default), coral, echo (our default), sage, shimmer, verse. Voice is a stylistic choice only — pick based on your brand persona.

3. Feature flags

All kill-switches live in apps/mobile/lib/feature-flags.ts. Values are read from process.env.EXPO_PUBLIC_* at build time.

Flag	Default	Flip when
`EXPO_PUBLIC_REALTIME_ENABLED`	`true`	OpenAI Realtime outage, cost spike, abuse wave. Set to `'false'` to hide the realtime button entirely; the interview falls back to text.
`EXPO_PUBLIC_HANDWRITING_ENABLED`	`true`	OCR provider outage, misrecognition regression. Set to `'false'` to force the typed-input fallback.
`EXPO_PUBLIC_NEW_INTERVIEW_UI`	`false`	Flip to `'true'` for cohorts currently flagged into the legacy UI.
`EXPO_PUBLIC_NATIVE_POLISH_ENABLED`	`true`	Isolating a haptics / blur / animation regression.
`EXPO_PUBLIC_MOCK_REALTIME`	`false`	Demos, Maestro E2E, airplane-mode dev. Never enable in production builds.

Because these are compile-time inlined, a hard flip requires an OTA update (eas update --branch production) or a new EAS build. Server-side kill switches (e.g. OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=0) take effect immediately and should be the first lever in an incident.

4. Integrity defense map

Attack vector	Detection layer	Code reference
Prompt injection (user tries to override system rules)	System-prompt hardening + output regex gate	`apps/web/lib/interview/prompts.ts`, `apps/web/lib/interview/output-guard.ts`
Answer fishing (user tries to extract civics answer keys)	Answer-leak detector in the realtime transcript stream	`apps/web/lib/interview/answer-leak-detector.ts`
Cost abuse (same user spins up many sessions)	Upstash per-user daily minute cap + global USD cap	`apps/web/lib/realtime/usage.ts`, env `OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY`, `OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP`
Off-topic / disallowed content	OpenAI Moderation API + keyword blocklist	`apps/web/lib/interview/moderation.ts`, env `OPENAI_MODERATION_ENABLED`
Session loops / stuck retries	Per-session turn cap + idle timeout	`apps/web/lib/interview/session-limits.ts`
Endpoint enumeration / brute force	Upstash IP-based rate limits	`apps/web/lib/rate-limit.ts`, env `UPSTASH_REDIS_URL`
Leaked API key reuse	Project-scoped OpenAI key + IP restriction	OpenAI console (Project → Keys)
Runaway cost from bug	Hard monthly cap + 50% / 80% usage-alert webhook	OpenAI console (Project → Limits, Org → Usage)
Officer leaks mid-interview feedback ("correct", "good")	`detectFeedbackLeak` + server injects correction, client-side lint mirror	`apps/web/lib/interview/answer-leak-detector.ts`
Runaway session (cost / UX)	25-min wall-clock cap + soft warn at 22 min + server-enforced `OPENAI_REALTIME_SESSION_MAX_MINUTES`	`apps/mobile/lib/realtime/useRealtimeOfficer.ts` + `apps/web/lib/interview/realtime-limits.ts`
Model fabricates pass/fail	Tool sandbox strips `outcome` from `end_interview`; server computes verdict from `save_evidence` aggregation	`apps/web/lib/interview/tool-sandbox.ts` + `apps/web/lib/interview/verdict-builder.ts`

If a path above does not yet exist in the repo (some modules are owned by the backend agent), treat it as the intended location and keep this table in sync with the PR that ships the module.

5. Rollout plan

Each phase is gated by the previous one. Every phase has a server-side kill switch (set OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=0) and a client kill switch (OTA-flip EXPO_PUBLIC_REALTIME_ENABLED=false).

Phase 1 — Internal dogfooding (week 1)

Audience: ≤ 10 engineers + PM, signed in with allowlisted emails.
Build: eas build --profile preview.
Flags: EXPO_PUBLIC_REALTIME_ENABLED=true, EXPO_PUBLIC_MOCK_REALTIME=false, EXPO_PUBLIC_NEW_INTERVIEW_UI=true.
Server caps: OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=30, OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=20.

Kill switch:

vercel env rm OPENAI_API_KEY production && vercel --prod

Phase 2 — TestFlight beta, realtime disabled (week 2)

Audience: TestFlight / Google Play internal testers (~100 users).
Build: eas build --profile testflight.
Flags: EXPO_PUBLIC_REALTIME_ENABLED=false — UI surfaces everything except the realtime button. Validate text-mode interview end to end.

Kill switch (emergency backout):

eas update --branch production --message "disable realtime UI"

Phase 3 — TestFlight beta, realtime enabled with tight limits (weeks 3–4)

Audience: Same TestFlight group.
Flags: EXPO_PUBLIC_REALTIME_ENABLED=true.
Server caps: OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=10, OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=30.
Monitoring: Sentry, Datadog cost widget, Slack usage-alert webhook.
Kill switch: flip server cap to 0 and ship an OTA in parallel.

Phase 4 — Public release

Audience: App Store + Play Store.
Flags: EXPO_PUBLIC_REALTIME_ENABLED=true — unchanged.
Server caps: OPENAI_REALTIME_MAX_MINUTES_PER_USER_DAY=15, OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=100.
Kill switches (in priority order):
1. Server: OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=0 (instant, no redeploy).
2. Server: vercel env rm OPENAI_API_KEY production && vercel --prod.
3. Client: eas update --branch production with the flag flipped.
Phase 2a — soft-gated beta: free users see REALTIME_PREMIUM_REQUIRED upsell sheet. Paid users get 3 sessions/hour. Monitor per-user monthly USD cap.
Phase 2b — real receipt validation: replace advisory x-civix-tier header with server-side RevenueCat receipt validation + signed JWT entitlement header.
Phase 2c — rate tuning: based on observed abuse / cost, tune OPENAI_REALTIME_PAID_SESSIONS_PER_HOUR up or down without redeploying.

5b. Post-interview verdict contract

At the end of each realtime session, the mobile client POSTs to /api/interview/realtime-verdict with { sessionId, mode, phase }. The server aggregates every save_evidence event saved under civix:realtime:evidence:${sessionId} (Redis, 24h TTL) and computes a USCIS-rules outcome.

Response payload:

Field	Type	Notes
`outcome`	`'pass' \| 'fail' \| 'continued' \| 'timeout_incomplete'`	Server-authoritative. Model never invents this.
`perPhase`	`Record<Phase, { status, ... }>`	Per-USCIS rules: civics >=6/10, reading 1-of-3, writing 1-of-3, identity/n400/oath subjective.
`officerSummary`	`string`	One neutral paragraph; read aloud by the officer voice via `conversation.item.create` on the open WebSocket.
`coachTips`	`string[]`	Phase-failure-derived tips; NEVER leaks specific answers.
`durationSec`	`number`	Elapsed wall-clock.
`sessionId`	`string`	Echo.

The mobile client narrates officerSummary back to the user via the open WS (the officer speaks the verdict), then closes the call sheet and navigates to InterviewResults with the full payload rendered in a per-phase grid.

Idempotency: the verdict is persisted under civix:realtime:verdict:${sessionId} (24h TTL). Re-POSTs return the cached payload.

6. Known risks + rollback playbook

Risk	Detection	Mitigation	Rollback action
Cost spike (runaway loop, viral session)	OpenAI usage alert at 50% / 80%; Datadog cost widget	Per-user minute cap, global USD cap, project monthly cap	Set `OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP=0` → redeploy Vercel; OTA flip `EXPO_PUBLIC_REALTIME_ENABLED=false`.
Prompt-injection jailbreak	Sentry regex trigger on forbidden output; manual transcript review	System prompt, output guard, moderation API	Server-side disable the affected prompt mode; ship a prompt hotfix and redeploy.
Moderation false-positive spike	Drop in completion rate, user reports	Keep `OPENAI_MODERATION_ENABLED=true` but log overrides	Lower moderation threshold temporarily; push fix to classifier wrapper.
Realtime model latency regression	p95 first-token > 3s in Datadog	Failover to text-mode in the same session	OTA flip `EXPO_PUBLIC_REALTIME_ENABLED=false` while investigating.
OpenAI outage	5xx burst on `/api/interview/realtime/*`	Automatic client fallback to text mode	No action if fallback works; otherwise OTA flip realtime off.
Handwriting OCR outage / regression	OCR accuracy alert, user support tickets	Typed-input fallback always available	OTA flip `EXPO_PUBLIC_HANDWRITING_ENABLED=false`.
API key leak	GitHub secret scanning, OpenAI unusual-usage alert	Project-scoped keys with IP restriction	Rotate key in OpenAI console, update `OPENAI_API_KEY` in Vercel, redeploy. Invalidate the compromised project.
Upstash outage (rate limits open)	Upstash dashboard, Sentry 5xx spike on rate-limit path	Graceful degrade: deny on failure if flag set, allow otherwise	Temporarily enable `deny-on-error` mode in `apps/web/lib/rate-limit.ts`; redeploy.

7. On-call cheat sheet

# Check current server caps
vercel env ls production | rg OPENAI_

# Pause realtime (server, instant)
vercel env add OPENAI_REALTIME_GLOBAL_DAILY_USD_CAP production <<< "0"
vercel --prod

# Pause realtime (client, ~1 min to propagate via OTA)
cd apps/mobile
EXPO_PUBLIC_REALTIME_ENABLED=false eas update --branch production --message "disable realtime UI"

# Rotate the OpenAI key
vercel env rm OPENAI_API_KEY production
vercel env add OPENAI_API_KEY production  # paste new key
vercel --prod

Keep this document in sync with any change to the realtime flow, the flags, or the server caps. It is the single entry point new on-call engineers are pointed at during realtime incidents.