Voice Agent Prompt Engineer (VAPI) + Agent Evals
We are hiring an experienced Voice Agent Prompt Engineer to refine and productionize system prompts for a VAPI-based real-time voice agent, using a structured evaluation-driven approach. This role is not just prompt writing. You will be expected to use agent evaluation tools (e.g., Langfuse) to test, score, and iteratively improve agent behavior until it meets strict business and conversational quality standards. Scope of Work Part 1: Prompt Engineering (Voice Agent) You will refine three separate system prompts, each for a distinct voice call use case: 1.24-hour pre-event reminder Confirm attendance, reduce no-shows, handle reschedules gracefully, and remain elegant and non-salesy. 2. Post-event feedback collection (attended) Collect structured feedback, surface business insights, and drive survey completion naturally. 3. Post-event follow-up (did not attend) Understand reasons for no-shows, reduce friction for future attendance, and log objections and intent signals. Business-Intelligent Behavior (Required) Across all prompts and test cases, the agent must: • Prioritize relationship and retention • Handle objections strategically • Capture structured insights (reason codes, sentiment, likelihood to attend again) • Remain concise and voice-natural • Never over-explain, argue, loop, or sound salesy Agent Evals (Mandatory) You must use agent evaluation tooling (Langfuse or equivalent) to drive refinement. Evaluation Requirements • Create agent evals for each use case • Minimum 20 test cases per use case • Each test case must define: o user utterance(s) o expected agent behavior o explicit pass/fail criteria Evaluation Dimensions (Examples) • Intent adherence • Conciseness • Tone consistency • Objection handling quality • Repetition avoidance • Data capture accuracy • Escalation / DNC compliance • Interruption handling Prompt iterations must be based on eval results, not intuition. ________________________________________ Deliverables Part 1 • 3 production-ready system prompts (Markdown, copy-paste ready) • Each prompt includes: o intent boundaries o anti-repetition rules o escalation + “do not record me” handling o question discipline (max 1–2 per turn) o voice/TTS phrasing and punctuation guidance • Agent eval test suite o 60+ total test cases o clear scoring or pass/fail logic • Optional: output schema guidance (attendance, sentiment, follow-up, etc.) ________________________________________ Part 2: VAPI Performance Tuning (Bonus / Add-On) Review our existing VAPI setup and suggest improvements to: • reduce perceived latency • improve turn-taking • handle interruptions more naturally • avoid awkward silences ________________________________________ Tools & Stack • VAPI (voice orchestration) • Deepgram (STT) • ElevenLabs (TTS) • Langfuse (agent evals & tracing) • LLM provider will be shared after onboarding ________________________________________ Required Experience • Voice agents deployed in production • Prompt engineering specifically for spoken conversations • Hands-on experience with Langfuse or similar eval frameworks • Strong understanding of real-time voice latency and interruption behavior • Ability to design test suites and evaluation criteria, not just prompts _______________________________________ Apply tot his job