Senior QA Engineer (AI Agent Quality & Evaluation)

San Francisco 3 days agoFull-time External
1.7k - 2k / yr
As a Senior QA Engineer, you’ll own quality strategy for AI-powered systems where correctness is probabilistic, outputs are structured (JSON), and evaluation requires real measurement (accuracy, cost, latency, edge-case handling, regression detection). You’ll build automated evaluation harnesses, and partner closely with Engineering and Product to prevent silent quality regressions as the system evolves. High autonomy, high leverage, and direct impact on the core product. Ideal Profile • Senior QA engineer who has moved beyond manual testing into automation, tooling, and quality systems. • Comfortable testing systems where “expected output” is not always deterministic — and knows how to create evaluation strategies anyway. • Strong Python + data mindset: can build repeatable harnesses, metrics pipelines, and regression suites. • Product-minded and skeptical in the best way: notices failure modes, ambiguous cases, and risks early. • Comfortable collaborating with engineers and shipping quality gates, not just filing bugs. • Hands-on experience with AI developer / agent tooling (e.g., Claude Code, GitHub Copilot or similar) and building agents that amplify inputs and orchestrate multi-step workflows (prompt engineering, tool integration).