Cybersecurity Engineer, Safeguards

San Francisco 29 days agoFull-time External
Negotiable
Position Overview Anthropic’s mission is to create reliable, interpretable, and steerable AI systems that are safe and beneficial for users and society. In this dual-hatted technical and policy role on the Dangerous Asymmetric Harms team, you will focus primarily on technical evaluations—designing and running cyber tests to assess risks and safeguard against potentially catastrophic harms—while also setting policy boundaries for emerging technologies. This is an opportunity to work within a collaborative, high-impact environment committed to building beneficial AI systems. Key Responsibilities • Design and implement robust evaluation infrastructure to measure model capabilities and risks across Cyber, CBRNE, and Dangerous Asymmetric Advanced Technologies—with a primary focus on Cyber. • Independently build technical projects and scale evaluation systems that could set industry standards. • Develop and run systems for deep automated analysis of cyber harm across Anthropic’s platforms. • Create scalable evaluation infrastructure for sandboxing systems. • Test and measure AI capability uplift to anticipate and evaluate safeguards for cyber and related risk domains. • Run independent evaluations to test and refine cyber policies. • Design heuristics for prohibited and dual-use cyber categories to support classifier training. • Partner with research and engineering teams to implement effective cyber safety systems. • Provide operational insights on threat patterns to support AI uplift testing. • Own and define policies for emerging technologies beyond traditional cyber/CBRN frameworks. • Address critical blind spots at domain intersections (e.g., cyber-physical attacks, bio-cyber threats). • Support policies related to explosive devices and advanced delivery systems. • Develop threat models for novel asymmetric technologies (e.g., drone swarms, space weapons). • Coordinate with CBRN and Cyber Policy Managers to address overlapping threats. Required Qualifications • Familiarity with the basics of prompting large language models (LLM) and using LLMs as both generative models and classifiers. • Ability to design intelligent language model pipelines to automate tasks. • Expertise in Python, with fluency in building complex systems and strong async capabilities for efficient scaling. • A hacker and fast prototyping mindset with experience in vulnerability detection and adversarial thinking. • Self-sufficient, with the capability to independently create and run evaluations. • Strong systems thinking and debugging skills for complex setups. • A minimum of a Bachelor’s degree in a related field or equivalent experience. Preferred Qualifications • Hands-on offensive security experience such as penetration testing, red teaming, or vulnerability research. • Relevant security certifications (e.g., SANS, OSCP) with an emphasis on ICS/SCADA systems. • Exposure to AI evaluation benchmarks and frameworks. • A background in AI/ML security or adversarial testing. • Prior experience at the intersection of security and policy. Benefits & Perks • Competitive Compensation: Annual Salary: $175,000—$295,000 USD • Benefits: Optional equity donation matching, generous vacation and parental leave, flexible working hours, and a collaborative office space. • Hybrid work policy: Expect in-person collaboration at an office at least 25% of the time. • Visa sponsorship available (subject to role requirements and availability).