About the position
You will design and scale the core distributed systems orchestrating thousands of autonomous AI agents. This role involves building fault-tolerant infrastructure, optimizing API performance, and developing local GPU systems. Joining a small, high-trust team, you will bridge the gap between complex infrastructure and real-world business automation in a stealth environment.
Responsibilities
• Architect distributed systems to coordinate thousands of autonomous agents with real-time monitoring.
• Drive horizontal scaling and optimize API rate limits across various model and data providers.
• Develop local GPU infrastructure and observability tools to ensure system health and cost efficiency.
Requirements
• Strong experience in Python and designing scalable, fault-tolerant distributed systems.
• Deep understanding of PostgreSQL, Redis, and asynchronous job queue systems for data isolation.
• Ability to work in a high-discretion environment with a commitment to security and privacy.