Title: Director of Engineering - AI Cloud Infrastructure
Location: FULLY remote! Must be located in the US
Salary: $200k-$300k+ Bonus and RSU Package
Requirements: 10+ years of Engineering experience + at least 5 years of AI, ML, HPC and/or Cloud Computing environment. Must also have at least 5 years of leadership experience.
If You Are an Engineering Leader with AI Cloud Experience, Please Read On!
We're a fast-moving team building the next generation of AI infrastructure-designed from the ground up for scale, speed, and performance. Our platform powers some of the most advanced AI workloads in the world, combining high-density GPU clusters, cutting-edge networking, and smart orchestration tools. We operate Tier-3 data centers optimized for AI and HPC, and offer flexible hybrid cloud solutions that let teams move fast and build big.
If you're an Engineering leader that is seasoned in the AI space and passionate about solving hard problems, working with world-class hardware and software, and shaping the future of AI infrastructure, we'd love to meet you. We are positioned extremely well for long term growth and reward our team.
What you'll do:
As Director of Engineering, you'll lead and grow a team of engineering managers and technical leads, fostering a culture of innovation and excellence. You'll oversee the design, deployment, and scaling of GPU-based AI infrastructure, while ensuring performance, reliability, and security. Your responsibilities include developing tools for provisioning and monitoring, implementing best practices like CI/CD and infrastructure-as-code, and managing change and incident response processes. You'll work closely with cross-functional teams to align infrastructure with business goals, and contribute to strategic planning, budgeting, and reporting to the CTO.
Must Have
• BS/MS in Computer Science or related field.
• 10+ years in engineering, 5+ in leadership roles.
• Proven experience with cloud-scale AI/ML infrastructure (e.g., Kubernetes, Slurm).
• Familiarity with infrastructure tools (OpenStack, MaaS, Netbox, KVM, Redfish).
• Strong knowledge of distributed systems, cloud-native tech, and automation.
• Skilled in DevOps, observability, and software delivery pipelines.
Bonus Points!
• Experience with NVIDIA clusters, RDMA, RoCE/Infiniband.
• Knowledge of SDN (EVPN/VXLAN, BGP, CLOS networks).
• Familiarity with LLM training/inference at scale.
• Background in AI platforms or cloud services.
Offering:
• Base of $200k-$300k + Bonus and generous RSU Package
• 5 weeks of PTO
• 401k w/ match
• comprehensive medical and supplemental benefits package
• fully remote!