Senior ML Infrastructure Engineer
San Francisco, CA - Onsite
Salary - Over market average + equity
Join us in building one of the world's leading generative video and multimodal AI platforms! We are seeking an experienced infrastructure engineer who is passionate about creating cloud-scale systems that power high-performance computing and robust CI/CD pipelines for complex ML workloads.
What You Will Achieve:
• Core ML Platform Architecture: Design and enhance infrastructure for large-scale generative video and multimodal model training, evaluation, and deployment.
• High-Throughput Compute Systems: Develop and optimize GPU/TPU clusters and distributed training systems tailored for video-heavy workloads.
• Production Reliability for Generative Models: Create the necessary tooling and services to support frequent model updates while managing significant compute demands and prolonged tasks.
• End-to-End CI/CD for ML: Spearhead the development of automated pipelines for model training, validation, artifact management, and deployment.
• Multimodal Data Infrastructure: Construct systems for the ingestion, versioning, transformation, and serving of large-scale datasets, including video, audio, and text, with high reliability.
• Internal Developer Experience: Collaborate with research, product, and applied ML teams to create user-friendly internal tools for experiment tracking, model lineage, and resource management.
• Technical Leadership: Mentor fellow engineers, establish platform standards, and influence future architectural decisions.
What You Bring:
• Proven experience designing and managing large-scale infrastructures in a cloud environment, hyperscaler, or premier AI firm.
• History of ownership over CI/CD systems, high-capacity compute platforms, or data infrastructure serving ML teams.
• Expert knowledge in distributed computing across GPUs/accelerators, Kubernetes, and cloud platforms (AWS/GCP/Azure).
• Solid engineering skills in languages such as Python or Go.
• Prior exposure to ML training pipelines, particularly concerning heavy video, multimodal, or high-dimensional data.
• Ability to lead complex initiatives across multiple teams and shape technical strategy.
Preferred Experience:
• Experience with video processing, large-scale media pipelines, or streaming architectures.
• Familiarity with modern multimodal or video-generation frameworks like PyTorch, JAX, or diffusers.
• Background with Ray, Triton, CUDA optimization, or specialized scheduling for ML workloads.
• History of working in high-growth AI startups or research-focused environments.
• Understanding of security and compliance issues related to user-generated content.
Why Join Us:
• Influence the foundation of a cutting-edge generative video system.
• Drive the future of multimodal AI by developing infrastructure that accelerates research and product innovation.
• Collaborate with seasoned founding engineers, researchers, and platform developers from top tech firms.
• Receive competitive compensation, meaningful equity, and engage in a strong in-person engineering culture in San Francisco.