Senior ML Infrastructure Engineer

San Francisco 3 days agoFull-time External
Negotiable
Senior ML Infrastructure Engineer San Francisco, CA - Onsite Salary - Over market average + equity Join us in building one of the world's leading generative video and multimodal AI platforms! We are seeking an experienced infrastructure engineer who is passionate about creating cloud-scale systems that power high-performance computing and robust CI/CD pipelines for complex ML workloads. What You Will Achieve: • Core ML Platform Architecture: Design and enhance infrastructure for large-scale generative video and multimodal model training, evaluation, and deployment. • High-Throughput Compute Systems: Develop and optimize GPU/TPU clusters and distributed training systems tailored for video-heavy workloads. • Production Reliability for Generative Models: Create the necessary tooling and services to support frequent model updates while managing significant compute demands and prolonged tasks. • End-to-End CI/CD for ML: Spearhead the development of automated pipelines for model training, validation, artifact management, and deployment. • Multimodal Data Infrastructure: Construct systems for the ingestion, versioning, transformation, and serving of large-scale datasets, including video, audio, and text, with high reliability. • Internal Developer Experience: Collaborate with research, product, and applied ML teams to create user-friendly internal tools for experiment tracking, model lineage, and resource management. • Technical Leadership: Mentor fellow engineers, establish platform standards, and influence future architectural decisions. What You Bring: • Proven experience designing and managing large-scale infrastructures in a cloud environment, hyperscaler, or premier AI firm. • History of ownership over CI/CD systems, high-capacity compute platforms, or data infrastructure serving ML teams. • Expert knowledge in distributed computing across GPUs/accelerators, Kubernetes, and cloud platforms (AWS/GCP/Azure). • Solid engineering skills in languages such as Python or Go. • Prior exposure to ML training pipelines, particularly concerning heavy video, multimodal, or high-dimensional data. • Ability to lead complex initiatives across multiple teams and shape technical strategy. Preferred Experience: • Experience with video processing, large-scale media pipelines, or streaming architectures. • Familiarity with modern multimodal or video-generation frameworks like PyTorch, JAX, or diffusers. • Background with Ray, Triton, CUDA optimization, or specialized scheduling for ML workloads. • History of working in high-growth AI startups or research-focused environments. • Understanding of security and compliance issues related to user-generated content. Why Join Us: • Influence the foundation of a cutting-edge generative video system. • Drive the future of multimodal AI by developing infrastructure that accelerates research and product innovation. • Collaborate with seasoned founding engineers, researchers, and platform developers from top tech firms. • Receive competitive compensation, meaningful equity, and engage in a strong in-person engineering culture in San Francisco.