Research Engineer (Data Infra/ML)

San Francisco 1 months agoFull-time External
1.4m - 2.1m / yr
Research Engineer (Data Infra/ML) Bay Area (Hybrid) Can you build & optimize distributed ML pipelines with Ray or Spark? Do you love speeding up cloud infra (Kubernetes, Docker, CI/CD)? Excited to build the data backbone for large-scale ML training? We're a tier 1 VC-backed start-up, developing hyper-realistic 3D simulations using AI. Our customers include leading names in industries such as autonomous vehicles, drones and robotics. Role You’ll be hands-on improving CI/CD pipelines, speeding up Docker builds, and scaling scene processing on Ray. You’ll also: • Build high-performance data pipelines for multimodal datasets (3D, video, sensor). • Optimize distributed training and processing across Spark, Databricks, and Kubernetes. • Work with researchers to productionize PyTorch models and streamline ML workflows. • Develop tools that make data discoverable, reusable, and reliable throughout the ML lifecycle. You • Strong Python skills and experience with distributed systems (Ray, Spark, Flyte, Dask). • Hands-on with cloud, Kubernetes, and distributed training (Ray, PyTorch DDP, Horovod). • Familiar with dataset versioning and experiment tracking (DVC, MLflow). Bonus Points • Experience in simulation, robotics, or autonomy pipelines. • Background in deep learning (PyTorch) and 3D / sensor data (LIDAR, meshes, radiance fields). • Open-source contributions or frontend/UI experience.