Applied AI Researcher - Video Diffusion
Location: On-site, San Francisco, CA
Compensation: $160,000 - $300,000 + Equity (0.5%-2%)
Employment Type: Full-time
About the Role
An early-stage AI startup in San Francisco is building the next generation of video foundation models for human motion and expression. They're seeking an Applied AI Researcher to help lead the training of state-of-the-art video diffusion models from scratch-working directly with massive visual datasets and 100s of GPUs.
This is a high-ownership role at a company that trains its own models (not just fine-tunes others) and is backed by well-known investors and founders from top-tier companies in AI, video, and infrastructure.
What You'll Do
• Train large-scale diffusion and transformer-based models for video generation
• Curate, clean, and label internet-scale video datasets
• Run targeted experiments and rapidly iterate on model improvements
• Distill models for faster inference with minimal performance loss
• Stay current with arXiv and GenAI research; help shape the model roadmap
• Build LoRA modules to expand model capability
Requirements
• 2+ years building ML models from scratch in Python and PyTorch
• Deep experience with vision transformers, diffusion models, or related architectures
• Comfortable working on Linux clusters with GPU workloads
• Experience with labeling tools (e.g., face detection, speaker recognition)
• Strong research mindset-PhD or top-tier publications a plus
• Familiarity with video compression, codecs, and perceptual metrics
You Might Be a Fit If...
• You're hands-on with training generative video models
• You thrive in early-stage environments and want full-stack research-to-deployment impact
• You obsess over both clean data and novel architecture design
• You're active in AI research circles and keep up with the latest GenAI trends