JOB TITLE
Senior Applied Machine Learning Engineer (audio / music generation)
ABOUT THE ROLE
We're building an AI-powered music system focused on commercial-ready audio generation. Our initial priority is getting the music generation quality right - structure, musicality, consistency, and production readiness.
We are looking for a Senior Applied ML Engineer to own the end-to-end audio generation pipeline for our MVP.
This role is hands-on and pragmatic: you'll fine-tune open-source music models, integrate inference pipelines, and work closely with audio and backend engineers to deliver usable results quickly and efficiently.
This role starts as a contract engagement (details below), with a path to full-time position for the right fit.
ROLE DETAIL
Terms:
Fixed-term (5 months) | Potential full-time conversion
Compensation:
$30,000 (Full 5 Month Term)
Location:
Hybrid/On-site (Monrovia, CA)
WHAT YOU'LL WORK ON
Fine-tuning open-source music generation models.
Implement conditioning controls (beats per minute, key, mood, section, density).
Training and deploying parameter-efficient fine-tunes (LoRA / adapters).
Building reference-conditioned generation.
Support long-form generation via chunking and continuation.
Integrating with Backend inference pipelines and APIs.
Collaborating with audio DSP engineers to ensure outputs are production ready.
REQUIRED QUALIFICATIONS
Strong experience with Python and PyTorch.
Hands-on experience with audio or speech generation models.
Familiarity with diffusion or autoregressive generative models.
Experience using or fine-tuning open-source ML models, familiar with HF Interfaces.
Understanding of audio representations.
Experience deploying ML models to production or API environments.
NICE-TO-HAVE SKILLS
Familiarity with CLAP / audio embeddings or retrieval-assisted generation.
Experience working with LoRA / PEFT methods.
Basic understanding of audio production workflows (tempo, key, stems, loudness).
Experience Optimizing inference cost and latency.
ROLE GOALS & OBJECTIVES
Reliably generate musically coherent, commercial-friendly cues (30 ~ 120 seconds)
The model responds correctly to conditioning inputs like tempo, key and mood
Outputs are stable, repeatable and usable downstream by post-production tools
The system is modular and ready to be integrated with downstream models.
Seniority level
Mid-Senior level
Employment type
Contract
Job function
Engineering and Information Technology
Industries
IT System Custom Software Development