Member of the Technical Staff- LLMs

San Francisco 1 months agoFull-time External
1.2m - 1.6m / yr
Member of Technical Staff - Infrastructure & LLMs Location: San Francisco, CA (Hybrid) Compensation: $170,000 - $220,000 base + 1-3% equity Work Authorization: U.S. work authorization required (no visa sponsorship) Start Date: ASAP Type: Full-time About the Role We're seeking a deeply curious and technically strong engineer to join a lean, high-performance team building next-generation inference infrastructure for LLMs. This is an opportunity to own the design and development of performance-critical systems from day one, working directly on problems like: • Scaling multi-GPU inference workloads • Designing distributed job schedulers • Experimenting with LLM distillation and optimization frameworks You'll join a two-person engineering team at the earliest stage, where your impact will be foundational to both product and culture. No bureaucracy. No politics. Just ambitious, technically challenging work that matters. Why This Role is Unique • Massive Technical Ownership: Drive core infra design with zero red tape. • Frontier Engineering: Work on distributed systems, LLM runtimes, CUDA orchestration, and novel scaling solutions. • Foundational Equity: Earn meaningful ownership and grow into a founding-level role. • Mission-Driven: Focused on durable infra, not short-term hype cycles. • No Credentials Needed: We value ability and drive over resumes and degrees. Ideal Candidate Profile • 2+ years experience in backend or infrastructure engineering • Deep interest or experience in distributed systems, GPU orchestration, or AI infra • Strong technical curiosity demonstrated through side projects, OSS contributions, or community involvement • Background at infra-focused orgs (e.g., Supabase, Dagster, Modal, Lightning AI, MotherDuck) • Python fluency, with production experience in Docker, GPU workloads, and distributed compute systems Tech Stack • Core Language: Python • Infrastructure: Custom distributed systems for multi-GPU inference • Deployment: Docker, CUDA, Kubernetes (or equivalent) • Focus: Batch inference, model distillation, low-latency pipelines Soft Traits • Fast learner with ownership mindset • Thinks from first principles, skeptical of default assumptions • Collaborative, positive-sum team player • Oriented toward building, not credentialism