Lead Data & Inference Engineer

San Francisco 1 days agoFull-time External
Negotiable
About the Job: Lead Data & Inference Engineer Our Client A fast-moving, venture-backed advertising technology startup based in San Francisco is seeking to transform how business-to-business marketers connect with their ideal customers. With twelve million dollars in funding, they are pioneering identity resolution technology that merges business and consumer signals, turning static audience lists into highly matchable and cross-channel segments without relying on cookies. By converting first-party and third-party data into precisely targeted audiences across platforms such as Meta, Google, and YouTube, they empower marketing teams to achieve greater match rates, minimize wasted advertising spend, and boost pipeline growth. With a keen understanding of business buyer behavior in traditionally consumer-focused channels, they are redefining demand generation and account-based marketing methods for business brands. About Us Catalyst Labs is a premier talent agency specializing in Applied AI, Machine Learning, and Data Science recruitment. We pride ourselves on being integrally involved in our clients' recruitment efforts, collaborating closely with Founders, CTOs, and Heads of AI to drive the future of applied intelligence from model optimization to AI-driven workflows. We focus on aligning your technical skills with creative problem-solving abilities and long-term growth opportunities in the rapidly evolving realm of intelligent systems. Location: San Francisco Work Type: Full Time Compensation: Above market base + bonus + equity Roles & Responsibilities • Lead the design, development, and scaling of a comprehensive data platform from ingestion to insights, ensuring fast, reliable, and business-ready data. • Build and maintain scalable batch and streaming pipelines, transforming diverse data sources and APIs into trusted, low-latency systems. • Own and optimize data reliability, cost, and service level objectives, aiming for 99.9% uptime and minute-level latency while conducting root cause analyses and providing sustainable solutions. • Operate inference pipelines that enhance and enrich data, incorporating enrichment, scoring, and quality assurance using large language models and retrieval-augmented generation. • Collaborate across teams to deliver data as a product, establishing clear data contracts, ownership models, lifecycle processes, and metrics-driven decision-making. • Guide architectural decisions across the data lake and pipeline stack, documenting lineage, trade-offs, and decision-making approaches for internal development versus external procurement. • Scale integration with APIs and internal services, ensuring data consistency, high quality, and support for both real-time and batch-oriented use cases. • Mentor engineers, review code, and enhance overall technical standards across the organization while promoting best practices in data-driven decision-making. Qualifications • Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or Mathematics. • Excellent written and verbal communication skills, with a proactive and collaborative mindset. • Comfortable working in hybrid or distributed environments, demonstrating strong ownership and accountability. • Action-oriented, with the ability to identify bottlenecks, automate workflows, and rapidly iterate based on measurable outcomes. • Demonstrated capability to teach, mentor, and clearly document technical decisions and schemas. Core Experience • 6 to 12 years of experience in building and scaling production-grade data systems, showcasing expertise in data architecture, modeling, and pipeline design. • Proficient in SQL (query optimization on large datasets) and Python. • Hands-on experience with distributed data technologies (Spark, Flink, Kafka) and modern orchestration tools (Airflow, Dagster, Prefect). • Familiarity with dbt, DuckDB, and the modern data stack; experience with Infrastructure as Code (IaC), Continuous Integration/Continuous Deployment (CI/CD), and observability. • Exposure to Kubernetes and cloud infrastructure providers (AWS, GCP, or Azure). • Bonus: Strong Node.js skills for faster onboarding and seamless system integration. • Previous experience in a high-growth startup (10 to 200 employees) or early-stage environment with a solid product mindset.