The Databricks R&D Operations Organization is seeking a highly motivated and technically skilled Technical Program Manager (TPM) to lead and oversee data annotation programs that power our cutting-edge AI research initiatives. This role sits at the intersection of program management, data operations, and AI/ML, and will play a pivotal part in ensuring that our data annotation efforts are scalable, high-quality, and aligned with the needs of our research and product teams.
You will collaborate closely with researchers, data scientists, ML engineers, and vendor operations to drive the end-to-end lifecycle of large-scale data labeling and curation efforts — from strategy and planning to execution, delivery, and quality evaluation.
Responsibilities
• Program Ownership: Drive large-scale data annotation programs end-to-end, from scoping requirements to delivery and post-mortem analysis.
• Cross-Functional Collaboration: Partner with AI Research leadership, AI researchers, data scientists, ML engineers, and product managers to define data needs, success metrics, and annotation guidelines.
• Vendor & Workforce Management: Manage external annotation vendors and internal labeling teams, including contract negotiation, SLAs, quality standards, and throughput planning.
• Quality & Process: Design and implement robust quality control pipelines, annotation tools, and feedback loops to ensure data quality at scale.
• Tooling & Automation: Collaborate with engineering to improve annotation infrastructure, workflows, and data pipelines for efficiency and scalability.
• Data Strategy & Governance: Contribute to data governance best practices, including privacy, security, ethics, and compliance in annotation workflows.
• Reporting & Metrics: Define and track key program metrics (cost, quality, speed, volume), and regularly communicate progress to stakeholders and leadership.
• Internal Adoption: Coordinate internal adoption of agentic AI products by building onboarding processes, workflows, and change management strategies.
• Data Quality Leadership: Establish and standardize processes for measuring, monitoring, and improving data quality across datasets and annotation teams.
• Customer Engagement: Collaborate with external customers and research partners on evaluation workshops, pilots, and feedback sessions to drive continuous improvement.
Competencies and Requirements
• Bachelor’s or Master’s degree in a technical field (e.g. Computer Science, Data Science, Machine Learning, Information Systems) or equivalent practical experience.
• 7+ years of experience in technical program management, project management, or operations in data-centric or AI/ML environments.
• Strong understanding of ML development workflows, data pipelines, and annotation lifecycle.
• Experience managing large-scale data labeling or data collection efforts, including working with third-party vendors.
• Familiarity with big data platforms (e.g. Apache Spark, Databricks, Hadoop) and data warehousing concepts.
• Excellent organizational, problem-solving, and communication skills with the ability to influence cross-functional stakeholders.
• Proven track record of driving cross-functional teams to deliver complex technical projects on time and with high quality.
• Excellent communication, negotiation and analytical skills, with the ability to document standard operating procedures and processes
• Advanced working SQL Knowledge, Ability to build and maintain analytics to track, forecast, and visualize consumption through ad-hoc SQL, reports, and dashboards
• Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
• Self-motivated and able to work independently, as well as in a team environment.
• Preferred good working knowledge of GPU technology and its applications in generative AI and machine learning.
• Familiarity with big data technologies such as Apache Spark, Delta Lake, and MLflow is a plus.