Principal Data Engineer

New York 29 days agoFull-time External
Negotiable
Principal Data Engineer Worth AI, a leader in the computer software industry, is looking for a talented and experienced Principal Data Engineer to join their innovative team. At Worth AI, we are on a mission to revolutionize decision-making with the power of artificial intelligence while fostering an environment of collaboration, and adaptability, aiming to make a meaningful impact in the tech landscape. Our team values include extreme ownership, one team and creating raving fans both for our employees and customers. Worth is looking for a Principal Data Engineer to own the company-wide data architecture and platform. Design and scale reliable batch/streaming pipelines, institute data quality and governance, and enable analytics/ML with secure, cost-efficient systems. Partner with engineering, product, analytics, and security to turn business needs into durable data products. Responsibilities What you will do: • Architecture & Strategy • Define end-to-end data architecture (lake/lakehouse/warehouse, batch/streaming, CDC, metadata). • Set standards for schemas, contracts, orchestration, storage layers, and semantic/metrics models. • Publish roadmaps, ADRs/RFCs, and "north star" target states; guide build vs. buy decisions. • Platform & Pipelines • Design and build scalable, observable ELT/ETL and event pipelines. • Establish ingestion patterns (CDC, file, API, message bus) and schema-evolution policies. • Provide self-service tooling for analysts/scientists (dbt, notebooks, catalogs, feature stores). • Ensure workflow reliability (idempotency, retries, backfills, SLAs). • Data Quality & Governance • Define dataset SLAs/SLOs, freshness, lineage, and data certification tiers. • Enforce contracts and validation tests; deploy anomaly detection and incident runbooks. • Partner with governance on cataloging, PII handling, retention, and access policies. • Reliability, Performance & Cost • Lead capacity planning, partitioning/clustering, and query optimization. • Introduce SRE-style practices for data (error budgets, postmortems). • Drive FinOps for storage/compute; monitor and reduce cost per TB/query/job. • Security & Compliance • Implement encryption, tokenization, and row/column-level security; manage secrets and audits. • Align with SOC 2 and privacy regulations (e.g., GDPR/CCPA; HIPAA if applicable). • ML & Analytics Enablement • Deliver versioned, documented datasets/features for BI and ML. • Operationalize training/serving data flows, drift signals, and feature-store governance. • Build and maintain the semantic layer and metrics consistency for experimentation/BI. • Leadership & Collaboration • Provide technical leadership across squads; mentor senior/staff engineers. • Run design reviews and drive consensus on complex trade-offs. • Translate business goals into data products with product/analytics leaders.