We are looking for an experienced Databricks Data Engineer with strong DevOps expertise to join our data engineering team. The ideal candidate will design, build, and optimize large-scale data pipelines on the Databricks Lakehouse platform while implementing robust CI/CD and deployment practices. This role requires strong skills in PySpark, SQL, Azure cloud services, and modern DevOps tooling. You will collaborate with cross-functional teams to deliver scalable, secure, and high‑performance data solutions.
Technical Skills
• Strong hands-on experience with Databricks, including:
• Delta Lake
• Unity Catalog
• Lakehouse Architecture
• Delta Live Pipelines
• Databricks Runtime
• Table Triggers
• Proficiency in PySpark, Spark, and advanced SQL.
• Expertise with Azure cloud services (ADLS, ADF, Key Vault, Functions, etc.).
• Experience with relational databases and data warehousing concepts.
• Strong understanding of DevOps tools:
• Git/GitLab
• CI/CD pipelines
• Databricks Asset Bundles
• Familiarity with infrastructure-as-code (Terraform is a plus).
Key Responsibilities
1. Data Pipeline Development
• Design, build, and maintain scalable ETL/ELT pipelines using Databricks.
• Develop data processing workflows using PySpark/Spark and SQL for large‑volume datasets.
• Integrate data from ADLS, Azure Blob Storage, and relational/non-relational data sources.
• Implement Delta Lake best practices including schema evolution, ACID transactions, OPTIMIZE, ZORDER, and performance tuning.
2. DevOps & CI/CD
• Implement CI/CD pipelines for Databricks using Git, GitLab, Azure DevOps, or similar tools.
• Build and manage automated deployments using Databricks Asset Bundles.
• Manage version control for notebooks, workflows, libraries, and configuration artifacts.
• Automate cluster configuration, job creation, and environment provisioning.
3. Collaboration & Business Support
• Work with data analysts and BI teams to prepare datasets for reporting and dashboarding.
• Collaborate with product owners, business partners, and engineering teams to translate requirements into scalable data solutions.
• Document data flows, architecture, and deployment processes.
4. Performance & Optimization
• Tune Databricks clusters, jobs, and pipelines for cost efficiency and high performance.
• Monitor workflows, debug failures, and ensure pipeline stability and reliability.
• Implement job instrumentation and observability using logging/monitoring tools.
5. Governance & Security
• Implement and manage data governance using Unity Catalog.
• Enforce access controls, data security, and compliance with enterprise policies.
• Ensure best practices around data quality, lineage, and auditability.
Preferred Experience
• Knowledge of streaming technologies like Structured Streaming or Spark Streaming.
• Experience building real-time or near real-time pipelines.
• Exposure to advanced Databricks runtime configurations and tuning.
Certifications (Optional)
• Databricks Certified Data Engineer Associate / Professional
• Azure Data Engineer Associate