Project description
We are seeking an expert with deep proficiency as a DataBricks Platform Engineer, possessing experience in data engineering. This individual should have a comprehensive understanding of both data platforms and software engineering, enabling them to integrate the platform effectively within an IT ecosystem.
Responsibilities
• Manage and optimize Databricks data platform.
• Ensure high availability, security, and performance of data systems.
• Provide valuable insights about data platform usage.
• Optimize computing and storage for large-scale data processing.
• Design and maintain system libraries (Python) used in ETL pipelines and platform governance.
• Optimize ETL Processes - Enhance and tune existing ETL processes for better performance, scalability, and reliability.
SKILLS
Must have
• Minimum 10 Years of experience in IT/Data.
• Minimum 3 years of experience as a Databricks Data Platform Engineer.
• Bachelor's in IT or related field.
• Infrastructure & Cloud: Azure, AWS (expertise in storage, networking, compute).
• Programming: Proficiency in PySpark for distributed computing.
• Proficiency in Python for ETL development.
• SQL: Expertise in writing and optimizing SQL queries, preferably with experience in databases such as PostgreSQL, MySQL, Oracle, or Snowflake.
• Data Warehousing: Experience working with data warehousing concepts and Databricks platform.
• ETL Tools: Familiarity with ETL tools & processes
• Data Modelling: Experience with dimensional modelling, normalization/denormalization, and schema design.
• Version Control: Proficiency with version control tools like Git to manage codebases and collaborate on development.
• Data Pipeline Monitoring: Familiarity with monitoring tools (e.g., Prometheus, Grafana, or custom monitoring scripts) to track pipeline performance.
• Data Quality Tools: Experience implementing data validation, cleaning, and quality frameworks, ideally Monte Carlo.
Nice to have
• Containerization & Orchestration: Docker, Kubernetes.
• Infrastructure as Code (IaC): Terraform.
• Understanding of Investment Data domain (desired).