Responsibilities
• Develop, test, and maintain scalable ETL/ELT pipelines using Python and PySpark to process large datasets.
• Design and implement data models and schemas in Snowflake to optimize storage and query performance.
• Build and manage data workflows using Azure Data Factory (or similar) to orchestrate data movement and transformation.
• Develop and optimize SQL queries for data extraction, transformation, and loading.
• Maintain and enhance data lakes, ensuring data governance and security best practices.
• Collaborate with DataOps and cloud teams to deploy and manage solutions on OpenShift and other container platforms.
• Monitor and troubleshoot data pipeline issues, ensuring high availability and performance.
• Document data engineering processes, architectures, and pipelines for knowledge sharing and compliance.
• Stay updated with the latest data engineering tools and best practices, advocating for their adoption where appropriate.
Requirements
• Proven experience as a Data Engineer or in a similar role.
• Strong proficiency in Python, including frameworks like PySpark.
• Expertise in SQL and experience with Snowflake or similar cloud data warehouses.
• Hands-on experience with cloud data orchestration tools such as Azure Data Factory or equivalent.
• Knowledge of data lakes architecture and best practices.
• Familiarity with container orchestration platforms like OpenShift or Kubernetes.
• Strong understanding of data security, governance, and compliance.
• Experience with version control and CI/CD pipelines.
• Excellent problem-solving and communication skills.
• Bachelor's degree in Computer Science, Data Engineering, or a related field; advanced degrees are a plus.