Lead Data Engineer - Python, PySpark & SQL

Montreal 7 days agoFull-time External
Negotiable
Job Title Lead Data Engineer – Python, PySpark & SQL Location Canada Job Type Full time contract Responsibilities • Build scalable data ingestion and transformation pipelines using Python, PySpark, and SQL. • Process raw CSV/text files from AWS S3, including validating headers, schema checks, and malformed file detection. • Convert raw data into structured DataFrames and implement reusable data quality checks. • Develop advanced transformations using SQL/PySpark (Window functions, LAG(), grouping logic, date gap detection, etc.). • Deploy and tune PySpark applications on AWS EMR, optimizing executor memory, cores, shuffle behavior, and cluster performance. • Work with AWS services such as S3, EMR, Glue, Lambda, IAM. • Debug performance issues (OOM errors, shuffle spill, GC problems) and improve pipeline reliability. • Lead design discussions, code reviews, and mentor junior engineers. Required Skills • 8+ years of experience in Data Engineering. • Expert Python (file processing, scripting, validation automation). • Strong PySpark (DataFrames, job tuning, distributed processing). • Advanced SQL (analytical functions, performance tuning). • Hands‑on with AWS data stack: S3, EMR, Glue, Lambda. • Strong understanding of Spark memory allocation, YARN container usage, and EMR resource tuning. • Excellent debugging, communication, and problem‑solving skills. Nice to Have • Airflow or Databricks experience. • Terraform or CloudFormation. • Experience with data lake formats (Delta, Iceberg, Hudi). Seniority level Mid-Senior level Employment type Contract Job function Information Technology #J-18808-Ljbffr