Job Descriptions:
• Develop, test, and maintain data processing applications and pipelines using PySpark and related technologies.
• Perform data extraction, transformation, and loading (ETL) from multiple sources into target systems.
• Ensure data quality, consistency, and performance across data workflows.
• Participate in code reviews, documentation, and continuous improvement of data processes.
• Troubleshoot and resolve issues in data processing and integration environments.
• Support the deployment, monitoring, and maintenance of data solutions in production.
Requirements:
• 1-3 Years in Python, including data structures, algorithms, and libraries for data manipulation (e.g., Pandas).
• Deep understanding of Apache Spark, its architecture, and components (RDDs, DataFrames, Datasets).
• Strong knowledge of SQL for data querying and manipulation.
• PExperience in ETL (Extract, Transform, Load) processes using PySpark.
• Ability to analyze and interpret complex datasets and derive insights.
• Strong analytical skills to troubleshoot issues in data processing pipelines.
• Good command of both spoken and written English & Cantonese. Ability to speak Putonghua is an advantage.
Click 'Apply Now' to apply for this position or call Stella Tang at +852 3180 4977 for a confidential discussion. All information collected will be kept in strict confidence and will be used for recruitment purpose only.