Scope of Work:
Data Pipeline Development & Management
• Design, build, and maintain robust data pipelines using AWS Glue
• Implement ETL/ELT processes for data ingestion from multiple sources
• Optimize data workflows for performance and scalability
• Monitor and troubleshoot data pipeline failures and performance issues
Data Infrastructure & Engineering
• Manage and optimize AWS Redshift data warehouse operations
• Configure and maintain data storage solutions (AWS S3, data lakes)
• Implement data partitioning, indexing, and compression strategies
• Support Infrastructure as Code (IaC) for data infrastructure deployment
CI/CD & DevOps for Data
• Develop and maintain CI/CD pipelines for data workflows using GitLab
• Implement automated testing for data pipelines and data quality
• Support version control and deployment strategies for data assets
• Configure AWS Lambda functions for data processing automation
Monitoring & Support
• Set up monitoring and alerting for data pipeline health
• Provide technical support for data-related issues
• Collaborate with technical teams on data architecture requirements
• Optimize query performance and database operations
Documentation & Reporting
• Document data pipeline architectures and technical specifications
• Maintain runbooks and operational procedures
• Conduct monthly progress meetings (1 hour) to report on system health
• Track engineering tasks through SHIP-HATS Jira
• Maintain technical documentation on SHIP-HATS Confluence
Required Skills & Experience:
• Strong background in data engineering and data pipeline development
• Proficiency in SQL, Python, and shell scripting
• Extensive experience with AWS data services (Redshift, S3, Glue, Lambda, CloudWatch)
• Data warehouse design and optimization experience
• Strong CI/CD pipeline knowledge (GitLab preferred)
• Infrastructure as Code (IaC) experience (Terraform, CloudFormation)
• Knowledge of data modeling and database design principles
• Strong troubleshooting and performance optimization skills