Requirements:
• 7+ years of hands-on DevOps/Infrastructure engineering experience with AWS
• Proven track record of redesigning and scaling production infrastructure for high-growth companies
• Deep expertise in AWS services including RDS, EC2, ELB/ALB, Route53, VPC, IAM, and CloudFormation/Terraform
• Strong security background with experience in infrastructure hardening and compliance
• Experience migrating from traditional deployments to modern container orchestration (Kubernetes/ECS)
• Proficiency in infrastructure-as-code tools (Terraform preferred)
• Expert-level scripting skills in Python, Go, or Bash
• Experience with GitLab CI/CD pipelines and GitLab-based workflows
• Kubernetes expertise for container orchestration at scale
• Experience with monitoring/observability tools (Prometheus, Grafana, DataDog, ELK stack)
• Advanced GitLab experience including GitOps practices, GitLab Runner optimization, and GitLab security scanning
• Knowledge of serverless architectures and event-driven systems
• Experience with compliance frameworks (SOC2, ISO 27001, etc.)
• Previous experience supporting AI/ML workloads and data pipelines
Responsibilities:
• Critically evaluate our existing AWS infrastructure setup including RDS configurations, EC2 instances, Nginx load balancing, Docker Compose deployments, Target Groups, and Route53 DNS management
• Identify security vulnerabilities and scalability bottlenecks in the current architecture
• Design and implement infrastructure improvements with a focus on high availability, disaster recovery, and auto-scaling capabilities
• Modernize deployment strategies moving from Docker Compose to more scalable orchestration solutions where appropriate
• Conduct security audits of existing infrastructure and implement security hardening measures
• Redesign network architecture with proper VPC segmentation, security groups, and IAM policies
• Implement secrets management and encryption at rest/in transit across all services
• Establish compliance frameworks suitable for enterprise clients in manufacturing and aerospace sectors
• Architect auto-scaling solutions to handle variable workloads and traffic spikes
• Optimize database performance and implement proper backup/recovery strategies for RDS
• Design load balancing strategies that can handle global traffic distribution
• Implement caching layers and CDN strategies for improved performance
• Take full ownership of infrastructure decisions and their implementation
• Drive initiatives from conception to completion with minimal supervision
• Establish infrastructure standards and best practices across the engineering organization
• Lead incident response and post-mortem analysis for infrastructure-related issues