Cloud Services Operations Lead

Singapore 4 months agoFull-time External
44.4k - 58.3k / mo
The Cloud Services Operations Lead will be a critical leader within our Cloud Shared Services team, responsible for the day-to-day operational excellence, stability, and continuous improvement of our multi-cloud (primarily AWS and Azure) environments. This role requires a strong blend of technical expertise in cloud operations, a deep understanding of IT service management (ITSM) best practices, and proven leadership skills to manage a team of cloud operations engineers. The successful candidate will ensure that our cloud services are delivered efficiently, securely, and in accordance with agreed-upon service level agreements (SLAs). Key Responsibilities: • Operational Leadership: • Lead, mentor, and develop a team of cloud operations engineers, fostering a culture of continuous learning, collaboration, and high performance. • Oversee daily operations of our multi-cloud environments (AWS, Azure, and others as applicable), ensuring high availability, performance, and reliability of all cloud services. • Implement and enforce operational best practices, standards, and procedures for cloud infrastructure and platform management. • Manage on-call rotations and ensure effective incident response and problem resolution. • Service Management & Performance: • Define, monitor, and report on key performance indicators (KPIs) and service level agreements (SLAs) for all cloud services. • Proactively identify and address potential operational issues, performance bottlenecks, and capacity constraints. • Drive continuous improvement initiatives to optimize cloud operations, reduce manual effort, and enhance service delivery. • Collaborate with internal customers to understand their evolving needs and ensure our cloud services meet their requirements. • Incident, Problem, and Change Management: • Establish and mature robust incident management processes, ensuring timely resolution and effective communication during outages. • Implement and manage problem management to identify root causes of incidents and prevent recurrence. • Oversee change management processes for cloud infrastructure and services, ensuring proper planning, testing, and execution to minimize risk. • Conduct post-incident reviews (PIRs) and implement corrective actions. • Monitoring, Alerting, and Automation: • Ensure comprehensive monitoring and alerting systems are in place for all cloud resources and services. • Drive automation initiatives using Infrastructure as Code (IaC) tools (e.g. Terraform, CloudFormation, ARM templates) and scripting (e.g., Python, PowerShell) to streamline operational tasks and improve efficiency. • Develop and maintain runbooks and operational documentation. • Cost Optimization & Governance: • Monitor and optimize cloud spending, identifying cost-saving opportunities without compromising performance or reliability. • Ensure adherence to cloud governance policies, security standards, and compliance requirements (e.g., ISO 27001, SOC 2, industry-specific regulations). • Work closely with finance and procurement teams to manage cloud expenditures. • Collaboration & Stakeholder Management: • Partner closely with architecture, engineering, security, and development teams to ensure seamless deployment and operation of cloud services. • Communicate effectively with internal stakeholders, providing regular updates on operational status, incidents, and improvement initiatives. • Act as a subject matter expert for cloud operations within the organization. Qualifications: • Education: Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field; or equivalent practical experience. • Experience: • 5+ years of progressive experience in cloud operations, with at least 3 years in a dedicated cloud operations or SRE role focusing on AWS and Azure. • 1+ years of experience leading and managing a team of operations engineers. • Proven experience with large-scale, highly available, and fault-tolerant cloud environments. • Extensive experience with cloud monitoring tools (e.g., CloudWatch, Azure Monitor, Datadog, Prometheus, Grafana). • Strong practical experience with Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation, ARM templates). • Proficiency in scripting languages (e.g., Python, PowerShell, Bash). • Solid understanding of networking concepts (TCP/IP, DNS, VPNs, Load Balancing, Firewalls) in a cloud context. • Experience with containerization technologies (e.g., Docker, Kubernetes) is a strong plus. • Familiarity with CI/CD pipelines and DevOps principles. • Certifications (Preferred): • AWS Certified Solutions Architect – Associate/Professional • Microsoft Certified Azure Administrator Associate / Azure Solutions Architect Expert • ITIL Foundation or higher certification Please refer to U3’s Privacy Notice for Job Applicants/Seekers at https://u3infotech.com/privacy-notice-job-applicants/. When you apply, you voluntarily consent to the collection, use and disclosure of your personal data for recruitment/employment and related purposes.