Employment Type
• 12-month renewable contract
• Employer: NTT DATA
• Location: Singapore (Hybrid / Onsite as per project needs)
Job Description
We are looking for a Cloud Platform Operations Engineer to support and operate a mission-critical cloud platform used at a national scale. This role focuses on production operations, reliability, incident ownership, and continuous improvement within a modern AWS cloud environment.
You will work closely with engineering, security, and stakeholder teams to ensure the platform remains highly available, secure, scalable, and reliable, while driving strong operational standards and best practices.
This role is ideal for engineers who enjoy hands-on cloud operations, take full ownership of incidents, and want exposure to large-scale, regulated production environments.
Key Responsibilities
• Lead day-to-day cloud platform operations with focus on monitoring, performance optimisation, reliability, and operational excellence within AWS.
• Own L2 incident management, troubleshooting, and escalation for high-throughput production workflows, ensuring issues are resolved within defined SLAs.
• Manage, design, and continuously optimise AWS cloud infrastructure to ensure scalability, security, cost efficiency, and high availability.
• Establish, maintain, and enforce operational processes including runbooks, dashboards, health checks, incident communication, and operational reporting.
• Drive change, release, and maintenance management by performing impact analysis, risk assessment, mitigation planning, and executing upgrades safely.
• Review testing outcomes to ensure changes meet operational, performance, and security requirements before production release.
• Define, track, and continuously improve operational OKRs, SLAs, and reliability metrics.
• Contribute to backend enhancements, bug fixes, and operational tooling to improve platform stability and maintainability.
• Share operational best practices, incident learnings, and technical knowledge within the team to uplift engineering and reliability standards.
Requirements
Mandatory
• Degree in Computer Science, Information Technology, or equivalent practical experience.
• Minimum 2 years of hands-on experience managing production workloads in public cloud environments (AWS preferred).
• Strong problem-solving skills across cloud infrastructure, applications, and distributed systems.
• Proven experience owning and resolving production incidents with urgency and attention to detail.
• Experience defining and enforcing operational processes, SOPs, and runbooks.
• Understanding of high-availability cloud architectures, security best practices, and preventative operational controls.
• Knowledge of change management, impact assessment, and service reliability improvement practices.
Preferred
• Experience operating applications on AWS at scale.
• Experience supporting regulated, enterprise, or public-sector environments.
• Familiarity with reliability or SRE-style practices is an advantage.
Key Technologies
• AWS (production architecture, monitoring, security, availability)
• Terraform (Infrastructure as Code)
• GitLab (CI/CD pipelines, version control)
• Monitoring, logging, and operational tooling in cloud environments
Why Join
• Work on a large-scale, high-impact cloud platform
• Strong exposure to production operations and reliability engineering
• Opportunity to build deep expertise in AWS cloud operations
• Contract role with renewal potential under NTT DATA
Interested candidates are kindly requested to email their CV with their experience to sandeep.sringeripai@global.ntt
We look forward to your application!