As a core member of the team, you will provide cloud operational support (including code-level fixes), own incident management, and continuously improve system reliability and operational excellence across production and non-production environments.
• Working Hours: Mon-Fri
• Working Location: Central
• Salary Package: Up to $8800 (basic) + AWS
• Job Type: Contract
Key Responsibilities
• Monitor and analyse production and non-production environments using full-stack observability tools to ensure optimal performance, uptime, and user experience.
• Own incident management end-to-end: detect, triage, resolve incidents, conduct root cause analysis (RCA), coordinate across teams/vendors, and produce post-incident reports.
• Drive continuous improvement initiatives through data-driven insights in collaboration with product, development, and security teams.
• Build and maintain operations documentation, runbooks, and SOPs to support audit compliance and knowledge sharing.
• Automate repetitive operational and infrastructure tasks using Infrastructure-as-Code and scripting tools to reduce downtime and human error.
• Implement and enhance monitoring, alerting, and logging across application and infrastructure layers (APM).
• Manage day-to-day operational activities, produce performance and availability reports, and present insights to stakeholders and leadership.
• Lead and coordinate 24/7 operations support, working with internal teams and external vendors to meet SLAs.
Requirements
• Bachelor’s degree in Computer Science, Information Technology, or a related field.
• Minimum 3 years of experience in Operations Support, Site Reliability Engineering, DevOps, or similar roles.
• Hands-on experience providing L1–L3 support, including troubleshooting at application and infrastructure levels.
• Strong experience with incident, problem, and change management using ITSM tools (e.g. ServiceNow, Jira Service Management, PagerDuty).
• Experience implementing security controls and privileged access management for test and production environments.
• Proven experience in full-stack monitoring and observability, including cloud-native and open-source tools (e.g. CloudWatch, Stackdriver, Prometheus/Grafana, OpenTelemetry).
• Experience with automation and Infrastructure-as-Code (e.g. Terraform, Ansible, scripting).
• Familiarity with Agile/DevOps practices, CI/CD pipelines, test-driven development, and information security best practices.
• Experience managing cloud infrastructure and services (AWS, Azure, Google Cloud); cloud certifications are a plus.
• Strong problem-solving, analytical, and communication skills, with the ability to explain technical issues to non-technical stakeholders.
• A collaborative mindset, proactive attitude, and ability to thrive in a fast-paced, high-performance environment.
By submitting your resume, you consent to the collection, use, and disclosure of your personal information per ScienTec’s Privacy Policy (scientecconsulting.com/privacy-policy).
This authorizes us to:
• Contact you about potential opportunities.
• Delete personal data as it is not required at this application stage.
All applications will be processed with strict confidence. Only shortlisted candidates will be contacted.
Elane Yap Theng Yu- R1989397
ScienTec Consulting Pte Ltd - 11C5781