This job posting has expired and is no longer accepting applications.

Roles & Responsibilities Key Responsibilities Design, implement, and maintain CI / CD pipelines to enable fast, reliable releases Automate infrastructure provisioning using Infrastructure as Code (IaC) tools Manage and optimise cloud infrastructure (AWS / Azure / GCP) Deploy, scale, and manage containerised applications using Docker & Kubernetes Monitor system health, availability, and performance using observability tools Implement SRE best practices : SLIs, SLOs, SLAs, error budgets Perform incident response, root cause analysis (RCA), and post-mortems Improve system reliability, scalability, and cost efficiency Collaborate with development, QA, and security teams Implement DevSecOps practices and security automation Required Skills & Qualifications Strong experience in DevOps / SRE roles Proficiency in Linux / Unix system administration Experience with CI / CD tools (Jenkins, GitHub Actions, GitLab CI, Azure DevOps) Hands-on experience with Docker & Kubernetes Strong scripting skills in Python / Bash / Shell Experience with IaC tools (Terraform, CloudFormation, ARM) Knowledge of monitoring & logging tools (Prometheus, Grafana, ELK, Datadog) Understanding of networking, security, and cloud architecture Good to Have (Preferred Skills) Experience with SRE frameworks and reliability engineering Exposure to DevSecOps tools (Snyk, Trivy, SonarQube) Experience in high-availability, large-scale systems Cloud certifications (AWS / Azure / GCP) Knowledge of service mesh (Istio, Linkerd) Tell employers what skills you have Scalability Kubernetes Azure Pipelines Root Cause Analysis Scripting Reliability Logging Networking Python Docker Ansible Linux

DevOps / Site Reliability Engineer (SRE) – Automation

YEPEESOFT PTE. LTD.