Job Description
We’re seeking a hands-on DevOps Engineer to design and operate a hybrid cloud observability stack across on-prem and AWS. You’ll own telemetry—logs, metrics, traces, alerts, and dashboards—to enable teams to detect, troubleshoot, and prevent issues at scale.
Key Responsibilities
• Design and maintain a unified observability platform across AWS and on-prem environments.
• Build and manage logging pipelines (ELK/OpenSearch, Logstash, Fluent Bit, Filebeat).
• Develop dashboards and alerts in Grafana, Kibana, and CloudWatch.
• Deploy and scale Prometheus, Alertmanager, and Splunk for metrics, tracing, and analytics.
• Automate infrastructure with Terraform, Helm, and CI/CD pipelines.
• Enforce RBAC, data retention, and cost optimization for observability.
• Drive SRE practices—SLIs, SLOs, error budgets, and post-incident reviews.
Required Skills
• 4–8+ years in DevOps/SRE/Platform Engineering.
• Strong in Prometheus, Grafana, ELK/OpenSearch, Splunk, AWS CloudWatch.
• Expertise in Terraform, Kubernetes, Python/Go scripting, and Git-based CI/CD.
• Solid understanding of Linux, networking, containers, and distributed systems.
Preferred
• Experience with SRE and observability-as-code concepts.
• AWS, Kubernetes, or Terraform certification is a plus.