Senior Site Reliability Engineer (SRE) Location: Chicago, IL (Onsite) Type: Contract Role Overview: We are seeking a Senior Site Reliability Engineer (SRE) with strong expertise in AWS infrastructure, automation, observability, and production support. The ideal candidate will bring a blend of DevOps and SRE practices, ensuring our systems remain resilient, scalable, and cost-efficient. This role requires hands-on technical depth, proactive problem-solving, and the ability to embed reliability engineering across development teams. Key Responsibilities: • Design, implement, and maintain secure, scalable, and highly available AWS infrastructure. • Build and enhance CI/CD pipelines and Infrastructure as Code (IaC) solutions using Terraform and Harness. • Implement and manage monitoring, logging, alerting, and distributed tracing with tools like Dynatrace and Datadog. • Troubleshoot production incidents, conduct blameless postmortems, and strengthen incident response processes. • Optimize systems for performance, cost efficiency, and reliability. • Drive chaos engineering and resilience testing initiatives. • Collaborate with developers to implement SLAs, SLOs, and error budgets. • Mentor junior SREs and promote DevOps/SRE best practices across the organization. Required Skills & Experience: • 8+ years of experience in DevOps/SRE roles with a strong focus on AWS. • Proven expertise in AWS services and infrastructure automation. • Strong hands-on experience with Terraform, Harness, or similar IaC/CICD tools. • Advanced knowledge of monitoring & observability platforms (Dynatrace, Datadog, Prometheus, Grafana, etc.). • Deep understanding of incident response, disaster recovery, and reliability frameworks. • Solid coding/scripting skills in Python, Bash, or similar languages. • Experience with chaos engineering, resilience testing, and fault tolerance design. • Strong collaboration, leadership, and mentoring capabilities. Preferred Qualifications: • Familiarity with Kubernetes, Docker, and container orchestration. • Experience with FinOps practices (cloud cost optimization). • Background in distributed systems, scalability, and fault-tolerant architectures.

Senior Site Reliability Engineer (SRE)

Purple Drive Technologies