We are looking for a Lead Site Reliability Engineer which requires a person having a strong Retail background and must have experience working on New Relic. Please have a look at the job description below:
Job Description:
• As a Senior/Lead Site Reliability Engineer, you’ll take ownership of the reliability, performance, and scalability of high-traffic retail platforms.
• This role demands deep experience in cloud-native environments, a strong observability mindset (with New Relic as a must), and the ability to lead both incident response and system design discussions with client teams.
• You’ll serve as a technical leader and mentor, partnering with engineering, DevOps, and product teams to build resilient systems for real-time retail operations—including eCommerce platforms like Shopify (bonus).
Key Responsibilities:
• Lead reliability and observability strategy for large-scale retail systems.
• Architect and implement robust monitoring using New Relic—dashboards, SLOs, alerts, synthetic monitoring, etc.
• Guide incident response processes and run blameless postmortems.
• Own availability, performance, and scalability of customer-facing apps and services.
• Design infrastructure for high availability using Kubernetes, Docker, and IAC tools (Terraform, CloudFormation).
• Collaborate with client engineering teams to optimize system behavior during retail surges (e.g., Black Friday).
• Mentor junior SREs and set operational best practices.
• Partner with dev and QA to integrate performance testing and failure injection into CI/CD workflows.
• Advocate for DevOps/SRE best practices (shift-left monitoring, chaos testing, performance budgets).
Required Qualifications:
• 8+ years in Site Reliability Engineering, DevOps, or Platform Engineering.
• Expertise with New Relic—must be able to architect observability end-to-end.
• Proven experience supporting retail or eCommerce platforms at scale.
• Strong coding/scripting (Python, Bash, or Go).
• Production experience with AWS/GCP/Azure and Kubernetes.
• Deep understanding of infrastructure automation (Terraform, Ansible, or Pulumi).
• Strong communication skills, client-facing presence, and leadership ability.
Nice to Have:
• Experience with Shopify or headless commerce stacks.
• Experience leading distributed teams.
• Familiarity with traffic-heavy retail events and strategies (caching, autoscaling, edge optimization).
• Experience integrating monitoring into microservices, APIs, and frontend apps