Title: SRE/Ops Engineer
Location: Englewood, NJ
• Support and enhance observability (monitoring, logging, alerting) across production systems
• Help maintain SLIs/SLOs for key services
• Participate in evaluating services for production readiness
• Collaborate with development teams to identify reliability risks and improve system architecture
• Contribute to automation of operations, including CI/CD pipelines, incident response, and infrastructure provisioning
• Participate in incident response and on-call rotations for critical services
• Contribute to post-incident analysis and drive reliability improvements
• Partner with security, infrastructure, and product teams to support performance, compliance, and operational excellence
Must-Haves
• Willingness to work onsite and participate in a 24/7 on-call rotation as needed
• 5+ years of experience managing and supporting high-traffic digital platforms
• Strong experience with CI/CD pipelines and deployment automation
• Experience with cloud platforms such as AWS and/or GCP
• Solid scripting skills (e.g., Python, Bash, Groovy)
• Hands-on experience with observability and monitoring tools like Datadog, New Relic, AppDynamics, or similar
• Understanding of web, mobile, and OTT architectures
• Experience supporting large scale websites, Mobile and OTT applications, microservices, APIs, and distributed systems
• Experience with infrastructure-as-code tools such as Ansible, Terraform, or Chef
• Familiarity with performance testing tools like JMeter or k6
• Hands on experience with debugging tools like Charles Proxy or Fiddler
• Preferred Qualifications
• Experience working with CDNs (e.g., Akamai) and reverse proxies (e.g., NGINX, Varnish)
• Exposure to video streaming platforms and Familiarity with application/infrastructure security controls and best practices
• Certifications in SRE, DevOps, or Performance Engineering are a plus