We are seeking a skilled and motivated
DevOps / Site Reliability Engineer (SRE)
with
2+ years of experience
to help us build, scale, and maintain robust, secure, and high-availability infrastructure. As a DevOps/SRE team member, you will work closely with development, QA, and operations teams to automate processes, monitor system health, and ensure the reliability of our services.
This is a hands‑on role that requires strong technical skills, a deep understanding of modern DevOps tools and practices, and a problem‑solving mindset.
Key Responsibilities:
Design, implement, and maintain
CI/CD pipelines
for reliable code deployment
Monitor application performance and system reliability using tools like
Prometheus ,
Grafana , or
Datadog
Maintain and improve
cloud infrastructure
(e.g., AWS, GCP, Azure) following best practices
Manage
infrastructure as code
using tools such as
Terraform ,
Ansible , or
CloudFormation
Troubleshoot infrastructure and application issues, ensuring minimal downtime and fast resolution
Automate repetitive operational tasks and improve development workflows
Implement and enforce security, backup, and disaster recovery strategies
Participate in
on‑call rotation
and respond to incidents with root cause analysis and postmortem reviews
Work closely with development teams to ensure applications are designed for performance, availability, and scalability
Optimize resource usage and costs across cloud environments
Qualifications:
Required:
Bachelors degree in Computer Science, Engineering, or a related field
2+ years of experience in a
DevOps ,
SRE , or
Systems Engineering
role
Hands‑on experience with
Linux/Unix system administration
Experience with
CI/CD tools
such as
Jenkins ,
GitHub Actions ,
CircleCI , or
GitLab CI
Working knowledge of
cloud platforms
(AWS, GCP, Azure)
Familiarity with
containerization and orchestration tools
(e.g., Docker, Kubernetes)
Experience with
infrastructure as code
tools like Terraform, Ansible, or similar
Proficient in at least one scripting or programming language (e.g., Bash, Python, Go)
Strong understanding of
monitoring, logging , and
alerting systems
Version control with Git
Preferred:
Experience with
Kubernetes administration
in production environments
Familiarity with
security best practices
and compliance standards
Understanding of
networking ,
load balancing , and
DNS configurations
Exposure to
incident management
and
SLA/SLO/SLI
concepts
Experience working in
Agile
environments
#J-18808-Ljbffr