A technology company in Toronto is seeking a Senior Site Reliability Engineer to manage and optimize their HPC cluster operations.
The role includes deploying infrastructure-as-code solutions and supporting research teams with cluster optimization.
The ideal candidate will have over 5 years of experience in SRE or HPC operations, proficiency in Linux and Kubernetes, and expertise in Ceph storage deployments.
Join us to work with cutting-edge GPU technology in a dynamic environment.
#J-18808-Ljbffr