Senior SRE: AI/ML HPC Infrastructure

Toronto

13 days ago

Full-time

External

Negotiable

Boson AI

A leading technology company in Toronto is seeking a Senior Site Reliability Engineer to manage and optimize their high-performance computing (HPC) cluster. The ideal candidate will have over 5 years of experience in SRE or HPC operations, proficiency in Linux, and expertise in Kubernetes and automation. This role involves deploying infrastructure-as-code solutions and supporting research teams. A competitive salary ranging from $150,000 to $250,000 per year is offered along with opportunities for continuous learning. #J-18808-Ljbffr