CoreWeave is The Essential Cloud for AI™, delivering a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. The Senior Software Engineer will design and build software that manages complex infrastructure across globally distributed datacenters, focusing on high-performance computing systems that power large AI workloads.
Responsibilities
• Design and implement solutions to problems of scale for multi-site deployment and management of CoreWeave’s global server hardware fleet
• Build and maintain backend services and APIs (gRPC/REST) in Go or Python to interact with Kubernetes and other infrastructure systems
• Develop provisioning services, automation workflows, and fleet management tools that span from bare metal to container orchestration
• Write and maintain Kubernetes custom controllers and operators to automate infrastructure behavior
• Design and implement observability solutions for large-scale server monitoring to improve system stability and insight
• Adapt and extend open source tooling to enhance visibility into system metrics, performance, and health
• Create test plans, deployment automation, dashboards, alerts, and insights into our fleet operations
• Resolve integration challenges across the entire infrastructure stack, from data center hardware to orchestration platforms
• Participate in an on-call rotation
Skills
• 5+ years of experience in software or infrastructure engineering
• Proficiency in Go and/or Python software development
• Familiarity with CI/CD tools like Argo, Flux, and GitHub Actions
• Strong understanding of Linux internals
• Experience designing, implementing, and monitoring Kubernetes operators for custom resource definitions
• Experience with infrastructure automation and configuration management tools like Ansible, Puppet, Chef, Salt
• Experience with distributed cloud computing principles, including testing strategies, observability, error budgets, and fault-tolerant design
• Experience implementing metrics pipelines, custom alerts, and monitoring strategies
• Ability to break down complex problems into achievable tasks and collaborate with teammates to execute them
• Willingness and ability to thrive in a fast-paced startup environment
Benefits
• Medical, dental, and vision insurance - 100% paid for by CoreWeave
• Company-paid Life Insurance
• Voluntary supplemental life insurance
• Short and long-term disability insurance
• Flexible Spending Account
• Health Savings Account
• Tuition Reimbursement
• Ability to Participate in Employee Stock Purchase Program (ESPP)
• Mental Wellness Benefits through Spring Health
• Family-Forming support provided by Carrot
• Paid Parental Leave
• Flexible, full-service childcare support with Kinside
• 401(k) with a generous employer match
• Flexible PTO
• Catered lunch each day in our office and data center locations
• A casual work environment
• A work culture focused on innovative disruption
Company Overview
• CoreWeave is a cloud-based AI infrastructure company offering GPU cloud services to simplify AI and machine learning workloads. It was founded in 2017, and is headquartered in Livingston, New Jersey, USA, with a workforce of 1001-5000 employees. Its website is https://www.coreweave.com.