Staff Deep Learning System Engineer (CUDA Specialist)

Los Angeles 22 months agoFull-time External
1.3m - 2.2m
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Myriad Consulting Inc, is seeking the following. Apply via Dice today! Staff Deep Learning System Engineer (CUDA Specialist Location: Silicon Valley Bilingual Chinese speaking required. Pay range 180k-300k We are seeking a highly skilled and motivated Deep Learning Systems Engineer with a strong background in CUDA programming to join our team. The successful candidate will be responsible for implementing and optimizing our large distributed system, a globally distributed scheduling service for efficient and reliable execution of deep learning workloads. Job Responsibilities: • - Implement and optimize a large distributed system, focusing on CUDA programming and GPU optimization. • - Develop custom servers for CUDA kernel launches and manage JIT kernels. • - Implement and manage a hardware abstraction layer for device-specific APIs. • - Optimize GPU calls and manage memory allocation APIs. • - Handle device synchronization APIs and ensure correct and efficient time-slicing and replica splicing. • - Collaborate with the team to design and implement new features and improvements. • - Troubleshoot and resolve issues related to CUDA programming and GPU optimization. Minimum Skill Requirements: • - Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field. • - Proven experience in CUDA programming and GPU optimization. • - Strong knowledge of deep learning workloads and distributed systems. • - Experience with NVIDIA GPUs and related toolkits, such as cuObjDump and nvrtcCompileProgram. • - Familiarity with memory allocation and device synchronization APIs. • - Strong problem-solving skills and ability to troubleshoot complex software issues. • - Excellent communication and teamwork skills. Preferred Skill Requirements: • - Experience with a large distributed system or similar distributed scheduling services. • - Knowledge of deep learning frameworks like PyTorch or TensorFlow. • - Experience with NCCL or similar collective communication libraries