Senior ML/AI Engineer opportunity to work for a company focused on software IP Solutions and Engineering Services.

San Francisco 11 days agoFull-time External
Negotiable
Dice is the leading career destination for tech experts at every stage of their careers. Our client, VortexLink, is seeking the following. Apply via Dice today! Senior ML/AI Engineer Long term contract Office located in Mountain View, CA (100% remote considered) Role Overview As an ML/AI Engineer, you will be responsible for the end-to-end acceleration of Large Language Models (LLMs) and deep learning frameworks. You will not only implement models but will deeply optimize their underlying kernels and integration within deployment pipelines. This is a high-visibility, customer-facing role requiring a blend of deep technical mastery and leadership. Key Responsibilities • Framework Customization & Extension: Enhance and extend open-source ML inference engines such as vLLM, SGLang, and PyTorch. • Feature Engineering: Lead the implementation of new architectural features, such as adding support for novel attention mechanisms or custom paged KV-cache logic. • Performance Optimization & Benchmarking: Conduct rigorous performance analysis of attention mechanisms (Torch native, Triton, FlashInfer) to optimize for latency, throughput, and memory efficiency. • Kernel Development: Write and optimize custom GPU kernels using Triton or CUDA to fuse operations and improve memory locality. • Pipeline Debugging: Identify and resolve bottlenecks within the ML pipeline, ensuring zero-regression and high-performance execution on proprietary AI hardware. Required Skills & Qualifications • Experience: 3+ years of professional experience in high-performance computing or ML • Programming Mastery: Expert-level proficiency in C++ and Python. • Inference Expertise: Hands-on experience customizing inference frameworks and contributing to high-performance open-source projects. • Mathematical Foundations: Deep understanding of attention mechanisms, prefix caching strategies, and KV-cache management. • Hardware Knowledge: Strong understanding of GPU architectures (e.g., Turing/RTX) and memory constraints (VRAM). Preferred Skills • Specialized Kernels: Proven experience with FlashInfer, Triton, or specialized CUDA- based memory management. • Strategic Thinking: Ability to provide a "Final Verdict" on technology choices based on customer-specific constraints like latency requirements vs. VRAM limits.