We are partnering with a leading global technology company at the forefront of innovation to recruit for a pivotal role in their AI team. This is a unique opportunity to advance the practical application of Large Language Models (LLMs) and Multimodal AI within critical industrial verticals, working alongside top-tier research talent.
About the Role:
You will be instrumental in the end-to-end lifecycle of cutting-edge large models—from research and training to optimization and real-world deployment. Your work will bridge the gap between state-of-the-art AI research and impactful industrial solutions, directly shaping how next-generation AI integrates into complex business environments.
Key Responsibilities:
• Lead the training, fine-tuning (SFT), and system deployment of vertical domain-specific large models.
• Research and implement advanced model compression & optimization techniques (pruning, quantization, distillation) to enhance inference efficiency.
• Design and develop algorithms for RAG (Retrieval-Augmented Generation) and Agent modules to boost AI reasoning in dynamic scenarios.
• Drive the application of multimodal (vision-language) understanding technologies, such as Large Vision Models (LVM), for industrial use cases.
• Architect and build large-scale, high-quality industry datasets to support model pre-training, fine-tuning, and evaluation.
• Develop and maintain robust validation, evaluation, and performance monitoring frameworks for AI systems.
• Contribute to the development of large model application platforms and microservices to improve modularity and usability.
What We're Looking For (Must-Have):
• Master’s or PhD in Computer Science, Mathematics, Electrical Engineering, Data Science, or a related field.
• Solid foundation in ML/DL, with in-depth understanding of Transformer architectures and hands-on experience in end-to-end LLM training/development.
• Proficient in Python and PyTorch ecosystem (e.g., Hugging Face, DeepSpeed, PEFT).
• Practical experience with large model compression techniques (pruning, quantization) and inference optimization frameworks (e.g., vLLM, Triton).
• Strong capability in large-scale data processing and familiarity with big data tools (e.g., Spark).
• Preferrably if you have Chat BI experience.
Bonus Points (Preferred):
• Hands-on experience in developing RAG systems and AI Agent modules.
• Knowledge of CUDA programming, distributed training, or hardware acceleration (GPU/TPU).
• Publications at top-tier AI conferences (NeurIPS, ICLR, CVPR, etc.).
• Experience with Docker, FastAPI, and enterprise-level model deployment pipelines.
What We Offer:
• The chance to work on groundbreaking AI projects with tangible real-world impact.
• Collaboration with leading academic research teams on cutting-edge explorations.
• A culture that values technical excellence, innovation, and professional growth.
• Competitive compensation, comprehensive benefits, and a supportive work environment.