Job Title: Data Scientist - Natural Language Processing (NLP)
Location: Los Angeles, California
Job Description:
We are seeking a highly skilled and motivated Data Scientist with expertise in Natural Language Processing (NLP) to join our dynamic team in Los Angeles, California. As a Data Scientist specializing in NLP, you will play a key role in developing and optimizing Large Language Models (LLMs) for retrieval-augmented generation (RAG) applications. Your responsibilities will encompass a wide range of challenges, from architecting effective models to addressing latency issues and ensuring responsible AI practices.
Responsibilities:
Architectural Design:
Develop and optimize RAG-based LLMs, addressing challenges such as context window size, accurate retrieval of chunks, and ranking relevance.
Retrieval Strategies:
Implement context-aware chunking, mixed-size chunks, knowledge graphs, and keyword match techniques for efficient retrieval.
Latency Optimization:
Work on pipeline latency optimization, including caching vectorized tokens and semantics-aware vector caching.
Reasoning and Problem-Solving:
Tackle reasoning challenges using frameworks like ReACT, function calling, and graph/tree of thought reasoning.
Custom Embeddings and Information Extraction:
Integrate custom embeddings within LangChain or LlamaIndex, leverage LLMs for information extraction, and expand queries for improved retrieval.
Platform Evaluation:
Evaluate Microsoft Copilot in Azure OpenAI, open-source frameworks like LangChain, AWS Bedrock, and other LLM playgrounds.
Training and Fine-Tuning:
Implement techniques like LORA for training and fine-tuning LLMs for specific instructions.
Quantization and Benchmarking:
Explore quantization methods for cost-effective GPU usage and participate in benchmarking efforts like Hugging Face Open LLM Leaderboard and Chatbot Arena.
Collaboration and Communication:
Collaborate effectively with cross-functional teams, communicate complex technical concepts, and contribute to a collaborative work environment.
Qualifications:
Advanced degree (Master's or Ph.D.) in Computer Science, Data Science, or a related field.
Proven experience in developing and optimizing Large Language Models for NLP applications.
Proficiency in programming languages such as Python and expertise in relevant libraries and frameworks.
Strong knowledge of NLP techniques, including retrieval-augmented generation, information extraction, and custom embeddings.
Experience with responsible AI practices and bias mitigation strategies.
Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.
Preferred Skills:
Familiarity with open-source LLMs and frameworks.
Previous experience in benchmarking and validation of language models.
Strong understanding of foundational models, transformer architectures, and other relevant concepts.
Effective communication skills for both technical and non-technical audiences.
Generative AI NLP Data Scientist California