Overview:
Reporting to the team lead of Adaptive Intelligence, this role will focus on designing, implementing, and optimizing vector databases, with a strong focus on utilizing ChromaDB/Pinecone/FAISS for vector applications. The role involves contributing to the development and maintenance of our data infrastructure, ensuring efficient handling of complex relationships and vectors related to Large Language Models (LLMs). The successful candidate should have a proven track record of accomplishments in applying data engineering skills to real world problems, have a disruption through innovation mindset and able to contribute to all phases of the software development life cycle. The candidate will also have regular interactions with various internal stakeholders and leaders to assist in formulating the problem, assigning priorities and providing status updates on the initiatives.
Accountabilities:
• Understand the various data points and organization
• Design and implement vector databases to efficiently store and retrieve high-dimensional vectors
• Optimize database queries, indexing strategies for vector operations
• Architect and performance tune vector pipelines for embedding and text similarity search
• Identify and resolve performance bottlenecks to ensure efficient data retrieval
• Collaborate with application developers to integrate vector databases into various pipelines i.e. front-end user interfaces, batch pipeline processes etc
• Provide support for query optimization and data modeling for application-specific requirements
• Implement and maintain data security measures for vector databases
• Ensure compliance with relevant data protection regulations and industry standards
• Work closely with cross-functional teams, including data scientists, software engineers, and product managers
• Communicate technical concepts and solutions effectively to both technical and non-technical stakeholders
• Troubleshoot, improve, and scale continuous integration, continuous delivery, and continuous deployment (CI/CD) pipelines
• Write design documents to build consensus for new systems components and enhancements to existing components
• Participate in problem definition and design sessions with business partners to develop a thorough understanding of the business problems
• Research new vector databases and their potential for application in the reinsurance space
• Work collaboratively with other AI and IT teams within Munich Re on joint problems
• Review requests from data scientists and software engineers, and enforce consistency, performance, readability, and security across codebase
• Develop documentation to share knowledge with other engineers/scientists
Qualifications:
• 3+ years relevant industry experience in deploying database engineering solutions
• Bachelors or Masters in Software Engineering, Computer Science, Computer Engineering or relevant statistical discipline
• Knowledge of distributed database systems
• Familiarity with machine learning and AI concepts related to vector data like Retrieval-augmented generation (RAG)
• Experience with cloud-based database solutions especially in Azure but similar experience with AWS or Google is welcomed
• Experience in Azure DevOps & Azure Cloud Services (e.g. Azure Blob, Azure Key Vault, Azure Data Factory) or similar experience with AWS or Google
• Proven experience in designing and implementing vector databases, with a focus on ChromaDB/Pinecone/FAISS etc for vector applications
• Strong proficiency in embeddings, vectorization, vector stores, database optimization, performance tuning, and relevant query languages
• Familiarity with embedding, retrieval algorithms, agents, data modeling for vector development graphs
• Experience with LLM and other related frameworks like Langchain, LLamaCPP etc
• Extensive programming experience with relevant programming languages, such as Python, Java, or Scala
• Foundational understanding in any of the following: Natural Language Processing, Computer Vision, Machine Learning or Deep Learning
• Experience with CI/CD pipelines, Automated Testing, Automated Deployments, Agile methodologies, Unit Testing and Integration Testing tools
• Excellent problem-solving skills and the ability to work in a collaborative team environment
• Excellent communication skills
• Work collaboratively across a variety of technical and business teams
• Participate in requirements gathering sessions with the business stakeholders
• Contribute to a positive and inclusive environment