Requirements
• Are in your last year of engineering school or M2 with a specialization in AI and machine learning,
• Enjoy working as a team player and learning from others,
• Possess good communication skills: you can explain your work and the technical choice you made, both orally and in writing,
• Demonstrate great programming skills in Python, and experience working with machine learning / AI packages (OpenAI, HuggingFace Transformers, Scikit-learn, Pytorch or Tensorflow, Pandas …),
• Are curious, humble, organized, and a relentless doer,
• Are confident in working in an international environment,
• [Optional] Have already been confronted with some components of our backend stack (Typescript/Javascript, Node.js, GraphQL, Docker),
• [Optional] Are experienced in software engineering (git, CI, linting, packaging, …)
What the job involves
• As an intern, you will be at the heart of the design of a patent similarity technology,
• A patent is a legal document that describes an invention,
• It is composed of several written sections and drawings,
• While writing a patent, there is an interest in knowing if there is any other patent that is using or describing the patent, to make sure that it is really describing an invention,
• After the patent’s filing, it is also important to monitor the new products, inventions, and standards to see if they use parts of the invention,
• It may also be used at the prosecution stage, where potentially infringing patents can be detected and analyzed,
• The objective of the internship is to design and evaluate techniques that enables to find inventions, or part of inventions, that are similar to a query invention,
• The work will involve finding the appropriate embeddings for these tasks for text and image content, trying several patent chunking strategies, and implement the appropriate method to retrieve the similar patents,
• The evaluation will deal with the relevance of the retrieved patents, and also the latency of retrieval,
• Building an evaluation dataset will also be part of the work,
• The intern will also implement methods to exploit user feedback in the application where this technology will be integrated,
• You will do research and development work on your internship subject (60% of the time),
• You will do Python software development work to integrate your work into the product (20% of the time),
• You will communicate your technical work (20 % of the time):,
• Publication in a relevant workshop or conference on the work done,
• Blog posts on the Kili blog on selected ML topics