The thesis project aims to study the relevance of LLMs (Large Language Models) improved by graph-RAG (Retrieval augmented generation) techniques to assist academic health technology designers and their hospital operators, under equipped with expert legal profiles, in their regulatory compliance procedures, with among other things the search for contradictory requirements between texts.
This project is the follow-up of a work initiated as part of a Master 2 internship, on a corpus of nearly 700 legal texts at various stages of development (from the bill to the decree), that includes the construction of tokenized databases and context chunks databases for the assessment of instruction models improved by RAG techniques. Priority is given to open source models for which training databases are known and whose performance/cost ratio is the most favorable.
The thesis project will focus on studying graph-RAG techniques to improve the results obtained, and will involve the identification of the most relevant content to be converted into graphs, the research and the implementation of optimized prompt engineering strategies exploiting the knowledge bases thus constructed (chunk bases and graph bases). A substantial part of the project will be devoted to the research and testing of the evaluation criteria of the most relevant models in view of the tasks delegated to them. Depending on the use cases selected during the implementation of the project, the exploration of fine tuning strategies may be considered.
The project will be conducted in collaboration with the Instituto de Tecnología para la Innovación en Salud y Bienestar (ITISB) of the Chilean University Andrés Bello (UNAB), and will give to the PhD candidate the opportunity to conduct a part of his/her research in Chile.