Orizzonte Sistemi Navali, a leading company in system engineering in the naval field, offers a master thesis activity dedicated to the validation of chatbots based on Artificial Intelligence to support the consultation of technical manuals and maintenance documentation in complex naval contexts. The objective is to define, test and classify the different methodologies to evaluate their quality and reliability.
The resource will have to define, implement and experiment with validation methods to evaluate the reliability, correctness, consistency and robustness of the chatbot's responses with respect to the original sources (ground truth), with particular attention to the technical-industrial domain. The student will take care of the following activities:
· Study and selection of metrics and validation methods for chatbots based on LLM
· Design of a structured test plan with realistic scenarios and questions representative of maintenance/operational activities
· Implementation of automatic comparison tools between answers and original content (e.g. cosine similarity on embedding, retrieval accuracy)
· Qualitative and quantitative assessment in terms of:
o Accuracy (adherence to technical content)
o Relevance (appropriateness to demand)
o Clarity (clarity and conciseness)
o Confidence estimation (estimated/perceived reliability)
· Robustness analysis (ambiguous/incomplete/out-of-scope questions) and confidence calibration activities
· Analysis and presentation of the results obtained