Fine-tuning Large Language Models on historical causes of death data
Permanent lenke
https://hdl.handle.net/10037/34160Dato
2024-05-15Type
Master thesisMastergradsoppgave
Forfatter
Wilhelmsen, Kristoffer BergSammendrag
This thesis assesses the impact of fine-tuning and rag on llms in accurately assigning icd-10 codes to historical causes of death. Using funeral records from Trondheim, Norway (1830-1920), we fine-tuned Llama 3 and Mistral on 2000 records. Twelve experiments were conducted on 2000 additional records to evaluate the accuracy of each knowledge-injection technique, as well as a combination of the two.
The results indicate that fine-tuning as a standalone knowledge-injection technique achieved the highest accuracy, generating 88% full matches and 2% partial matches for icd-10 codes, up from 58% full matches and 25% partial matches in previous research. However, concerns regarding memorization of training data due to the lack of diversity in the available dataset remain. Moreover, combining RAG with fine-tuning led to a decrease in accuracy, while a sole rag approach decreased the results even further. These findings serve as proof-of-concept for the automatic assignment of icd-10 codes to historical causes of death, paving the way for future research.
Forlag
UiT Norges arktiske universitetUiT The Arctic University of Norway
Metadata
Vis full innførsel
Copyright 2024 The Author(s)
Følgende lisensfil er knyttet til denne innførselen: