Vis enkel innførsel

dc.contributor.advisorBongo, Lars Ailo
dc.contributor.advisorRicaud, Benjamin
dc.contributor.advisorBakkeli, Nicoali
dc.contributor.authorAli, Muhammad Nauman
dc.date.accessioned2024-06-13T05:35:06Z
dc.date.available2024-06-13T05:35:06Z
dc.date.issued2024-05-15en
dc.description.abstractThe advancement in the field of Artificial Intelligence (AI) has brought revolution in almost every field of life, and Journalism is also one of them. Which includes prospective use in Investigating reports and uncovering information. This project explores the avenue of integrating technologies such as Large Language Model (LLM) with the Vector Databases. At the same time, the motive is to address two avenues: Information Retrieval and LLM for summarization and finding information of interest to the journalists. We begin the study with an overview of related concepts/literature. Then, we proposed a system based on the literature in the methodology. The proposed system is based on Retrieval Augmented Generation (RAG) architecture employing Vector Database and the integration of LLM. The vector database was employed to efficiently retrieve relevant documents, and LLM for putting the information in concise form and also identifying any irregularities in the cases. A series of queries and prompts were presented by iTromsø, and the system was tested. The results, both documents retrieved and the prompt answers were evaluated by iTromsø. The results for documents retrieval, had varied varied degree of accuracy, with some queries giving the most relevant and some completely fail to retrieve the document in- tended. The quality of answers from also showed variance as expected and ChatGPT4 outperforming ChatGPT 3.5 turbo and GPT4All in answering the prompt with high accuracy. The duplication of documents and also the presence of special characters and void spaces in the text effected the results for documents retrieval by not able to retrieve most desired document in most cases. Except ChatGPT 4, ChatGPT 3.5 turbo and GPT4All response was also effected due to special characters and white spaces. While the proposed system showing advantage in assisting journalists with inves- tigative process both in term of scalability and efficiency when compared to traditional approaches. But the limitations in accurate document retrieval must be addressed by cleaning the text data.en_US
dc.identifier.urihttps://hdl.handle.net/10037/33792
dc.language.isoengen_US
dc.publisherUiT Norges arktiske universitetno
dc.publisherUiT The Arctic University of Norwayen
dc.rights.holderCopyright 2024 The Author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-sa/4.0en_US
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)en_US
dc.subject.courseIDINF-3990
dc.subjectLLMen_US
dc.subjectJournalisimen_US
dc.subjectRAGen_US
dc.subjectVector Databaseen_US
dc.subjectChatGPTen_US
dc.subjectSummarizationen_US
dc.titleEnhancing Investigative Journalism: Leveraging Large Language Models and Vector Databasesen_US
dc.typeMastergradsoppgaveno
dc.typeMaster thesisen


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)