Enhancing Investigative Journalism: Leveraging Large Language Models and Vector Databases
Permanent link
https://hdl.handle.net/10037/33792Date
2024-05-15Type
MastergradsoppgaveMaster thesis
Author
Ali, Muhammad NaumanAbstract
The advancement in the field of Artificial Intelligence (AI) has brought revolution in almost every field of life, and Journalism is also one of them. Which includes prospective use in Investigating reports and uncovering information. This project explores the avenue of integrating technologies such as Large Language Model (LLM) with the Vector Databases. At the same time, the motive is to address two avenues: Information Retrieval and LLM for summarization and finding information of interest to the journalists.
We begin the study with an overview of related concepts/literature. Then, we proposed a system based on the literature in the methodology. The proposed system is based on Retrieval Augmented Generation (RAG) architecture employing Vector Database and the integration of LLM. The vector database was employed to efficiently retrieve relevant documents, and LLM for putting the information in concise form and also identifying any irregularities in the cases. A series of queries and prompts were presented by iTromsø, and the system was tested. The results, both documents retrieved and the prompt answers were evaluated by iTromsø.
The results for documents retrieval, had varied varied degree of accuracy, with some queries giving the most relevant and some completely fail to retrieve the document in- tended. The quality of answers from also showed variance as expected and ChatGPT4 outperforming ChatGPT 3.5 turbo and GPT4All in answering the prompt with high accuracy.
The duplication of documents and also the presence of special characters and void spaces in the text effected the results for documents retrieval by not able to retrieve most desired document in most cases. Except ChatGPT 4, ChatGPT 3.5 turbo and GPT4All response was also effected due to special characters and white spaces.
While the proposed system showing advantage in assisting journalists with inves- tigative process both in term of scalability and efficiency when compared to traditional approaches. But the limitations in accurate document retrieval must be addressed by cleaning the text data.
Publisher
UiT Norges arktiske universitetUiT The Arctic University of Norway
Metadata
Show full item recordCollections
Copyright 2024 The Author(s)
The following license file are associated with this item: