Towards Interpretable, Trustworthy and Reliable AI
Permanent link
https://hdl.handle.net/10037/33143View/ Open
Date
2024-03-15Type
Doctoral thesisDoktorgradsavhandling
Author
Gautam, SrishtiAbstract
The field of artificial intelligence recently witnessed remarkable growth, leading to the development of complex deep learning models that perform exceptionally across various domains. However, these developments bring forth critical issues. Deep learning models are vulnerable to inheriting and potentially exacerbating biases present in their training data. Moreover, the complexity of these models leads to a lack of transparency, which can allow biases to go undetected. This can lead to ultimately hindering the adoption of these models due to a lack of trust. It is therefore crucial to foster the creation of artificial intelligence systems that are inherently transparent, trustworthy, and fair.
This thesis contributes to this line of research by exploring the interpretability of deep learning through self-explainable models. These models represent a shift towards more transparent systems, offering explanations that are integral to the model's architecture, yielding insights into their decision-making processes. Consequently, this inherent transparency enhances our understanding, thereby providing a mechanism to address the inadvertent learning of biases.
To advance the development of self-explainable models, this thesis undertakes a comprehensive analysis of current methodologies. It introduces a novel algorithm designed to enhance the explanation quality of one of the state-of-the art models. In addition, this work proposes a novel self-explainable model that surpasses existing methods by generating explanations through a learned decoder, facilitating end-to-end training, and addressing the prevalent trade-off between explainability and performance. Furthermore, to enhance the accessibility and sustainability of these models, this thesis also introduces a universal methodology to transform any pre-trained black-box model into a self-explainable one without the need for re-training.
Through the proposed methodology, this research identifies and counteracts the learning of artifacts -- spurious correlations -- from the data, further emphasizing the need for transparent models. Additionally, this thesis expands its scope to encompass the dimension of fairness for large language models, demonstrating the tendency of these models to reinforce social biases.
The results of this research highlight the efficacy of the proposed methodologies, thereby paving the way for artificial intelligence systems that are not only accurate but also transparent, fair, and reliable, to facilitate widespread adoption and trust in artificial intelligence technologies.
Has part(s)
Paper I: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2023). This looks more like that: Enhancing self-explaining models by prototypical relevance propagation. Pattern Recognition, 136, 109172. Also available in Munin at https://hdl.handle.net/10037/27611.
Paper II: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2022). Demonstrating the risk of imbalanced datasets in chest x-ray image-based diagnostics by prototypical relevance propagation. 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India. Not available in Munin due to publisher’s restrictions. Published version available at https://doi.org/10.1109/ISBI52829.2022.9761651.
Paper III: Gautam, S., Boubekki, A., Hansen, S., Salahuddin, S., Jenssen, R., Höhne, M. & Kampffmeyer, M. (2022). ProtoVAE: A trustworthy self-explainable prototypical variational model. Advances in Neural Information Processing Systems, 35, 17940–17952. Also available at https://proceedings.neurips.cc/paper_files/paper/2022/hash/722f3f9298a961d2639eadd3f14a2816-Abstract-Conference.html.
Paper IV: Gautam, S., Boubekki, S., Höhne, M. & Kampffmeyer, M.C. Prototypical Self-Explainable Models Without Re-training. (Manuscript under review). Also available in arXiv at https://doi.org/10.48550/arXiv.2312.07822.
Paper V: Liu, Y., Gautam, S., Ma, J. & Lakkaraju, H. Investigating the Fairness of Large Language Models for Predictions on Tabular Data. (Manuscript under review). Also available in arXiv at https://doi.org/10.48550/arXiv.2310.14607.
Related research data
MNIST: Deng, L. (2012). The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6), 141–142, available at https://doi.org/10.1109/MSP.2012.2211477.Publisher
UiT Norges arktiske universitetUiT The Arctic University of Norway
Metadata
Show full item recordCollections
The following license file are associated with this item: