Towards Interpretable, Trustworthy and Reliable AI

Gautam, Srishti

dc.contributor.advisor	Kampffmeyer, Michael
dc.contributor.author	Gautam, Srishti
dc.date.accessioned	2024-03-08T12:35:57Z
dc.date.available	2024-03-08T12:35:57Z
dc.date.embargoEndDate	2029-03-15
dc.date.issued	2024-03-15
dc.description.abstract	<p>The field of artificial intelligence recently witnessed remarkable growth, leading to the development of complex deep learning models that perform exceptionally across various domains. However, these developments bring forth critical issues. Deep learning models are vulnerable to inheriting and potentially exacerbating biases present in their training data. Moreover, the complexity of these models leads to a lack of transparency, which can allow biases to go undetected. This can lead to ultimately hindering the adoption of these models due to a lack of trust. It is therefore crucial to foster the creation of artificial intelligence systems that are inherently transparent, trustworthy, and fair. <p>This thesis contributes to this line of research by exploring the interpretability of deep learning through self-explainable models. These models represent a shift towards more transparent systems, offering explanations that are integral to the model's architecture, yielding insights into their decision-making processes. Consequently, this inherent transparency enhances our understanding, thereby providing a mechanism to address the inadvertent learning of biases. <p>To advance the development of self-explainable models, this thesis undertakes a comprehensive analysis of current methodologies. It introduces a novel algorithm designed to enhance the explanation quality of one of the state-of-the art models. In addition, this work proposes a novel self-explainable model that surpasses existing methods by generating explanations through a learned decoder, facilitating end-to-end training, and addressing the prevalent trade-off between explainability and performance. Furthermore, to enhance the accessibility and sustainability of these models, this thesis also introduces a universal methodology to transform any pre-trained black-box model into a self-explainable one without the need for re-training. <p>Through the proposed methodology, this research identifies and counteracts the learning of artifacts -- spurious correlations -- from the data, further emphasizing the need for transparent models. Additionally, this thesis expands its scope to encompass the dimension of fairness for large language models, demonstrating the tendency of these models to reinforce social biases. <p>The results of this research highlight the efficacy of the proposed methodologies, thereby paving the way for artificial intelligence systems that are not only accurate but also transparent, fair, and reliable, to facilitate widespread adoption and trust in artificial intelligence technologies.	en_US
dc.description.doctoraltype	ph.d.	en_US
dc.description.popularabstract	The field of artificial intelligence recently witnessed remarkable growth, leading to the development of complex deep learning models that perform exceptionally across various domains. However, these developments bring forth critical issues. These models are vulnerable to inheriting and potentially exacerbating biases present in their training data. Moreover, the complexity of these models leads to a lack of transparency, which can allow biases to go undetected. This can lead to ultimately hindering the adoption of these models due to a lack of trust. This thesis contributes to this line of research by exploring the inherent interpretability of deep learning models and the issue of bias detection. Across five papers, we present a series of methodological advances that yield novel insights. Our contributions constitute significant advancements in deep learning, thus paving the way for artificial intelligence systems that are not only accurate but also transparent, fair, and reliable.	en_US
dc.description.sponsorship	Research Council of Norway, grant numbers: 315029, 309439, and 303514	en_US
dc.identifier.isbn	978-82-8236-568-0 trykt
dc.identifier.isbn	978-82-8236-569-7 pdf
dc.identifier.issn	978-82-8236-569-7 pdf
dc.identifier.uri	https://hdl.handle.net/10037/33143
dc.language.iso	eng	en_US
dc.publisher	UiT Norges arktiske universitet	en_US
dc.publisher	UiT The Arctic University of Norway	en_US
dc.relation.haspart	<p>Paper I: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2023). This looks more like that: Enhancing self-explaining models by prototypical relevance propagation. <i>Pattern Recognition, 136</i>, 109172. Also available in Munin at <a href=https://hdl.handle.net/10037/27611> https://hdl.handle.net/10037/27611</a>. <p>Paper II: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2022). Demonstrating the risk of imbalanced datasets in chest x-ray image-based diagnostics by prototypical relevance propagation. <i>2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India</i>. Not available in Munin due to publisher’s restrictions. Published version available at <a href=https://doi.org/10.1109/ISBI52829.2022.9761651>https://doi.org/10.1109/ISBI52829.2022.9761651</a>. <p>Paper III: Gautam, S., Boubekki, A., Hansen, S., Salahuddin, S., Jenssen, R., Höhne, M. & Kampffmeyer, M. (2022). ProtoVAE: A trustworthy self-explainable prototypical variational model. <i>Advances in Neural Information Processing Systems, 35</i>, 17940–17952. Also available at <a href=https://proceedings.neurips.cc/paper_files/paper/2022/hash/722f3f9298a961d2639eadd3f14a2816-Abstract-Conference.html>https://proceedings.neurips.cc/paper_files/paper/2022/hash/722f3f9298a961d2639eadd3f14a2816-Abstract-Conference.html</a>. <p>Paper IV: Gautam, S., Boubekki, S., Höhne, M. & Kampffmeyer, M.C. Prototypical Self-Explainable Models Without Re-training. (Manuscript under review). Also available in arXiv at <a href=https://doi.org/10.48550/arXiv.2312.07822>https://doi.org/10.48550/arXiv.2312.07822</a>. <p>Paper V: Liu, Y., Gautam, S., Ma, J. & Lakkaraju, H. Investigating the Fairness of Large Language Models for Predictions on Tabular Data. (Manuscript under review). Also available in arXiv at <a href=https://doi.org/10.48550/arXiv.2310.14607>https://doi.org/10.48550/arXiv.2310.14607</a>.	en_US
dc.relation.isbasedon	MNIST: Deng, L. (2012). The MNIST database of handwritten digit images for machine learning research. <i>IEEE Signal Processing Magazine, 29</i>(6), 141–142, available at <a href=https://doi.org/10.1109/MSP.2012.2211477>https://doi.org/10.1109/MSP.2012.2211477</a>.	en_US
dc.relation.isbasedon	Fashion-MNIST: Xiao, H., Rasul, K. & Vollgraf, R. (2017). Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Available on arXiv at <a href=https://doi.org/10.48550/arXiv.1708.07747>https://doi.org/10.48550/arXiv.1708.07747</a> and on Github at <a href=https://github.com/zalandoresearch/fashion-mnist>https://github.com/zalandoresearch/fashion-mnist</a>.	en_US
dc.relation.isbasedon	SVHN: Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B. & Ng, A.Y. (2011). Reading Digits in Natural Images with Unsupervised Feature Learning. <i>NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011</i>. Available at <a href=http://ufldl.stanford.edu/housenumbers/>http://ufldl.stanford.edu/housenumbers/</a>.	en_US
dc.relation.isbasedon	STL-10: Coates, A., Lee, H. & Ng, A.Y. (2011). An Analysis of Single Layer Networks in Unsupervised Feature Learning. <i>AISTATS, 2011</i>. Available at <a href=https://cs.stanford.edu/~acoates/stl10/>https://cs.stanford.edu/~acoates/stl10/</a>.	en_US
dc.relation.isbasedon	CIFAR-10: Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images. Available at <a href=https://www.cs.toronto.edu/~kriz/cifar.html>https://www.cs.toronto.edu/~kriz/cifar.html</a>.	en_US
dc.relation.isbasedon	CelebA: Liu, Z., Luo, P., Wang, X. & Tang, X. (2015). Deep Learning Face Attributes in the Wild. <i>Proceedings of International Conference on Computer Vision (ICCV), December, 2015</i>. Available at <a href=https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html>https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html</a>.	en_US
dc.relation.isbasedon	CUB-200: Wah, C., Branson, S., Welinder, P., Perona, P. & Belongie, S. (2011). Caltech-UCSD Birds-200-2011 (CUB-200-2011). California Institute of Technology, 2011. Available at <a href=https://www.vision.caltech.edu/datasets/cub_200_2011/> https://www.vision.caltech.edu/datasets/cub_200_2011/</a>.	en_US
dc.relation.isbasedon	LISA Traffic Sign Dataset: Møgelmose, A., Trivedi, M.M. & Moeslund, T.B. (2012). Vision based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey. <i>IEEE Transactions on Intelligent Transportation Systems</i>, 2012. Available at <a href=http://cvrr-nas.ucsd.edu/LISA/lisa-traffic-sign-dataset.html> http://cvrr-nas.ucsd.edu/LISA/lisa-traffic-sign-dataset.html</a>.	en_US
dc.relation.isbasedon	UCI Adult Data: Becker, B. & Kohavi,R. (1996). Adult. <i>UCI Machine Learning Repository</i>. Available at <a href=https://doi.org/10.24432/C5XW20>https://doi.org/10.24432/C5XW20</a>.	en_US
dc.relation.isbasedon	UCI German Credit Data: Hofmann, H. (1994). Statlog (German Credit Data). <i>UCI Machine Learning Repository</i>. Available at <a href=https://doi.org/10.24432/C5NC77> https://doi.org/10.24432/C5NC77</a>.	en_US
dc.relation.isbasedon	COMPAS Recidivism Risk Score Data and Analysis: <i>ProPublica</i>. Available at <a href=https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis>https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis</a>.	en_US
dc.rights.accessRights	embargoedAccess	en_US
dc.rights.holder	Copyright 2024 The Author(s)
dc.rights.uri	https://creativecommons.org/licenses/by-nc-sa/4.0	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)	en_US
dc.subject	Deep learning	en_US
dc.subject	Explainable AI	en_US
dc.subject	Self-Explainable Models	en_US
dc.subject	Artifact Detection	en_US
dc.subject	Fairness in LLMs	en_US
dc.title	Towards Interpretable, Trustworthy and Reliable AI	en_US
dc.type	Doctoral thesis	en_US
dc.type	Doktorgradsavhandling	en_US