Towards Interpretable, Trustworthy and Reliable AI

Gautam, Srishti

Permanent lenke

https://hdl.handle.net/10037/33143

Åpne

thesis.pdf (32.70Mb)

Thesis introduction, Papers I, III, IV & V (PDF)

thesis_entire.pdf (34.97Mb)

Entire thesis in one comprehensive file (PDF)

Fil(er) med begrenset tilgang er under embargo til 2029-03-15

Dato

2024-03-15

Type

Doctoral thesis
Doktorgradsavhandling

Forfatter

Gautam, Srishti

Sammendrag

The field of artificial intelligence recently witnessed remarkable growth, leading to the development of complex deep learning models that perform exceptionally across various domains. However, these developments bring forth critical issues. Deep learning models are vulnerable to inheriting and potentially exacerbating biases present in their training data. Moreover, the complexity of these models leads to a lack of transparency, which can allow biases to go undetected. This can lead to ultimately hindering the adoption of these models due to a lack of trust. It is therefore crucial to foster the creation of artificial intelligence systems that are inherently transparent, trustworthy, and fair.

This thesis contributes to this line of research by exploring the interpretability of deep learning through self-explainable models. These models represent a shift towards more transparent systems, offering explanations that are integral to the model's architecture, yielding insights into their decision-making processes. Consequently, this inherent transparency enhances our understanding, thereby providing a mechanism to address the inadvertent learning of biases.

To advance the development of self-explainable models, this thesis undertakes a comprehensive analysis of current methodologies. It introduces a novel algorithm designed to enhance the explanation quality of one of the state-of-the art models. In addition, this work proposes a novel self-explainable model that surpasses existing methods by generating explanations through a learned decoder, facilitating end-to-end training, and addressing the prevalent trade-off between explainability and performance. Furthermore, to enhance the accessibility and sustainability of these models, this thesis also introduces a universal methodology to transform any pre-trained black-box model into a self-explainable one without the need for re-training.

Through the proposed methodology, this research identifies and counteracts the learning of artifacts -- spurious correlations -- from the data, further emphasizing the need for transparent models. Additionally, this thesis expands its scope to encompass the dimension of fairness for large language models, demonstrating the tendency of these models to reinforce social biases.

The results of this research highlight the efficacy of the proposed methodologies, thereby paving the way for artificial intelligence systems that are not only accurate but also transparent, fair, and reliable, to facilitate widespread adoption and trust in artificial intelligence technologies.

Har del(er)

Paper I: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2023). This looks more like that: Enhancing self-explaining models by prototypical relevance propagation. Pattern Recognition, 136, 109172. Also available in Munin at https://hdl.handle.net/10037/27611.

Paper II: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2022). Demonstrating the risk of imbalanced datasets in chest x-ray image-based diagnostics by prototypical relevance propagation. 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India. Not available in Munin due to publisher’s restrictions. Published version available at https://doi.org/10.1109/ISBI52829.2022.9761651.

Paper III: Gautam, S., Boubekki, A., Hansen, S., Salahuddin, S., Jenssen, R., Höhne, M. & Kampffmeyer, M. (2022). ProtoVAE: A trustworthy self-explainable prototypical variational model. Advances in Neural Information Processing Systems, 35, 17940–17952. Also available at https://proceedings.neurips.cc/paper_files/paper/2022/hash/722f3f9298a961d2639eadd3f14a2816-Abstract-Conference.html.

Paper IV: Gautam, S., Boubekki, S., Höhne, M. & Kampffmeyer, M.C. Prototypical Self-Explainable Models Without Re-training. (Manuscript under review). Also available in arXiv at https://doi.org/10.48550/arXiv.2312.07822.

Paper V: Liu, Y., Gautam, S., Ma, J. & Lakkaraju, H. Investigating the Fairness of Large Language Models for Predictions on Tabular Data. (Manuscript under review). Also available in arXiv at https://doi.org/10.48550/arXiv.2310.14607.

Tilknyttede forskningsdata

MNIST: Deng, L. (2012). The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6), 141–142, available at https://doi.org/10.1109/MSP.2012.2211477.

Fashion-MNIST: Xiao, H., Rasul, K. & Vollgraf, R. (2017). Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Available on arXiv at https://doi.org/10.48550/arXiv.1708.07747 and on Github at https://github.com/zalandoresearch/fashion-mnist.

SVHN: Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B. & Ng, A.Y. (2011). Reading Digits in Natural Images with Unsupervised Feature Learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. Available at http://ufldl.stanford.edu/housenumbers/.

STL-10: Coates, A., Lee, H. & Ng, A.Y. (2011). An Analysis of Single Layer Networks in Unsupervised Feature Learning. AISTATS, 2011. Available at https://cs.stanford.edu/~acoates/stl10/.

CIFAR-10: Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images. Available at https://www.cs.toronto.edu/~kriz/cifar.html.

CelebA: Liu, Z., Luo, P., Wang, X. & Tang, X. (2015). Deep Learning Face Attributes in the Wild. Proceedings of International Conference on Computer Vision (ICCV), December, 2015. Available at https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html.

CUB-200: Wah, C., Branson, S., Welinder, P., Perona, P. & Belongie, S. (2011). Caltech-UCSD Birds-200-2011 (CUB-200-2011). California Institute of Technology, 2011. Available at https://www.vision.caltech.edu/datasets/cub_200_2011/.

LISA Traffic Sign Dataset: Møgelmose, A., Trivedi, M.M. & Moeslund, T.B. (2012). Vision based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey. IEEE Transactions on Intelligent Transportation Systems, 2012. Available at http://cvrr-nas.ucsd.edu/LISA/lisa-traffic-sign-dataset.html.

UCI Adult Data: Becker, B. & Kohavi,R. (1996). Adult. UCI Machine Learning Repository. Available at https://doi.org/10.24432/C5XW20.

UCI German Credit Data: Hofmann, H. (1994). Statlog (German Credit Data). UCI Machine Learning Repository. Available at https://doi.org/10.24432/C5NC77.

COMPAS Recidivism Risk Score Data and Analysis: ProPublica. Available at https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis.

Forlag

UiT Norges arktiske universitet
UiT The Arctic University of Norway

Metadata

Vis full innførsel

Samlinger

Doktorgradsavhandlinger (NT-fak) [322]

Følgende lisensfil er knyttet til denne innførselen:

Original lisens

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)