dc.contributor.advisor | Kampffmeyer, Michael | |
dc.contributor.author | Gautam, Srishti | |
dc.date.accessioned | 2024-03-08T12:35:57Z | |
dc.date.available | 2024-03-08T12:35:57Z | |
dc.date.embargoEndDate | 2029-03-15 | |
dc.date.issued | 2024-03-15 | |
dc.description.abstract | <p>The field of artificial intelligence recently witnessed remarkable growth, leading to the development of complex deep learning models that perform exceptionally across various domains. However, these developments bring forth critical issues. Deep learning models are vulnerable to inheriting and potentially exacerbating biases present in their training data. Moreover, the complexity of these models leads to a lack of transparency, which can allow biases to go undetected. This can lead to ultimately hindering the adoption of these models due to a lack of trust. It is therefore crucial to foster the creation of artificial intelligence systems that are inherently transparent, trustworthy, and fair.
<p>This thesis contributes to this line of research by exploring the interpretability of deep learning through self-explainable models. These models represent a shift towards more transparent systems, offering explanations that are integral to the model's architecture, yielding insights into their decision-making processes. Consequently, this inherent transparency enhances our understanding, thereby providing a mechanism to address the inadvertent learning of biases.
<p>To advance the development of self-explainable models, this thesis undertakes a comprehensive analysis of current methodologies. It introduces a novel algorithm designed to enhance the explanation quality of one of the state-of-the art models. In addition, this work proposes a novel self-explainable model that surpasses existing methods by generating explanations through a learned decoder, facilitating end-to-end training, and addressing the prevalent trade-off between explainability and performance. Furthermore, to enhance the accessibility and sustainability of these models, this thesis also introduces a universal methodology to transform any pre-trained black-box model into a self-explainable one without the need for re-training.
<p>Through the proposed methodology, this research identifies and counteracts the learning of artifacts -- spurious correlations -- from the data, further emphasizing the need for transparent models. Additionally, this thesis expands its scope to encompass the dimension of fairness for large language models, demonstrating the tendency of these models to reinforce social biases.
<p>The results of this research highlight the efficacy of the proposed methodologies, thereby paving the way for artificial intelligence systems that are not only accurate but also transparent, fair, and reliable, to facilitate widespread adoption and trust in artificial intelligence technologies. | en_US |
dc.description.doctoraltype | ph.d. | en_US |
dc.description.popularabstract | The field of artificial intelligence recently witnessed remarkable growth, leading to the development of complex deep learning models that perform exceptionally across various domains. However, these developments bring forth critical issues. These models are vulnerable to inheriting and potentially exacerbating biases present in their training data. Moreover, the complexity of these models leads to a lack of transparency, which can allow biases to go undetected. This can lead to ultimately hindering the adoption of these models due to a lack of trust. This thesis contributes to this line of research by exploring the inherent interpretability of deep learning models and the issue of bias detection. Across five papers, we present a series of methodological advances that yield novel insights. Our contributions constitute significant advancements in deep learning, thus paving the way for artificial intelligence systems that are not only accurate but also transparent, fair, and reliable. | en_US |
dc.description.sponsorship | Research Council of Norway, grant numbers: 315029, 309439, and 303514 | en_US |
dc.identifier.isbn | 978-82-8236-568-0 trykt | |
dc.identifier.issn | 978-82-8236-569-7 pdf | |
dc.identifier.uri | https://hdl.handle.net/10037/33143 | |
dc.language.iso | eng | en_US |
dc.publisher | UiT Norges arktiske universitet | en_US |
dc.publisher | UiT The Arctic University of Norway | en_US |
dc.relation.haspart | <p>Paper I: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2023). This looks more like that: Enhancing self-explaining models by prototypical relevance propagation. <i>Pattern Recognition, 136</i>, 109172. Also available in Munin at <a href=https://hdl.handle.net/10037/27611> https://hdl.handle.net/10037/27611</a>.
<p>Paper II: Gautam, S., Höhne, M.M.C., Hansen, S., Jenssen, R. & Kampffmeyer, M. (2022). Demonstrating the risk of imbalanced datasets in chest x-ray image-based diagnostics by prototypical relevance propagation. <i>2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), Kolkata, India</i>. Not available in Munin due to publisher’s restrictions. Published version available at <a href=https://doi.org/10.1109/ISBI52829.2022.9761651>https://doi.org/10.1109/ISBI52829.2022.9761651</a>.
<p>Paper III: Gautam, S., Boubekki, A., Hansen, S., Salahuddin, S., Jenssen, R., Höhne, M. & Kampffmeyer, M. (2022). ProtoVAE: A trustworthy self-explainable prototypical variational model. <i>Advances in Neural Information Processing Systems, 35</i>, 17940–17952. Also available at <a href=https://proceedings.neurips.cc/paper_files/paper/2022/hash/722f3f9298a961d2639eadd3f14a2816-Abstract-Conference.html>https://proceedings.neurips.cc/paper_files/paper/2022/hash/722f3f9298a961d2639eadd3f14a2816-Abstract-Conference.html</a>.
<p>Paper IV: Gautam, S., Boubekki, S., Höhne, M. & Kampffmeyer, M.C. Prototypical Self-Explainable Models Without Re-training. (Manuscript under review). Also available in arXiv at <a href=https://doi.org/10.48550/arXiv.2312.07822>https://doi.org/10.48550/arXiv.2312.07822</a>.
<p>Paper V: Liu, Y., Gautam, S., Ma, J. & Lakkaraju, H. Investigating the Fairness of Large Language Models for Predictions on Tabular Data. (Manuscript under review). Also available in arXiv at <a href=https://doi.org/10.48550/arXiv.2310.14607>https://doi.org/10.48550/arXiv.2310.14607</a>. | en_US |
dc.relation.isbasedon | MNIST: Deng, L. (2012). The MNIST database of handwritten digit images for machine learning research. <i>IEEE Signal Processing Magazine, 29</i>(6), 141–142, available at <a href=https://doi.org/10.1109/MSP.2012.2211477>https://doi.org/10.1109/MSP.2012.2211477</a>. | en_US |
dc.relation.isbasedon | Fashion-MNIST: Xiao, H., Rasul, K. & Vollgraf, R. (2017). Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. Available on arXiv at <a href=https://doi.org/10.48550/arXiv.1708.07747>https://doi.org/10.48550/arXiv.1708.07747</a> and on Github at <a href=https://github.com/zalandoresearch/fashion-mnist>https://github.com/zalandoresearch/fashion-mnist</a>. | en_US |
dc.relation.isbasedon | SVHN: Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B. & Ng, A.Y. (2011). Reading Digits in Natural Images with Unsupervised Feature Learning. <i>NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011</i>. Available at <a href=http://ufldl.stanford.edu/housenumbers/>http://ufldl.stanford.edu/housenumbers/</a>. | en_US |
dc.relation.isbasedon | STL-10: Coates, A., Lee, H. & Ng, A.Y. (2011). An Analysis of Single Layer Networks in Unsupervised Feature Learning. <i>AISTATS, 2011</i>. Available at <a href=https://cs.stanford.edu/~acoates/stl10/>https://cs.stanford.edu/~acoates/stl10/</a>. | en_US |
dc.relation.isbasedon | CIFAR-10: Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images. Available at <a href=https://www.cs.toronto.edu/~kriz/cifar.html>https://www.cs.toronto.edu/~kriz/cifar.html</a>. | en_US |
dc.relation.isbasedon | CelebA: Liu, Z., Luo, P., Wang, X. & Tang, X. (2015). Deep Learning Face Attributes in the Wild. <i>Proceedings of International Conference on Computer Vision (ICCV), December, 2015</i>. Available at <a href=https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html>https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html</a>. | en_US |
dc.relation.isbasedon | CUB-200: Wah, C., Branson, S., Welinder, P., Perona, P. & Belongie, S. (2011). Caltech-UCSD Birds-200-2011 (CUB-200-2011). California Institute of Technology, 2011. Available at <a href=https://www.vision.caltech.edu/datasets/cub_200_2011/> https://www.vision.caltech.edu/datasets/cub_200_2011/</a>. | en_US |
dc.relation.isbasedon | LISA Traffic Sign Dataset: Møgelmose, A., Trivedi, M.M. & Moeslund, T.B. (2012). Vision based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey. <i>IEEE Transactions on Intelligent Transportation Systems</i>, 2012. Available at <a href=http://cvrr-nas.ucsd.edu/LISA/lisa-traffic-sign-dataset.html> http://cvrr-nas.ucsd.edu/LISA/lisa-traffic-sign-dataset.html</a>. | en_US |
dc.relation.isbasedon | UCI Adult Data: Becker, B. & Kohavi,R. (1996). Adult. <i>UCI Machine Learning Repository</i>. Available at <a href=https://doi.org/10.24432/C5XW20>https://doi.org/10.24432/C5XW20</a>. | en_US |
dc.relation.isbasedon | UCI German Credit Data: Hofmann, H. (1994). Statlog (German Credit Data). <i>UCI Machine Learning Repository</i>. Available at <a href=https://doi.org/10.24432/C5NC77> https://doi.org/10.24432/C5NC77</a>. | en_US |
dc.relation.isbasedon | COMPAS Recidivism Risk Score Data and Analysis: <i>ProPublica</i>. Available at <a href=https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis>https://www.propublica.org/datastore/dataset/compas-recidivism-risk-score-data-and-analysis</a>. | en_US |
dc.rights.accessRights | embargoedAccess | en_US |
dc.rights.holder | Copyright 2024 The Author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0 | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | en_US |
dc.subject | Deep learning | en_US |
dc.subject | Explainable AI | en_US |
dc.subject | Self-Explainable Models | en_US |
dc.subject | Artifact Detection | en_US |
dc.subject | Fairness in LLMs | en_US |
dc.title | Towards Interpretable, Trustworthy and Reliable AI | en_US |
dc.type | Doctoral thesis | en_US |
dc.type | Doktorgradsavhandling | en_US |