Detecting Unhealthy Comments in Norwegian using BERT

Warholm, Joakim

Permanent link

https://hdl.handle.net/10037/21853

View/Open

thesis.pdf (2.161Mb)

(PDF)

Date

2021-05-28

Type

Mastergradsoppgave
Master thesis

Author

Warholm, Joakim

Abstract

In this work we present a new Norwegian labeled dataset of 7078 comments for unhealthy comment detection. The dataset is used to fine-tune a BERT model, and demonstrates that BERT has the ability to detect subtle forms of toxicity, also in Norwegian. We compare how the different newly released Norwegian BERT models perform when fine-tuned on our dataset, and we also experiment with how English data can be utilized to fine-tune one of the models. We fine-tune BERT to recognize unhealthy comments in Norwegian, as well as a list of other characteristics a comment may have such as being hostile, antagonising/insulting/trolling, dismissive, condescending, sarcastic, or being an unfair generalisation. Our AUC scores beat the AUC scores from previous work on detecting unhealthy comments in English on all categories, except dismissive.

Publisher

UiT Norges arktiske universitet
UiT The Arctic University of Norway

Metadata

Show full item record

Collections

Mastergradsoppgaver IFT [102]

The following license file are associated with this item:

Original License

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)