Detecting Unhealthy Comments in Norwegian using BERT

Warholm, Joakim

Permanent lenke

https://hdl.handle.net/10037/21853

Åpne

thesis.pdf (2.161Mb)

(PDF)

Dato

2021-05-28

Type

Mastergradsoppgave
Master thesis

Forfatter

Warholm, Joakim

Sammendrag

In this work we present a new Norwegian labeled dataset of 7078 comments for unhealthy comment detection. The dataset is used to fine-tune a BERT model, and demonstrates that BERT has the ability to detect subtle forms of toxicity, also in Norwegian. We compare how the different newly released Norwegian BERT models perform when fine-tuned on our dataset, and we also experiment with how English data can be utilized to fine-tune one of the models. We fine-tune BERT to recognize unhealthy comments in Norwegian, as well as a list of other characteristics a comment may have such as being hostile, antagonising/insulting/trolling, dismissive, condescending, sarcastic, or being an unfair generalisation. Our AUC scores beat the AUC scores from previous work on detecting unhealthy comments in English on all categories, except dismissive.

Forlag

UiT Norges arktiske universitet
UiT The Arctic University of Norway

Metadata

Vis full innførsel

Samlinger

Mastergradsoppgaver IFT [102]

Følgende lisensfil er knyttet til denne innførselen:

Original lisens

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)