Abstract
In this work we present a new Norwegian labeled dataset of 7078 comments for unhealthy comment detection. The dataset is used to fine-tune a BERT model, and demonstrates that BERT has the ability to detect subtle forms of toxicity, also in Norwegian. We compare how the different newly released Norwegian BERT models perform when fine-tuned on our dataset, and we also experiment with how English data can be utilized to fine-tune one of the models. We fine-tune BERT to recognize unhealthy comments in Norwegian, as well as a list of other characteristics a comment may have such as being hostile, antagonising/insulting/trolling, dismissive, condescending, sarcastic, or being an unfair generalisation. Our AUC scores beat the AUC scores from previous work on detecting unhealthy comments in English on all categories, except dismissive.