Indigenous language technology in the age of machine learning

Moshagen, Sjur Nørstebø; Antonsen, Lene; Wiechetek, Linda; Trosterud, Trond

Publisert versjon (PDF)

Dato

2024-11-13

Type

Journal article
Tidsskriftartikkel
Peer reviewed

Forfatter

Moshagen, Sjur Nørstebø; Antonsen, Lene; Wiechetek, Linda; Trosterud, Trond

Sammendrag

Most modern language technology for proofing tools, machine translation and other applications is based on machine learning. However, very few Indigenous languages have the necessary amount of texts for making tools based on this technology. When most language technology is based on large language models (LLMs), it bears the risk of most of Indigenous language online text being produced by neural text generation. The result would be that online texts cannot be trusted as a source for authentic Indigenous languages anymore. An alternative is the work done at UiT – The Arctic University of Norway during the last 20 years, based on linguistics. Sámi language tools have been made available for both industry and language communities, with open licenses. These have been widely used by translators, teachers and various software companies. The article analyzes the following four parts of language technology development: language data, language tool development, making the tools available to users, and ethical use of available language technology tools. We make extensive use of the CARE principles, and discuss the shortcomings of existing software and data licensing schemes. Finally, we introduce a 3D table to help classify language technology projects with respect to their suitability for Indigenous languages.

Forlag

Taylor & Francis

Sitering

Moshagen, Antonsen, Wiechetek, Trosterud. Indigenous language technology in the age of machine learning. Acta Borealia. 2024;41(2):102-116

Metadata

Vis full innførsel

Samlinger

Artikler, rapporter og annet (språk og kultur) [1477]

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution 4.0 International (CC BY 4.0)