Indigenous language technology in the age of machine learning
Permanent lenke
https://hdl.handle.net/10037/36008Dato
2024-11-13Type
Journal articleTidsskriftartikkel
Peer reviewed
Sammendrag
Most modern language technology for proofing tools, machine translation and other
applications is based on machine learning. However, very few Indigenous languages
have the necessary amount of texts for making tools based on this technology. When
most language technology is based on large language models (LLMs), it bears the risk
of most of Indigenous language online text being produced by neural text
generation. The result would be that online texts cannot be trusted as a source for
authentic Indigenous languages anymore. An alternative is the work done at UiT –
The Arctic University of Norway during the last 20 years, based on linguistics. Sámi
language tools have been made available for both industry and language communities,
with open licenses. These have been widely used by translators, teachers and various
software companies. The article analyzes the following four parts of language
technology development: language data, language tool development, making the tools
available to users, and ethical use of available language technology tools. We make
extensive use of the CARE principles, and discuss the shortcomings of existing software
and data licensing schemes. Finally, we introduce a 3D table to help classify language
technology projects with respect to their suitability for Indigenous languages.
Forlag
Taylor & FrancisSitering
Moshagen, Antonsen, Wiechetek, Trosterud. Indigenous language technology in the age of machine learning. Acta Borealia. 2024;41(2):102-116Metadata
Vis full innførselSamlinger
Copyright 2024 The Author(s)