dc.contributor.advisor | Anfinsen, Stian Normann | |
dc.contributor.author | Dretvik, Vilde Fonn | |
dc.date.accessioned | 2021-08-04T06:29:27Z | |
dc.date.available | 2021-08-04T06:29:27Z | |
dc.date.issued | 2021-06-21 | en |
dc.description.abstract | This work is about classifying time series with missing data with the help of imputation and selected machine learning algorithms and methods. The author has used imputation to replace missing values in two data sets, one containing surgical site infection (SSI) data of 11 types of blood samples of patients over 20 days, and another data set called uwave which contain 3D accelerometer data of several patterns made by a subset of people, where two patterns were selected. The SSI data set is known to possess informative missingness. For the uwave data, missing data was simulated by removing data points in an informative (not random) way to simulate missing data. The DTW and Euclidean distances were computed for each imputed data set to make distance grid matrices, and used to performed classification on the data using the K Nearest Neighbour (KNN) classifier and the Support Vector Machine (SVM) classifier. Furthermore the data set features were augmented by adding masks that indicate the presence of missing data and counters of consecutive spells of missing data to help exploit informative missingness. The augmented dataset was used to classify the data using the same classifiers and distance methods mentioned earlier, in addition to a newer classifier called the Temporal Convolution Network (TCN), which used the augmented data in combination with imputation of the original data. It was found that applying Dynamic Time Warping (DTW) was unnecessary for the KNN classifier, and that Euclidean distance was sufficient. Augmenting the data was found to improve the overall results for the SVM and KNN classifier. The TCN was found to need more work due to giving unstable test results with much lower values than the validation would imply. | en_US |
dc.identifier.uri | https://hdl.handle.net/10037/21916 | |
dc.language.iso | eng | en_US |
dc.publisher | UiT Norges arktiske universitet | no |
dc.publisher | UiT The Arctic University of Norway | en |
dc.rights.holder | Copyright 2021 The Author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0 | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | en_US |
dc.subject.courseID | FYS-3941 | |
dc.subject | VDP::Mathematics and natural science: 400::Mathematics: 410::Applied mathematics: 413 | en_US |
dc.subject | VDP::Matematikk og Naturvitenskap: 400::Matematikk: 410::Anvendt matematikk: 413 | en_US |
dc.subject | VDP::Mathematics and natural science: 400::Mathematics: 410::Statistics: 412 | en_US |
dc.subject | VDP::Matematikk og Naturvitenskap: 400::Matematikk: 410::Statistikk: 412 | en_US |
dc.subject | VDP::Mathematics and natural science: 400::Information and communication science: 420::Knowledge based systems: 425 | en_US |
dc.subject | VDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Kunnskapsbaserte systemer: 425 | en_US |
dc.subject | VDP::Medical disciplines: 700::Health sciences: 800::Other health science disciplines: 829 | en_US |
dc.subject | VDP::Medisinske Fag: 700::Helsefag: 800::Andre helsefag: 829 | en_US |
dc.title | Imputation and classification of time series with missing data using machine learning | en_US |
dc.type | Mastergradsoppgave | nor |
dc.type | Master thesis | eng |