A heavy-tailed model for analyzing miRNA-seq raw read counts
Permanent link
https://hdl.handle.net/10037/35225Date
2024-05-29Type
Journal articleTidsskriftartikkel
Peer reviewed
Abstract
This article addresses the limitations of existing statistical models in analyzing and interpreting highly skewed miRNA-seq raw read count data that can range from zero to millions. A heavy-tailed model using discrete stable distributions is proposed as a novel approach to better capture the heterogeneity and extreme values commonly observed in miRNA-seq data. Additionally, the parameters of the discrete stable distribution are proposed as an alternative target for differential expression analysis. An R package for computing and estimating the discrete stable distribution is provided. The proposed model is applied to miRNA-seq raw counts from the Norwegian Women and Cancer Study (NOWAC) and the Cancer Genome Atlas (TCGA) databases. The goodness-of-fit is compared with the popular Poisson and negative binomial distributions, and the discrete stable distributions are found to give a better fit for both datasets. In conclusion, the use of discrete stable distributions is shown to potentially lead to more accurate modeling of the underlying biological processes.
Publisher
De GruyterCitation
Krutto A, Haugdahl Nost, Thoresen. A heavy-tailed model for analyzing miRNA-seq raw read counts. Statistical Applications in Genetics and Molecular Biology. 2024;23(1)Metadata
Show full item recordCollections
Copyright 2024 The Author(s)