Use of directed quasi-metric distances for quantifying the information of gene families

Thorvaldsen, Steinar; Hössjer, Ola

dc.contributor.author	Thorvaldsen, Steinar
dc.contributor.author	Hössjer, Ola
dc.date.accessioned	2024-09-24T07:23:59Z
dc.date.available	2024-09-24T07:23:59Z
dc.date.issued	2024-06-12
dc.description.abstract	A large hindrance to analyzing information in genetic or protein sequence data has been a lack of a mathematical framework for doing so. In this paper, we present a multinomial probability space X as a general foundation for multicategory discrete data, where categories refer to variants/alleles of biosequences. The external information that is infused in order to generate a sample of such data is quantified as a distance on X between the prior distribution of data and the empirical distribution of the sample. A number of distances on X are treated. All of them have an information theoretic interpretation, reflecting the information that the sampling mechanism provides about which variants that have a selective advantage and therefore appear more frequently compared to prior expectations. This includes distances on X based on mutual information, conditional mutual information, active information, and functional information. The functional information distance is singled out as particularly useful. It is simple and has intuitive interpretations in terms of 1) a rejection sampling mechanism, where functional entities are retained, whereas non-functional categories are censored, and 2) evolutionary waiting times. The functional information is also a quasi-metric on X , with information being measured in an asymmetric, mountainous landscape. This quasi-metric property is also retained for a robustified version of the functional information distance that allows for mutations in the sampling mechanism. The functional information quasi-metric has been applied with success on bioinformatics data sets, for proteins and sequence alignment of protein families.	en_US
dc.identifier.citation	Thorvaldsen, Hössjer. Use of directed quasi-metric distances for quantifying the information of gene families. Biosystems (Amsterdam. Print). 2024	en_US
dc.identifier.cristinID	FRIDAID 2279886
dc.identifier.doi	10.1016/j.biosystems.2024.105256
dc.identifier.issn	0303-2647
dc.identifier.issn	1872-8324
dc.identifier.uri	https://hdl.handle.net/10037/34834
dc.language.iso	eng	en_US
dc.publisher	Elsevier	en_US
dc.relation.journal	Biosystems (Amsterdam. Print)
dc.rights.accessRights	openAccess	en_US
dc.rights.holder	Copyright 2024 The Author(s)	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0	en_US
dc.rights	Attribution 4.0 International (CC BY 4.0)	en_US
dc.title	Use of directed quasi-metric distances for quantifying the information of gene families	en_US
dc.type.version	publishedVersion	en_US
dc.type	Journal article	en_US
dc.type	Tidsskriftartikkel	en_US
dc.type	Peer reviewed	en_US

Tilhørende fil(er)

Navn:: article.pdf
Størrelse:: 1.279Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Artikler, rapporter og annet (lærerutdanning og pedagogikk) [673]

Vis enkel innførsel

Med mindre det står noe annet, er denne innførselens lisens beskrevet som Attribution 4.0 International (CC BY 4.0)