Use of directed quasi-metric distances for quantifying the information of gene families
Permanent lenke
https://hdl.handle.net/10037/34834Dato
2024-06-12Type
Journal articleTidsskriftartikkel
Peer reviewed
Sammendrag
A large hindrance to analyzing information in genetic or protein sequence data has been a lack of a mathematical
framework for doing so. In this paper, we present a multinomial probability space X as a general foundation for
multicategory discrete data, where categories refer to variants/alleles of biosequences. The external information
that is infused in order to generate a sample of such data is quantified as a distance on X between the prior
distribution of data and the empirical distribution of the sample. A number of distances on X are treated. All of
them have an information theoretic interpretation, reflecting the information that the sampling mechanism
provides about which variants that have a selective advantage and therefore appear more frequently compared to
prior expectations. This includes distances on X based on mutual information, conditional mutual information,
active information, and functional information. The functional information distance is singled out as particularly
useful. It is simple and has intuitive interpretations in terms of 1) a rejection sampling mechanism, where
functional entities are retained, whereas non-functional categories are censored, and 2) evolutionary waiting
times. The functional information is also a quasi-metric on X , with information being measured in an asymmetric, mountainous landscape. This quasi-metric property is also retained for a robustified version of the
functional information distance that allows for mutations in the sampling mechanism. The functional information quasi-metric has been applied with success on bioinformatics data sets, for proteins and sequence alignment
of protein families.
Forlag
ElsevierSitering
Thorvaldsen, Hössjer. Use of directed quasi-metric distances for quantifying the information of gene families. Biosystems (Amsterdam. Print). 2024Metadata
Vis full innførselSamlinger
Copyright 2024 The Author(s)