Toward a Collaborative Platform for Hybrid Designs Sharing a Common Cohort
Permanent link
https://hdl.handle.net/10037/32243Date
2024-01-18Type
Doctoral thesisDoktorgradsavhandling
Author
Arnes, Jo IngeAbstract
This doctoral thesis binds together four included papers in a thematical whole and is simultaneously an independent work proposing a platform facilitating epidemiological research.
Population-based prospective cohort studies typically recruit a relatively large group of participants representative of a studied population and follow them over years or decades. This group of participants is called a cohort. As part of the study, the participants may be asked to answer extensive questionnaires, undergo medical examinations, donate blood samples, and participate in several rounds of follow-ups. The collected data can also include information from other sources, such as health registers. In prospective cohort studies, the participants initially do not have the investigated diagnoses, but statistically, a certain percentage will be diagnosed with a disease yearly. The studies enable the researchers to investigate how those who got a disease differ from those who did not. Often, many new studies can be nested within a cohort study. Data for a subgroup of the cohort is then selected and analyzed. A new study combined with an existing cohort is said to have a hybrid design.
When a research group uses the same cohort as a basis for multiple new studies, these studies often have similarities regarding the workflow for designing the study and analysis. The thesis shows the potential for developing a platform encouraging the reuse of work from previous studies and systematizing the study design workflows to enhance time efficiency and reduce the risk of errors.
However, the study data are subject to strict acts and regulations pertaining to privacy and research ethics. Therefore, the data must be stored and accessed within a secured IT environment where researchers log in to conduct analyses, with minimal possibilities to install analytics software not already provided by default. Further, transferring the data from the secured IT environment to a local computer or a public cloud is prohibited. Nevertheless, researchers can usually upload and run script files, e.g., written in R and run in R-studio. A consequence is that researchers - often having limited software engineering skills - may rely mainly on self-written code for their analyses, possibly unsystematically developed with a high risk of errors and reinventing solutions solved in preceding studies within the group.
The thesis makes a case for a platform providing a collaboration software as a service (SaaS) addressing the challenges of the described research context and proposes its architecture and design. Its main characteristic, and contribution, is the separation of concerns between the SaaS, which operates independently of the data, and a secured IT environment where data can be accessed and analyzed. The platform lets the researchers define the data analysis for the study using the cloud-based software, which is then automatically transformed into an executable version represented as source code in a scripting language already supported by the secure environment where the data resides.
The author has not found systems solving the same problem similarly. However, the work is informed by cloud computing, workflow management systems, data analysis pipelines, low-code, no-code, and model-driven development.
Has part(s)
Paper I: Arnes, J.I. & Bongo, L.A. (2020). The Beauty of Complex Designs. In: E. Lund (Ed.), Advancing Systems Epidemiology in Cancer (pp. 23–47). Scandinavian University Press (Universitetsforlaget). Also available in Munin at https://hdl.handle.net/10037/31365.
Paper II: Arnes, J.I., Hapfelmeier, A. & Horsch, A. (2022). Autostrata: Improved Automatic Stratification for Coarsened Exact Matching. Proceedings of the 18th Scandinavian Conference on Health Informatics. Linköping Electronic Conference Proceedings, 179-186. Also available in Munin at https://hdl.handle.net/10037/31386.
Paper III: Arnes, J.I., Hapfelmeier, A., Horsch, A. & Braaten, T. Greedy Knot Selection Algorithm for Restricted Cubic Spline Regression. (Submitted manuscript). Now published in Frontiers in Epidemiolgy, 3, 2023, 1283705, available at https://doi.org/10.3389/fepid.2023.1283705.
Paper IV: Arnes, J.I. & Horsch, A. Schema-Based Priming of Large Language Model for Data Object Validation Compliance. (Manuscript under review). Preprint also available at SSRN: http://dx.doi.org/10.2139/ssrn.4453361.
Publisher
UiT Norges arktiske universitetUiT The Arctic University of Norway
Metadata
Show full item recordCollections
The following license file are associated with this item:
Related items
Showing items related by title, author, creator and subject.
-
Influence of environmental tonicity changes on lipophilic drug release from liposomes
Nikolaisen, Trygg Einar (Mastergradsoppgave; Master thesis, 2018-05-15)Introduction: Liposomes as drug delivery systems has been widely studied as a way to solubilize poorly soluble drugs, reduce side effects of chemotherapeutics and increase circulation time in vivo. Since the first descriptions of liposomes over 60 years ago, they have shown tendencies to shrink and swell when the external environment of the liposomes is altered. This phenomenon has been studied in ... -
Implementing an electronic health record in a Nigerian secondary healthcare facility. Prospects and challenges
Attah, Ambrose Ojadale (Master thesis; Mastergradsoppgave, 2017-11-02)Nigeria is witnessing continuing advocacy and increase in number of individuals yearning for computerization of health information and healthcare processes. However, little is known about the opinions of the diverse healthcare providers who would ensure the successful implementation and meaningful use of health information technology in the country (Adeleke, Erinle et al. 2015). This study explores ... -
Geometric Modeling- and Sensor Technology Applications for Engineering Problems
Pedersen, Aleksander (Doctoral thesis; Doktorgradsavhandling, 2020-10-20)In applications for technical problems, Geometric modeling and sensor technology are key in both scientific and industrial development. Simulations and visualization techniques are the next step after defining geometry models and data types. This thesis attempts to combine different aspects of geometric modeling and sensor technology as well as to facilitate simulation and visualization. It includes ...