dc.contributor.advisor | Horsch, Alexander | |
dc.contributor.author | Arnes, Jo Inge | |
dc.date.accessioned | 2023-12-22T12:36:00Z | |
dc.date.available | 2023-12-22T12:36:00Z | |
dc.date.issued | 2024-01-18 | |
dc.description.abstract | <p>This doctoral thesis binds together four included papers in a thematical whole and is simultaneously an independent work proposing a platform facilitating epidemiological research.
<p><i>Population-based prospective cohort studies</i> typically recruit a relatively large group of participants representative of a studied population and follow them over years or decades. This group of participants is called a <i>cohort</i>. As part of the study, the participants may be asked to answer extensive questionnaires, undergo medical examinations, donate blood samples, and participate in several rounds of follow-ups. The collected data can also include information from other sources, such as health registers. In prospective cohort studies, the participants initially do not have the investigated diagnoses, but statistically, a certain percentage will be diagnosed with a disease yearly. The studies enable the researchers to investigate how those who got a disease differ from those who did not. Often, many new studies can be nested within a cohort study. Data for a subgroup of the cohort is then selected and analyzed. A new study combined with an existing cohort is said to have a <i>hybrid design</i>.
<p>When a research group uses the same cohort as a basis for multiple new studies, these studies often have similarities regarding the workflow for designing the study and analysis. The thesis shows the potential for developing a platform encouraging the reuse of work from previous studies and systematizing the study design workflows to enhance time efficiency and reduce the risk of errors.
<p>However, the study data are subject to strict acts and regulations pertaining to privacy and research ethics. Therefore, the data must be stored and accessed within a secured IT environment where researchers log in to conduct analyses, with minimal possibilities to install analytics software not already provided by default. Further, transferring the data from the secured IT environment to a local computer or a public cloud is prohibited. Nevertheless, researchers can usually upload and run script files, e.g., written in R and run in R-studio. A consequence is that researchers - often having limited software engineering skills - may rely mainly on self-written code for their analyses, possibly unsystematically developed with a high risk of errors and reinventing solutions solved in preceding studies within the group.
<p>The thesis makes a case for a platform providing a collaboration <i>software as a service</i> (SaaS) addressing the challenges of the described research context and proposes its architecture and design. Its main characteristic, and contribution, is the separation of concerns between the SaaS, which operates independently of the data, and a secured IT environment where data can be accessed and analyzed. The platform lets the researchers define the data analysis for the study using the cloud-based software, which is then automatically transformed into an executable version represented as source code in a scripting language already supported by the secure environment where the data resides.
<p>The author has not found systems solving the same problem similarly. However, the work is informed by cloud computing, workflow management systems, data analysis pipelines, low-code, no-code, and model-driven development. | en_US |
dc.description.doctoraltype | ph.d. | en_US |
dc.description.popularabstract | The doctoral thesis proposes a collaborative platform that could improve epidemiological research by making the design of studies and data analyses more efficient. This proposed platform is a cloud-based tool that will help researchers conduct multiple studies based on a single large group of participants, known as a cohort, in a systematic and less error-prone way. The tool would be particularly beneficial for researchers who have limited programming experience as it offers automated and collaborative features. An important feature is that the platform allows data analysis to be defined in a public cloud while keeping sensitive information within a secure IT environment according to acts, regulations, and ethical guidelines for research. The study hopes to serve as the foundation for further development in this cross-disciplinary field. | en_US |
dc.identifier.isbn | 978-82-8236-560-4 (Trykk) | |
dc.identifier.isbn | 978-82-8236-561-1 (pdf) | |
dc.identifier.uri | https://hdl.handle.net/10037/32243 | |
dc.language.iso | eng | en_US |
dc.publisher | UiT Norges arktiske universitet | en_US |
dc.publisher | UiT The Arctic University of Norway | en_US |
dc.relation.haspart | <p>Paper I: Arnes, J.I. & Bongo, L.A. (2020). The Beauty of Complex Designs. In: E. Lund (Ed.), <i>Advancing Systems Epidemiology in Cancer</i> (pp. 23–47). Scandinavian University Press (Universitetsforlaget). Also available in Munin at <a href=https://hdl.handle.net/10037/31365>https://hdl.handle.net/10037/31365</a>.
<p>Paper II: Arnes, J.I., Hapfelmeier, A. & Horsch, A. (2022). Autostrata: Improved Automatic Stratification for Coarsened Exact Matching. <i>Proceedings of the 18th Scandinavian Conference on Health Informatics. Linköping Electronic Conference Proceedings</i>, 179-186. Also available in Munin at <a href=https://hdl.handle.net/10037/31386>https://hdl.handle.net/10037/31386</a>.
<p>Paper III: Arnes, J.I., Hapfelmeier, A., Horsch, A. & Braaten, T. Greedy Knot Selection Algorithm for Restricted Cubic Spline Regression. (Submitted manuscript). Now published in <i>Frontiers in Epidemiolgy, 3</i>, 2023, 1283705, available at <a href=https://doi.org/10.3389/fepid.2023.1283705>https://doi.org/10.3389/fepid.2023.1283705</a>.
<p>Paper IV: Arnes, J.I. & Horsch, A. Schema-Based Priming of Large Language Model for Data Object Validation Compliance. (Manuscript under review). Preprint also available at SSRN: <a href=http://dx.doi.org/10.2139/ssrn.4453361> http://dx.doi.org/10.2139/ssrn.4453361</a>. | en_US |
dc.rights.accessRights | openAccess | en_US |
dc.rights.holder | Copyright 2024 The Author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-sa/4.0 | en_US |
dc.rights | Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) | en_US |
dc.subject | VDP::Matematikk og Naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420 | en_US |
dc.subject | VDP::Mathematics and natural science: 400::Information and communication science: 420 | en_US |
dc.subject | VDP::Matematikk og Naturvitenskap: 400::Matematikk: 410::Statistikk: 412 | en_US |
dc.subject | VDP::Mathematics and natural science: 400::Mathematics: 410::Statistics: 412 | en_US |
dc.subject | VDP::Medisinske Fag: 700::Helsefag: 800::Samfunnsmedisin, sosialmedisin: 801 | en_US |
dc.subject | VDP::Medical disciplines: 700::Health sciences: 800::Community medicine, Social medicine: 801 | en_US |
dc.subject | VDP::Medisinske Fag: 700::Helsefag: 800::Epidemiologi medisinsk og odontologisk statistikk: 803 | en_US |
dc.subject | VDP::Medical disciplines: 700::Health sciences: 800::Epidemiology medical and dental statistics: 803 | en_US |
dc.title | Toward a Collaborative Platform for Hybrid Designs Sharing a Common Cohort | en_US |
dc.type | Doctoral thesis | en_US |
dc.type | Doktorgradsavhandling | en_US |