Health Informatics · Data Lakes · Graph Algorithms · EHR · FHIR
Graduate Student · 2025 · Completed
Description
Modern developments in healthcare data collection and management aim to improve patient outcomes. For instance, consider type 1 diabetes (T1D) patients, who benefit from enhanced continuous glucose monitoring (CGM) devices. These devices can provide time-sensitive alerts and facilitate automated data collection for patient-practitioner retrospectives, greatly improving patients’ ability to manage their disease. However, CGM vendors often package their devices with proprietary software and rely on third-party infrastructure. This leads to interoperability challenges across clinical systems, fragmented workflows, and increased cognitive burden for clinicians, who grapple with a heterogeneous device landscape encumbered by data silos. This work aims to begin addressing these concerns by first analysing T1D patient data pathways through process modelling. By translating a traditional swimlane (business process) diagram to a formal graph-based representation, we propose a technique for identifying data-flow bottlenecks in complex multi-actor networks. Following this, we discuss an approach to centralised ingestion of CGM data based on a data lake-like architecture (inspired by the Scottish DataLoch initiative). We hope our investigation provides a refreshing perspective on some of the many issues arising from the fragmented nature of the healthcare industry.
Health Informatics · Literature Review · Deep Learning · Genomics · Polygenic Risk Scores
Graduate Student · 2025 · Completed
Description
This work reviews traditional genome-wide association studies (GWAS) and weighted polygenic risk scores (PRS) as methods for predicting the onset of Alzheimer’s Disease (AD), then examines machine learning (ML) and deep learning (DL) approaches. Reviewed studies include the use of random forests, support vector machines, and various neural network architectures. We identify persistent challenges encountered throughout the survey, including dataset diversity, model explainability, and regulatory compliance. The work concludes by cautiously proposing a multi-phase framework for clinical adoption of selective ML and DL methods into existing NHS genomic testing pipelines over a seven-year timeline, emphasising quality control, SHAP-based interpretability, and robust validation before any scaled deployment.
Databases · Query Optimisation · SQLite · Python · Bash · LaTeX
Graduate Student · 2024 · Completed
Description
This work focuses on SQLite's query optimiser by executing three pairs of semantically equivalent queries across progressively larger instances of the Mondial database. We examine subquery flattening and co-routine usage, the performance impact of DISTINCT versus self-JOIN with GROUP BY, and index utilisation for filtered selections. Hypotheses are stated before experimentation, with query execution plans and timing measurements used to assess each pair.