Projects

Brute Forcing Keypoints: BoVW vs CNN

Python · PyTorch · cuML · RAPIDS · OpenCV · CUDA · wandb

A comparative study benchmarking GPU-accelerated Bag-of-Visual-Words against CNNs on CIFAR-10/100. Scaling traditional BoVW to 50M keypoints via cuML matches shallow network performance.

View project →

PoSTACRED: Attention-GCN Relation Extraction

Python · PyTorch · HuggingFace Transformers · spaCy · GloVe

with Emma O'Brien, Rojs Aktumanis, Alexandros Michaelides

Explored whether PoS tags and dependency relations improve BERT-based relation extraction & proposed an attention-augmented GCN.

View project →

Data Lakes for Pooling T1D Patients CGM Data to Enable EHR Integration and Prevent Data Silos: A Patient Pathway & Clinical Workflow Analysis

Health Informatics · Data Lakes · Graph Algorithms · EHR · FHIR

Graduate Student · 2025 · Completed

Description

Modern developments in healthcare data collection and management aim to improve patient outcomes. For instance, consider type 1 diabetes (T1D) patients, who benefit from enhanced continuous glucose monitoring (CGM) devices. These devices can provide time-sensitive alerts and facilitate automated data collection for patient-practitioner retrospectives, greatly improving patients’ ability to manage their disease. However, CGM vendors often package their devices with proprietary software and rely on third-party infrastructure. This leads to interoperability challenges across clinical systems, fragmented workflows, and increased cognitive burden for clinicians, who grapple with a heterogeneous device landscape encumbered by data silos. This work aims to begin addressing these concerns by first analysing T1D patient data pathways through process modelling. By translating a traditional swimlane (business process) diagram to a formal graph-based representation, we propose a technique for identifying data-flow bottlenecks in complex multi-actor networks. Following this, we discuss an approach to centralised ingestion of CGM data based on a data lake-like architecture (inspired by the Scottish DataLoch initiative). We hope our investigation provides a refreshing perspective on some of the many issues arising from the fragmented nature of the healthcare industry.

[View project]

Deep Learning, Sequencing Technologies & Polygenic Scores: Alzheimer’s Disease Risk Prediction and Classification Review

Health Informatics · Literature Review · Deep Learning · Genomics · Polygenic Risk Scores

Graduate Student · 2025 · Completed

Description

This work reviews traditional genome-wide association studies (GWAS) and weighted polygenic risk scores (PRS) as methods for predicting the onset of Alzheimer’s Disease (AD), then examines machine learning (ML) and deep learning (DL) approaches. Reviewed studies include the use of random forests, support vector machines, and various neural network architectures. We identify persistent challenges encountered throughout the survey, including dataset diversity, model explainability, and regulatory compliance. The work concludes by cautiously proposing a multi-phase framework for clinical adoption of selective ML and DL methods into existing NHS genomic testing pipelines over a seven-year timeline, emphasising quality control, SHAP-based interpretability, and robust validation before any scaled deployment.

[View project]

Query Optimisation of Semantically Equivalent Queries in SQLite3

Databases · Query Optimisation · SQLite · Python · Bash · LaTeX

Graduate Student · 2024 · Completed

Description

This work focuses on SQLite's query optimiser by executing three pairs of semantically equivalent queries across progressively larger instances of the Mondial database. We examine subquery flattening and co-routine usage, the performance impact of DISTINCT versus self-JOIN with GROUP BY, and index utilisation for filtered selections. Hypotheses are stated before experimentation, with query execution plans and timing measurements used to assess each pair.

[View project]