
Senior Data Scientist · PhD · Computational Biology
Data science, ML & drug discovery - and making these tools accessible to everyone.
I'm a Senior Data Scientist bridging computational biology and experimental research, with expertise in machine learning, bioinformatics, and tool development across cancer genomics, immunology, and drug discovery. I think a lot about how AI is changing science - and about making sure those changes reach every researcher, not just those with a computer science or machine learning background. My work tries to do that: building practical tools, leading interdisciplinary teams, and closing the gap between what these models can do and what most researchers can actually access.
About me
I'm a Senior Data Scientist at Nexus BioQuest, a contract research organisation in Portishead, Bristol, where I work across data science, machine learning, and analytical tool development in support of research programmes spanning pharmaceutical and biotech clients.
My PhD at the University of Bristol, funded by a competitive Cancer Research UK studentship, focused on predicting the functional impact of genetic variants in cancer genomes. I've since worked at Roche, exploring protein language models and antibody optimisation - and published peer-reviewed papers in variant effect prediction.
I've also led interdisciplinary teams to back-to-back hackathon victories at Cambridge and the Wellcome Collection, and co-organised Bristol's first AI in Health meeting, which secured two interdisciplinary research grants. The best science I've been part of has come from people with genuinely different mental models trying to solve the same problem. It generates questions that nobody in a single-discipline team would ever think to ask, and occasionally, those questions lead somewhere important.

A perspective on AI in science
The tools exist, and the clinical evidence is starting to follow. The real challenge now is getting them out of research papers and into the hands of the scientists who could use them to find new medicines, and opening up possibilities for researchers who don't even know what they're missing yet.
"Some of the most powerful tools in the history of biology are sitting in research papers and GitHub repos that most bench scientists have never heard of. Building the bridges to get them there, through upskilling and through teams where computational and experimental scientists actually learn from each other, is how I think we find the next generation of medicines."
AlphaFold transformed protein structure prediction in 2021 - making accurate structural models accessible for hundreds of millions of proteins where before it had required years of experimental work. AI-designed drugs are now reaching clinical trials. The industry is adapting, with pharma companies partnering with specialist AI firms and embedding AI infrastructure directly into their R&D pipelines. The pace of change is accelerating.
But using these tools well still requires an unusual combination of skills - enough ML to run and adapt the models, enough compute to work with them, and enough domain knowledge to ask the right questions. Most biologists have one of those things, maybe two. Closing that gap - whether through upskilling or building teams that complement each other's strengths - is something I think the field needs to take seriously. I came into data science from the biology side, so I know what it feels like to have a question you can't answer because the tools are out of reach. That's what drives most of what I build and write about.
The deeper analysis - the partnerships, what the clinical evidence actually shows, how infrastructure investments could lead to more breakthroughs, and the honest open questions about whether any of this produces better drugs - is in the blog.
Foundation models reshaping biology
Projects & Hackathons
A mix of published tools, hackathon projects, and pipelines built for real research problems.
Led the winning team at GetSeen Ventures' AI × Cancer Bio Hackathon. Used transformer encoders on SMILES strings and high-content image embeddings from the RxRx3-core dataset to predict molecular pathways. Ongoing collaboration likely to result in publication.
Led the winning team at the Roche & HDR Hackathon. Encoded protein sequences with pre-trained language models (ESM, AntiBERT) and explored CNNs to model sequence-function relationships using DMS data from Protein Gym. Secured a Roche AI internship as a direct result.
Participated in the Alan Turing Institute's Data Study Group in collaboration with Ignota Labs, working on machine learning approaches to predict drug toxicity. Contributed to the team summary and published technical report.
Built a post-acquisition flow cytometry analysis pipeline with an intuitive Streamlit interface for processing and visualising high-dimensional cytometry data, enabling efficient downstream reporting across client projects.
Published a data mining toolkit integrating molecular annotations for SNVs, creating a centralised resource that reduces redundancy and accelerates machine learning model development for variant effect prediction.
At Roche pRED, used TensorFlow models grounded in global epistasis and pre-trained protein language models to predict binding affinity from deep mutational scanning data.
Currently developing a pipeline to automate ELISA data processing and reporting, reducing manual effort across client projects. Currently mentoring a team member through the project as part of their coding development - turning it into a hands-on introduction to Python and scientific data pipelines.
Co-organised Bristol's first interdisciplinary AI in Health Meeting in collaboration with the Elizabeth Blackwell Institute. Facilitated cross-disciplinary collaboration that resulted in two interdisciplinary grants for applied AI projects.

Skills & Tools
Picked up across academia, industry, and a few hackathons.
Publications
Three peer-reviewed papers and one technical report, spanning cancer genomics, variant effect prediction, and drug discovery.
Get in touch
If you're in Bristol, working on something in this space, or just want to grab a coffee and talk through a problem, feel free to get in touch. I'm always up for connecting with others in the field and sharing knowledge.
