Hierarchical clustering split for low-bias evaluation of drug-target interaction prediction

Peizhen Bai, Filip Miljković, Yan Ge, Nigel Greene, Bino John, Haiping Lu

December 2021

Abstract

Drug-target interaction (DTI) prediction is important in drug discovery and chemogenomics studies. Machine learning, particularly deep learning, has advanced this area significantly over the past few years. However, a significant gap between the performance reported in academic papers and that in practical drug discovery settings, e.g. the random-split-based evaluation strategy tends to be too optimistic in estimating the prediction performance in real-world settings. Such performance gap is largely due to hidden data bias in experimental datasets and inappropriate data split. In this paper, we construct a low-bias DTI dataset and study more challenging data split strategies to improve performance evaluation for real-world settings. Specifically, we study the data bias in a popular DTI dataset, BindingDB, and re-evaluate the prediction performance of three state-of-the-art deep learning models using five different data split strategies: random split, cold drug split, scaffold split, and two hierarchical-clustering-based splits. In addition, we comprehensively examine six performance metrics. Our experimental results confirm the overoptimism of the popular random split and show that hierarchical-clustering-based splits are far more challenging and can provide potentially more useful assessment of model generalizability in real-world DTI prediction settings.

Type

Conference paper

Publication

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Hierarchical clustering split for low-bias evaluation of drug-target interaction prediction

Abstract

Peizhen Bai

PhD Student (now a Senior Machine Learning Scientist at AstraZeneca)

Filip Miljković

Associate Principal AI Scientist at AstraZeneca

Yan Ge

PhD Student (now a Lecturer in Financial Technology at University of Bristol)

Haiping Lu

Director of the UK Open Multimodal AI Network, Professor of Machine Learning, and Head of AI Research Engineering