Small dataset solves big problem: An outlier-insensitive binary classifier for inhibitory potency prediction

作者：

Highlights：

•

摘要

Nicotinamide phosphoribosyltransferase (NAMPT) inhibitors show importance in cancer disease treatment while selecting compounds from a library according to inhibitory potency for further experiments is considered to be the main way for drug discovery. Meanwhile, computational methods have been widely used to accelerate the process of drug discovery. Hence, we propose a machine learning model that only needs to be trained on an extremely small dataset to predict the inhibition constant (Ki) and half maximal inhibitory concentration (IC50) for a compound. The key idea is to directly rank compounds according to inhibitory potency by solving a simpler binary classification problem since we only need the relative ranks of the inhibitors for drug screening. To this end, we develop an adaptive data augmentation method to consider and effectively capture the relative information between compounds in the original dataset. However, outliers in small samples can always be tricky to detect, and may severely affect the learned distribution of the classifier. In this regard, we propose an outlier-insensitive classifier with an effective feature selection module for the one-to-all classification task. Extensive experiments show that our model gains high and reliable accuracy in ranking compounds according to inhibitory potency. The current results demonstrate that the proposed model achieves reliability in prioritizing chemicals for experiment research and analysis through a ligand-based in silico approach.

论文关键词：Drug screening,Inhibitory potency prediction,Machine learning,Outlier-insensitive learning,Feature selection

论文评审过程：Received 14 February 2022, Revised 7 June 2022, Accepted 8 June 2022, Available online 15 June 2022, Version of Record 24 June 2022.

论文官网地址：https://doi.org/10.1016/j.knosys.2022.109242