On the design of a similarity function for sparse binary data with application on protein function annotation

作者:

Highlights:

摘要

Automatic protein function annotation is a challenging task that is fundamental in many medical applications. Indeed, the capability to predict whether a protein has a given function is a key step for disease understanding and drug design. For such reasons, many authors have proposed computational methods for protein function prediction. One key element that is present in many proposals is similarity functions. Such functions are often used to compute the pairwise similarity between two proteins. It is commonly accepted that proteins with similar structures share the same function. Nevertheless, no previous works have focused on proposing a similarity function that is specifically designed for protein function annotation. In this work, we analyze the best similarity functions for the protein function annotation task and propose a new one. We performed experiments in a simple pairwise similarity scenario and also using our proposal as part of a more complex protein function annotation method. Based on the results, we can state that our proposal is a valid alternative as a building block of many protein function annotation methods.

论文关键词:Similarity function,Protein function annotation,Protein structure,Sparse data

论文评审过程:Received 30 June 2021, Revised 24 September 2021, Accepted 2 December 2021, Available online 14 December 2021, Version of Record 27 December 2021.

论文官网地址:https://doi.org/10.1016/j.knosys.2021.107863