Download PDFOpen PDF in browser

Network-based Machine Learning Approach for Structural Domain Identification in Proteins

EasyChair Preprint no. 4969

4 pagesDate: February 3, 2021

Abstract

In the era of structural genomics, with a large number of protein structures becoming available, identification of domains is an important problem in protein function analysis as it forms the first step in protein classification. In the proposed network-based machine learning approach, NML-DIP, a combination of supervised (SVM) and unsupervised (k-means) machine learning techniques are used for domain identification in proteins. The algorithm proceeds by first representing protein structure as a protein contact network and using topological properties, viz., length, density, and interaction strength (that assesses inter- and intra-domain interactions) as feature vectors in the first SVM to distinguish between single and multi-domain proteins. A second SVM is used to identify number of domains in multi-domain proteins. Thus, it does not require a prior information of the number of domains. The domain boundaries are identified using k-means algorithm and confirmed with CATH annotation. Performance of the proposed algorithm is evaluated on four benchmark datasets and compared with four state-of-the-art domain identification methods. Its performance is comparable to other domain identification tools and works well even when the domains are non-contiguous. Available at: https://bit.ly/NML-DIP.

Keyphrases: graph theory, K-means, Structural domain identification in proteins, SVM

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@Booklet{EasyChair:4969,
  author = {Anirudh Tiwari and Nita Parekh},
  title = {Network-based Machine Learning Approach for Structural Domain Identification in Proteins},
  howpublished = {EasyChair Preprint no. 4969},

  year = {EasyChair, 2021}}
Download PDFOpen PDF in browser