Network-based Machine Learning Approach for Structural Domain Identiﬁcation in Proteins

EasyChair Preprint 4969

4 pages•Date: February 3, 2021

Abstract

In the era of structural genomics, with a large number of protein structures becoming available, identification of domains is an important problem in protein function analysis as it forms the first step in protein classification. In the proposed network-based machine learning approach, NML-DIP, a combination of supervised (SVM) and unsupervised (k-means) machine learning techniques are used for domain identification in proteins. The algorithm proceeds by first representing protein structure as a protein contact network and using topological properties, viz., length, density, and interaction strength (that assesses inter- and intra-domain interactions) as feature vectors in the first SVM to distinguish between single and multi-domain proteins. A second SVM is used to identify number of domains in multi-domain proteins. Thus, it does not require a prior information of the number of domains. The domain boundaries are identified using k-means algorithm and confirmed with CATH annotation. Performance of the proposed algorithm is evaluated on four benchmark datasets and compared with four state-of-the-art domain identification methods. Its performance is comparable to other domain identification tools and works well even when the domains are non-contiguous. Available at: https://bit.ly/NML-DIP.

Keyphrases: K-means, SVM, Structural domain identification in proteins, graph theory

Links:

https://easychair.org/publications/preprint/bVNg

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:4969,
  author    = {Anirudh Tiwari and Nita Parekh},
  title     = {Network-based Machine Learning Approach for Structural Domain Identiﬁcation in Proteins},
  howpublished = {EasyChair Preprint 4969},
  year      = {EasyChair, 2021}}

Download PDF Open PDF in browser