ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Filling-in Void and Sparse Regions in Protein Sequence Space by Protein-Like Artificial Sequences Enables Remarkable Enhancement in Remote Homology Detection Capability

Mudgal, Richa and Sowdhamini, Ramanathan and Chandra, Nagasuma and Srinivasan, Narayanaswamy and Sandhya, Sankaran (2014) Filling-in Void and Sparse Regions in Protein Sequence Space by Protein-Like Artificial Sequences Enables Remarkable Enhancement in Remote Homology Detection Capability. In: JOURNAL OF MOLECULAR BIOLOGY, 426 (4). pp. 962-979.

[img] PDF
jou_mol_bio_426_4_962_2014.pdf - Published Version
Restricted to Registered users only

Download (2MB) | Request a copy
Official URL: http://dx.doi.org/10.1016/j.jmb.2013.11.026


Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like ``linker'' sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be ``plugged-into'' routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold. (C) 2013 Elsevier Ltd. All rights reserved.

Item Type: Journal Article
Additional Information: copyright for this article belongs to ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD, 24-28 OVAL RD, LONDON NW1 7DX, ENGLAND
Keywords: remote homology detection; in silico protein design; protein evolution
Department/Centre: Division of Biological Sciences > Biochemistry
Division of Biological Sciences > Molecular Biophysics Unit
Division of Chemical Sciences > Materials Research Centre
Date Deposited: 26 Mar 2014 07:54
Last Modified: 26 Mar 2014 07:54
URI: http://eprints.iisc.ac.in/id/eprint/48722

Actions (login required)

View Item View Item