Siddalingappa, R and Kanagaraj, S (2023) A Novel ML Approach for Computing Missing Sift, Provean, and Mutassessor Scores in Tp53 Mutation Pathogenicity Prediction. In: International Journal of Advanced Computer Science and Applications, 14 (6). pp. 1038-1047.
|
PDF
int_jou_adv_14-6_1038-1047_2023.pdf - Published Version Download (985kB) | Preview |
Abstract
Cancer is often caused by missense mutations, where a single nucleotide substitution leads to an amino acid change and affects protein function. This study proposes a novel machine learning (ML) approach to calculate missing values in the tp53 database for three computational methods: SIFT, Provean, and Mutassessor scores. The computed values are compared with those obtained from the imputation method. Using these values, an ML classification model trained on 80,406 samples achieves an accuracy of 85%, while the impute method achieves 75%. The scores and statistics are used to classify samples into five classes: Benign, likely pathogenic, possibly pathogenic, pathogenic, and a variant of uncertain significance. Additionally, a comparative analysis is conducted on 58,444 samples, evaluating six ML techniques. The accuracy obtained by each of these is mentioned alongside the algorithm: logistic regression (89%), k-nearest neighbor (99%), decision tree (95%), random forest (99.8%), support vector machine with the polynomial kernel (91%), support vector machine with RBF kernel (84%), and deep neural networks (98.2%). These results demonstrate the effectiveness of the proposed ML approach for pathogenicity prediction.
Item Type: | Journal Article |
---|---|
Publication: | International Journal of Advanced Computer Science and Applications |
Publisher: | Science and Information Organization |
Additional Information: | The copyright for this article belongs to the author. |
Keywords: | Decision tree (DT); deep neural networks (DNN); imputation; k-nearest neighbor (KNN); logistic regression (LR); missense mutations; Mutassessor; pathogenicity; Provean; random forest (RF); SIFT; support vector machine (SVM). |
Department/Centre: | Division of Interdisciplinary Sciences > Computational and Data Sciences |
Date Deposited: | 01 Aug 2023 05:08 |
Last Modified: | 01 Aug 2023 05:08 |
URI: | https://eprints.iisc.ac.in/id/eprint/82748 |
Actions (login required)
View Item |