ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Classification of SARS-CoV-2 viral genome sequences using Neurochaos Learning

Harikrishnan, NB and Pranay, SY and Nagaraj, N (2022) Classification of SARS-CoV-2 viral genome sequences using Neurochaos Learning. In: Medical and Biological Engineering and Computing, 60 (8). pp. 2245-2255.

[img]
Preview
PDF
med_bio_eng_com_60-8_2245-2255_2022.pdf - Published Version

Download (2MB) | Preview
Official URL: https://doi.org/10.1007/s11517-022-02591-3

Abstract

Abstract: The high spread rate of SARS-CoV-2 virus has put the researchers all over the world in a demanding situation. The need of the hour is to develop novel learning algorithms that can effectively learn a general pattern by training with fewer genome sequences of coronavirus. Learning from very few training samples is necessary and important during the beginning of a disease outbreak when sequencing data is limited. This is because a successful detection and isolation of patients can curb the spread of the virus. However, this poses a huge challenge for machine learning and deep learning algorithms as they require huge amounts of training data to learn the pattern and distinguish from other closely related viruses. In this paper, we propose a new paradigm – Neurochaos Learning (NL) for classification of coronavirus genome sequence that addresses this specific problem. NL is inspired from the empirical evidence of chaos and non-linearity at the level of neurons in biological neural networks. The average sensitivity, specificity and accuracy for NL are 0.998, 0.999 and 0.998 respectively for the multiclass classification problem (SARS-CoV-2, Coronaviridae, Metapneumovirus, Rhinovirus and Influenza) using leave one out crossvalidation. With just one training sample per class for 1000 independent random trials of training, we report an average macro F1-score > 0.99 for the classification of SARS-CoV-2 from SARS-CoV-1 genome sequences. We compare the performance of NL with K-nearest neighbours (KNN), logistic regression, random forest, SVM, and naïve Bayes classifiers. We foresee promising future applications in genome classification using NL with novel combinations of chaotic feature engineering and other machine learning algorithms.

Item Type: Journal Article
Publication: Medical and Biological Engineering and Computing
Publisher: Springer Science and Business Media Deutschland GmbH
Additional Information: The copyright for this article belongs to the Author(s).
Keywords: Decision trees; Deep learning; Diseases; Genes; Logistic regression; Nearest neighbor search; Sampling; Support vector machines, Approximation theorem; Coronaviruses; Genome classification; Genome sequences; Learn+; Neurochaos; SARS-CoV-2; Training sample; Universal approximation; Universal approximation theorem, SARS, Article; Coronaviridae; deep learning; gene sequence; Influenza virus; k nearest neighbor; leave one out cross validation; logistic regression analysis; machine learning; Metapneumovirus; multiclass classification; nerve cell network; nonhuman; nonlinear system; random forest; Rhinovirus; SARS coronavirus; sensitivity and specificity; Severe acute respiratory syndrome coronavirus 2; virus genome
Department/Centre: Division of Biological Sciences > Centre for Infectious Disease Research
Date Deposited: 19 Sep 2022 08:39
Last Modified: 19 Sep 2022 08:39
URI: https://eprints.iisc.ac.in/id/eprint/76593

Actions (login required)

View Item View Item