Molecular Biology and Genetics
EJB Electronic Journal of Biotechnology ISSN: 0717-3458 Vol.1 No.3, Issue of December 15, 1998.
© 1998 by Universidad Católica de Valparaíso -- Chile Received 26 August 1998 / Accepted 9 October 1998
RESEARCH ARTICLE

Using a neural network to backtranslate amino acid sequences

Gilbert White
Department of Biological Sciences
Clark Atlanta University
223 James Brawley Dr., S.W.
Atlanta, GA 30314, USA

William Seffens*
Department of Biological Sciences and Center for Theoretical Study of Physical Systems
Clark Atlanta University
223 James Brawley Dr., S.W.
Atlanta, GA 30314, USA
Tel: 404-880-6822 (USA) Fax: 404-880-6756 (USA)
E-mail: wseffens@cau.edu

http://www.cau.edu

*Corresponding author

Keywords: Amino acids, , Backtranslation, , Genetic code, Neural network, Nucleic acids


Financial Support:
This work was supported (or partially supported) by NIH grant GM08247, Research Centers in Minority Institutions award G12RR03062 from the Division of Research Resources, National Institutes of Health and NSF CREST Center for Theoretical Studies of Physical Systems (CTSPS) Cooperative Agreement #HRD-9632844.

Abstract Full Text

A neural network (NN) was trained on amino and nucleic acid sequences to test the NN’s ability to predict a nucleic acid sequence given only an amino acid sequence. A multi-layer backpropagation network of one hidden layer with 5 to 9 neurons was used. Different network configurations were used with varying numbers of input neurons to represent amino acids, while a constant representation was used for the output layer representing nucleic acids. In the best-trained network, 93% of the overall bases, 85% of the degenerate bases, and 100% of the fixed bases were correctly predicted from randomly selected test sequences. The training set was composed of 60 human sequences in a window of 10 to 25 codons at the coding sequence start site. Different NN configurations involving the encoding of amino acids under increasing window sizes were evaluated to predict the behavior of the NN with a significantly larger training set. This genetic data analysis effort will assist in understanding human gene structure. Benefits include computational tools that could predict more reliably the backtranslation of amino acid sequences useful for Degenerate PCR cloning, and may assist the identification of human gene coding sequences (CDS) from open reading frames in DNA databases.

Supported by UNESCO / MIRCEN network
Home | Mail to Editor | Search | Archive