BIP - Analysis of genomic sequences from peanut (Arachis hypogaea)

Jayashree B.
Bioinformatics and Computational Biology Unit
International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)
Patancheru, Andhra Pradesh 502 324, India
Tel: 91 40 30713071
Fax: 91 40 30713075
E-mail: b.jayashree@cgiar.org

Morag Ferguson
International Institute for Tropical Agriculture (IITA)
c/o ILRI, P.O. Box 30709
Nairobi, Kenya
Tel: 254 20 422 3000
Fax: 254 20 422 3001
E-mail: m.ferguson@ilri.exch.cgiar.org

Dan Ilut
Department of Plant Biology
Cornell University
Ithaca, NY 14853-430, USA
E-mail: dcil@cornell.edu

Jeff Doyle
Department of Plant Biology
Cornell University
Ithaca, NY 14853-430, USA
Tel: 1 607 255 7972
Fax: 1 607 255 7979
E-mail: jjd5@cornell.edu

Jonathan H. Crouch*
M.S. Swaminathan Applied Genomics Lab.
International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)
Patancheru, Andhra Pradesh 502 324, India
Tel: 52 55 5804 7574
Fax: 52 55 58047558
E-mail: j.crouch@cgiar.org

Financial support: Legume genomics, bioinformatics and molecular breeding research at ICRISAT has benefited from unrestricted grants from the governments of UK, Japan and European Union, and from the Generation Challenge Program.

Keywords: Arachis hypogaea, codon usage, gene-based markers, peanut, SSR markers.

Peanut is an important legume crop across the world. However, in contrast to most legume crops, groundnut lacks taxonomic proximity to any major model genome. Most of the major legume crops and model legume systems are concentrated in a very limited region of the legume phylogenetic tree: viz. the sister clades of "halogalegina" (including Lotus and Medicago) and "phaseoloid/millettioids" (including soybean and common bean). In contrast, Arachis falls in the "dalbergioid" clade, which includes no other major crops. Comparisons of DNA sequences from taxa whose genomes have been partially or fully sequenced can to a certain extent provide estimates of relationships. Thus, when we generated 1312 new genomic sequences from groundnut, we compared these sequences with major crops and model genomes, with the intention of better understanding the relationship between the genomes of groundnut, Lotus, soybean, Medicago and Arabidopsis. This approach also provided functional annotations for some of the new SSR markers that we had developed, which can be used for the development of candidate gene-based markers. We also report on the codon usage amongst the dicot species studied.

Materials and Methods

Two peanut genomic DNA libraries were constructed, one following digestion of the whole genome with the PstI restriction enzyme and the other after Sau3AI/BamHI digestion. Both libraries were probed using γ^P32-dATP labeled oligonucleotide probes. Sequencing was performed using an ABI3700 automated DNA Analyzer (Applied Biosystems, Foster City CA). Low quality sequences were eliminated using Sequencher (Gene Codes, Ann Arbor, MI). The sequences are now available with Genbank Accessions: BZ999351-CC000573.

A total of 1312 sequences were selected for the analysis of which 448 contained SSR (simple sequence repeats) motifs. The sequences were assembled using cap3 sequence assembly software with default parameters. The resulting singletons and consensus sequences were screened using tBLASTx against Arabidopsis thaliana, soybean, Medicago truncatula and Lotus japonicus TIGR Gene Indices. A transformation of the expect values was carried out to pick up the best e-value across hits from all four databases. The resultant best hit was then used to annotate the sequence if there was a minimum of 30% identity over at least 20% of the protein sequence (Quackenbushet al. 2000; Quackenbushet al. 2001). Tandem repeats were searched for, using RepeatFinder (http://tandem.bu.edu/trf/trf.advanced.submit.html).

Results and Discussion

About 38.5% (475 singlets and contigs from a total of 1233) of the Arachis sequences had significant similarity with sequences in public databases at the amino acid level. All contigs and singletons were searched against Arabidopsis, soybean, Lotus and Medicago Gene Indices. Of the sequences that found matches in any of the four databases, the maximum number of matches was with sequences from the Lotus Gene Indices. A total of 475 of the 1233 non-redundant sequences have been assigned a putative identity. Of these 475 sequences that have been annotated, 222 were assigned their putative annotations from Lotus sequences (46.7%), 127 (26.7%) from Arabidopsis, 89 from soybean and 37 from Medicago. While the present study is limited to sequence comparison, which cannot by itself explain relationships at the species level, this does provide indication of the similarities, differences and relative distances amongst the legumes and beyond.

The codon usage table for groundnut was compared with the reported codon usage tables for Arabidopsis, soybean, Medicago and Lotus. The codon usage pattern for A. hypogaea most closely resembles that of Lotus, with 15 of the 18 most frequently used codons in groundnut being the most commonly used in Lotus. This general trend in codon usage reflects the overall relationships proposed from the sequence analysis data, although the codon usage strategy is expected to vary at the level of the individual genes. Thus, it appears from these preliminary comparisons that Lotus could be the most appropriate model species for peanut comparative genomics and genome evolution studies. We anticipate that the sequences and their annotations reported in this study will prove useful for those interested in plant comparative genomics in general, and legume researchers and molecular breeders in particular.

References

QUACKENBUSH, J.; CHO, J.; LEE, D.; LIANG, F.; HOLT, I.; KARAMYCHEVA, S.; PARVIZI, B.; PERTEA, G.; SULTANA, R. and WHITE, J. The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Research, 2001, vol. 29, no. 1, p. 159-164.

QUACKENBUSH, J.; LIANG, F.; HOLT, I.; PERTEA, G. and UPTON, J. The TIGR gene indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Research, 2000, vol. 28 no.1, p. 141-145.

Note: Electronic Journal of Biotechnology is not responsible if on-line references cited on manuscripts are not available any more after the date of publication.