Analysis of genomic sequences from peanut (Arachis hypogaea) Jayashree
B. Morag Ferguson Dan Ilut Jeff Doyle Jonathan
H. Crouch* *Corresponding author Financial
support: Legume genomics, bioinformatics and molecular breeding
research at ICRISAT has benefited from unrestricted grants from the
governments of Keywords: Arachis hypogaea, codon usage, gene-based markers, peanut, SSR markers.
Peanut is an important legume crop across the world. However, in contrast to most legume crops, groundnut lacks taxonomic proximity to any major model genome. Most of the major legume crops and model legume systems are concentrated in a very limited region of the legume phylogenetic tree: viz. the sister clades of "halogalegina" (including Lotus and Medicago) and "phaseoloid/millettioids" (including soybean and common bean). In contrast, Arachis falls in the "dalbergioid" clade, which includes no other major crops. Comparisons of DNA sequences from taxa whose genomes have been partially or fully sequenced can to a certain extent provide estimates of relationships. Thus, when we generated 1312 new genomic sequences from groundnut, we compared these sequences with major crops and model genomes, with the intention of better understanding the relationship between the genomes of groundnut, Lotus, soybean, Medicago and Arabidopsis. This approach also provided functional annotations for some of the new SSR markers that we had developed, which can be used for the development of candidate gene-based markers. We also report on the codon usage amongst the dicot species studied.
Two peanut
genomic DNA libraries were constructed, one following digestion
of the whole genome with the PstI restriction enzyme and the other
after Sau3AI/BamHI digestion. Both libraries were probed using γP32-dATP
labeled oligonucleotide probes. Sequencing was performed using an
ABI3700 automated DNA Analyzer (Applied
Biosystems, A total of 1312 sequences were selected for the analysis of which 448 contained SSR (simple sequence repeats) motifs. The sequences were assembled using cap3 sequence assembly software with default parameters. The resulting singletons and consensus sequences were screened using tBLASTx against Arabidopsis thaliana, soybean, Medicago truncatula and Lotus japonicus TIGR Gene Indices. A transformation of the expect values was carried out to pick up the best e-value across hits from all four databases. The resultant best hit was then used to annotate the sequence if there was a minimum of 30% identity over at least 20% of the protein sequence (Quackenbushet al. 2000; Quackenbushet al. 2001). Tandem repeats were searched for, using RepeatFinder (http://tandem.bu.edu/trf/trf.advanced.submit.html).
About 38.5% (475 singlets and contigs from a total of 1233) of the Arachis sequences had significant similarity with sequences in public databases at the amino acid level. All contigs and singletons were searched against Arabidopsis, soybean, Lotus and Medicago Gene Indices. Of the sequences that found matches in any of the four databases, the maximum number of matches was with sequences from the Lotus Gene Indices. A total of 475 of the 1233 non-redundant sequences have been assigned a putative identity. Of these 475 sequences that have been annotated, 222 were assigned their putative annotations from Lotus sequences (46.7%), 127 (26.7%) from Arabidopsis, 89 from soybean and 37 from Medicago. While the present study is limited to sequence comparison, which cannot by itself explain relationships at the species level, this does provide indication of the similarities, differences and relative distances amongst the legumes and beyond. The codon usage table for groundnut was compared with the reported codon usage tables for Arabidopsis, soybean, Medicago and Lotus. The codon usage pattern for A. hypogaea most closely resembles that of Lotus, with 15 of the 18 most frequently used codons in groundnut being the most commonly used in Lotus. This general trend in codon usage reflects the overall relationships proposed from the sequence analysis data, although the codon usage strategy is expected to vary at the level of the individual genes. Thus, it appears from these preliminary comparisons that Lotus could be the most appropriate model species for peanut comparative genomics and genome evolution studies. We anticipate that the sequences and their annotations reported in this study will prove useful for those interested in plant comparative genomics in general, and legume researchers and molecular breeders in particular.
QUACKENBUSH, J.; CHO, J.; LEE, D.; LIANG, F.; HOLT, I.; KARAMYCHEVA, S.; PARVIZI, B.; PERTEA, G.; SULTANA, R. and WHITE, J. The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Research, 2001, vol. 29, no. 1, p. 159-164. QUACKENBUSH,
J.; LIANG, F.; HOLT, Note: Electronic Journal of Biotechnology is not responsible if on-line references cited on manuscripts are not available any more after the date of publication. |
|||||||||||||||||
Home | Mail to Editor | Search | Archive |