Molecular Biology and Genetics
EJB Electronic Journal of Biotechnology ISSN: 0717-3458 Vol. 3 No. 3, Issue of December 15, 2000.
© 2000 by Universidad Católica de Valparaíso -- Chile Received September 11, 2000 / Accepted November 17, 2000
RESEARCH ARTICLE

Predicting regulatory elements in repetitive sequences using transcription factor binding sites

Jorng-Tzong Horng*
Department of Computer Science and Information Engineering
National Central University
Taiwan
Tel: +886-3-4227151 Ext. 4519
Fax: +886-3-4222681

E-mail: horng@db.csie.ncu.edu.tw

Wen-Fu Cho
Applied Research Lab., Telecommunications Labs.
Chunghwa Telecom Co., Ltd.
Yang-Mei, Taoyuan, Taiwan
Tel: +886-3-4244197
Fax: +886-3-4244167

*Corresponding author

Financial Support: National Science Council of the Republic of China under Contract No. NSC 89-2213-E-008-061.
Keywords: binding sites, data mining, genomes, regulatory elements, transcription factors.


Abstract Full Text

Repeat sequences are the most abundant ones in the extragenic region of genomes. Biologists have already found a large number of regulatory elements in this region. These elements may profoundly impact the chromatin structure formation in nucleus and also contain important clues in genetic evolution and phylogenic study. This study attempts to mine rules on how combinations of individual binding sites are distributed repeat sequences. The association rules mined would facilitate efforts to identify gene classes regulated by similar mechanisms and accurately predict regulatory elements. Herein, the combinations of transcription factor binding sites in the repeat sequences are obtained and, then, data mining techniques are applied to mine the association rules from the combinations of binding sites. In addition, the discovered associations are further pruned to remove those insignificant associations and obtain a set of discovered associations. Finally, the discovered association rules are used to partially classify the repeat sequences in our repeat database. Experiments on several genomes include C. elegans, human chromosome 22 and yeast.

Supported by UNESCO / MIRCEN network
Home | Mail to Editor | Search | Archive