Biotechnology Educations

EJB Electronic Journal of Biotechnology ISSN: 0717-3458 Vol. 5 No. 2, Issue of August 15, 2002.
© 2002 by Universidad Católica de Valparaíso -- Chile  
BIP RESEARCH ARTICLE

Database integration with the Web for biologists to share data and information

Yulu Xia*
National Science Fundation Center for Integrated Pest Management
Department of Entomology
Department of Computer Science
North Carolina State University
Raleigh, North Carolina, USA
Tel: 919 513 1432
Fax: 919 513 1114
http://cipm.ncsu.edu
yulu_xia@ncsu.edu

Roland E. Stinner
National Science Fundation Center for Integrated Pest Management
Department of Entomology
Department of Computer Science
North Carolina State University
Raleigh, North Carolina 27695-7553, USA
Tel: 919 513 1432
Fax: 919 513 1114

Ping-Chu Chu
Department of Mathematics and Computer Science
Fayetteville State University
Fayetteville, North Carolina 28301-4298, USA
Tel: 910 672 1070
Fax: 910 672 1070
E-mail: Ping.Chu-Chu@uncfsu.edu

* Corresponding author


Keywords: biological data, DBMS, integration, relational database, programming, standardisation.

BIP Article  Reprint (PDF)

Biological sciences are data-intensive. There are enormous amount of data existed in the fields such as genomics and entomology, and new data are generated in exponentially increasing rate due to the adoption of newer laboratory technologies such as DNA micro-arrays. Providing universal access to share and utilize those data becomes increasing important and challenging. In this paper, author discuss the issues relating to integrate biological data with the World Wide Web (Web hereafter).

To integrate biological data with the Web, the first thing is to store data in database. Modern concept of database can be described as a collection of data managed by database management system (DBMS). A DBMS is a software system for creating, manipulating, and managing data. Some well-known DBMS include Oracle, Sybase, DB2, and SQL Server. Database allows us to store, retrieve, or modify data easily and efficiently regardless of the amount of data being manipulated. Another major advantage using database for biological data is that database can be easily integrated with the Web. This is a real revolution in terms of information sharing and exchanging. It brings us enormous opportunity and flexibility for sharing and utilizing biological data. Client can access biological data from any where at any time.

Development of a database can be a major effort or simple task depending on each project. Generally speaking, database development does not require previous programming experience. However, some knowledge on DBMS, Structural Query Language (SQL) which is a simple language for creating and manipulating relational database, and the major principles of database design is required before embarking on a database project. The next task is to integrate database with the Web once the database is in place.

To integrate database with the Web, one of many programming technologies, such as CGI, ASP, JSP, ColdFusion, and PHP, is needed. These technologies are usually based on one or more general purpose programming languages such as Perl, C, C++, and Java. Mastering one of the programming languages is a must for the task. Some of the languages are relative easy to learn and inexpensive to use. But they are generally less powerful. Other languages can be complicated and have longer learning curve. However, those languages are usually more power when dealing with large project. Choosing a suitable language is upon to your experience with the language and the nature of your project. Authors suggest starting from a small and simple project. This will help gain the experience needed for larger and more complicated projects later.

By integrating biological database with the Web, we provide universal access to our data. However, to achieve true universal data sharing, we need standardization. How can others share our genomic sequence data if there are multiple names for one gene? or there is no standard format for data submission and storing? Another increasing significance of data standardization is for automated data exchange, process, and publication. By using eXtensible Markup Language (XML) and other technology, we can let computer understand the meaning of the data and process the data automatically.

Many progresses have been made in process biological data using XML based technology. For example, Bioinformatic Sequence Markup Language (BSML) can be found at http://www.oasis-open.org/cover.bsml.html.

In summer, this paper covers some basics on database development, Web programming, and XML based data standardization technologies. Technological progresses in the fields have been extremely fast recently. To keep up with the newer technologies, one needs to update knowledge constantly.

Supported by UNESCO / MIRCEN network
Home | Mail to Editor | Search | Archive