Bioinformatics: an essential tool for Biotechnology

It does not seem necessary to justify the inclusion of an editorial about Bioinformatics in the Electronic Journal of Biotechnology. Actually, it is rare to find a biotechnologist that is not familiar with the usual tools of Bioinformatics; however, the amazing amount of information and tools available and the complexity of the biological systems tackled require continuous updating.

From the initial mere accumulation of sequences and their comparison now Bioinformatics includes evolution, protein modeling, genomic analysis and comparison of entire genomes and proteomes, biomarker data analysis, expression profiles and pathway and disease modeling.

The magnitude of the information available in public data bases and free access to a multitude of web servers is overwhelming.

In fact, in the prestigious journal Nucleic Acids Research 719 data bases were listed in the January issue of 2005 by Michael Y. Galperin (http://nar.oxfordjournals.org/cgi/content/full/33/suppl_1/D5) and in the June number 166 web services were reported (http://nar.oxfordjournals.org/cgi/content/full/33/suppl_2/W1).

Considering that these sources of information and paraphernalia represent only those that have been evaluated by the journal, it can be inferred that the total existing number would be much higher. Therefore, the main problem for a biotechnologist to properly use bioinformatics is to select the appropriate databases and programs that would better serve his/her research. This is so because even similar databases contain different information (and mistakes) and dissimilar programs use diverse algorithms which provide diverse and sometimes contradictory results.

Much information about the pros and limitations of the major databases could be obtained from their tutorials (http://www.ncbi.nlm.nih.gov/, http://www.ebi.ac.uk/embl/); however, the comparison of similar ones requires an experienced operator.

Two approaches have been used to solve this problem: an important effort in Bioinformatics education and the establishment of networks of specialists that provide generous advice.

In Europe, as early as 1988, EMBnet (European Molecular Biology Network) was created as a net of national nodes for Bioinformatics research and to provide services to their local communities. Since its creation EMBnet has evolved from an informal network of individuals in charge of maintaining biological databases into the only organization world-wide bringing bioinformatics professionals to work together to serve the expanding fields of genetics and molecular biology.

Although composed predominantly of academic nodes, EMBnet gains an important added dimension from its industrial members. The success of EMBnet is attracting increasing numbers of organizations outside Europe to join. EMBnet has a tried-and-tested infrastructure to organize training courses, give technical help and help its members effectively interact and respond to the rapidly changing needs of biological research in a way no single institute is able to do. In 2005 the organization created additional types of nodes to allow more than one member per country.

Apart from the European countries, Australia, Canada, China, India, Israel, Russia and South Africa and, from Latin America, Argentina, Brazil, Chile, Colombia, Cuba and México also constitute EMBnet.

For many years EMBnet collaborated with GCG developing tools for nucleic acid analysis but when the original sources of GCG were no longer provided, a suite of more than 150 programs was put together under the name of EMBOSS (European Molecular Biology Open Software Suite) that is freely available for non profit organizations (http://emboss.sourceforge.net/) which can be used without knowledge of computing languages through the interface wEMBOSS or Jemboss.

Also EMBnet in collaboration with LION Bioscience is organizing the SRS Federation to use a disperse set of SRS servers acting in a concerted manner to provide the user community with a high quality / high availability source of biomolecular and associated data with a large set of databases updated daily. The SRS federation project is run on a purely voluntary basis for a limited time by EMBnet members (initially Sweden, Slovakia and Belgium, later joined by Brazil, Colombia Poland and Argentina) with the active support of the LION Bioscience (http://www.srsfed.org/).

In 2001, with the support of UNU-BIOLAC), a bioinformatics portal was established in Latin America (http://portal-bio.ula.ve/) that is now operated by the RIBIO (Red Iberoamericana de Bioinformática) sponsored by CYTED. RIBIO includes 3 nodes from Argentina and Brazil, 4 from Chile, 7 from Spain, and one from Colombia, Cuba, Ecuador, México, Perú, Uruguay and Venezuela. This network awards fellowships for young students to pursue research in laboratories of the region and to participate in specialized courses. In 2005, the First Jornadas Iberoamericanas de Bioinformática were organized with the support of AECI in Cartagena, Colombia, and the second ones will take place in Buenos Aires, Argentina, in December 2006.

Two other very important meetings will take place in Brazil in 2006: the International Society for Computational Biology (ISCB) is organizing the 14th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2006) for August 6 -10 in Fortaleza (http://ismb2006.cbi.cnptia.embrapa.br). Swiss-Prot, the manually curated section of the UniProt Knowlegdebase, will be celebrating its 20 years of service to the scientific community organizing a 5-day conference also in Fortaleza from July 30th to August 4th (http://www.swissprot20.org/).

Bioinformatics is moving forward into more complex areas, as it is stated in the subjects to be tackled at the ACM Symposium on Applied Computing (SAC) (http://www.acm.org/conferences/sac/sac2006/). Use of natural language processing techniques and/or artificial intelligence techniques to automatically extract multiple biological objects such as gene names, protein names, drugs, organisms, disease, etc., from free-text. Information and knowledge extraction such as object-object interactions (ex: protein interactions, functions, etc.). Software systems to support biological research that integrates multi-format and multi-type data from heterogeneous databases. Information visualization techniques for integrated biological systems. Clustering of very large dimensional data such as microarray and mass-spectrometry data. Clustering algorithms that support biological meaning. Network models and simulations of various pathways. Visualization techniques for network simulations Pathway estimation from genomic data. Computational methods that model cellular mechanisms, the protein machine, and regulatory networks. Algorithms for processing and interpreting large-scale mass-spectrometry data. Comparative genomics and genome dynamics (i.e., evolution of whole genomes, e.g., by translocations, reversals, duplications, etc.).

The field of Bioinformatics is expanding very fast, and its use is becoming most necessary for biotechnology every day. It seems obvious then, that biotechnologists should not only keep an eye on the developments in Bioinformatics, but also include courses in all graduate and postgraduate curricula to allow students an early contact with this information and tools.

Oscar Grau
Instituto de Bioquímica y Biología Molecular
Comisión de Investigaciones Científicas Pcia. Buenos Aires
Facultad de Ciencias Exactas
Universidad Nacional de La Plata
Calle 115 e/49 y 50 – 1900 La Plata, Argentina
Supported by UNESCO / MIRCEN network