The Knowledgebase and Analytical Tools of Bioinformatics

The Knowledgebase and Analytical Tools of Bioinformatics
(A Survey of World Wide Resources on the Web)

K. Sundaram
Department of Crystallography and Biophysics
University of Madras, Guindy Campus, Madras 600025

Introduction
The rapid increase in the flow of information on the Internet resulting from the bandwidth explosion can truely be likened to an avalanche. Not only has the rate of information flow increased phenominally, but equally so the spread of connectivity and the storage capacity and processing power at the various distributed centers. In fact, the volume of data available to bioinformatics specialists is so large that it defies presentation using the conventional print medium. An appropriate method is to make a Web index or "Home Page" document based on the Hyper Text Mark-up Language (HTML) with hot links or pointers to various resources. This will be the mode of the actual presentation of this article during the Workshop. What follows here will be a selective coverage of the Index document.
As the title of this article suggests, bioinformatics comprises of various different types of resources. Firstly, there are the databases. Then there are the tools (computer programs) to analyse and draw inferences from the contents of the databases. Of more recent origin are automated alert services and java applets that tap the client side computing resources in addition to those on the server side.
In the following we will give a selected listing of the resources of these various categories and provide brief descriptions of their contents and functionalities.
Databases
By far the largest databases are concerned with nucleotide sequences of DNA and aminoacid sequences of proteins.
An annotated collection of all publicly available DNA sequences is maintained in a database called the GenBank by the National Center for Biotechnology Information (NCBI) of the National Institutes of Health (NIH), USA. As of December 1996 the total holding was approximately 730,500,000 bases in 1,115,000 sequences. The database available for public access is updated once every two months. There are two other major DNA databases in the world, one maintained by the European Molecular Biology Laboratory (EMBL) and the other by the DNA DataBank of Japan (DDBJ). The GenBank, EMBL and DDBJ have joined hands to form the International Nucleotide Sequence Database Collaboration, under which they exchange data on a daily basis.
The Division of Biomedical Information Sciences of the Johns Hopkins University, School of Medicine hosts the Human Genome Database (GDB). The ambitious long term objective of the Human Genome Project is to map all the genes of the entire human genome encompassing all variants and diversities.
Two principal databases of protein sequences are, SWISS-PROT and Protein Information Resource(PIR) maintained at the Georgetown University Medical Center supported by the Division of Research Resources of the NIH.
SWISS-PROT contains sequences translated from the EMBL Nucleotide Sequence Database. A small part of the information in SWISS-PROT was originally adapted from information contained in the (PIR).
A characteristic feature of SWISS-PROT is extensive annotation of the various sequences held. Data are stored in a format similar to that of the EMBL Nucleotide Sequence and all the data are easily retrievable by computer programs.
Some of the other protein sequence databases are, The Mendelian Inheritance in Man data bank (MIM) prepared under the supervision of Victor McKusick at John Hopkins University.
The PROSITE dictionary of sites and patterns in proteins prepared by Amos Bairoch at the University of Geneva.
The restriction enzymes database (REBASE) prepared by Richard Roberts and Dana Macelis at New England BioLabs.
The G-protein--coupled receptor database (GCRDb) prepared by Lee Frank Kolakowski at the Massachusetts General Hospital Renal Unit.
The EcoGene section of the EcoSeq/EcoMap integrated Escherichia coli K12 database and the StyGene section of StySeq/StyMap integrated Salmonella typhimurium LT2 database, both prepared by Ken Rudd at the NCBI.
The gene-protein database of Escherichia coli K12 (2D-gel spots)(ECO2DBASE).
The SubtiList relational database for the Bacillus subtilis 168 genome prepared under the supervisation of Ivan Moszer at the Pasteur Institute.
The LISTA database of yeast (Saccharomyces cerevisiae) genes coding for proteins prepared under the supervisation of Patrick Linder at the University of Geneva.
The human keratinocyte 2D gel protein database from the universities of Aarhus and Ghent.
The human 2D gel protein database (SWISS-2DPAGE) of the Faculty of Medicine of the University of Geneva.
The Yeast Electrophoresis Protein Database (YEPD) prepared under the supervisation of Jim Garrells from the Quest Protein Database Center of the Cold Spring Harbor Laboratory.
The Drosophila genome database (FlyBase) prepared under the supervisation of Michael Ashburner at the Department of Genetics, University of Cambridge. (http://flybase.bio.indiana.edu:82/)
The Maize genome database (MaizeDB) developed by the USDA-ARS Maize Genome Project as part of the National Agricultural Library's Plant Genome Research Program.
The WormPep database prepared by Richard Durbin and Erik Sonnhammer from the MRC Laboratory of Molecular Biology and Sanger Center at Hinxton Hall, Cambridge.
The DictyDb database prepared by Douglas W. Smith and Bill Loomis from the University of California, San Diego (UCSD).
The Human Retroviruses and AIDS compilation of nucleic and amino acid sequences (HIV Sequence Database) edited by G. Myers, A.B. Rabson, S.F. Josephs, T.F. Smith, J.A. Berzofsky, F. Wong-Staal; published by the Theoretical Biology and Biophysics Group T-10 at Los Alamos National Laboratory; and funded by the AIDS program of the National Institute of Allergy and Infectious Diseases through an interagency agreement with the United States Department of Energy.
The database of Homology-derived Secondary Structure of Proteins (HSSP) prepared under the supervisation of Chris Sander at the EMBL.
The transcription factor database (Transfac) developed by Edgar Wingender and Rainer Knueppel from the Gesellschaft fuer Biotechnologische Forschung mbH in Braunschweig.
The Protein Data Bank (PDB) maintained by the Brookhaven National Laboratory, and supported by the United States National Science Foundation, the Division of Research Resources of the NIH and the United States Department of Energy is quite distinct from the sequence databases and deserves special mention. It contains the detailed three dimentional structural information on proteins and some nucleic acids whose structures have been solved using experimental techniques like x-ray crystallography, nuclear magnetic resonance, etc.
A Nucleic Acid Database (NDB) emulating the PDB for nucleic acids is maintained at the Rutgers University. The NDB server is located at URL: (http://ndbserver.rutgers.edu/interface/)
NRL_3D Contains entries for which an X-Ray crystal structure exists in Brookhaven. The codes for these entries start with NRL_ followed by the Brookhaven database code.
Database Associated Tools
The voluminous data contained in the libraries will be of no use unless convenient techniques are provided to browse them and retrieve data from them selectively. Hence many of the database keepers themselves provide various browsing and retrieval tools which are computer programs. In this section we will describe briefly some of the prominent tools.
Two of the most widely used sequence matching software are, FASTA and the Basic Local Alignment Search Tool (BLAST)
FASTA was developed by Lipman and Pearson in 1985. FASTA considers exact matches between short substrings of two sequences. If a significant number of such exact matches is found, FASTA uses the dynamic programming algorithm to compute optimal alignments.
This approach allows to trade speed for precision: The larger we choose the substring length, the smaller is the number of exact matches. This makes the program faster, but loses precision: It becomes less likely that the optimal alignment contains enough exact matches of the given length, and the procedure may find nothing. Nevertheless, experience shows that with sensibly chosen parameters, FASTA misses very few cases of significant homology. FASTA is available from: (ftp.virginia.edu in pub/fasta)
BLAST, developed by Altschul et al. in 1990, is another heuristic based on a similar idea. BLAST focusses on no-gap alignments of (again) a certain, fixed length. Rather than requiring exact matches, BLAST uses a scoring function to measure similarity (rather than distance). In particular for proteins, one can argue that segment pairs with no gaps and a high similarity score indicate regions of functional similarity. For a given threshold score BLAST reports to the user all database entries which have a segment pair with the query sequence that scores higher than the prescribed score. If the scoring function used has a probabilistic interpretation, BLAST can also give an assessment of the statistical significance of the matches it reports.
A BLAST search can be carried out interactively through a link from the NCBI home page. NCBI also provides a BLAST server that can be accessed through e-mail. The BLAST home page provides three links, BLAST help, Basic BLAST search, and Advanced BLAST search. The Basic search provides a search with default parameters, including filtering for low complexity regions. The Advanced search allows a user to specify a number of BLAST parameters. There are five different BLAST modules, that perform the following searches:
BLASTP compares an amino acid query sequence against a protein sequence database; BLASTN compares a nucleotide query sequence against a nucleotide sequence database; BLASTX compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database; TBLASTN compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands). TBLASTX compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
NCBI also provides an associated search tool called ENTREZ for searching the comment fields of the sequence databases. To search with ENTREZ, select which database you want to search (e.g., GenBank) from the ENTREZ home page. A Search Page is opened, where the search query (keyword/ term/phrase/ partial word) is entered. The number of hits is reported. The search may be refined by progressively decreasing (or increasing) the number of hits by introducing additional search terms that are either ANDed (or ORed) to the original term. ENTREZ is a powerful tool which enables one to search the comments field of sequence entries but even bibliographic entries in a section of MEDLINE. An online help is provided.
Most journals which publish articles relating to protein or nucleic acid sequence determination require the author(s) to submit the sequence to one of the appropriate major databases and obtain an accession number as a precondition to the publication of the article. One of the typical uses of ENTREZ is to obtain the actual sequence of a protein or nucleic acid whose sequence determination is reported in a journal article. In this situation the accession number would be chosen as a search term after choosing the concerned database.
The human genome project provides a number of biologist friendly tools to search the entries in the database. These tools are accessible from the home page of the GDB: (http://gdbwww.gdb.org/).
Automated Alert Services
Nowadays it is even possible to get alerted automatically if a new relative of your favoured sequence appears in any protein sequence databases. For example, the EMBL has one such alert service, (http://swan.embl-heidelberg.de:8080/register_sequence.html) The user can customise the homology search parameters that are used to check the daily updates of the major protein sequence databases and will be informed immediately by email.
DBWatcher (Fridiric Plewniak, Bioinformatics, I.G.B.M.C., Strasbourg, France) is a program handling periodic BLAST searches to find similarities to your own sequences. It keeps track of the previous searches and only performs new ones when necessary. Only novel similarities are reported, thus saving the time of browsing through bulky result files. When executed daily (as a cron job) it ensures that you are informed as soon as new sequences similar to yours are incorporated into a given database. Results are sent by electronic mail to one or several addresses. DBWatcher can now be run as a client remotely. Sources are available for download from : (ftp://ftp-igbmc.u-strasbg.fr/pub/DBWatcher/dbwatcher.tar.Z)
Utility Software and Services
1. Sequence Related Software
  A number of utility programs which assist sequence analysis based research are available as either freeware or shareware.
  The SEQIO package is a C/C++ package (or library) developed by James Knight at the University of California, Davis, which makes reading and writing sequences and biological databases easy. The following file formats are supported: Raw/Plain, GenBank, PIR (CODATA), EMBL, Swiss-Prot, FASTA, NBRF, IG/Stanford, ASN.1 text files, GCG, MSF, PHYLIP, Clustalw, and output from the FASTA and BLAST suites of programs
  SSEARCH in Bill Pearson's FASTA package is a C program for a Smith - Waterman search. This code is for searching/aligning a query sequence against an entire database using the S+W algorithm.
  The sim.c algorithm (and others) is located at: (http://globin.cse.psu.edu/ftp/dist/sim/)
  SorFind, RepFind, and PromFind are three programs to analyse protein sequences, developed by Dr. Gordon B. Hutchinson, Department of Medical Genetics, University of British Columbia, Canada. SorFind predicts coding exons in vertebrate genomic DNA. RepFind identifies common repetitive elements in DNA sequence. PromFind predicts promoter regions in vertebrate DNA sequence.
  AutoGene(AUG) is a neucleotide sequence analysis program developed by Andrew Ptitsyn and collaborators. AUG contains programs for: FASTA-GenBank -EMBL-AUG sequence format conversion; -ALU and L1HS rearch; -Polyadenilation site recognition; -Vertebrate promoter site recognition; -Vertebrate exon/ introne structure recognition; and some others. AUG can be obtained through ftp from: (ftp://ftp.bionet.nsc.ru). AutoGene also includes an exon-finding subsystem at: (ftp://ftp.bionet.nsc.ru/pub/biology/autogene).
  Microbe Software has published a program called, Plasmid Toolkit. It is an intuitive Windows program for producing publication quality plasmid maps with or without sequence data. Microbesoft's address on the Web is: (http://ourworld.compuserve.com/homepages/microbesoft)
  There is a program called Gene Construction Kit that can be used to generate plasmid drawings. The program also does other useful things such as generating restriction maps of imported DNA sequences. The Gene Construction Kit is available at URL: (http://www.textco.com/).
  ShadyBox is a drawing program which enables you to box and shade regular and irregular shaped segments of aligned multiple sequences. It was designed with the intention of producing PostScript output suitable for use in publications. It is also possible to colour regions of sequences, individual residues or residues of specified frequency. ShadyBox can be obtained from: ANGIS- The Australian National Genomic Information Service. (http://www.angis.su.oz.au) or (ftp://ftp.angis.su.oz.au/pub/unix)
  ProMSED (Protein Multiple Sequences EDitor) for Windows is an easy-to-use application for automatic and manual multiple protein sequences alignment, alignment editing, analysis and printing. Interface and the main functions are similar to Microsoft Word. ProMSED can align complete set of sequences, its subset and any selected block, providing thus flexible tool for sequences analysis, visualization, edition and illustrations preparation. ProMSED has been developed by Dr. Alexey Eroshkin of the Institute of Molecular Biology, State Research Center of Virology and Biotechnology, "Vector", Koltsovo, Novosibirsk Region 633159, Russia, and is available at (ftp://ftp.ebi.ac.uk/pub/software/dos/promsed/)
  GeneDoc is a full featured Multiple Sequence Alignment Editor and Shading Utility with Phylogenetic tree support. GeneDoc is also intended to help with the publication aspects of genetics research work by providing features such as shading, page and font layout. GeneDoc can read either .MSF multiple Sequence alignment files or can Import Fasta Format files to be saved as a .MSF file project.
  Phylogenetic software PIWE and NONA written by Pablo Goloboff are available from the Willi Hennig Society's software pages: (http://www.vims.edu/~mes/hennig/software.html)
  ANTHEPROT (ANalyze THE PROTeins) is available from the Institut de Biologie et Chimie des Proteines. UPR 412-CNRS, Lyon Cedex, FRANCE. Well known sequence formats are supported. The helical wheel diagram with the possibility of moving along the sequence coupled with a real-time coupled 3D view of the helix in alpha carbon view is included. It can be obtained through ftp from: (ftp://ftp.ibcp.fr).
  SCOP: Structural Classification of Proteins database, hierarchically organizes all proteins of known structure according to their structural and evolutionary relationships. The database can be accessed at URL: (http://scop.mrc-lmb.cam.ac.uk/scop/)
  ALSCRIPT takes an alignment and produces PostScript. Download instructions are at: (ftp://geoff.biop.ox.ac.uk/README).
  AMAS server allows a multiple alignment to be analysed for interesting conservation patterns. PostScript output with boxing and shading of the alignment is provided for. The AMAS server site on the Web is: (http://geoff.biop.ox.ac.uk/servers/amas_server.html).
  GELPICTURE reads a contig from the Fragment Assembly database and displays a diagram of the gel alignments and a printout of the aligned gel sequences and consensus. GELPICTURE has been modified to include the sequence direction in both sections of the output, and to mark with '=======' any consensus sequence that is correct (agrees with every fragment) and has been sequenced in both directions.
  GELFIGURE produces a graphical summary of a contig in a fragment assembly project. The output is in four sections: a redundancy plot, a diagram of the directoions and orientations of sequence fragments, a restriction map and a plot of open reading frames.
  The plot is intended both as a quality guide during the course of a sequencing project, and as a final report for a completed assembly.
  SeqPup developed by Don Gilbert (Biocomputing, University of Indiana, Bloomington) is a versatile biological sequence editor and analysis program usable on the common computer systems of Macintosh, MS-Windows and X-Windows. SeqPup can be obtained through ftp: (ftp://iubio.bio.indiana.edu/molbio/seqpup/) or from the Web at: (http://iubio.bio.indiana.edu/1/IUBio-Software%2bData/molbio/seqpup/)
  The sequencing of the entire DNA of the S. cerevisae genome completed recently marks a major event in the history of biology. The analysis of these gene products will provide powerful tools for reading the genomes of other eukaryotes, particularly those of higher eukaryotes. The analysis of the yeast genome has provided a useful framework for the annotation of many of the complete genome projects currently nearing completion, as well as the upcoming human genome. A yeast web page has been set up by the Bio-Molecular Engineering Center at BU by Jim Freeman. The yeast sequence information of this webpage was obtained from the GeneQuiz Consortium and the Mips Genome Commission and an attempt has been made to integrate these two data structures as well as to supplement their annotation with that obtained From a set of functionally diagnostic patterns (Adams, R. M., et al. Protein Science 5, 1240-49, 1996). The yeast web page hosts the following search tools:
  User sequence as query (via blast): (http://bmerc-www.bu.edu/protein-seq/wwwblast.html) User keyword as query: (http://bmerc-www.bu.edu/protein-seq/yeast-keyword-search.html) Unix egrep regular expression as a sequence query: (http://bmerc-www.bu.edu/protein-seq/yeast-egrep-search.html).
  ASSET (Aligned Segment Statistical Evaluation Tool) includes 3 programs: asset, purge and scan. The PURGE program removes closely related sequences from an input file prior to running asset. This is important in order to reduce input sequence redundancy. The command syntax for purge is: purge , where determines the maximum blosum62 relatedness score between any two sequences in the output file (the output file is created with the name .b). A score between 100 and 200 is recommended. The scan program scans a database for sequences that contain motifs detected by asset. ASSET will produce a "scan file" of the locally aligned segment blocks by using the -f option; specifies the percentage of sequences in the input file that are required to contain a motif before the corresponding motif block can be included in the scan file. The scan file is given the name .sn. The input file should be in FASTA format.
  A program for the prediction of transmembrane helices using neural networks (PHDhtm) is available at EMBL: (http://www.embl-heidelberg.de/predictprotein/predictprotein.html)
  Cutter is a web-based service that analyzes a given sequence for restriction enzyme sites and gives an easy-to follow analysis. (http://www.ccsi.com/firstmarket/cutter/cutter%2b.html)
  The Genome Sequencing Facility at Brookhaven National Laboratory also hosts Restriction enzyme analysis of DNA sequences at: (http://genome1.bio.bnl.gov/cgi-bin/bbq?para=rea&MODE=0)
  DNA2Prot XFCN by Jared Roach (c) August 1996 translates DNA sequences.
  PROMOTER SCAN II is a program developed to recognize and predict POL II promoters in genomic DNA sequences. Presently it is limited to mammalian promoter sequences, and is set to find approximately 60-70% of promoter sequences never before seen by the program, with an expected false positive rate of less then 1 in 30,000 single-stranded bases (based upon cross validation tests). The program is accessable on the web at URL: (http://biosci.umn.edu/software/proscan/promoterscan.htm )
  A comprehensive package of DNA/protein sequence analysis programs can be accessed from: (http://www.webgenetics.com).
  ConsInspector is a program to scan nucleic acid sequences for matches to a precompiled library of transcription factor binding sites. The program carries out an extensive examination of binding site candidates: the real sequence is compared with randomly shuffled versions and sequence regions surrounding the conserved binding site are included into the analysis. ConsInspector is available for UNIX and VAX/VMS for ftp at: (ftp://ariane.gsf.de/pub/unix/) and (ftp://ariane.gsf.de/pub/vax/), respectively, or through the Web site: (http://www.gsf.de/biodv/consinspector.html).
  XPound is an exon predicting program .
  BLOCK (pedigree and linkage analysis in large complex pedigrees) written by Claus Skaanning Jensen, Aalborg University, Denmark, implements a method called blocking Gibbs sampling. The method is based on the Markov chain Monte Carlo method, Gibbs sampling, but combines this stochastic method with exact local computations to get a method that can successfully handle very large and complex (e.g., inbred) pedigrees (thousands of individuals). The method allows the user to test the presence of linkage between two genes. BLOCK is on the Web at: (http://www.cs.auc.dk/~claus/block.html)
  MatInd and MatInspector, are tools for the definition and detection of consensus matches in DNA sequences. MatInspector uses a large library of predefined matrix descriptions of transcription factor binding sites to locate matches in nucleotide sequences of unlimited length. This library is based on TRANSFAC database: (http://transfac.gbf-braunschweig.de/TRANSFAC). MatInd, MatInspector together with the library are available for UNIX and DOS at: (ftp://ariane.gsf.de/pub/) MatInspector can also be used interactively at (http://www.gsf.de./biodv/matinspector.html)
  GENET is an On-line searchable DataBase. The database presents known gene networks organizations and includes maps of gene-gene interactions, sequences and structure of known regulatory elements, and links to GenBank and Medline references. GENET is on the Web at: (http://www.iephb.ru/~spirov/genet00.html)
  Many popular computational biology software are available for the popular Linux (PC unix freeware) platform, e.g, these include: clustalw (alignment), readseq (conversion of sequence files), phylip (phylogenetic analysis), (GDE (Genetic data enviroment)), ACeDB c.elegans (Genome database), xbbtools (visual sequence analysis), seaview (visual alignment editor), phylo_win (visual phylogenetic analysis), blast(database search), and fasta.
  ProAnalyst is an easy-to-use, state-of-the-art MS-DOS application designed to solve traditional and new tasks of protein science. Developed at the State Research Center of Virology and Biotechnology, Koltsovo, Russia, and by Vladimir Ivanisenko and Alexey Eroshkin, ProAnalyst is available from EBI software library: (ftp://ftp.ebi.ac.uk/pub/software/dos/proanalyst/). ProAnalyst is basically an advanced statistical analysis program which, for instance, relates experimental data to protein primary and tertiary structure, finds relationships between protein sites' characteristics (hydrophobicity, amphipathicity, etc.) and protein activities, investigates differences between proteins divided by functional, evolutionary or other criteria (for example, relates genotype to phenotype), etc. ProAnWin is a similar program for use in Windows environment. In the EBI software library ProAnWin is at: (ftp://ftp.ebi.ac.uk/pub/software/dos/proanwin)
  PUZZLE is a maximum likelihood program for reconstructing phylogenetic trees from nucleotide and amino acid sequence data. It is available free of charge over the Internet and runs on all popular systems. It is distributed by the European Bioinformatics Institute: (ftp://ftp.ebi.ac.uk/pub/software/dos/puzzle) (DOS version) (ftp://ftp.ebi.ac.uk/pub/software/mac/puzzle) (MacOS version) (ftp://ftp.ebi.ac.uk/pub/software/unix/puzzle) (UNIX version) (ftp://ftp.ebi.ac.uk/pub/software/vms/puzzle) (VMS version)
  Emmanuel Skoufos of the Yale University School of Medicine has set up a new gene discovery page. The purpose of this page is to serve as a "desktop" area, primarily for the bench scientist with little biocomputing background. It organizes existing search engines in a coherent, stepwise fashion providing one of the many strategies that may lead to gene discovery. Questions that this page helps to answer are of the type: "Does a particular sequence of DNA code for proteins and what may their function be?" or "Is there a protein in organism A homologous to protein X of organism B?", etc. The principal site and a European mirror site are available at: (http://www.geocities.com/CapeCanaveral/1915/gdp.html) (http://konops.imbb.forth.gr/~topalis/mirror/gdp.html)
  SSPAL- Prediction of protein secondary sturcture by using local alignments has been published by Victor V. Solovyev (http://dot.imgen.bcm.tmc.edu:9331/pssprediction/pssp.html)
  FGEBEHB - search for Mammalian gene structure with exons assembling by dynamic programming and using similarity information with known proteins by data base scaning with fasta. FEXHB - search for Mammalian coding exons using exon recognition functions and similarity information with known proteins by data base scaning with fasta. Some additional information about Gene-Finder programs can be obtained from: (http://dot.imgen.bcm.tmc.edu:9331/gene-finder/gf.html)
  Other utilities available at this site are: FGENEH - search for Mammalian gene structure with exons assembling by dynamic programming FEXH - search for 5'-, internal and 3'-exons HEXON - search for internal exons HSPL - search for splice sites RNASPL - prediction exon-exon junctions in cDNA sequences CDSB - prediction of Bacterial coding regions HBR - recognition of human and bacterial sequences to test a library for E. coli contamination by sequencing example clones TSSG - recognition of human promoter regions (Ghosh/Prestridge motif data) TSSW - recognition of human promoter regions (Weingender motif data base) POLYAH - recognition of of 3'-end cleavage and polyadenilation region of human mRNA precursors FGENED - search for Drosophila gene structure with exons assembling by dynamic programming FEXD - search for Drosophila 5'-, internal and 3'-exons DSPL - search for Drosophila splice sites FGENEN - search for Nematode gene structure with exons assembling by dynamic programming FEXN - search for Nematode 5'-, internal and 3'-exons NSPL - search for Nematode splice sites FGENEA - search for Plant gene structure with exons assembling by dynamic programming FEXA - search for Plant 5'-, internal and 3'-exons ASPL - search for Plant splice sites SSP - prediction of a-helix and b-strand in globular proteins by segment- oriented approach. NSSP - prediction of a-helix and b-strand segments in globular proteins by nearest-neighbor algorithm. PSITE - search for PROSITE patterns with statistics
  DynaClip is a program designed to trim a little bit off of the 5' and 3' ends of DNA sequence reads. DynaClip can be found at: (http://weber.u.washington.edu/~roach/Programs/)
  Lasergene, a modular program package by the company DNASTAR has a decent sequence alignment module that uses the Clustal method. It also has a PCR primer selection module.(mailto: sales@dnastar.com)
  CLONE is a program that can identify RE sites, ORFs, cut, ligate etc. It has a companion ENHANCE, which gives a decent number of ways to present your cloning strategy and another companion, PRIMER, which helps in primer design and is rather flexible.
  WWW site at EMBL (www.ebi.ac.uk) has a number of programs for DNA and protein analysis, e.g., BBSEQ for sequence conversion, MACAW for sequence similarity for primer design, Clustal and Phylip for sequence alignment, etc.
  Mac emulator Executor is a program that enables Apple Mac programs to be run on Wintel (Windows/ Intel) machines. Executor runs under Linux, DOS, and Windows95.
  BioOnline Store hosts a text based search engine. (http://synapse.bio.com/cgi-bin/bio)
  clustalw documentation and on-line help is available in html format at: (http://www-igbmc.u-strasbg.fr/BioInfo/ClustalW/)
  BioComputing Division of the Virtual School of Natural Sciences, a member school of the Globewide Network Academy, conducted a Course on BioComputing in 1996, using the electronic conferencing system BioMOO. An account of the course, and access to its hypertext book and related materials, are available at: (URL:http://www.techfak.uni-bielefeld.de/bcd/welcome.html).
  The edited transcript includes a link to the main FastA v 3.0 FTP distribution site and to previous lectures, and is available at the following locations (WWW/hypertext): (http://www.techfak.uni-bielefeld.de/bcd/Lectures/pearson3.html) (http://merlin.mbcr.bcm.tmc.edu:8001/bcd/Lectures/pearson3.html) (http://www.biotech.ist.unige.it/bcd/Lectures/pearson3.html)
  PROSITE is a program that enables long aminoacid sequence patterns to be searched in sequence databases. PROSITE can be found at: (http://www.genome.ad.jp/SIT/MOTIF.html)
  Visual sequence editor (VISED) is a multiple sequence editor for Microsoft Windows platforms. It features viewing and editing of uptp 200 sequences simultaneously, boxed output, pattern search function using PROSITE syntax, Translation and other simple nucleic acid functions. VISED is available at: (ftp.bio.indiana.edu/molbio/ibmpc)
  Plasmid Processor is a simple tool for plasmid presentation for scientific and educational purposes. It features both circular and linear DNA, user defined restriction sites, genes and multiple cloning site. In addition you can manipulate plasmid by inserting and deleting fragments. Created drawings can be copied to clipboard or saved to disk for later use. Printing from withing program is also supported. Plasmid Processor was developed by T. Kivirauma, P. Oikari and J.Saarela of the Department of Computer Sciences and Applied Mathematics, University of Kuopio, Finland. It can be obtained from URL: (http://www.uku.fi/~kiviraum/plasmid/plasmid.html)
  Wentian Li, Laboratory of Statistical Genetics, Rockefeller University maintains a number of bibliographies for the benefit of computational biologists. A bibliography on Computational Gene Recognition is available at: (http://linkage.rockefeller.edu/wli/gene) Another bibliography on long-range correlations in DNA sequences is at: (http://linkage.rockefeller.edu/wli/dna_corr) Papers on short-range and middle-range correlations in DNA sequences will also be included. A comprehensive list of computer software for genetic linkage analysis and genetic map construction can be found at: (http://linkage.rockefeller.edu/soft/list.html) For each program, the following information is provided whenever possible: description, authors, web or ftp site, source code language, operating systems the program runs on, references.
  TACG is a character-based, command line tool for the restriction enzyme analysis of DNA for unix-like operating systems. Written by Harry Mangalam, UC Irvine, it can be obtained by from: (ftp://mamba.bio.uci.edu/pub/tacg/)
  NRSub (the Non-Redundant Bacillus subtilis data base) is available through anonymous FTP at: (ftp://biom3.univ-lyon1.fr/pub/nrsub/) or (ftp://ftp.nig.ac.jp/pub/db/nrsub/) It is also possible to access NRSub through two World Wide Web servers at: (http://acnuc.univ-lyon1.fr/nrsub/nrsub.html) or (http://ddbjs4h.genes.nig.ac.jp/)
  A Protein Secondary Structure Prediction (DSC) program has been developed by Ross D. King of the Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, London. Two prediction modes are available:
  1) Given a single sequence. A multiple sequence alignmentt will be formed and DSC used to predict secondary structure. This mode can be accessed at: (http://www.icnet.uk/bmm/dsc/dsc_form_align.html)
  2) Given a multiple sequence alignmnet. DSC will use this alignment to predict secondary structure. This mode can be accessed at: (http://www.icnet.uk/bmm/dsc/dsc_read_align.html)
  The C source code of DSC is available and can be obtained through ftp: (ftp://ftp.icnet.uk/icrf-public/bmm/king/dsc/dsc.tar.z)
  Ross Overbeek, et al. at the Argonne National Laboratory have developed the WIT/PUMA2 system that supports metabolic reconstructions and integration of sequence and phylogenetic and metabolic information in a coherent interactive environment. It consists of two parts, WIT and PUMA WIT -- an interactive tool for an expert biologist, which allows one to develop a metabolic reconstruction for an organism (from a complete, or partial genome). WIT is available at: (http://www.cme.msu.edu/WIT/) PUMA is a growing repository of the metabolic models for the organisms, developed in WIT.
  Several kinds of bioinformatics related searches can be executed - to GenBank, the MEDLINE molecular biology subset, OMIM, Entrez, the BCM Search Launcher, PDB, BLAST, GDB, etc., from the Biological Data Transport Web resource, (http://www.data-transport.com)
  Drawtree is a program written by Joe Felsenstein that produces tree pictures. It is included in the PHYLIP software package. The 386 DOS executables for PHYLIP are available at: (http://evolution.genetics.washington.edu/phylip.html)
  An alternative to DRAWTREE is the program TreeView written by Rod Page, which runs under Windows and supports a range of tree file formats. The tree pictures can be cut and pasted into other applications, as well as saved as a Windows metafile (recognised by most drawing and word processing programs). For more information, visit the site: http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
  Don Gilbert of Indiana University has produced TreeDraw Deck. (ftp://iubio.bio.indiana.edu/)
  EditBase is a program developed by Purdue Reserach Fundation & USDA/ARS. It is useful for DNA cloning, etc.
  Bootscanning (c) Mika Salminen, Wayne Cobb, Henry M. Jackson Foundation is a method for anaysis of viral recombination. It can be used to compare an unknown, suspected recombinant sequence, to a set of predefined potential parental sequences. It should be independent of organism, but, based on tests on HIV-1 it appears to work only for sufficiently variable genes. GDE and Phylip are required to run the package, and at the current time, only SUN executables are available. However, the source-code is also included. Bootscanning is available at: (http://hivgenome.hjf.org/)
2. Software for Studies Other Than Sequence
  Hyperchem models molecules and does very good minimization via molecular mechanics and quantum mechanics. Hyperchem is available at: (http://www.ppgsoft.com/)
  MolScript is a molecular graphics program written by Per Kraulis A description and a pointer to an alternative program is available at: (http://www.bocklabs.wisc.edu/Molscript.html)
  NAMD is a high-performance molecular mechanics program for simulating large biomolecular systems on parallel and distributed computers developed by the Theoretical Biophysics group at the University of Illinois and the Beckman Institute. This software is made available to the molecular modeling community free of charge, and includes commented source code, documentation for users and programmers, and precompiled binaries for HP and SGI workstations. Detailed documentation and the software are at: (http://www.ks.uiuc.edu/Research/namd/) (ftp://ftp.ks.uiuc.edu/pub/mdscope/namd/)
  Kevin Shreder, University of California, San Diego has constructed the Antibody Resource Page. The webpage contains educational links about antibodies (some with incredible graphics), links to on-line journals that cover antibody-related topics, an essay on the study of antibody molecular recognition, links to on-line antibody sequencing and hybridoma databases, and a miscellaneous section. There is also a large section designed to help those looking for an antibody. This latter section contains more than 60 links to on-line companies that sell antibodies, many which have searchable catalogues. This section also contains useful tips on how to find antibodies using the internet or otherwise. The Antibody Resource Page is at: (http://www-chem.ucsd.edu/Faculty/goodman/antibody.html/abpage.html)
  Several shareware software are available for common molecular applications, e.g., MOPAC for energy minimization, Babel to convert output to PDB format, and Molden for molecular model building.
  It contains a lot of information on proteases and a link to the MAGE software, that could be useful for any teaching program on enzymes and proteins. The Prolysis server is at URL: (http://prolysis.phys.univ-tours.fr/Prolysis)
  Kinemage, RasMol and Linus!Lite are three programs which are particularly useful for teaching purposes. RasMol is particularly useful when used in combination with "ChemScape Chime". Linus!Lite can produce "ray trace" images very efficiently and the resultant images can be manipulated (rotation). You can also generate movies with Linus!Lite with little effort. Unfortunately, Linus!Lite is not available for Wintel.
  PDB files can be downloaded (xxx.full) as text files and then directly viewed through RasMol. For viewing through Mage, the file must first be read in with RasMol as PDB files and written out as kinemage files by issuing the command, "write kinemage filename.kin" in the command window, where "filename.kin" is the name of the output file.
  The URL's of Rasmol, Chemscape and Linus!Lite are, respectively, (http://www.umass.edu/microbio/rasmol/getras.htm) (http://www.mdli.com/mdlhome.html) (http://www.blc.arizona.edu/linus/linus.html)
  A modified version of Rasmol is available at U.C. Berkley; it enables fiddling and twiddling, i.e., fragments can be rotated about a bond as an axis. The first version was based on RasMac 2.5, and supports the PPC chip; Beta test versions of Rasmol2.6-ucb are available for Mac, Windows, Linux, Ultrix, and HP-UX. This version of RasMol is available at: (http://hydrogen.cchem.berkeley.edu:8080/Rasmol/)
  Three-dimensional structural information for S. cerevisiae proteins is now available through the Saccharomyces Genome Database web site at the following URL: (http://genome-www.stanford.edu/Sacch3D)
  The following programs can be used to measure the nucleic acid parameters and/or construct a new nucleic acid from scratch based on a new set of parameters:
  1) BIOSYM/MSI InsightII and MacroModel for DNA. 2) MC-SYM (http://www.iro.umontreal.ca/people/major/mcsym.html) for RNA 3) NAB for DNA (http://scripps.edu/case) 4) newhel93 for DNA available in PDB (http://www.gdb.org/hopkins.html) 5) Rasmol (http://www.gdb.org/hopkins.html)
  Nemesis from Oxford Molecular allows modeling, manipulations, as well as energy calculations. The Web Address of Oxford Molecular is: (http://www.oxmol.co.uk/PRODUCTS/nemesis_top.html)
  Modeller by Andrej Sali at Rockefeller University, is a homology based modeling program for proteins. It is available at the URL: (http://guitar.rockefeller.edu)
  Swiss-Model is another such program which gives good result if the sequence is highly homologus to a known structure. Swiss-Model is at: (http://expasy.hcuge.ch/swissmod/SWISS-MODEL.html)
  Raster3D runs on Unix derivatives and gives nice protein ribbon drawings, especially if used together with Molscript.
  NIH-Imager can be found at http://rsb.info.nih.gov/nih-image/ and Image Tool (from University of Texas in San Antonio) can be found at: (ftp://maxrad6.uthscsa.edu/pub/it).
  NIH hosts an impressive site with Molecular Modeling resources at: (http://cmm.info.nih.gov/modeling/quick_finder.html) Particularly noteworthy are "Molecules R Us", a search engine, graphic display of molecules in different forms and a PDB viewer helper application.
  The UCLA-DOE Protein Fold Recogntion Automated Server takes an amino acid sequence and searches the known structures to find a compatible fold. In addition, it automatically provides the results from other sequence analysis programs. The server is at: (http://www.mbi.ucla.edu/people/frsvr/frsvr.html)
  NCBI's Entrez provides a means to visualize molecular structure data. The viewer, Cn3D, is part of Network Entrez client programs and can also be used as a helper application for the Web version of Entrez. A full description of the Cn3D program may be found at the NCBI's Structure Group home page: (http://www.ncbi.nlm.nih.gov/Structure)
  3DBbrowse is a Web based browser, that makes it easy to search and retrieve data from the Protein Data Bank (PDB). It allows the user to rapidly search through the contents of the entire PDB Archive for entries obeying certain constraints. A full text search (based on the Glimpse indexing and query system) can be made for any string appearing in the text of a PDB entry. 3DBbrowse is available at: (http://pdb.pdb.bnl.gov/3DBbrowser.html)
  PovChem is a program that takes pdb files as input, and uses the ray- tracing program PovRay to produce graphics. PovChem is available at: (http://ludwig.scs.uiuc.edu/~paul/PovChem.html)
  NAOMI - a program for studying 3-D structures of proteins is available from the Web site: (http://www.ocms.ox.ac.uk/~smb/Software/N_details/naomi.html) or via anonymous ftp: (ftp://nmrz.ocms.ox.ac.uk/pub/smb/naomi)
  SETOR solid model macromolecules program can be found at: (http://scsg9.unige.ch/fln/setorlic.html)
  Gamess-UK is a general purpose ab initio quantum chemistry package distributed by Computing for Science Ltd. The program can be used to study a wide range of chemical phenomena (including biological problems, such as drug design and enzyme catalysis). The program is available free to UK academics. Non-UK academics pay a nominal fee to cover administration and installation costs. More information on GAMESS-UK is available on the World Wide Web (http://www.dl.ac.uk/CFS)
  ProFit is a least squares fitting program, written by Andrew Martin of University College, London. It performs the basic function of fitting one protein structure to another. One can specify subsets of atoms to be considered, and zones to be fitted by number, sequence, or by sequence alignment. The program will output an RMS deviation and optionally the fitted coordinates. Zones for calculating the RMS can be different from those used for fitting. ProFit is available from: (http://www.biochem.ucl.ac.uk/~martin/#programs)
  GRAMM (Global RAnge Molecular Matching) is a program for protein docking written by Ilya Vakser. To predict the structure of a complex, it requires only the atomic coordinates of the two molecules (no information about the binding sites is needed). The program performs an exhaustive 6-dimensional search through the relative translations and rotations of the molecules. The molecular pairs may be: two proteins, a protein and a smaller compound, two transmembrane (TM) helices, etc. GRAMM may be used for high-resolution molecules, for inaccurate structures (where only the gross structural features are known), in cases of large conformational changes, etc.
  The Global Range Molecular Matching (GRAMM) methodology is an empirical approach to smoothing the intermolecular energy function by changing the range of the atom-atom potentials. The technique allows to locate the area of the global minimum of intermolecular energy for structures of different accuracy. The quality of the prediction depends on the accuracy of the structures. Thus, the docking of high-resolution structures with small conformational changes yields an accurate prediction, while the docking of ultralow-resolution structures will give only the gross features of the complex. The GRAMM site on the Web is (http://guitar.rockefeller.edu/).
  Presently, GRAMM is compiled on SGI R4000, SGI R4400, SGI R8000, and SGI R10000 Unix workstations. In the near future I will expand this list, so check the GRAMM site for the updates. Interestingly, GRAMM also works on a PC platform under Windows95 (the performance on P5-120 with 16 MB RAM is only two times slower than on SGI 250 MHZ Indigo2 R4400).
  SnB (Shake-and-Bake) is a simulated annealing software hosted at Roswell Park Memorial Institute, Buffalo. SnB can be obtained from: (http://www.hwi.buffalo.edu/SnB/).
  A web server for prediction of protein secondary structure percentages from UV circular dichroism spectra has been established by Merelo and Andrade of the University of Granada. 41 CD values ranging from 200 nm to 240 nm are to be submitted (given in deg cm^2 dmol^-1 multiplied by 0.001) and the server gives back the estimated percentages of helix, beta and rest of secondary structure of your protein plus an estimate of the accuracy of the prediction. The prediction is done using a Kohonen neural network with a 2-dimensional output layer. The http address of the k2d server is: (http://kal-el.ugr.es/k2d/spectra.html) The program can be downloaded from: http://www.embl-heidelberg.de/~andrade/k2d.html
  Several programs are available to support Voltage Clamp technique of membrane biophysics, e.g., PClamp a DOS program and a commercial package MicroCal Origin which has a PClamp module. Lars Thomsen has made a program PROFILE that generates a voltage profile as used when doing whole cell recordings. It is a true 32bit WIN95 program and thus provides good graphics support and transferability. It requires a minimum screen resolution of 800x600 pixels. Whenever a setting is changed the picture is updated and copied automatically to the clipboard. PROFILE is available at: (http://home.interlynx.net/~lthomsen/index.htm).
  NASA's Ames Research Center hosts a good home page pertaining to 3D reconstruction from 2D images. (http://biocomp.arc.nasa.gov:80/3dreconstruction/)
  There is a package called SwaN-MR written by Dr. Balacco which does all NMR processing on a Macintosh. It can be downloaded from sfdzuma.usc.es, in the directory /pub/NMR.
  Hanqing Wu has created a homepage of "Online EPR Spectrum Simulation through CGIEMAIL" at: (http://www.uwm.edu/~hanqing/watoc/oleprsm.htm". Please look)
  LEE (Latent Energy Environments) is an artificial life simulator developed by Richard Belew and Filippo Menczer of the University of California, San Diego. LEE can be obtained through anonymous FTP: (ftp://cs.ucsd.edu/pub/LEE), or through a link from the URL: (http://www.cs.ucsd.edu/users/fil/lee/lee.html)
  Foster Findlay Associates, Newcastle Upon Tyne, have developed PC_Image for Windows 95 and Windows 3.1 and several other software for image processing and analysis. These can be obtained through URL: (http://www.demon.co.uk/ffaltd/)
  There are nice pages on search engines at: (http://www.unige.ch/crystal/w3vlc/int.index.html) (http://scsg9.unige.ch/fln/setorlic.html)
  MathPad is freeware and available at info-mac or at (http://pubpages.unh.edu/~whd/MathPad/).
  David Mathog, Manager, sequence analysis facility, biology division, Caltech, has prepared a very informative and popular comparative table of various molecular modeling/molecular display/related programs with respect to portability. The table, listed alphabetically, is reproduced below. The table is periodically updated by David Mathog and can be seen at: (http://seqaxp.bio.caltech.edu:8000/www/molec_model_progs.html)
```
             MIPS      ---DEC Alpha----         Intel
What         SGI       DU            WNT      Windows   GraphicsType

Biosym       Yes       No*           No*           No*       GL
Grasp        Yes       No            No            No        GL
MidasPlus    Yes       No            No            No        GL
molmol       Yes       Yes           No            No        X11
O            Yes       No            No            No        GL
rasmol       Yes       Yes           ?             Yes       X11/Windows
Setor        Yes       No            No            No        GL
VMD          Yes       No            No            No        GL
XtalView     Yes       Yes           No            No        GL/X11

* Separately licensed "Axxess" product lets Biosym run as an X11 client.
```
3. X-ray Crystallography Related Programs
  PHASES: A Program Package for the Processing and Analysis of Diffraction Data from Macromolecules" is available from Dr. William Furey, Biocrystallography Laboratory, VA Medical Center, Pittsburgh. Phase extension using partial structure information, MAD phasing, using molecular replacement models, NC symmetry averaging (which can be done with multiple crystal forms), are well supported. Support for SGI's R8000 series processors and IRIX 6.2 is included, one can now input SCALEPACK files directly.
  M.Capel of Brookhaven National laboratory has posted a software suite for visualizing and integrating two-dimensional diffraction images. The suite supports fuji, mar and multi-wire PSD detector formats, and has a wealth of different operational modes including: Circular, Sectorial, Norms along a central radial, Norms along an arbitrary vector, line/column extraction, Angular dependence of sector, Optimization of detector parameters, etc. The software runs on IRIX and Linux, and ports to SUN, HP and VMS are in progress. It is documented and made available via anonymous ftp at (http://crim12b.nsls.bnl.gov/x12b_downloads.html)
  CrystalDesigner, developed by Crystal Structure Design AS, Oslo, Norway, is a tool for building, studying and visualising all kinds of crystal structures on the Macintosh platform. CrystalDesigner is an ideal tool for both teaching and scientific studies. The software is intended to be used by students and teachers at colleges and universities, as well as in industrial research. CrystalDesigner is available at: (http://www.crystaldesigner.no) or through ftp: (ftp://ftp.crystaldesigner.no/).
  A major and very popular software for the refinement of molecular structures using x-ray crystallographic and solution nmr spectroscopic techniques is X-PLOR. This has been developed by Axel T. Brunger of Yale University. Internet access to X-PLOR(online) is now available for non-profit (academic) users holding a license for X-PLOR version 3.1 from Yale University, free of charge. Access instructions are available in the X-PLOR home page (http://xplor.csb.yale.edu.) The following is a summary of enhanced features introduced in a recent release:
  X-ray crystallography:
  1. Major update of all X-ray crystallographic tutorial files using new syntax.
  2. torsion angle molecular dynamics for crystallographic refinement.
  3. probabilistic MAD phasing.
  4. sigmaa-weighting for electron density maps with optional cross-validation.
  5. difference, anomalous difference, and Fo-Fc electron density maps.
  6. Cross-validated coordinate error estimates by Luzzati and Sigmaa method.
  7. Script files for molecular replacement with multiple molecules
  8. Automated water picking procedure
  9. New bulk solvent refinement procedure
  10. Example for resolution-dependent weigthing scheme for refinement
  11. Direct rotation function.
  12. Phased translation function.
  13. scalepack/denzo -> X-PLOR(online) conversion program
  14. X-PLOR(online) -> PDB deposition script (for crystal structures)
  Solution NNR spectroscopy:
  1. J-coupling restraints
  2. proton chemical shift restraints
  3. carbon chemical shift restraints
4. Java Applets
  One of the most exciting developments in Bioinformatics in recent times has been the application of Java to provide qualitatively enhanced information transfer over the Internet. In simple terms, Java enables a service provider, or server site to transmit not only a certain set of information, but also related appropriate routines, or programs to manipulate and process the data sent in response to commands entered interactively by the (client) user. The Java routines (applets) run on the client machine and thus relieves the server site and the network of considerable burden. Latest versions of most popular Web browsers have built in modules to interpret Java applets. Browsers having such a capability are referred to as Java enabled browsers.
  Here below are listed some of the early bioinformatics applications using Java.
  MDL's Chemscape Chime plugin for Netscape, for displaying the interactive rotating 3D Models has made a mark as an excellent teaching tool. The URL for MDL is: (http://www.mdli.com/mdlhome.html) Steve Williams and coworkers have developed some good examples to illustrate the use of Chime in Biology teaching. These can be seen at (http://iptunix.bcm.bham.ac.uk/sjwb/models.html), or, (http://www.birmingham.ac.uk/biochemistry) The MDL site itself provides links to some excellent examples, e.g., one for the photosynthetic reaction centre from the purple bacterium Rhodopseudomonas viridis.
  Frank R. Gorga of Duquesne University has put together a simple (sophomore level) web-based tutorial on isomers of organic molecules at URL: (http://nexus.chemistry.duq.edu/~gorga/stereo/intro.htm)
  Dirk Walther of EMBL has implemented a Java based PDB-structure viewer at (http://www.embl-heidelberg.de/~walther/JAVA/pdb.html)
  Luca Ida Giovanni of EMBL has written a Java-based restriction map program, as part of a collection of Java based solutions in molecular biology at: (http://www.embl-heidelberg.de/~toldo/JaMBW.html)
  Andrei Grigoriev has added a calculator which uses the model of Roach for random fingerprinting to the set of physical mapping calculators. These can be viewed with a Java-enabled browser at: (http://www.mpimg-berlin-dahlem.mpg.de/~andy/calc/mapcalc.html)
  QuickPDB is A Sequence/Structure Search and Display Java Applet developed by Ilya Shindyalov and Phil Bourne. QuickPDB is a lightweight applet with two major functions:
  1. Find a structure in the PDB based upon a text or sequence search.
  2. Render that structure with ability to mark up fragments in sequence and see them accordingly in structure view.
  QuickPDB accesses the most current (nightly updated) version of the PDB database located on San Diego Supercomputer Center (SDSC) servers and uses the new index-based database structure for fast search and retrieval developed at SDSC. The URL to access QuickPDB is: (http://xtal1.sdsc.edu/misha/QuickPDB.html).
  Tai Y. Fu of the University of British Columbia has released Java Lattice (A Java applet for viewing crystal packing of Protein Structure Database files). Features include double buffering, colored models, control buttons, zooming, etc. Java Lattice can be visited at: (http://laue.biochem.ubc.ca:8080/cgi-bin/ssis/kelowna/latte.html)
5. Conclusion
  However overwhelming the above compilation may look, it may be mentioned that this is still only a fraction of the growing list of resources. The present compilation is based on discussions taking place among members of several Bionet News Groups mid 1996. In fact the present compilation itself can be converted into an Internet resource as will be illustrated during the presentation. With a few additions the present document will become an Index page with active links to the various URL's mentioned here. Each paragraph here would virtually explode into a voluminous repertoire of documents, programs and services with the power of the HTML and WWW technologies. Innovative tools, appropriate computer and network resources and cooperative

The Knowledgebase and Analytical Tools of Bioinformatics (A Survey of World Wide Resources on the Web)

K. Sundaram Department of Crystallography and Biophysics University of Madras, Guindy Campus, Madras 600025

The Knowledgebase and Analytical Tools of Bioinformatics
(A Survey of World Wide Resources on the Web)

K. Sundaram
Department of Crystallography and Biophysics
University of Madras, Guindy Campus, Madras 600025