genome Genomics
BIO 427
Instructors:
Louise Temple and  Jon Monroe
JMU Department of Biology

Home

Copyright Jon Monroe and Louise Temple, 2/18/07

Exercise 6.   Multiple Sequence Alignment using ClustalW


Clustal W is a general purpose multiple sequence alignment program for DNA or proteins.  It produces biologically meaningful multiple sequence alignments of divergent sequences.  It also calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen.  Evolutionary relationships can be seen by  viewing Cladograms or Phylograms.
A. Multiple alignments of 11 putative fimbrial subunit genes from Bordetella avium

Use the ClustalW multiple sequence alignment program, which can be accessed through a number of sites, but for this exercise please use one at the European Bioinformatics Institute.  Under "Data Resources and Tools" select "Sequence Similarity & Analysis."  Take a moment to look at the long list of different programs you can access from the left side of this site, then select "ClustalW."  For this submission, retain all the default values in the various boxes.  Copy the following data set into the box and click on “Run”.  These sequences are 11 copies of putative fimbrial subunit genes in the Bordetella avium genome. 

Fimbrial protein sequences from Bordetella avium

The format for submitting sequences is very specific, although several formats are accepted.  Usually, the best idea for making a file with the sequences you want to compare, is to download the sequences in a specific format, like FASTA, compile them into one text file without spaces between entries, and then copy the entire file into the page.  You will note that every sequence is preceded by a number and a description, but before the number is a “>” sign.  That tells the program that the previous sequence has ended.

Now have a look at your results.  If you do not have a button across from “Jalview” in the set of 10 items just under “Results” you will need to activate Java Applets or choose another browser.  The computer help desk people should be able to help you.  The first set of results is called the “Scores Table.”

1.  Notice that each of the eleven sequences are aligned with the others and scores are provided.  Resort the entries by Alignment Sore.  Which 2 pairs have highest scores and which 3 pairs have the lowest scores.

Moving to the alignment itself (below the Scores Tables), scan over the aligned sequences.

2.  Just perusing casually, which pair of sequences seem to be most similar?  Is there one sequence that appears to be the most different?  What do you deduce that the double and single dots and asterisks below the sequences mean?  How many amino acids are perfectly conserved in this gene family?  Could all of these proteins have a conserved disulfide bond?

Further down the page, you see a cladogram.  You also have the choice of viewing this as a phylogram tree.  Look at the trees using both options.  Look up the definitions of phylogram and cladogram on this glossary.

3. What is the difference in these two ways of viewing?  Comment on the high and low results from the Scores Table and the relatedness of the sequences as shown by the Phylogram.

Return to the top of the results screen and click on “Start Jalview.”  A new screen will appear with the alignment in living colors and a graph below. 

4.  Looking at the "quality graph" near the bottom of the page, where are the most conserved regions in this protein family?  Are these results consistent with your answer to question 2?

From the top menu, click on "calculate" then "Calculate Tree" then  “Neighbor joining tree.”  This may not work in all browsers but it should work on a PC running Explorer.

5.  How does this tree compare to the Phylogram?  Put in a line by clicking in the figure.  What results from this action?

Look at the sequences again (in either the JalView or regular data page). 

6.  What part of the proteins are the most different?  Speculate on why this part of the proteins might be divergent.