![]() |
Genomics BIO 427 |
Instructors: Louise Temple and Jon Monroe JMU Department of Biology |
|
Copyright Jon Monroe and
Louise Temple,
2/18/07
Exercise 6. Multiple Sequence Alignment using ClustalW Clustal W is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It also calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. Evolutionary relationships can be seen by viewing Cladograms or Phylograms. |
| A. Multiple alignments of 11 putative
fimbrial subunit genes from Bordetella avium Use the ClustalW multiple sequence alignment program, which can be accessed through a number of sites, but for this exercise please use one at the European Bioinformatics Institute. Under "Data Resources and Tools" select "Sequence Similarity & Analysis." Take a moment to look at the long list of different programs you can access from the left side of this site, then select "ClustalW." For this submission, retain all the default values in the various boxes. Copy the following data set into the box and click on “Run”. These sequences are 11 copies of putative fimbrial subunit genes in the Bordetella avium genome. 1. Are these proteins orthologs or paralogs of one another? The format for submitting sequences is very specific, although several formats are accepted. Usually, the best idea for making a file with the sequences you want to compare, is to download the sequences in a specific format, like FASTA, compile them into one text file without spaces between entries, and then copy the entire file into the page. You will note that every sequence is preceded by a number and a description, but before the number is a “>” sign. That tells the program that the previous sequence has ended. Now have a look at your results. If you do not have a button across from “Jalview” in the set of 10 items just under “Results” you will need to activate Java Applets or choose another browser. The computer help desk people should be able to help you. The first set of results is called the “Scores Table.” 2. Notice that each of the
eleven sequences are aligned with the others and scores are
provided. Resort the entries by Alignment Sore. Which 2 pairs
have highest scores and which 3 pairs have
the lowest scores.
Moving to the alignment itself (below the Scores Tables), scan over the aligned sequences. 3. Just perusing casually,
which pair of sequences seem to be most similar? Is there one
sequence that appears to be the most different? What
do you deduce that the double and single dots and asterisks below the
sequences mean? How many amino acids are perfectly conserved in
this gene family? Could all of these proteins have a conserved
disulfide bond?
Further down the page, you see a tree. This tree is actually a ClustalW guide tree. It indicates the degree of similarity between sequences based on pairwise comparisons between each of the sequences. This guide tree is not a true phylogenetic tree. However, we can construct one using the alignment that you just made. Click on the "View Alignment File" button. Select and copy all the text on this page. Return to the clustalw2 submission page and paste in your alignment. Under "Tree Type," select the "nj" option and click "submit." The resulting tree is a true neighbor-joining tree based on your alignment. We will be discussing neighbor-joining and other types of trees in class in the coming weeks. You also have the choice of viewing your tree as a cladogram or phylogram. Look at the trees using both options. Look up the definitions of phylogram and cladogram on this glossary.4. What is the difference in
these two ways of viewing?
Comment on the high and low results
from the Scores Table and the relatedness of the sequences as shown by
the Phylogram.
Return to the top of the results screen and click on “Start Jalview.” A new screen will appear with the alignment in living colors and a graph below. 5. Looking at the
"quality graph" near the bottom of the page, where are the
most
conserved regions in this protein family? Are these results
consistent with your answer to question 2?
From the top menu, click on "calculate" then "Calculate Tree" then “Neighbor joining tree.” This may not work in all browsers but it should work on a PC running Explorer. 6. How does this tree
compare to the Phylogram? Put in a line by clicking in the
figure. What results from this action?
Look at the sequences again (in either the JalView or regular data page). 7. What part of the
proteins are the most different? Speculate on why this part of
the proteins might be divergent.
|