![]() |
Genomics BIO 427 |
Instructors: Louise Temple and Jon Monroe JMU Department of Biology |
|
Copyright Jon Monroe and
Louise Temple, 3/8/07
Exercise 8. Introduction to Sequencher In
this exercise, you will see how to edit raw DNA sequence data, create
an overlapping “contig”, determine
open reading frames, and check for homology to known
genes/proteins. The data you will use represent three sequencing
reads from a bacteriophage genome used in the Genomics class,
spring 2006.
A. Loading data into "Sequencher" and learning your way around the program. 1. Launch the program Sequencher
4.6 Win Demo from the desktop.
Acknowledge that this is a Demo
version.
2. Click on File and select Import, then
Folder of Sequences. Find the CURData
folder and press OK.
3. Three files will be loaded into Sequencher; data1.ab1, data2.ab1 and data3.ab1. All three will be highlighted by default so to open only the first file, click in the window to deselect the files then double click on the data1.ab1. A new window with the sequence will appear. 4. Spend a few moments to become familiar with the program views and menus. You can toggle between a "Bases" view (the default) and an "Overview" illustrating various aspects of the sequence. In the bases view click on Cut Map to see the restriction sites in this sequence. Notice the clustered sites near the 5' end indicative of a multiple cloning site. This part of the sequence will be trimmed below. 5. Now click on Bases to return to the previous window, then click on Overview. The Diagram Key will be helpful here but notice that the sequence is illustrated in three reading frames. Since stop codons are illustrated by red hatch marks, which reading frame is likely to contain a gene sequence? 6. Now return to the bases view and choose Show Chromatogram to view the raw data from the sequencing machine. Find the many “Ns” at the 3' end of the sequence. N is the single-letter code for any base indicating that the sequence was ambiguous in this region. These will also be trimmed below. About how many good bases are present in this sequence? 7. Look for the cloning site at about base #100; these
clones were partial
digests using Tsp509I, a 4-base cutter with ends compatible
with EcoRI. In some, but not all, cases, the EcoRI
site is retained
(GAATTC). Before proceding to the next section, close the window
containing the sequence. B. Editing sequences Now you are ready to do some
editing. We recommend that the students first learn how this is
done by looking for the cloning sites on both ends
of some sequences and then trimming them manually. Later, when
lots
of sequences are being processed they will really appreciate the
automated process...
8. Under Select, choose Select All, then Sequence, then Trim Vector.... This will pull up a dialog box. 9. Select Choose Insertion Site Now, then click the first UseVecbase File button (to trim one end of each sequence). Choose the workshop CD then select 3 Vectors from VecBase. 10. Choose BlueKSm then click the Select button. 11. Click and highlight the EcoRI site and hit OK. 12. To trim the other end of each sequence, repeat this procedure by clicking the Use Vecbase file in the second window, select BlueKSm then the EcoRI site, but this time press Prime Other Strand then OK. This permits trimming from both directions in the vector. Close out of the window. 13. Pull down Sequence and select Trim Vector... Note: you should see scissors illustrating cutting from each end of two of the three sequences (only those sequences with detectable vector appear in the box). Select Trim Checked Items and proceed in spite of the caution box. Note the change in length of each sequence, which is also reflected in the original window. Close the window and repeat this process until you see a dialog box indicating that no vector contamination was found. 14. Once that is finished close the window and under the sequence menu click Trim Ends in order to remove the ambiguous bases that are present at the end of some sequences (highlighted in red). Now click Trim Checked Items then close the window. C. Assembly of the sequence into one, due to overlaps. Look at the assembly
parameters. Using this dialog box, you can change the stringency
of matches and work with the data in different ways. We will use
the default, which is “Dirty Data” with 85% minimum match and 20 bases
minimum overlap. Now close that window.
15. Click the Assemble Automatically button on the main window. This will begin assembling forward and backwards strands to form contigs. The results are shown in a box which you can close by clicking Close. One contig should be formed and zero fragments should remain. 16. Open the contig by double-clicking on it. The resulting box should look like the single sequence you were viewing earlier, except that it has the single sequences above the contig. You can look at the bases, or an overview, and at the chromatogram from this box. 17. Choose the bases option and open the box wider. Note that the consensus sequence is at the bottom. Scroll to the right to see a region of overlapping sequences, click on the consensus sequence line where there are overlapping sequences (scroll to the right), then choose Show Chromatogram. This will allow you to view the raw data to resolve sequence ambiguities. 18. Scroll along the sequence until you find an ambiguous base. Click on the ambiguous base in the consensus sequence, and the same base will be highlighted in each chromatogram. You can choose to change the consensus line based on the chromatograms. Where 2 of the 3 bases are the same the consensus will likely be correct but where there are only two sequences and they differ an ambiguity code will occur in the consensus line. Through this process students should easily learn the value of having the same region sequenced many times to increase the level of confidence in the sequence... 19. Go back to the Overview and inspect the clones. How much overlap is there in this contig? Are both strands sequenced multiple times in both directions? What would it take to get “8-fold coverage” as advertised in many sequencing projects? D. Analysis of the sequence Because we are using a demo version of Sequencher you cannot copy and paste your edited sequence, but we have provided one for you to use. 20. Open the link called Contig.txt and copy the sequence into the computer's clipboard. This is an edited version, something like you would have come up with in the previous section if you had continued working. 21. Next, you will see one way to predict the location of genes within genomic sequence. Go to the GenScan site: http://genes.mit.edu/GENSCAN.html This is a program for analyzing eukaryotic DNA but for this exercise we will use it to look for a gene in this phage DNA. 22. Paste the consensus sequence into the box. Leave all other choices in the default setting, even though some don’t make sense! Perform the analysis. The results should show a single “exon” beginning at 737 and ending at 157. Copy the predicted protein sequence into the clipboard. 23. Now you can see if this putative protein matches anything in the sequence database. Go to the NCBI Blast site (you can access Blast from a number of sites) http://www.ncbi.nlm.nih.gov/BLAST/. Choose blastP (the first link under Protein), paste the sequence into the search box and press the BLAST! button to run the program. In the next window press the Format! button and wait for the results to appear. 24. Move the mouse over the colored bars to find their identification and various stats. A pink bar represents a sequence with a high level of identity over a long stretch of amino acids and is likely to be a homologous sequence. 25. If you click on the bar it will take you to the sequence alignment further down the page. 26. Note the sorts of proteins that are homologous to the sequenced gene. This helps us know that we cloned some DNA from a phage! |