|
Copyright Jon Monroe and
Louise Temple, 3/21/06
Exercise 7. Introduction to Artemis
To be completed individually or in pairs.
From the desktop, start the Artemis program. You will see a small
rectangle in the center of the screen. Choose “File” and open the
data file marked “Phage1.con.text”. To see this file, you will need to
choose “All Files” in the bottom section of the popup box. In a
few seconds, the Artemis screen will open with the DNA sequence below,
the AA predictions either above or below, and the two panels with stop
codons on top. Look at the manipulations you can perform, by
moving the blue bars back and forth, or up and down. These take
you horizontally through the sequence, or make the sequences more or
less detailed. Now go to the Menu and choose “File” and “Read an
Entry”. This time, choose the data file,
“Phage1.prot4.txt”. In a few second you will see that the genes
are color coded over the top two panels and information about the genes
is given in a lower panel that was empty before. This is the
genome of a Bordetella avium
phage that has already been sequenced and annotated, to a certain
extent.
If you scan through the sequence, you will see that there are some
named genes, but many of the genes have no names, but just numbers.
Click on the gene close to the left end named “Repressor”. When
it has a black box around it, choose “edit selected features” under the
“edit” menu. A new screen will appear with some information. Copy
that information into a word document. Moving to the right, find
the gene called “terminase” (the larger one). Copy the “edit”
information into your text file.
1. Compare these two
descriptions or annotations. Can you figure out what each line
means? Take a guess for each line. Which of these putative
genes is named with the most confidence? Why?
2. For each of the putative
genes, what is the size of the predicted protein? Look not only
in the “Edit” box, but also in the line above the panels.
3. How many total bases are
present in this dataset? Approximately how many genes are
designated (use the list below to make this easier). How many are
named?
Double click on the “repressor” gene. This makes the DNA and
amino acid sequence below align with what you are look at.
4. What amino acid is the start
codon? Note: this gene is transcribed from right to left.
Make the view of the upper panels more distant (so you can see more in
one screen).
5. Do you notice an general
patterns about gene locations in the phage genome? If so,
describe it.
Go back to gene #1, in green, and highlight it. Go to “write” in
the menu and choose “AA of selected feature”. You will have
a chance to name this document. Open the document.
6. What is the molecular
weight of the predicted protein?
Copy the amino acid sequence and run it through BlastP.
7. What are the top two
matches and the corresponding E values? What is a “conserved
hypothetical protein”? Are either of the matches to phage related
proteins?
Perhaps this gene should be named a “putative phage
protein”. Look back at the Artemis display and find the
orange gene labeled exactly like that.
8. What is the E value of
this gene encoding a phage related protein?
The match you found for gene #1 has been entered in the database since
the annotation of Ba1 was
prepared.
9. What does this tell you
about the frequency with which one should check the databases, if one
is trying to stay current on comparisons?
Go back the Artemis display. Choose “graph” and “GC plot”.
Now you will see a graph with three colored lines that trace the codon
usage throughout the sequence. Look for the “helicase”
gene.
10. Describe the three
colored lines in the region of this predicted gene, relative to the
black horizontal line.
This is one method used to predict “real” genes. Toggle the graph
“Off”.
Use “Go to” to find “Navigator”. Search for gene name
“42c”. Notice that the program takes you to that gene.
11. There is a problem in this
area—what is it? Write the amino acid sequence for both 42c and
43c and perform a BlastP search. Can you find any reason to
choose one of these over the other as the “real” gene?
Go to “entries” and “remove an entry” and choose the Ba1.tab.txt file
to remove; you should be back to the naked (unannotated)
sequence. Now go to “Create” and “Mark open reading
frames”. You will have a choice to make the minimum bases equal
to 100. Take the default. Double-click the first gene going
forward from the top strand.
12. What amino acid is the first
codon of the open reading frame? Is this a normal amino acid for
starting a protein?
Go to “select” and “select all” and each of the CDS’s should be
highlighted. Go to “Edit” and choose “Trim to Met”. Next
you will get a message saying that some could not be trimmed—choose
OK. Now double click the first gene again.
13. Now what is the first
codon? Why do you need to go through this process? Should a
similar procedure be carried out at the other end of the open reading
frames?
Under “Edit” choose “automatically create gene names” and put BAV1 as
the prefix. Scroll to the right end of the sequence.
14. How many ORFs were designated
by this procedure?
Before you leave, please delete any files you have created!
Print out your answers, including your name(s) and hand in before you
leave.
|