genome Genomics
BIO 427
Instructors:
Louise Temple and  Jon Monroe
JMU Department of Biology

Home

Copyright Jon Monroe and Louise Temple, 3/21/06

Exercise 7.   Introduction to Artemis



To be completed individually or in pairs.

From the desktop, start the Artemis program.  You will see a small rectangle in the center of the screen.  Choose “File” and open the data file marked “Phage1.con.text”. To see this file, you will need to choose “All Files” in the bottom section of the popup box.  In a few seconds, the Artemis screen will open with the DNA sequence below, the AA predictions either above or below, and the two panels with stop codons on top.  Look at the manipulations you can perform, by moving the blue bars back and forth, or up and down.  These take you horizontally through the sequence, or make the sequences more or less detailed.  Now go to the Menu and choose “File” and “Read an Entry”.  This time, choose the data file, “Phage1.prot4.txt”.  In a few second you will see that the genes are color coded over the top two panels and information about the genes is given in a lower panel that was empty before.  This is the genome of a Bordetella avium phage that has already been sequenced and annotated, to a certain extent.

If you scan through the sequence, you will see that there are some named genes, but many of the genes have no names, but just numbers. Click on the gene close to the left end named “Repressor”.  When it has a black box around it, choose “edit selected features” under the “edit” menu. A new screen will appear with some information.  Copy that information into a word document.  Moving to the right, find the gene called “terminase” (the larger one).  Copy the “edit” information into your text file. 

1.  Compare these two descriptions or annotations.  Can you figure out what each line means?  Take a guess for each line.  Which of these putative genes is named with the most confidence?  Why?

2.  For each of the putative genes, what is the size of the predicted protein?  Look not only in the “Edit” box, but also in the line above the panels.

3.  How many total bases are present in this dataset?  Approximately how many genes are designated (use the list below to make this easier).  How many are named?

Double click on the “repressor” gene.  This makes the DNA and amino acid sequence below align with what you are look at. 

4. What amino acid is the start codon? Note:  this gene is transcribed from right to left.

Make the view of the upper panels more distant (so you can see more in one screen). 

5.  Do you notice an general patterns about gene locations in the phage genome?  If so, describe it.

Go back to gene #1, in green, and highlight it.  Go to “write” in the menu and choose  “AA of selected feature”.  You will have a chance to name this document.  Open the document.

6.  What is the molecular weight of the predicted protein?

Copy the amino acid sequence and run it through BlastP.

7.  What are the top two matches and the corresponding E values?  What is a “conserved hypothetical protein”?  Are either of the matches to phage related proteins? 

Perhaps this gene should be named a “putative phage protein”.   Look back at the Artemis display and find the orange gene labeled exactly like that. 

8.  What is the E value of this gene encoding a phage related protein? 

The match you found for gene #1 has been entered in the database since the annotation of Ba1 was prepared.

9.  What does this tell you about the frequency with which one should check the databases, if one is trying to stay current on comparisons?

Go back the Artemis display.  Choose “graph” and “GC plot”.  Now you will see a graph with three colored lines that trace the codon usage throughout the sequence.  Look for the “helicase” gene. 

10.  Describe the three colored lines in the region of this predicted gene, relative to the black horizontal line. 

This is one method used to predict “real” genes.  Toggle the graph “Off”.

Use “Go to” to find “Navigator”.  Search for gene name “42c”.  Notice that the program takes you to that gene. 

11. There is a problem in this area—what is it?  Write the amino acid sequence for both 42c and 43c and perform a BlastP search.  Can you find any reason to choose one of these over the other as the “real” gene?

Go to “entries” and “remove an entry” and choose the Ba1.tab.txt file to remove; you should be back to the naked (unannotated) sequence.  Now go to “Create” and “Mark open reading frames”.  You will have a choice to make the minimum bases equal to 100.  Take the default. Double-click the first gene going forward from the top strand.

12. What amino acid is the first codon of the open reading frame?  Is this a normal amino acid for starting a protein? 

Go to “select” and “select all” and each of the CDS’s should be highlighted.  Go to “Edit” and choose “Trim to Met”.  Next you will get a message saying that some could not be trimmed—choose OK.  Now double click the first gene again.

13.  Now what is the first codon?  Why do you need to go through this process?  Should a similar procedure be carried out at the other end of the open reading frames?

Under “Edit” choose “automatically create gene names” and put BAV1 as the prefix.  Scroll to the right end of the sequence. 

14. How many ORFs were designated by this procedure?


Before you leave, please delete any files you have created!

Print out your answers, including your name(s) and hand in before you leave.