![]() |
Genomics BIO 427 |
Instructors: Louise Temple and Jon Monroe JMU Department of Biology |
|
Copyright Jon Monroe and
Louise Temple,
1/24/06
Exercise
2. Repetitive DNA and Sequencing Genomes
Sequencing entire genomes is accomplished by either or both of two strategies: whole-genome shotgun sequencing and clone-by-clone sequencing. In the former, the entire genome is sheared into 0.5-20 kb fragments which are then cloned into vectors for end-sequencing. In the latter, the genome is initially broken into ~100 kb fragments which are first mapped onto the genome and then sequenced. In both approaches overlapping regions of the individual sequences are assembled into contiguous sequences and then gaps are filled and ambiguities resolved by resequencing subclones or PCR products in a process called finishing. Any sequencing project will encounter various problems that make assembly or finishing difficult in certain regions of the genome. For eukaryotic genomes one significant problem is the existence of repetitive DNA. The problem is simple. If two or more sequences from randomly isolated clones are identical, one must determine if they came from one locus or several different repetitive loci. In this exercise you will explore several types of repetitive DNA in genomes. Make use of the NCBI Genome Glossary if terms are unfamiliar. |
|||||||||||||||||||||||
A. What types of repetitive DNA are
found in the human genome? Repetitive DNA
makes up a very large part of big eukaryotic genomes. One simple
way to see it is pictured on the right. While 25% of the human
genome is composed of genes (exons + introns), nearly all of that DNA
is introns that are spliced out of mRNAs. Of the remaining DNA
most is repetitive; 45% of the genome is comprised of transposons and
8% contains simple repeats or large duplications.Repetitive DNA is also characterized by size and abundance. There are four types of transposable elements comprising almost half of the human genome. These range in size from about 300 bp to over 10 kb and are repeated hundreds of thousands of times throughout the genome (data from Lewin B., Essential Genes, 2006).
Many of these elements are remnants of virus-like sequences that once hopped around our genome. Of these four groups of repetitive sequences, all but the SINEs contain functional sequence encoding genes such as transposase that are responsible for this hopping behavior. Most have suffered deletions damaging the genes required for transposition so they are no longer mobile. The most abundant SINES family are the Alu repeats, with over 1 million copies comprising 10% of the genome. Alu repeats are named for the restriction enzyme that they are cut by, Alu1, that is used in their identification. B. Exploring the human genome for Alu repeats, LINES and SINES. First search for some normal, functional genes to see how abundant they are. Go the the main Entrez search page then click on the Map Viewer section of the top navigation bar. To search just the human genome pull down the Select Group or Genome bar to Homo sapiens, and then type "amylase" in the box to the right and press Go. You will then see a map of all of the human chromosomes with several red marks indicating the locations of amylase genes. 1. How many amylase genes are
there and on which chromosome are they clustered? Notice that
each gene is listed twice, one from the reference sequences and one
from the Celera sequences.
On the same page search for "hemoglobin". 2. On how many chromosomes
are hemoglobin genes found?
Now search for "Alu" and watch for the red marks. Notice that just under the map it indicates that only the first 1500 hits out of 1,146,042 found are shown! 3. Is there a chromosome
without Alu repeats?
Below the map is a long table with lines reading: Alu 7SLRNA REPEAT Repeats Click on the first Repeat link to see a map of the region in which this Alu repeat is located. 4. On which arm (long or
short) of which chromosome is located?
(Use the chromosome numbers across the top and the Ideogram map on the left.) This default view is the highest resolution possible in this viewer. The vertical line represents only about 700 bp. Any other identified regions are listed to the right. Your Alu element is in the middle. To step back and see a larger view of this chromosome use the zoom box on the left. Start with the smallest box at the bottom. As you mouse over the bottom bar it will indicate:
5. What other types of
repetitive DNA are found in this region?
Below the Alu repeat are two elements labelled (TA)n Simple_repeat. Follow the line from the lower TA(n) repeat to the gray box on the chromosome with your mouse, then click once to pop up a more detailed Map Viewer zoom box. Click on Show Sequence at the bottom. 6. Describe this sequence element.
Look at the (TG)n Simple element above the Alu repeat. 7. Describe this sequence
element.
These two sequences are quite short and are likely to be found within a single sequence read of ~400 bp, but longer repeats will cause problems in the assembly phase of genome sequencing. C. Repetitive Elements in Bacterial Genomes. Repeated sequences including transposons, can wreck havoc in a genome. We don’t really know the effect of the high number of repeats in the human genome. But we can look at a bacterial genome to see some dramatic effects of a number of so-called “Insertion Sequence” elements. Below are a table, a figure, and two paragraphs from a paper comparing the chromosomal sequences of three highly related species of Bordetella, which are pathogens of mammals. In the figure, you see the chromosomes shown as horizontal lines, with red bands between them where there is a high degree of nucleotide-nucleotide similarity (over 85% identical). Between the bottom two, you see that most of the genome is organized identically, although certain regions are rearranged and there is some difference in the overall size. However, comparing the top line (B. pertussis, the human pathogen, and B. bronchiseptica, the dog and pig pathogen), you can see that the B. pertussis is quite different. Read the two paragraphs about these observations and look at the table, as well. (The entire article is posted on Blackboard and linked below.) 8. In the table, how many “pseudogenes”
are present in each of the three genomes?
9. How many IS elements are present in each of the three genomes? 10. What is the mechanism by which the authors believe the differences between the top two genomes were derived? 11. It turns out that B. pertussis, in contrast to the other two Bordetellae species in this paper and others not shown here, exists in a very narrow ecological niche: the human ciliated tracheal epithelium. Based on what you see and read here, can you suggest a relationship between this observation and the presence of the numerous IS elements?
Go back to the Map Viewer page, find the genome you will work with all semester and start exploring! |