![]() |
Genomics BIO 427 |
Instructors: Louise Temple and Steve Cresawn JMU Department of Biology |
|
Copyright Jon Monroe and
Louise Temple,
1/24/06
Exercise
2. Repetitive DNA and Sequencing Genomes
Sequencing entire genomes is accomplished by either or both of two strategies: whole-genome shotgun sequencing and clone-by-clone sequencing. In the former, the entire genome is sheared into 0.5-20 kb fragments which are then cloned into vectors for end-sequencing. In the latter, the genome is initially broken into ~100 kb fragments which are first mapped onto the genome and then sequenced. In both approaches overlapping regions of the individual sequences are assembled into contiguous sequences and then gaps are filled and ambiguities resolved by resequencing subclones or PCR products in a process called finishing. Any sequencing project will encounter various problems that make assembly or finishing difficult in certain regions of the genome. For eukaryotic genomes one significant problem is the existence of repetitive DNA. The problem is simple. If two or more sequences from randomly isolated clones are identical, one must determine if they came from one locus or several different repetitive loci. In this exercise you will explore several types of repetitive DNA in genomes. Make use of the NCBI Genome Glossary if terms are unfamiliar. |
|||||||||||||||||||||
A. What types of repetitive DNA are
found in the human genome? Repetitive DNA
makes up a very large part of big eukaryotic genomes. One simple
way to see it is pictured on the right. While 25% of the human
genome is composed of genes (exons + introns), nearly all of that DNA
is introns that are spliced out of mRNAs. Of the remaining DNA
most is repetitive; 45% of the genome is comprised of transposons and
8% contains simple repeats or large duplications.Repetitive DNA is also characterized by size and abundance. There are four types of transposable elements comprising almost half of the human genome. These range in size from about 300 bp to over 10 kb and are repeated hundreds of thousands of times throughout the genome (data from Lewin B., Essential Genes, 2006).
Many of these elements are remnants of virus-like sequences that once hopped around our genome. Of these four groups of repetitive sequences, all but the SINEs contain functional sequence encoding genes such as transposase that are responsible for this hopping behavior. Most have suffered deletions damaging the genes required for transposition so they are no longer mobile. The most abundant SINES family are the Alu repeats, with over 1 million copies comprising 10% of the genome. Alu repeats are named for the restriction enzyme that they are cut by, Alu1, that is used in their identification. B. Exploring the human genome for Alu repeats, LINES and SINES. First search for some normal, functional genes to see how abundant they are. Go the the main Entrez search page then click on the Map Viewer section of the top navigation bar. To search just the human genome pull down the Select Group or Genome bar to Homo sapiens, and then type "amylase" in the box to the right and press Go. You will then see a map of all of the human chromosomes with several red marks indicating the locations of amylase genes.
Notice that each gene is listed twice, one from the reference sequences and one from the Celera sequences. On the same page search for "hemoglobin". Now search for "7sLRNA" and watch for the red marks. Notice that just under the map it indicates that only the first 57 hits out of 1772 found are shown! This is one type of Alu repeat. Below the map is a long table with lines reading: Alu 7SLRNA REPEAT Repeats Click on the second Repeat link to see a map of the region in which the first Alu repeat is located. This default view shows the entire chromosome. To step back and see a larger view of this chromosome use the zoom box on the left. Start with the smallest box at the bottom. As you mouse over the bottom bar it will indicate: show 1/10,000th of chromosome Now you will see a map of 19,200 bp with the Alu repeat at the very top. Below the highlighted repeat are some other repeated elements. Look for one called "Simple_repeat". Follow the line from the lower TA(n) repeat to the gray box on the chromosome with your mouse, then click once to pop up a more detailed Map Viewer zoom box. Click on Show Sequence at the bottom. Look at one of the "Low_complexity" regions. These two sequences are quite short and are likely to be found within a single sequence read of ~400 bp, but longer repeats and regions of low complexity will cause problems in the assembly phase of genome sequencing. C. Repetitive Elements in Bacterial Genomes. Repeated sequences including transposons, can wreck havoc in a genome. We don’t really know the effect of the high number of repeats in the human genome. But we can look at a bacterial genome to see some dramatic effects of a number of so-called “Insertion Sequence” elements. Below are a table, a figure, and two paragraphs from a paper comparing the chromosomal sequences of three highly related species of Bordetella, which are pathogens of mammals. In the figure, you see the chromosomes shown as horizontal lines, with red bands between them where there is a high degree of nucleotide-nucleotide similarity (over 85% identical). Between the bottom two, you see that most of the genome is organized identically, although certain regions are rearranged and there is some difference in the overall size. However, comparing the top line (B. pertussis, the human pathogen, and B. bronchiseptica, the dog and pig pathogen), you can see that the B. pertussis is quite different. Read the two paragraphs about these observations and look at the table, as well. (The entire article is posted on Blackboard and linked below.) 8. In the table, how many “pseudogenes”
are present in each of the three genomes?
9. How many IS elements are present in each of the three genomes? 10. What is the mechanism by which the authors believe the differences between the top two genomes were derived? 11. It turns out that B. pertussis, in contrast to the other two Bordetellae species in this paper and others not shown here, exists in a very narrow ecological niche: the human ciliated tracheal epithelium. Based on what you see and read here, can you suggest a relationship between this observation and the presence of the numerous IS elements?
Go back to the Map Viewer page and find some genomes you think are interesting! |