Fanconi Anemia Genetics . . .

Genetics Primer


Overview Resources
Learning Objectives
Genes as the unit of inheritance
Genes Reside on Chromosomes
Human Chromosomes
   Autosomal vs sex chromosomes
   Inheritance
      Haploid genome
      Locus and alleles
      Heterozygous, homozygous, hemizygous
Genes are made of DNA
DNA
   Four nucleotide bases A,T,C,G
   5' and 3' ends
   Base pairing
   Triplet codons
   The Genetic Code
   Six reading frames
   Coding and non-coding DNA
   The "average" human gene
Gene Expression
   Transcription
   Splicing
   Translation
Genetic Mutations
   Silent
   Missense
   Nonsense
   Readthrough
   Deletions
   Frameshift
   Mutations in promoter regions
   Mutations at splice sites & cryptic splice sites
   Polymorphisms
   Chromsome Abnormalities
      Translocations
      Insertions
Conclusion

Overview Resources

On-line Sites

DNA From The Beginning - a website on the basics of DNA, genes, and heredity. Covers the basics, with some interesting animated content, plus a lot of historical photographs of the people and places involved in the discoveries.

Roche Genetics Education Program - this site offers a free CD ROM on genetics which also covers the basics.  Many of the graphics on this page come from this CD ROM (with permission). The CD ROM is very good and worth obtaining.

National Center for Biotechnology Information - a more advanced primer on the science behind genetics and molecular biology.

Genomics Lexicon - a website that has a glossary of genetic and molecular biology terms.  A great place to lookup terms I don't explain sufficiently!

Talking Glossary of Genetic Terms - The National Human Genome Research Institute (NHGRI) created the Talking Glossary of Genetic Terms to help people without scientific backgrounds understand the terms and concepts used in genetic research. Includes the term's pronunciation, audio information, images and additional links to related terms. Excellent resource.

Books

Either of the first two books will tell you everything you need to know about DNA, molecular biology, gene mutations, and genomes. They are not easy reading--they are college textbooks and require concentration to get through.

Genomes, 2nd. edition (2002), by T.A. Brown, published by John Wiley & Sons 2002, ISBN 0-471-25046-5.  Excellent college-level textbook on DNA, genomes, molecular biology. I've read this one myself.

Human Molecular Genetics, 2nd edition (December 15, 1999) by Tom Strachan and Andrew P. Read, published by Wiley Liss, ISBN 0471330612. Another college-level textbook on DNA, genomes, molecular biology.

Understanding DNA and Gene Cloning: A Guide for the Curious, 3rd edition (March 1996), by Karl Drlica, published by John Wiley & Sons, ISBN 047113774X. A good overview for the non-scientist, one level more detailed than the popular press; covers molecular genetics, DNA manipulation, and the basics of human genetics. Much less detailed than the first two books.

Learning Objectives

At the end of reading this (very long) web page, you will understand the following:

Genes as the Unit of Inheritance

In 1865, Gregor Mendel published the results of his breeding experiments with pea plants, which were carried out in the monastery gardens at Brno, a small European city in what is now the Czech Republic. Mendel concluded that each pea plant possesses two alleles for each gene, but displays only one phenotype (an observable physical characteristic). An allele is one of a different number of forms of a gene. Each person inherits two alleles for each gene, one from each parent; these alleles may be the same or different from on another. The display of a single phenotype is easy to understand if the plant is pure-breeding, or homozygous, for a particular characteristic, as it then possesses two identical alleles for the characteristic.

If, however, you take two pure-breeding plants with different phenotypes (say, a white flower and a violet flower) and cross-breed them, all the progeny display the same phenotype (say, they are all violet).  The progeny must be heterozygous, meaning that they have two different alleles (one from each parent), so why do they display one phenotype? Mendel postulated that the phenotype of one allele (say, violet) overrides the effects of the other allele (white). He therefore described the phenotype expressed in the progeny (violet flowers) as being dominant over the second, recessive phenotype (white flowers). This simple dominant-recessive rule was correct for the pairs of alleles studied by Mendel, but today it is recognized that there are more complicated situations that he did not encounter.  There can be incomplete dominance (the heterozygous phenotype is intermediate between the two homozygous forms)  and codominance (when both phenotypes are detectable in the heterozygote).

Mendel also established two Laws of Genetics. The First law states that alleles segregate randomly; that is, if the parent's alleles are A and a, then the progeny has the same chance of inheriting A as it has of inheriting a. The Second Law is that pairs of alleles segregate independently, which means that the inheritance of alleles of gene A is independent of alleles of gene B. Because of these laws, the outcomes of genetic crosses are predictable.

Genes Reside on Chromosomes

In 1903, WS Sutton realized that the inheritance pattern of genes paralleled the behavior of chromosomes during cell behavior. This led to the proposal that genes are located in chromosomes, and by the 1930s it was universally accepted that this was correct.

Since there are many more genes than chromosomes, and since chromosomes are inherited as intact units, it stood to reason that alleles of some pairs of genes will be inherited together because they are on the same chromosome. This is the principle of genetic linkage and it was quickly shown to be correct. This is a refinement to Mendel's Second Law, as alleles of two genes will not inherit independently if they are both on the same chromosome.Crossing-over in Homologous Chromosomes

Two genes are said to be linked together if they are on the same chromosome. You would expect that two genes on one chromosome would exhibit complete linkage and always be inherited together; however, the actual case is that genes on the same chromosome exhibit only partial linkage: sometimes they are inherited together, sometimes they are not. This is due to crossing-over events during the development of gametes (egg and sperm). In crossing-over events, the arms of chromosomes break off and exchange. This is always done between the same chromosome, know as homologous  chromosomes--i.e., crossing-over events occur between copies of chromosome 1 and its copy 1',  chromosome 2 and it's copy 2', etc. If the crossing-over event occurs between two genes, they can be inherited independently in the offspring. This explains how partial linkage occurs. You would expect, and it is true, that genes are that closer together on a chromosome will be separated by crossovers less frequently than genes that are farther apart from one another. Study of this phenomenon can be used to construct maps of the relative position of genes on a chromosome.

The study of genetic linkage (linkage analysis) can be used to understand the inheritance of genes as well.  If you are interested in using PGD to select an embryo that is an exact HLA tissue match to an affected child, the inheritance of your HLA markers will be determined using linkage analysis.

Human ChromosomesKaryotype of the 23 Human Chromosomes

There are 23 human chromosomes. You can see from the karyotype at the right the relative size of each chromosome. Normally you will see pictures of chromosomes in their most compact form (as a right), which only occurs when a cell is undergoing mitosis (cell division).  Here is a diagram of a single chromosome:Chromosome banding and major structural features

The ends of each chromosome are capped with a structure called a telomere, which is a repeating base sequence that stabilizes the ends.

There are two "arms" to each chromosome, called p and q, which are defined by a central pinched section called the centromere. The centromere is where mitotic spindles attach to pull homologous chromosomes apart during mitosis.

The black and white banding pattern is due to a particular staining technique used to visualize the chromosomes. Depending on the staining technique, the dark bands can be rich in AT base pairs, and the white bands rich in GC base pairs.  Other stains can produce the exact opposite pattern.

As noted, this form of the chromosomes is highly compact and only occurs during cell division.  At other times in the cell cycle, the chromosomes present in the cell in less compact forms.

Autosomal vs sex chromosomes

There are 22 autosomal chromosomes--these are chromosomes that are not involved in determining a human's sex. Each person has 2 copies of each autosomal chromosome; one set is inherited from each parent. So everyone has a chromosome 1 and 1', 2 and 2', etc.

The sex chromosomes are called X and Y and determine a human's sex. Females have two X chromosomes.  Males have an X and a Y chromosome. 

So human individuals have 46 chromosomes in all: 2 sets of 22 autosomal chromosomes (total of 44) and either XX (female) or XY (male) chromosomes (total of 2). Add it together, you have 46 in all.

Inheritance

As mentioned, each person inherits one set of chromosomes from each parent. This is done via sexual reproduction where cells with only 23 chromosomes from each parent combine to form a new set--and a new individual--of 46 chromosomes.

Haploid genomeHaploid Genome

Normally each cell has 46 chromosomes. These are called diploid cells--they have 2 copies of the 23 chromosomes.

The sex cells, egg and sperm, are haploid cells--they only have 23 chromosomes.  The egg will always have an X sex chromosome. The sperm will either have an X or a Y (but not both) sex chromosome. These cells are created via a special cell division process called meiosis.

One thing we can tell immediately is that the father determines the sex of the child. A sperm carrying an X chromosome will produce a daughter, whereas a sperm carrying a Y chromosome will produce a son.

We also see how each parent contributes only 23 chromosomes to the child.

Locus and allelesLocus and Alleles

The locus is the position of a gene on a chromosome (see diagram at right). The diagram shows two homologous chromosomes, with Locus A pointing to the location of a gene on each chromosome--one from the mother, one from the father.

Note there are two genes--one on the paternally inherited chromosome, one on the maternally inherited chromosome. The different forms of a gene are called alleles. Each person has two alleles for each gene--one allele from each parent. Note in the diagram that the genetic sequence of the two alleles is different. Note that the gene will likely function the same, even though the genetic sequences are slightly different (although see the section on mutations, below).

Heterozygous, homozygous, hemizygous

A certain terminology is used to describe how each individual has inherited the alleles of specific genes.

An individual is said to be heterozygous for a particular gene if the two inherited alleles are different from each other, if even by a single base pair. This is the situation show in the diagram above.

An individual is said to be homozygous for a particular gene if the two inherited alleles are exactly the same.

An individual is said to be hemizygous for a particular gene if only one allele was inherited. This can happen if one allele is deleted from one of the chromosomes. This deletion is a form of genetic mutation (discussed below). 

Genes are made of DNA

So we know that genes are unit factors of inheritance--they control which traits appear in offspring. We also know that genes reside on chromosomes. But what are genes made of?DNA double helix

Genes are made of DNA! This is the lowest level of organization of genes. The next level below that concerns the molecular chemistry of the DNA itself.

Note that some viruses can have genomes that are composed of RNA.

DNA

DNA is short for deoxyribonucleic acid. DNA is a linear, unbranched polymer in which the subunits are four chemically distinct nucleotides that can be linked together in any order to form chains of hundreds, thousands, or even millions of units in length.

The structure of the molecule is a sugar and phosphate backbone (the blue helix bands in the picture at right), with the nucleotides linking across.

Four nucleotide bases A,T,C,G

There are four base molecules for nucleotides in human DNA: Adenine (A), Cytosine (C), Guanine (G), and Thymine (T).  When one of these bases are combined with a piece of the sugar-phosphate backbone, it is called a nucleotide.  Nucleotides can be stitched together to from chains of DNA.

RNA shares the bases of A, C, and G, but a different base, Uracil (U) replaces Thymine (T).

5' and 3' ends

The two ends of DNA are chemically distinct. At one end, there is an unreacted triphosphate group which is referred to as the 5' end. At the other end is an unreacted hydroxyl (OH) molecule which is referred to as the 3' end. This means that the DNA has a chemical direction, or polarity, expressed as either 5'®3' or 3'®5'. All natural synthesis is carried out in the 5'®3' direction. Normally you'll see DNA sequences printed with the 5' terminus on the left and the 3' terminus on the right--often the 5' and 3' ends are indicated explicitly.

Note that only one strand of the double-stranded DNA molecule will have its sequence printed. This is usually the "upper" strand, with 5' on the left and 3' on the right. Keep in mind that there is always a complementary strand, with the 3' end on the left and the 5' end on the right, that is not shown.

Base pairingDNA base pairing - A with T, C with G

To form the DNA molecule, bases pair together across the double helix structure. Adenine (A) always pairs with Thymine (T), and Cytosine (C) always pairs with Guanine (G). These are the only pairs possible.

Triplet codons

The genetic code is encoded by the sequence of base pairs in the DNA. The genetic code must have codes for all 20 amino acids. A two letter code would provide only 4² = 16 codewords--not enough for the entire genetic code. It was therefore recognized in the 1950s that a triplet genetic code, in which each codeword, or codon, compromises 3 nucleotides is required. This results in 4³ = 64 codons to encode the 20 amino acids. This means that some amino acids will be coded for by more than one codon, a feature called degeneracy.

The Genetic CodeThe Genetic Code

First, note that the Genetic Code is a code that relates RNA codons to amino acids. Protein synthesis proceeds from messenger RNA (mRNA) molecules, not DNA, so the code relates RNA codons to amino acids. So, you won't see "T" (thymine) in the genetic code--T is only in DNA. In place of T, you will see U (uracil). So if you are reading a genetic sequence (DNA) and want to look up which amino acids are specified, you first have to convert the T's in the sequence to U's.

The genetic code specifies which codons specify which amino acids for protein synthesis. As noted previously, the code is degenerate, so some amino acids will be coded for by more than one codon. In fact, only tryptophan (Trp) and methionine (Met) have a single codon; all others are coded by two, three, four, or six codons.  The code also has four punctuation codons, which indicate points within the mRNA where translation (the assembly of amino acids into proteins by ribosomes) of the nucleotide sequence should start and finish.

The initiation codon is usually 5'-AUG-3', which also specifies methionine (Met) (so most newly synthesized polypeptides start with methionine).  Sometimes 5'-GUG-3' or 5'-UUG-3' is used.

There are three termination codons, 5'-UAG-3', 5'-UAA-3', and 5'-UGA-3'. These indicate the point where translation should stop.

The genetic code is not universal. The code presented above holds for the vast majority of genes in the vast majority of organisms, but deviations are widespread. For example, human mitochondria have a slightly different genetic code, as do many bacteria, fungi, and plants.

Six reading frames

A gene's open reading frame (ORF) is the series of codons that specify the amino acid sequence of that gene's protein. The ORF begins with the initiation codon (usually ATG) and ends with a termination codon (TAA, TAG, TGA). Note this is referring to the DNA (note the presence of T, thymine).

One way to find genes, then, is to scan DNA sequences to look for initiation codons and termination codons. With bacteria, this is an effective technique. For higher organisms, some complicating factors make it less effective.

Note that to scan a DNA sequence for ORFs, you need to do it six times. This is because each DNA sequence has six reading frames: three in one direction, and three in the reverse direction of the complementary strand. This is illustrated in the diagram below:

A double-stranded DNA molecule has six reading frames

Note that in each case the sequence is read in the 5'®3' direction. Genes can be present on either strand of DNA.

Coding and non-coding DNA

It would be quite simple to find and analyze genes if they were all lined up, one after another, separated only by initiation (start) and termination (stop) codons. Some virii and bacteria, this is is often what is found: a very compact genome. The human genome, however, is much more complex.

Not all of human DNA codes for genes. Approximately 62% of the human genome consists of non-coding intergenic regions, the parts of the genome that lie between genes and have no known function. The bulk of the intergenic region is composed various kinds of genome-wide repeat sequences, which can be long or short, which are repetitive sequences of a particular motif are repeated.

Only 1.5% of the human genome codes for genes. Another 36% of the genome is related to genes, such as pseudogenes (copies of genes that have lost their function), gene fragments, introns (see below), and UTRs (untranslated regions before and after genes).Gene Structure showing Exons and Introns

The "average" human gene

A model of the "average" human gene is shown at right.  The key concept is that most genes are discontinuous, in that the information use to synthesis of the protein is split between exons (which code for the gene) and introns (non-coding sequences). The model gene at right has 3 exons, separated by two non-coding introns. Before exon 1 would be the initiation codon, and the termination codon would be after exon 3.

This is a key concept to remember. Most human genes are discontinuous, with an average of nine exons per gene (although some genes have many more than this, such as FANCA, which has 43). During gene expression, the introns will be removed via a process called splicing. Genetic disease is generally caused by mutations to the exons, since they code for genes, but as explained below, mutations in introns can also cause genetic disease. Introns were not discovered until 1977 (which earned the discoverers a Nobel Prize in Medicine in 1993).

Gene ExpressionGenes are expressed into Proteins

The simple model of gene expression is that DNA is transcribed into RNA, which is then translated into protein. A "DNA makes RNA makes protein" model. This model will suffice for this primer. However, you should know that the process is much more complicated than presented here.

Transcription

In the transcription process, a gene's DNA is transcribed into pre-mRNA, or pre-messenger-RNA. Messenger RNA is used by ribosomes as the instructions for assembling amino acids into a protein molecule. A this step, we have only a precursor to the final mRNA, hence it is called pre-mRNA.

In the transcription process, special molecules move along the DNA molecule and assemble an RNA molecule according to the DNA sequence. Transcription make an RNA copy of the entire gene, including the introns as well as the exons. The RNA molecule created is called pre-mRNA.

SplicingGene Transcription and Translation

The process of splicing removes the introns from the pre-mRNA and joins the exons together to create the mRNA molecule that eventually directs protein synthesis.

Splicing occurs in the nucleus and involves special molecular complexes called splicesomes.

Some genes can undergo alternative splicing. That is, a gene can be spliced together in different ways. For example, say you have a gene with 3 exons (numbered 1, 2, 3). One mRNA may be spliced together from exons 1 and 2; another mRNA may be spliced together from exons 1 and 3. The two mRNAs result in different proteins. Via the process of alternative splicing, one pre-mRNA can resulting in different proteins being synthesized.

After splicing, the mRNA is complete and is transported outside of the nucleus.

Translation

Translation is the process of assembling proteins based on the mRNA genetic code. Amino acids are assembled step-by-step by a ribosome according the mRNA sequence. During this process, other specialized RNA molecules, called transfer-RNA, or tRNA, bring amino acids matching the RNA sequence to the ribosome. The ribosome stitches the amino acids together one by one to form the protein molecule.

Genetic Mutations

A mutation is a small-scale change in the nucleotide sequence of a DNA molecule.  The accumulation of mutations over time causes changes to the genome. So the genome is not a fixed entity--it is dynamic, changing over time.  Some mutations are spontaneous errors in replication, whereas others arise from the reaction of a mutagen with DNA.

Mutations can have benign or catastrophic affects on the cell in which they occur. Mutation of a gene that produces an important protein may cause the cell to die.   However, if a gene is mutated in a way that causes little change to the protein, it may have no effect on the cell at all. Mutations that are not lethal have the potential to contribute to the evolution of the species, but for this to happen, they must be inherited during reproduction.

The sections below describe the different types of mutations that can occur to the genome.

SilentMutation_silent-01_03_03b.jpg (83081 bytes)

A silent mutation is one that has no effect on the functioning of the genome. This includes all mutations that occur in the non-coding intergenic DNA, and non-coding components of genes. This is actually 98.5% of the human genome.

Mutations in the coding regions of genes can also be silent. A single nucleotide can change (a point mutation), but the new codon specifies the same amino acid as the unmutated codon.  This type of change is called synonymous change, since the old and new codon code for the same amino acid.  This is possible because of the degeneracy of the genetic code, with 64 codons specifying only 20 amino acids. In the figure at right, a point mutation to GAA converts that codon to GAG--but both code for the amino acid Glu, so the mutation is silent.

Missense

A point mutation--the change of a single nucleotide--can also change a codon so that a different protein is specified, a non-synonymous change. This is called a missense mutation, since the wrong amino acid is specified.Mutation_missense-01_03_03a.jpg (91040 bytes)

The protein coded by the gene therefore has a change to a single amino acid. This often has no significant effect on the protein, as most can tolerate a few amino acid changes without their biological function changing.

On the other hand, sometimes a missense mutation does have a significant effect. Many Fanconi Anemia mutations are missense mutations.

Nonsense

A nonsense mutation is one that converts a codon that specifies an amino acid into a termination codon.  This results in a shortened protein, since the translation of the mRNA stops at this new termination codon and not the correct one which if further downstream.

The effect of a nonsense mutation depends on how much of the protein is lost.   Usually the effect is catastrophic and the protein is non-functional.Mutation_nonsense-01_03_03c.jpg (85486 bytes)

Readthrough

A readthrough mutation is one that converts a termination codon to one that codes for an amino acid. This will result in the protein being extended until another stop codon is encountered. Most proteins can tolerate short extensions without an effect on function; however, longer extensions may interfere with protein folding and result in reduced activity.

Deletions

A deletion mutation is where genetic material is deleted from a gene, or entire genes are deleted.  The figure at right shows the most extreme case--the deletion of a large section of a chromosome.

Deletions can be of an entire gene, part of a gene, a single codon (3 nucleotides), or a single nucleotide (known as a frameshift mutation, see section below).

Many mutations to the FANCA gene are small and large deletions.

FrameshiftLarge Deletion of a Chromosome

The deletion of a single nucleotide (A, T, C, or G) from a gene can have a particularly disrupting affect on the gene. This is because the deletion of one nucletide changes all of the codons downstream from the change. See the figure at right--the deletion of the "G" base changes the downstream coding such that instead of coding for ...Glu-Ala-Gly, the mutated gene now codes for ...Lys-His... downstream from the mutation. This usually has a significant affect on protein function.Frameshift Mutation disrupts the entire gene coding

Mutations in promotor regionsMutations in Promotor Regions

It's possible that a point, insertion, or deletion mutation occurs upstream of the gene itself. This can have the affect to disrupt gene expression--either leading to overexpression or supression.

Mutations at splice sites & cryptic splice sites

Mutations that occur at the boundries between introns and exons can lead to abberrant splicing. This could lead to an intron not being removed from the pre-mRNA, but it is more likely that a cryptic splice site would be used as an alternative. A cryptic splice site is a genetic sequence that resembles an authentic splice site and might be selected during aberrant splicing. It is also possible for a mutation (within an intron or exon) to create a new cryptic splice site that is preferred over a genuine splice site that is not itself mutated.

Abberrant splicing might delete part of the resulting protein, add a new section of amino acids, or result in a frameshift.

Several forms of the genetic disease ß-thalassemia are caused by mutations that lead to cryptic splice site selection.

Polymorphisms

Many of the mutation mechanisms listed above have led to genetic variability in the human population. The most common are point (missense) mutations, called single nucleotide polymorphisms (SNP). A polymorphism is a change to the reference gene that does not cause disease or significantly alter gene function.

Polymorphisms can be very useful for tracking inheritance of genes.  It's unlikely that two parents will have the exact same genetic sequence for any gene under study--they are likely to differ by one or more polymorphisms. Knowledge of these polymorphisms can be used to track from where the child inherited the two copies (alleles) of the gene he carries.

Chromsome Abnormalities

Some genetic abnormalities can happen at the level of the chromosome and affect many genes.

Translocations

A translocation is when two chromosomes swap pieces of their arms.Chromosome Translocation


Insertions

An insertion is when one portion of a chromosome is inserted into another. Note that in this case, genetic material is not swapped, it is justed moved to another chromosome.

mutation_chromosome_insertion.gif (19278 bytes)


Conclusion

So there you go. My thanks to you if you've made it this far. Let's review the learning objectives:

Got it?  Feel free to write me with any questions. I would say my primary objective of writing this section was so that people could understand the difference between exons and introns.


amgough_email_footer.gif (1177 bytes)
Last Updated: 08 Feb 2004