|
|
||||||||
|
T-DNA as an Insertional Mutagen in ArabidopsisPatrick J. Krysan1,a, Jeffery C. Young1,2,a, and Michael R. Sussmanaa Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, Wisconsin 53706 Correspondence to: Michael R. Sussman, msussman{at}facstaff.wisc.edu (E-mail), 608-262-6748 (fax)
Forward genetics begins with a mutant phenotype and asks the question "What is the genotype?" that is, what is the sequence of the mutant gene causing the altered phenotype? Reverse genetics begins with a mutant gene sequence and asks the question "What is the resulting change in phenotype?" These two approaches are fundamentally different, and whereas forward genetics has been in operation for more than a century, the recent avalanche of complete genome sequences has only now created the opportunity for pursuing reverse genetics in an exhaustive and complete manner.
Gene knockouts, or null mutations, are important because they provide a direct route to determining the function of a gene product in situ. Most other approaches to gene function are correlative and do not necessarily prove a causal relationship between gene sequence and function. For example, DNA chips provide an exciting means to discover conditions under which gene expression is regulated on a genomewide scale (
There are many ways to implement targeted mutagenesis so as to compromise specific genes. In mice, knockout mutations are now routinely obtained by promoting the homologous recombination of null gene constructs with the genomic wild-type sequence in embryonic stem cells. Provided that the given mutation is not embryonic lethal, "knockout mice" can then be developed in utero by injecting such stem cells into blastocysts (
Insertional mutagenesis is an alternative means of disrupting gene function and is based on the insertion of foreign DNA into the gene of interest. In Arabidopsis, this involves the use of either transposable elements (see
Polymerase chain reaction (PCR) methods have been developed that allow one to easily isolate individual plants that carry a particular T-DNA mutation of interest (
Several improvements in Agrobacterium-mediated transformation techniques have made T-DNA a viable method for approaching genomewide mutagenesis. The original root-explant method ( Saturation of the Arabidopsis genome with T-DNA insertions is an experimental goal that requires the actualization of specific quantitative considerations. To date, the quantitative exigencies associated with mutational saturation of the genome by T-DNA have not been fully satisfied. Nevertheless, we have recently established a population of 60,480 T-DNAtransformed lines as a significant step toward the production of genomewide mutations. Access to these lines is now available through the Arabidopsis Knockout Facility at the University of Wisconsin (http://www.biotech.wisc.edu/arabidopsis/default.htm). This facility will serve the research community by allowing users to screen the entire population of lines for the presence of a T-DNA insert within their gene of interest. The organization of this population of 60,480 lines, as well as the operation of the service facility, is described below.
The consequences of inserting a T-DNA element into the Arabidopsis genome depends on the nature of the T-DNA as well as the precise site of insertion. Figure 1 diagrams several of the possible outcomes of T-DNA insertion and proposes a standard nomenclature for describing them. The "knockon" mutations are a special case in which the T-DNA construct carries a constitutive promoter, such as the cauliflower mosaic virus 35S promoter, capable of driving expression of genes adjacent to the site of insertion (
Given an infinite number of T-DNAtransformed Arabidopsis lines, one should be able to identify a T-DNA insertion within every gene in the genome (with the exception of those genes required for the viability of both the male and female gametes). It is not practical, however, to generate a population large enough to ensure that every single gene has been mutated. It is therefore important to perform some calculations to estimate how many T-DNAtransformed lines are realistically necessary and sufficient. Three variables determine the probability that a T-DNA insert will be found within a given gene: the size of the gene, the size of the genome, and the number of T-DNA inserts distributed among the population. This relationship is described by the formula shown in the legend of Figure 2A. Of the three independent variables, the only one that is experimentally controllable is the total number of T-DNA inserts implemented within the population.
Figure 2A also shows that the number of T-DNA inserts needed to approach saturation is highly dependent on the length of the gene of interest. For example, a 5-kb gene requires 110,000 T-DNA inserts to achieve a 99% probability of being mutated, whereas a 1-kb gene correspondingly necessitates 550,000 T-DNA inserts. It should also be noted that the slope of the curve in Figure 2A flattens out as the probability approaches 100%. Thus, the experimentalist must at some point face the likelihood of diminishing returns when investing time to create additional T-DNA lines. Because the size of the gene of interest determines the probability of its mutation by T-DNA insertion, we were interested in determining average gene size in Arabidopsis. For this calculation, we defined gene as a genomic DNA sequence, including introns and exons, from which a protein is specified. Sequences upstream and downstream of the sequence flanked by the start and stop codons were not included in our definition. Using this definition of the Arabidopsis gene, we next estimated the size of the productive target region, that is, the portion of the gene within which a T-DNA insertion leads to a null allele. Because T-DNA insertions directly upstream of the start codon would likely lead to null alleles, our omission of upstream regions from our definition of the Arabidopsis gene may result in an underestimate of the actual target size. At the same time, however, it should be considered that insertions at the very end of the coding region may not lead to null alleles; thus, the inclusion of this region within our definition of gene could incur a slight overestimate of target size. In this way, we chose to offset a potential overestimate of target size with a potential underestimate by excluding upstream regions from our definition of gene.
Using published sequence data ( Given the median gene length of 2.1 kb determined above, one would require ~280,000 T-DNA inserts to have a 99% chance of mutating a particular gene; a 95% chance would require 180,000 inserts. These numbers provide a framework for determining how many T-DNAtransformed lines need to be created to have a good chance of finding a mutation in the large majority of genes in the Arabidopsis genome. To effectively screen such large populations, efficient and robust protocols must be employed.
Pool Size
In early experiments, DNA was extracted from groups of 10 lines and then combined to make fifty-three pools, each pool representing DNA from 100 lines ( To further probe the upper size limit on DNA pools, we performed a PCR experiment using various amounts of template DNA derived from a pool of 225 independently transformed lines. These PCRs used the T-DNA left border primer and one gene-specific primer to detect a T-DNA insert known to be present in this pool of 225. As shown in Table 1, three different amounts of template DNA were tested, using sixteen replicates of each PCR to determine the reproducibility with which the T-DNA insert could be detected. In this analysis, it was concluded that ~208 copies of the T-DNA insert are required in a complex pool of DNA to ensure that the insertion is reliably detected. It was also found that PCR efficiency is attenuated when >125 ng of pooled DNA from these T-DNA lines is used per PCR. These limitations therefore set the maximum useful pool size under our conditions at ~2350 lines per pool.
Pool Architecture
We have recently organized a population of 60,480 T-DNA transformed lines by using the simple strategy outlined in Figure 3A, which had proven successful on smaller populations (
Seed from each of the 270 unique pools of 225 was then germinated in a liquid culture, and genomic DNA was extracted from the resulting seedling pools. This work resulted in the generation of 270 separate DNA samples, with each DNA sample representing 225 independently transformed plants. Finally, DNA superpools were formed in which each superpool contained the DNA extracted from nine separate pools of 225. In this manner, 30 superpools were created, with each DNA superpool representing 2025 (225 x 9) independently transformed lines. The entire population of 60,480 transformed plants is represented within these 30 DNA superpools.
This ordered population of 60,480 T-DNA insertion lines can then be exhaustively screened with 120 PCRs. However, we generally limit our initial screens to 60 reactions by using only the T-DNA left border primer. Preliminary results indicate that the T-DNA left border is detected two to three times more often than the right border in this population. A predominance of intact left border sequences has been previously documented (
T-DNA Border Primers
Gene-Specific Primer Design
The identification of knockout mutants is the first step toward describing the function of a gene. After the isolation of a mutant line, plants homozygous for the mutation must be identified, outcrossed, and analyzed to ensure that only one T-DNA insertion is present. With a confirmed mutant in hand, the next step is to determine the consequences of the mutation on growth and development relative to the wild type. However, it has become apparent that many knockout mutants have no readily identifiable phenotype. For example, of the 17 mutants described in
Functional redundancy among the members of a gene family is a likely reason for the frequently observed lack of an identifiable phenotype associated with knockout mutations ( First, double-mutant lines are created and, when viable, crossed to lines homozygous for a mutation of a third member of a given gene family. Because the T-DNA insert associated with each mutation serves as a marker, PCR genotyping of complex mutant backgrounds is possible. In this cumulative manner, any number of the members in a given gene family can theoretically be knocked out within a single line, provided the genes are not too closely linked on the chromosome. Once a variety of multiple-mutant lines is available, the lines can in turn be crossed with each other to obtain complex segregating populations. At any point, the genotype for seedlings with interesting phenotypes in these segregating populations can be determined using PCR.
Another possible reason for the lack of observable phenotypes is that individual gene family members may have evolved to function only under specific physiological conditions. Thus, unless the mutant plant is placed under a condition in which the target gene is required, no phenotype is observed (
The goal of reverse genetics is the identification of a phenotype that is caused by mutation of a particular gene. Once such a phenotype has been observed, several steps must be taken to prove that the phenotypic characteristic is indeed controlled by the gene of interest. The first step is to follow the segregation of the T-DNA over multiple generations and score the corresponding phenotypes. One can easily determine the precise genotype of large numbers of individual plants by using the T-DNA insertion as a PCR marker for the mutant locus. Such analysis, however, does not prove that the PCR-identified, T-DNAinduced mutation is responsible for the phenotype rather than a closely linked, unrelated mutation. To prove definitively that the insertional mutation causes the phenotype, one must either isolate additional mutant alleles for the locus or complement the mutation by introducing a wild-type copy of the gene into mutant plants by using transgenic technology. If additional mutant alleles are available, they will provide the quickest route to confirming or refuting the role played by the insertionally mutated gene in controlling the observed phenotype. If the same phenotype is found to be linked to the same T-DNA insertion in several independently transformed Arabidopsis plants, one could make a strong argument that the mutation is indeed causing the phenotype. One of the benefits of generating a large collection of T-DNAtransformed lines is that the probability of finding more than one T-DNA insert in a given gene is quite high. For instance, given a 95% chance of finding a single insert in a particular gene, one would have a 90% chance of finding two independent inserts in that same gene and an 86% chance of finding three alleles. Having access to a large population of T-DNAtransformed lines could thus supplant the labor-intensive process of transgenic complementation with the more efficient process of multiple allele analysis.
We have recently established a service facility at the University of Wisconsin that will provide the Arabidopsis research community with access to our population of 60,480 T-DNAtransformed lines. Detailed information about the operation of the facility can be found by visiting its Web site (http://www.biotech.wisc.edu/arabidopsis/default.htm). Using gene-specific primers provided by the user, our facility will perform PCRs that screen the entire population of 60,480 lines for the presence of a T-DNA insert within the gene of interest. The resulting PCRs will be mailed to the user, who will be responsible for analyzing the reactions by gel electrophoresis, DNA gel blotting, and DNA sequencing to determine if any knockouts of the user's gene are present in the population. If a positive result is obtained in the first round of PCR, the user can then request that a second round of PCR be performed by the facility, whereby the particular pools of 225 that carries the knockout of interest will be identified. Finally, the knockout facility will supply the user with seed from the 25 pools of nine that correspond to the pool of 225. The user will then perform DNA extractions and PCR to determine which pool of nine contains the user's mutant and will ultimately isolate the individual mutant plant.
The process of PCR screening for individual knockout mutations is an efficient and fruitful approach to reverse genetics. This strategy allows one to focus resources and energies on a small number of interesting genes. As the age of high-throughput genomics arrives, it is apparent that one could also pursue an alternative strategy. Rather than searching for mutations in particular genes, one could simply begin cataloging the locations of all of the T-DNA inserts present in the entire population ( The Arabidopsis Knockout Facility at the University of Wisconsin will soon begin a program of isolating and sequencing the DNA that flanks the T-DNA inserts present in its population of 60,480 lines. This strategy will allow us to characterize most, if not all, of the T-DNA inserts present in each pool of nine lines. A computer database will then be established in which all of the T-DNA flanking sequences will be stored, along with a notation to indicate the corresponding pool of nine lines. This database can then be searched for the presence of flanking sequences homologous to any gene of interest, and the corresponding pool of nine can be ordered directly from the stock center, eliminating the need for large-scale PCR screens.
With three-quarters of the Arabidopsis genome already sequenced and the expected completion of the entire genome within the next year, the era of reverse genetics should yield simple and direct routes for exploring gene function. In conjunction with other emerging genomic technologies, reverse genetic analysis will provide a solid foundation upon which to build a more complete understanding of the complex interactions among the thousands of different genes present in Arabidopsis.
1 These authors contributed equally to this work.
The authors thank Dr. Rick Amasino and his laboratory and Sandra Austin-Phillips for producing the T-DNAtagged lines; they also thank Heather Burch and Sarah Graham for growing tissue and extracting DNA. Thanks also to Pete Jester, Laura Katers, and Sean Monson for technical assistance. This work was supported by Grant No. DBI 9872638 from the National Science Foundation. Received August 16, 1999; accepted October 27, 1999.
Allen, M.J., Collick, A., and Jeffreys, A.J. (1994) Use of vectorette and subvectorette PCR to isolate transgene flanking DNA. PCR Methods Appl. 4:71-75[Medline]. Azpiroz-Leehan, R., and Feldmann, K.A. (1997) T-DNA insertion mutagenesis in Arabidopsis: Going back and forth. Trends Genet. 13:152-156[CrossRef][ISI][Medline]. Bechtold, N., and Pelletier, G. (1998) In planta Agrobacterium-mediated transformation of adult Arabidopsis thaliana plants by vacuum infiltration. Methods Mol. Biol. 82:259-266[Medline]. Bevan, M. et al. (1998) Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana.. Nature 391:485-488[CrossRef][Medline].
Bouchez, D., and Höfte, H. (1998) Functional genomics in plants. Plant Physiol. 118:725-732 Castle, L.A., Errampalli, D., Atherton, T.L., Franzmann, L.H., Yoon, E.S., and Meinke, D.W. (1993) Genetic and molecular characterization of embryonic mutants identified following seed transformation in Arabidopsis. Mol. Gen. Genet. 241:504-514[CrossRef][ISI][Medline]. Clough, S.J., and Bent, A.F. (1998) Floral dip: A simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana.. Plant J. 16:735-743[CrossRef][ISI][Medline].
DeRisi, J.L., Iyer, V.R., and Brown, P.O. (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680-686 Feldmann, K.A., and Marks, M.D. (1987) Agrobacterium mediated transformation of germinating seeds of Arabidopsis thaliana: A non-tissue culture approach. Mol. Gen. Genet. 208:1-9[CrossRef][ISI]. Hua, J., and Meyerowitz, E.M. (1998) Ethylene responses are negatively regulated by a receptor gene family in Arabidopsis thaliana.. Cell 94:261-271[CrossRef][ISI][Medline].
Hirsch, R.E., Lewis, B.D., Spalding, E.P., and Sussman, M.R. (1998) A role for AKT1 potassium channel in plant nutrition. Science 280:918-921 Kempin, S.A., Liljegren, S.J., Block, L.M., Rounsley, S.D., Lam, E., and Yanofsky, M.F. (1997) Inactivation of the Arabidopsis AGL5 MADS-box gene by homologous recombination. Nature 389:802-803[CrossRef][Medline].
Koller, B.H., Hagemann, L.J., Doetschman, T., Hagaman, J.R., Huang, S., Williams, P.J., First, N.L., Maeda, N., and Smithies, O. (1989) Germ-line transmission of a planned alteration made in a hypoxanthine phosphoribosyltransferase gene by homologous recombination in embryonic stem cells. Proc. Natl. Acad. Sci. USA 86:8927-8931
Krysan, P.J., Young, J.C., Tax, F., and Sussman, M.R. (1996) Identification of transferred DNA insertions within Arabidopsis genes involved in signal transduction and ion transport. Proc. Natl. Acad. Sci. USA 93:8145-8150 Laufs, P., Autran, D., and Traas, J. (1999) A chromosomal paracentric inversion associated with T-DNA integration in Arabidopsis. Plant J. 18:131-139[CrossRef][ISI][Medline]. Liu, Y.G., Mitsukawa, N., Oosumi, T., and Whittier, R.F. (1995) Efficient isolation and mapping of Arabidopsis thaliana T-DNA insert junctions by thermal asymmetric interlaced PCR. Plant J. 8:457-463[CrossRef][ISI][Medline].
Martienssen, R.A. (1998) Functional genomics: Probing plant gene function and expression with transposons. Proc. Natl. Acad. Sci. USA 95:2021-2026 McBride, K.E., and Summerfelt, K.R. (1990) Improved binary vectors for Agrobacterium mediated plant transformation. Plant Mol. Biol. 14:269-276[CrossRef][ISI][Medline]. McKinney, E.C., Ali, N., Traut, A., Feldmann, K.A., Belostotsky, D.A., McDowell, J.M., and Meagher, R.B. (1995) Sequence-based identification of T-DNA insertion mutations in Arabidopsis: Actin mutants act2-1 and act4-1. Plant J. 8:613-622[CrossRef][ISI][Medline].
Nacry, P., Camilleri, C., Courtial, B., Caboche, M., and Bouchez, D. (1998) Major chromosomal rearrangements induced by T-DNA transformation in Arabidopsis. Genetics 149:641-650
Parinov, S., Sevugan, M., Ye, D., Yang, W.-C., Kumaran, M., and Sundaresan, V. (1999) Analysis of Flanking Sequences from Dissociation Insertion Lines: A database for reverse genetics in Arabidopsis. Plant Cell 11:2263-2270 Singh-Gasson, S., Green, R.D., Yue, Y., Nelson, C., Blattner, F., Sussman, M.R., and Cerrina, F. (1999) Maskless fabrication of light-directed oligonucleotide microarrays using a digital micromirror array. Nat. Biotechnol. 17:974-978[CrossRef][ISI][Medline].
Valvekens, D., Van Montagu, M., and Van Lijsebettens, M. (1988) Agrobacterium tumefaciens mediated transformation of Arabidopsis thaliana root explants by using kanamycin selection. Proc. Natl. Acad. Sci. USA 85:5536-5540 Wilson, K., Long, D., Swinburne, J., and Coupland, G. (1996) A Dissociation insertion causes a semidominant mutation that increases expression of TINY, an Arabidopsis gene related to APETALA2.. Plant Cell 8:659-671[Abstract].
Wisman, E., Hartmann, U., Sagasser, M., Baumann, E., Palme, K., Hahlbrock, K., Saedler, H., and Weisshaar, B. (1998) Knock-out mutants from an En-1 mutagenized Arabidopsis thaliana population generate phenylpropanoid biosynthesis phenotypes. Proc. Natl. Acad. Sci. USA 95:12432-12437 Wodicka, L., Dong, H., Mittmann, M., Ho, M.H., and Lockhart, D.J. (1997) Genome-wide expression monitoring in Saccharomyces cerevisiae.. Nat. Biotechnol. 15:1359-1367[CrossRef][ISI][Medline].
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||