Publications

2011

Wu X, Liu M, Downie B, Liang C, Ji G, Li QQ, Hunt AG. Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation.. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(30):12533–8. doi:10.1073/pnas.1019732108

Alternative polyadenylation (APA) has been shown to play an important role in gene expression regulation in animals and plants. However, the extent of sense and antisense APA at the genome level is not known. We developed a deep-sequencing protocol that queries the junctions of 3'UTR and poly(A) tails and confidently maps the poly(A) tags to the annotated genome. The results of this mapping show that 70% of Arabidopsis genes use more than one poly(A) site, excluding microheterogeneity. Analysis of the poly(A) tags reveal extensive APA in introns and coding sequences, results of which can significantly alter transcript sequences and their encoding proteins. Although the interplay of intron splicing and polyadenylation potentially defines poly(A) site uses in introns, the polyadenylation signals leading to the use of CDS protein-coding region poly(A) sites are distinct from the rest of the genome. Interestingly, a large number of poly(A) sites correspond to putative antisense transcripts that overlap with the promoter of the associated sense transcript, a mode previously demonstrated to regulate sense gene expression. Our results suggest that APA plays a far greater role in gene expression in plants than previously expected.

Zhao H, Zheng J, Li QQ. A novel plant in vitro assay system for pre-mRNA cleavage during 3’-end formation.. Plant physiology. 2011;157(3):1546–54. doi:10.1104/pp.111.179465

Messenger RNA (mRNA) maturation in eukaryotic cells requires the formation of the 3' end, which includes two tightly coupled steps: the committing cleavage reaction that requires both correct cis-element signals and cleavage complex formation, and the polyadenylation step that adds a polyadenosine [poly(A)] tract to the newly generated 3' end. An in vitro biochemical assay plays a critical role in studying this process. The lack of such an assay system in plants hampered the study of plant mRNA 3'-end formation for the last two decades. To address this, we have now established and characterized a plant in vitro cleavage assay system, in which nuclear protein extracts from Arabidopsis (Arabidopsis thaliana) suspension cell cultures can accurately cleave different pre-mRNAs at expected in vivo authenticated poly(A) sites. The specific activity is dependent on appropriate cis-elements on the substrate RNA. When complemented by yeast (Saccharomyces cerevisiae) poly(A) polymerase, about 150-nucleotide poly(A) tracts were added specifically to the newly cleaved 3' ends in a cooperative manner. The reconstituted polyadenylation reaction is indicative that authentic cleavage products were generated. Our results not only provide a novel plant pre-mRNA cleavage assay system, but also suggest a cross-kingdom functional complementation of yeast poly(A) polymerase in a plant system.

2010

Ji G, Wu X, Shen Y, Huang J, Li QQ. A classification-based prediction model of messenger RNA polyadenylation sites.. Journal of theoretical biology. 2010;265(3):287–96. doi:10.1016/j.jtbi.2010.05.015

Messenger RNA polyadenylation is one of the essential processing steps during eukaryotic gene expression. The site of polyadenylation [(poly(A) site] marks the end of a transcript, which is also the end of a gene. A computation program that is able to recognize poly(A) sites would not only prove useful for genome annotation in finding genes ends, but also for predicting alternative poly(A) sites. Features that define the poly(A) sites can now be extracted from the poly(A) site datasets to build such predictive models. Using methods, including K-gram pattern, Z-curve, position-specific scoring matrix and first-order inhomogeneous Markov sub-model, numerous features were generated and placed in an original feature space. To select the most useful features, attribute selection algorithms, such as information gain and entropy, were employed. A training model was then built based on the Bayesian network to determine a subset of the optimal features. Test models corresponding to the training models were built to predict poly(A) sites in Arabidopsis and rice. Thus, a prediction model, termed Poly(A) site classifier, or PAC, was constructed. The uniqueness of the model lies in its structure in that each sub-model can be replaced or expanded, while feature generation, selection and classification are all independent processes. Its modular design makes it easily adaptable to different species or datasets. The algorithm's high specificity and sensitivity were demonstrated by testing several datasets and, at the best combinations, they both reached 95%. The software package may be used for genome annotation and optimizing transgene structure.

2009

Xing D, Ni S, Kennedy MA, Li QQ. Identification of a plant-specific Zn2+-sensitive ribonuclease activity.. Planta. 2009;230(4):819–25. doi:10.1007/s00425-009-0986-3

Ribonucleases (RNases) play a variety of cellular and biological roles in all three domains of life. In an attempt to perform RNA immuno-precipitation assays of Arabidopsis proteins, we found an EDTA-dependent RNase activity from Arabidopsis suspension tissue cultures. Further investigations proved that the EDTA-dependent RNase activity was plant specific. Characterization of the RNase activity indicated that it was insensitive to low pH and high concentration of NaCl. In the process of isolating the activity with cation exchange chromatography, we found that the EDTA dependency of the activity was lost. This led us to speculate that some metal ions, which inhibited the RNase activity, may be removed during cation exchange chromatography so that the nuclease activity was released. The EDTA dependency of the activity could be due to the ability of the EDTA chelating those metal ions, mimicking the effect of the cation exchange chromatography. Indeed, Zn(2+) strongly inhibited the activity, and the inhibition could be released by EDTA based on both in-solution and in-gel assays. In-gel assays identified two RNase activity bands. Mass spectrometry assays of those activity bands revealed more than 20 proteins. However, none of them has an apparent known nuclease domain, suggesting that one or more of those proteins might possess a currently uncharacterized nuclease domain. Our results may shed light on RNA metabolism in plants by introducing a novel plant-specific RNase activity.

Xing D, Li QQ. Alternative polyadenylation: a mechanism maximizing transcriptome diversity in higher eukaryotes.. Plant signaling & behavior. 2009;4(5):440–2. doi:10.1104/pp.108.129817

Based on comparative genome analyses, the increases in protein-coding gene number could not account for the increases of morphological and behavioral complexity of higher eukaryotes. Transcriptional regulations, alternative splicing and the involvement of non-coding RNA in gene expression regulations have been credited for the drastic increase of transcriptome complexity. However, an emerging theme of another mechanism that contributes to the formation of alternative mRNA 3'-ends is alternative polyadenylation (APA). First, recent studies indicated that APA is a wide spread phenomenon across the transcriptomes of higher eukaryotes and being regulated by developmental and environmental cues. Secondly, our characterization of the Arabidopsis polyadenylation factors suggested that plant polyadenylation has also evolved to regulate the expression of specific genes by means of APA and therefore the specific biological functions. Finally, Phylogenetic analyses of eukaryotic polyadenylation factors from several organisms revealed that the number of polyadenylation factors tends to increase in higher eukaryotes, which provides the potential for their functional differentiation in regulating gene expression through APA. Based on above evidence, we, thus, hypothesize that APA, serving as an additional mechanism, contributes to the complexity of higher eukaryotes.

Zhao H, Xing D, Li QQ. Unique features of plant cleavage and polyadenylation specificity factor revealed by proteomic studies.. Plant physiology. 2009;151(3):1546–56. doi:10.1104/pp.109.142729

Cleavage and polyadenylation of precursor mRNA is an essential process for mRNA maturation. Among the 15 to 20 protein factors required for this process, a subgroup of proteins is needed for both cleavage and polyadenylation in plants and animals. This subgroup of proteins is known as the cleavage and polyadenylation specificity factor (CPSF). To explore the in vivo structural features of plant CPSF, we used tandem affinity purification methods to isolate the interacting protein complexes for each component of the CPSF subunits using Arabidopsis (Arabidopsis thaliana ecotype Landsberg erecta) suspension culture cells. The proteins in these complexes were identified by mass spectrometry and western immunoblots. By compiling the in vivo interaction data from tandem affinity purification tagging as well as other available yeast two-hybrid data, we propose an in vivo plant CPSF model in which the Arabidopsis CPSF possesses AtCPSF30, AtCPSF73-I, AtCPSF73-II, AtCPSF100, AtCPSF160, AtFY, and AtFIPS5. Among them, AtCPSF100 serves as a core with which all other factors, except AtFIPS5, are associated. These results show that plant CPSF possesses distinct features, such as AtCPSF73-II and AtFY, while sharing other ortholog components with its yeast and mammalian counterparts. Interestingly, these two unique plant CPSF components have been associated with embryo development and flowering time controls, both of which involve plant-specific biological processes.

2008

BACKGROUND: In plant functional genomic studies, gene cloning into binary vectors for plant transformation is a routine procedure. Traditionally, gene cloning has relied on restriction enzyme digestion and ligation. In recent years, however, Gateway(R) cloning technology (Invitrogen Co.) has developed a fast and reliable alternative cloning methodology which uses a phage recombination strategy. While many Gateway- compatible vectors are available, we frequently encounter problems in which antibiotic resistance genes for bacterial selection are the same between recombinant vectors. Under these conditions, it is difficult, if not sometimes impossible, to use antibiotic resistance in selecting the desired transformants. We have, therefore, developed a practical procedure to solve this problem.

RESULTS: An integrated protocol for cloning genes of interest from PCR to Agrobacterium transformants via the Gateway(R) System was developed. The protocol takes advantage of unique characteristics of the replication origins of plasmids used and eliminates the necessity for restriction enzyme digestion in plasmid selections.

CONCLUSION: The protocol presented here is a streamlined procedure for fast and reliable cloning of genes of interest from PCR to Agrobacterium via the Gateway(R) System. This protocol overcomes a key problem in which two recombinant vectors carry the same antibiotic selection marker. In addition, the protocol could be adapted for high-throughput applications.

Liang C, Liu Y, Liu L, Davis AC, Shen Y, Li QQ. Expressed sequence tags with cDNA termini: previously overlooked resources for gene annotation and transcriptome exploration in Chlamydomonas reinhardtii.. Genetics. 2008;179(1):83–93. doi:10.1534/genetics.107.085605

Many of Chlamydomonas reinhardtii expressed sequence tags (ESTs) in GenBank dbEST and community EST assemblies were either over- or undertrimmed in terms of their cDNA termini, which are defined as the diagnostic sequence elements that delineate 3'/5' ends of mRNA transcripts. Overtrimming represents a loss of directional, positional, and structural information of transcript ends whereas undertrimming causes unclean spurious sequences retained in ESTs that exert deleterious impacts on downstream EST-based applications. We examined 309,278 raw EST sequencing trace files of C. reinhardtii and found that only 57% had cDNA termini that matched the expected structures specified in their cDNA library constructions while satisfying our minimum length requirement for their final clean sequences. Using GMAP, 156,963 individual ESTs were mapped to the genome successfully, with their in silico-verified cDNA termini anchored to the genome. Our data analysis suggested strong macro- and microheterogeneity of 3'/5' end positions of individual transcripts derived from the same genes in C. reinhardtii. This work annotating differential ends of individual transcripts in the draft genome presents the research community with a new stream of data that will facilitate accurate determination of gene structures, genome annotation, and exploration of the transcriptome and mRNA metabolism in C. reinhardtii.

Shen Y, Liu Y, Liu L, Liang C, Li QQ. Unique features of nuclear mRNA poly(A) signals and alternative polyadenylation in Chlamydomonas reinhardtii.. Genetics. 2008;179(1):167–76. doi:10.1534/genetics.108.088971

To understand nuclear mRNA polyadenylation mechanisms in the model alga Chlamydomonas reinhardtii, we generated a data set of 16,952 in silico-verified poly(A) sites from EST sequencing traces based on Chlamydomonas Genome Assembly v.3.1. Analysis of this data set revealed a unique and complex polyadenylation signal profile that is setting Chlamydomonas apart from other organisms. In contrast to the high-AU content in the 3'-UTRs of other organisms, Chlamydomonas shows a high-guanylate content that transits to high-cytidylate around the poly(A) site. The average length of the 3'-UTR is 595 nucleotides (nt), significantly longer than that of Arabidopsis and rice. The dominant poly(A) signal, UGUAA, was found in 52% of the near-upstream elements, and its occurrence may be positively correlated with higher gene expression levels. The UGUAA signal also exists in Arabidopsis and in some mammalian genes but mainly in the far-upstream elements, suggesting a shift in function. The C-rich region after poly(A) sites with unique signal elements is a characteristic downstream element that is lacking in higher plants. We also found a high level of alternative polyadenylation in the Chlamydomonas genome, with a range of up to 33% of the 4057 genes analyzed having at least two unique poly(A) sites and approximately 1% of these genes having poly(A) sites residing in predicted coding sequences, introns, and 5'-UTRs. These potentially contribute to transcriptome diversity and gene expression regulation.

Xing D, Zhao H, Xu R, Li QQ. Arabidopsis PCFS4, a homologue of yeast polyadenylation factor Pcf11p, regulates FCA alternative processing and promotes flowering time.. The Plant journal : for cell and molecular biology. 2008;54(5):899–910. doi:10.1111/j.1365-313X.2008.03455.x

The timely transition from vegetative to reproductive growth is vital for reproductive success in plants. It has been suggested that messenger RNA 3'-end processing plays a role in this transition. Specifically, two autonomous factors in the Arabidopsis thaliana flowering time control pathway, FY and FCA, are required for the alternative polyadenylation of FCA pre-mRNA. In this paper we provide evidence that Pcf11p-similar protein 4 (PCFS4), an Arabidopsis homologue of yeast polyadenylation factor Protein 1 of Cleavage Factor 1 (Pcf11p), regulates FCA alternative polyadenylation and promotes flowering as a novel factor in the autonomous pathway. First, the mutants of PCFS4 show delayed flowering under both long-day and short-day conditions and still respond to vernalization treatment. Next, gene expression analyses indicate that the delayed flowering in pcfs4 mutants is mediated by Flowering Locus C (FLC). Moreover, the expression profile of the known FCA transcripts, which result from alternative polyadenylation, was altered in the pcfs4 mutants, suggesting the role of PCFS4 in FCA alternative polyadenylation and control of flowering time. In agreement with these observations, using yeast two-hybrid assays and TAP-tagged protein pull-down analyses, we also revealed that PCFS4 forms a complex in vivo with FY and other polyadenylation factors. The PCFS4 promoter activity assay indicated that the transcription of PCFS4 is temporally and spatially regulated, suggesting its non-essential nature in plant growth and development.