Publications

2014

Wu X, Gaffney B, Hunt AG, Li QQ. Genome-wide determination of poly(A) sites in Medicago truncatula: evolutionary conservation of alternative poly(A) site choice.. BMC genomics. 2014;15(1):615. doi:10.1186/1471-2164-15-615

BACKGROUND: Alternative polyadenylation (APA) plays an important role in the post-transcriptional regulation of gene expression. Little is known about how APA sites may evolve in homologous genes in different plant species. To this end, comparative studies of APA sites in different organisms are needed. In this study, a collection of poly(A) sites in Medicago truncatula, a model system for legume plants, has been generated and compared with APA sites in Arabidopsis thaliana.

RESULTS: The poly(A) tags from a deep-sequencing protocol were mapped to the annotated M. truncatula genome, and the identified poly(A) sites used to update the annotations of 14,203 genes. The results show that 64% of M. truncatula genes possess more than one poly(A) site, comparable to the percentages reported for Arabidopsis and rice. In addition, the poly(A) signals associated with M. truncatula genes were similar to those seen in Arabidopsis and other plants. The 3'-UTR lengths are correlated in pairs of orthologous genes between M. truncatula and Arabidopsis. Very little conservation of intronic poly(A) sites was found between Arabidopsis and M. truncatula, which suggests that such sites are likely to be species-specific in plants. In contrast, there is a greater conservation of CDS-localized poly(A) sites in these two species. A sizeable number of M. truncatula antisense poly(A) sites were found. A high percentage of the associated target genes possess Arabidopsis orthologs that are also associated with antisense sites. This is suggestive of important roles for antisense regulation of these target genes.

CONCLUSIONS: Our results reveal some distinct patterns of sense and antisense poly(A) sites in Arabidopsis and M. truncatula. In so doing, this study lends insight into general evolutionary trends of alternative polyadenylation in plants.

Ray WC, Wolock SL, Callahan NW, Dong M, Li Q, Liang C, Magliery TJ, Bartlett CW. Addressing the unmet need for visualizing conditional random fields in biological data.. BMC bioinformatics. 2014;15:202. doi:10.1186/1471-2105-15-202

BACKGROUND: The biological world is replete with phenomena that appear to be ideally modeled and analyzed by one archetypal statistical framework - the Graphical Probabilistic Model (GPM). The structure of GPMs is a uniquely good match for biological problems that range from aligning sequences to modeling the genome-to-phenome relationship. The fundamental questions that GPMs address involve making decisions based on a complex web of interacting factors. Unfortunately, while GPMs ideally fit many questions in biology, they are not an easy solution to apply. Building a GPM is not a simple task for an end user. Moreover, applying GPMs is also impeded by the insidious fact that the "complex web of interacting factors" inherent to a problem might be easy to define and also intractable to compute upon.

DISCUSSION: We propose that the visualization sciences can contribute to many domains of the bio-sciences, by developing tools to address archetypal representation and user interaction issues in GPMs, and in particular a variety of GPM called a Conditional Random Field(CRF). CRFs bring additional power, and additional complexity, because the CRF dependency network can be conditioned on the query data.

CONCLUSIONS: In this manuscript we examine the shared features of several biological problems that are amenable to modeling with CRFs, highlight the challenges that existing visualization and visual analytics paradigms induce for these data, and document an experimental solution called StickWRLD which, while leaving room for improvement, has been successfully applied in several biological research projects. Software and tutorials are available at http://www.stickwrld.org/.

2013

Xing D, Wang Y, Xu R, Ye X, Yang D, Li QQ. The regulatory role of Pcf11-similar-4 (PCFS4) in Arabidopsis development by genome-wide physical interactions with target loci.. BMC genomics. 2013;14:598. doi:10.1186/1471-2164-14-598

BACKGROUND: The yeast and human Pcf11 functions in both constitutive and regulated transcription and pre-mRNA processing. The constitutive roles of PCF11 are largely mediated by its direct interaction with RNA Polymerase II C-terminal domain and a polyadenylation factor, Clp1. However, little is known about the mechanism of the regulatory roles of Pcf11. Though similar to Pcf11 in multiple aspects, Arabidopsis Pcf11-similar-4 protein (PCFS4) plays only a regulatory role in Arabidopsis gene expression. Towards understanding how PCFS4 regulates the expression of its direct target genes in a genome level, ChIP-Seq approach was employed in this study to identify PCFS4 enrichment sites (ES) and the ES-linked genes within the Arabidopsis genome.

RESULTS: A total of 892 PCFS4 ES sites linked to 839 genes were identified. Distribution analysis of the ES sites along the gene bodies suggested that PCFS4 is preferentially located on the coding sequences of the genes, consistent with its regulatory role in transcription and pre-mRNA processing. Gene ontology (GO) analysis revealed that the ES-linked genes were specifically enriched in a few GO terms, including those categories of known PCFS4 functions in Arabidopsis development. More interestingly, GO analysis suggested novel roles of PCFS4. An example is its role in circadian rhythm, which was experimentally verified herein. ES site sequences analysis identified some over-represented sequence motifs shared by subsets of ES sites. The motifs may explain the specificity of PCFS4 on its target genes and the PCFS4's functions in multiple aspects of Arabidopsis development and behavior.

CONCLUSIONS: Arabidopsis PCFS4 has been shown to specifically target on, and physically interact with, the subsets of genes. Its targeting specificity is likely mediated by cis-elements shared by the genes of each subset. The potential regulation on both transcription and mRNA processing levels of each subset of the genes may explain the functions of PCFS4 in multiple aspects of Arabidopsis development and behavior.

2012

Thomas PE, Wu X, Liu M, Gaffney B, Ji G, Li QQ, Hunt AG. Genome-wide control of polyadenylation site choice by CPSF30 in Arabidopsis.. The Plant cell. 2012;24(11):4376–88. doi:10.1105/tpc.112.096107

The Arabidopsis thaliana ortholog of the 30-kD subunit of the mammalian Cleavage and Polyadenylation Specificity Factor (CPSF30) has been implicated in the responses of plants to oxidative stress, suggesting a role for alternative polyadenylation. To better understand this, poly(A) site choice was studied in a mutant (oxt6) deficient in CPSF30 expression using a genome-scale approach. The results indicate that poly(A) site choice in a large majority of Arabidopsis genes is altered in the oxt6 mutant. A number of poly(A) sites were identified that are seen only in the wild type or oxt6 mutant. Interestingly, putative polyadenylation signals associated with sites that are seen only in the oxt6 mutant are decidedly different from the canonical plant polyadenylation signal, lacking the characteristic A-rich near-upstream element (where AAUAAA can be found); this suggests that CPSF30 functions in the handling of the near-upstream element. The sets of genes that possess sites seen only in the wild type or mutant were enriched for those involved in stress and defense responses, a result consistent with the properties of the oxt6 mutant. Taken together, these studies provide new insights into the mechanisms and consequences of CPSF30-mediated alternative polyadenylation.

Hunt AG, Xing D, Li QQ. Plant polyadenylation factors: conservation and variety in the polyadenylation complex in plants.. BMC genomics. 2012;13:641. doi:10.1186/1471-2164-13-641

BACKGROUND: Polyadenylation, an essential step in eukaryotic gene expression, requires both cis-elements and a plethora of trans-acting polyadenylation factors. The polyadenylation factors are largely conserved across mammals and fungi. The conservation seems also extended to plants based on the analyses of Arabidopsis polyadenylation factors. To extend this observation, we systemically identified the orthologs of yeast and human polyadenylation factors from 10 plant species chosen based on both the availability of their genome sequences and their positions in the evolutionary tree, which render them representatives of different plant lineages.

RESULTS: The evolutionary trajectories revealed several interesting features of plant polyadenylation factors. First, the number of genes encoding plant polyadenylation factors was clearly increased from "lower" to "higher" plants. Second, the gene expansion in higher plants was biased to some polyadenylation factors, particularly those involved in RNA binding. Finally, while there are clear commonalities, the differences in the polyadenylation apparatus were obvious across different species, suggesting an ongoing process of evolutionary change. These features lead to a model in which the plant polyadenylation complex consists of a conserved core, which is rather rigid in terms of evolutionary conservation, and a panoply of peripheral subunits, which are less conserved and associated with the core in various combinations, forming a collection of somewhat distinct complex assemblies.

CONCLUSIONS: The multiple forms of plant polyadenylation complex, together with the diversified polyA signals may explain the intensive alternative polyadenylation (APA) and its regulatory role in biological functions of higher plants.

2011

Xing D, Li QQ. Alternative polyadenylation and gene expression regulation in plants.. Wiley interdisciplinary reviews. RNA. 2011;2(3):445–58. doi:10.1002/wrna.59

Functioning as an essential step of pre-mRNA processing, polyadenylation has been realized in recent years to play an important regulatory role during eukaryotic gene expression. Such regulation occurs mostly through the use of alternative polyadenylation (APA) sites and generates different transcripts with altered coding capacity for proteins and/or RNA. However, the molecular mechanisms that underlie APAs are poorly understood. Besides APA cases demonstrated in animal embryo development, cancers, and other diseases, there are a number of APA examples reported in plants. The best-known ones are related to flowering time control pathways and stress responses. Genome-wide studies have revealed that plants use APA extensively to generate diversity in their transcriptomes. Although each transcript produced by RNA polymerase II has a poly(A) tail, over 50% of plant genes studied possess multiple APA sites in their transcripts. The signals defining poly(A) sites in plants were mostly studied through classical genetic means. Our understanding of these poly(A) signals is enhanced by the tallies of whole plant transcriptomes. The profiles of these signals have been used to build computer models that can predict poly(A) sites in newly sequenced genomes, potential APA sites in genes of interest, and/or to identify, and then mutate, unwanted poly(A) sites in target transgenes to facilitate crop improvements. In this review, we provide readers an update on recent research advances that shed light on the understanding of polyadenylation, APA, and its role in gene expression regulation in plants.

Zheng J, Xing D, Wu X, Shen Y, Kroll DM, Ji G, Li QQ. Ratio-based analysis of differential mRNA processing and expression of a polyadenylation factor mutant pcfs4 using arabidopsis tiling microarray.. PloS one. 2011;6(2):e14719. doi:10.1371/journal.pone.0014719

BACKGROUND: Alternative polyadenylation as a mechanism in gene expression regulation has been widely recognized in recent years. Arabidopsis polyadenylation factor PCFS4 was shown to function in leaf development and in flowering time control. The function of PCFS4 in controlling flowering time was correlated with the alternative polyadenylation of FCA, a flowering time regulator. However, genetic evidence suggested additional targets of PCFS4 that may mediate its function in both flowering time and leaf development.

METHODOLOGY/PRINCIPAL FINDINGS: To identify further targets, we investigated the whole transcriptome of a PCFS4 mutant using Affymetrix Arabidopsis genomic tiling 1.0R array and developed a data analysis pipeline, termed RADPRE (Ratio-based Analysis of Differential mRNA Processing and Expression). In RADPRE, ratios of normalized probe intensities between wild type Columbia and a pcfs4 mutant were first generated. By doing so, one of the major problems of tiling array data–variations caused by differential probe affinity–was significantly alleviated. With the probe ratios as inputs, a hierarchy of statistical tests was carried out to identify differentially processed genes (DPG) and differentially expressed genes (DEG). The false discovery rate (FDR) of this analysis was estimated by using the balanced random combinations of Col/pcfs4 and pcfs4/Col ratios as inputs. Gene Ontology (GO) analysis of the DPGs and DEGs revealed potential new roles of PCFS4 in stress responses besides flowering time regulation.

CONCLUSION/SIGNIFICANCE: We identified 68 DPGs and 114 DEGs with FDR at 1% and 2%, respectively. Most of the 68 DPGs were subjected to alternative polyadenylation, splicing or transcription initiation. Quantitative PCR analysis of a set of DPGs confirmed that most of these genes were truly differentially processed in pcfs4 mutant plants. The enriched GO term "regulation of flower development" among PCFS4 targets further indicated the efficacy of the RADPRE pipeline. This simple but effective program is available upon request.

Shen Y, Venu RC, Nobuta K, Wu X, Notibala V, Demirci C, Meyers BC, Wang G-L, Ji G, Li QQ. Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing.. Genome research. 2011;21(9):1478–86. doi:10.1101/gr.114744.110

Polyadenylation sites mark the ends of mRNA transcripts. Alternative polyadenylation (APA) may alter sequence elements and/or the coding capacity of transcripts, a mechanism that has been demonstrated to regulate gene expression and transcriptome diversity. To study the role of APA in transcriptome dynamics, we analyzed a large-scale data set of RNA "tags" that signify poly(A) sites and expression levels of mRNA. These tags were derived from a wide range of tissues and developmental stages that were mutated or exposed to environmental treatments, and generated using digital gene expression (DGE)-based protocols of the massively parallel signature sequencing (MPSS-DGE) and the Illumina sequencing-by-synthesis (SBS-DGE) sequencing platforms. The data offer a global view of APA and how it contributes to transcriptome dynamics. Upon analysis of these data, we found that ∼60% of Arabidopsis genes have multiple poly(A) sites. Likewise, ∼47% and 82% of rice genes use APA, supported by MPSS-DGE and SBS-DGE tags, respectively. In both species, ∼49%-66% of APA events were mapped upstream of annotated stop codons. Interestingly, 10% of the transcriptomes are made up of APA transcripts that are differentially distributed among developmental stages and in tissues responding to environmental stresses, providing an additional level of transcriptome dynamics. Examples of pollen-specific APA switching and salicylic acid treatment-specific APA clearly demonstrated such dynamics. The significance of these APAs is more evident in the 3034 genes that have conserved APA events between rice and Arabidopsis.

Wu X, Liu M, Downie B, Liang C, Ji G, Li QQ, Hunt AG. Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation.. Proceedings of the National Academy of Sciences of the United States of America. 2011;108(30):12533–8. doi:10.1073/pnas.1019732108

Alternative polyadenylation (APA) has been shown to play an important role in gene expression regulation in animals and plants. However, the extent of sense and antisense APA at the genome level is not known. We developed a deep-sequencing protocol that queries the junctions of 3'UTR and poly(A) tails and confidently maps the poly(A) tags to the annotated genome. The results of this mapping show that 70% of Arabidopsis genes use more than one poly(A) site, excluding microheterogeneity. Analysis of the poly(A) tags reveal extensive APA in introns and coding sequences, results of which can significantly alter transcript sequences and their encoding proteins. Although the interplay of intron splicing and polyadenylation potentially defines poly(A) site uses in introns, the polyadenylation signals leading to the use of CDS protein-coding region poly(A) sites are distinct from the rest of the genome. Interestingly, a large number of poly(A) sites correspond to putative antisense transcripts that overlap with the promoter of the associated sense transcript, a mode previously demonstrated to regulate sense gene expression. Our results suggest that APA plays a far greater role in gene expression in plants than previously expected.

Zhao H, Zheng J, Li QQ. A novel plant in vitro assay system for pre-mRNA cleavage during 3’-end formation.. Plant physiology. 2011;157(3):1546–54. doi:10.1104/pp.111.179465

Messenger RNA (mRNA) maturation in eukaryotic cells requires the formation of the 3' end, which includes two tightly coupled steps: the committing cleavage reaction that requires both correct cis-element signals and cleavage complex formation, and the polyadenylation step that adds a polyadenosine [poly(A)] tract to the newly generated 3' end. An in vitro biochemical assay plays a critical role in studying this process. The lack of such an assay system in plants hampered the study of plant mRNA 3'-end formation for the last two decades. To address this, we have now established and characterized a plant in vitro cleavage assay system, in which nuclear protein extracts from Arabidopsis (Arabidopsis thaliana) suspension cell cultures can accurately cleave different pre-mRNAs at expected in vivo authenticated poly(A) sites. The specific activity is dependent on appropriate cis-elements on the substrate RNA. When complemented by yeast (Saccharomyces cerevisiae) poly(A) polymerase, about 150-nucleotide poly(A) tracts were added specifically to the newly cleaved 3' ends in a cooperative manner. The reconstituted polyadenylation reaction is indicative that authentic cleavage products were generated. Our results not only provide a novel plant pre-mRNA cleavage assay system, but also suggest a cross-kingdom functional complementation of yeast poly(A) polymerase in a plant system.