Publications

2015

Liu M, Wu X, Li QQ. DNA/RNA hybrid primer mediated poly(A) tag library construction for Illumina sequencing.. Methods in molecular biology (Clifton, N.J.). 2015;1255:175–84. doi:10.1007/978-1-4939-2175-1_15

Alternation polyadenylation is widespread in eukaryotes, and has demonstrated roles in gene expression regulation. Owing to deep DNA sequencing technologies, global analyses of alternation polyadenylation and their functions have become possible. We present a method to generate poly(A) tags libraries for high-throughput sequencing (PAT-seq). This protocol targets the junction of the 3'-UTR and poly(A) tail of a transcript so it can be positively identified as a poly(A) site. Upon Zinc-mediated limited digestion of total RNA, RNA fragments with poly(A) tail are then isolated and 5'-end repaired. A DNA/RNA hybrid adaptor is ligated to the 5' end as an anchor. Then the library is generated by reverse transcription with oligo(dT)-adapter followed by PCR amplification. Such a custom poly(A) tags library can be generated from any source poly(A) containing RNA and good for both single- or paired-end sequencing in any Illumina sequencing platforms. This new method has been applied to investigate mRNA polyadenylation in Arabidopsis.

Guan J, Fu J, Wu M, Chen L, Ji G, Li QQ, Wu X. VAAPA: a web platform for visualization and analysis of alternative polyadenylation.. Computers in biology and medicine. 2015;57:20–5. doi:10.1016/j.compbiomed.2014.11.010

Polyadenylation [poly(A)] is an essential process during the maturation of most mRNAs in eukaryotes. Alternative polyadenylation (APA) as an important layer of gene expression regulation has been increasingly recognized in various species. Here, a web platform for visualization and analysis of alternative polyadenylation (VAAPA) was developed. This platform can visualize the distribution of poly(A) sites and poly(A) clusters of a gene or a section of a chromosome. It can also highlight genes with switched APA sites among different conditions. VAAPA is an easy-to-use web-based tool that provides functions of poly(A) site query, data uploading, downloading, and APA sites visualization. It was designed in a multi-tier architecture and developed based on Smart GWT (Google Web Toolkit) using Java as the development language. VAAPA will be a valuable addition to the community for the comprehensive study of APA, not only by making the high quality poly(A) site data more accessible, but also by providing users with numerous valuable functions for poly(A) site analysis and visualization.

Zhao H, Li QQ. In vitro analysis of cleavage and polyadenylation in Arabidopsis.. Methods in molecular biology (Clifton, N.J.). 2015;1255:79–89. doi:10.1007/978-1-4939-2175-1_8

In eukaryotes, pre-messenger RNA (pre-mRNA) cleavage and polyadenylation is one of the necessary processing steps that produce a mature and functional mRNA. Regulation on pre-mRNA cleavage and polyadenylation affects other processes such as mRNA translocation, stability, and translation. The process of pre-mRNA cleavage and polyadenylation, and its relationship with RNA splicing and translation, have been extensively studied due to its importance in vivo. A successful in vitro system has provided enormous amount of information to the study of cleavage and polyadenylation in the mammalian and yeast systems. Here, we describe an in vitro pre-mRNA cleavage system that faithfully cleaves pre-mRNA substrate using Arabidopsis cell/tissue cultures.

Ji G, Li L, Li QQ, Wu X, Fu J, Chen G, Wu X. PASPA: a web server for mRNA poly(A) site predictions in plants and algae.. Bioinformatics (Oxford, England). 2015;31(10):1671–3. doi:10.1093/bioinformatics/btv004

Polyadenylation is an essential process during eukaryotic gene expression. Prediction of poly(A) sites helps to define the 3' end of genes, which is important for gene annotation and elucidating gene regulation mechanisms. However, due to limited knowledge of poly(A) signals, it is still challenging to predict poly(A) sites in plants and algae. PASPA is a web server for P: oly( A: ) S: ite prediction in P: lants and A: lgae, which integrates many in-house tools as add-ons to facilitate poly(A) site prediction, visualization and mining. This server can predict poly(A) sites for ten species, including seven previously poly(A) signal non-characterized species, with sensitivity and specificity in a range between 0.80 and 0.95.

Ji G, Guan J, Zeng Y, Li QQ, Wu X. Genome-wide identification and predictive modeling of polyadenylation sites in eukaryotes.. Briefings in bioinformatics. 2015;16(2):304–13. doi:10.1093/bib/bbu011

Polyadenylation [poly(A)] is a vital step in post-transcriptional processing of pre-mRNA. Alternative polyadenylation is a widespread mechanism of regulating gene expression in eukaryotes. Defining poly(A) sites contributes to the annotation of transcripts' ends and the study of gene regulatory mechanisms. Here, we survey methods for collecting poly(A) sites using high-throughput sequencing technologies and summarize the general processes for genome-wide poly(A) site identifications. We also compare the performances of various poly(A) site prediction models and discuss the relationship between poly(A) site identification from sequencing projects and predictive modeling. Moreover, we attempt to address some potential problems in current researches and propose future directions related to polyadenylation research.

Wu X, Zeng Y, Guan J, Ji G, Huang R, Li QQ. Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana.. BMC genomics. 2015;16(1):511. doi:10.1186/s12864-015-1691-1

BACKGROUND: Messenger RNA polyadenylation is an essential step for the maturation of most eukaryotic mRNAs. Accurate determination of poly(A) sites helps define the 3'-ends of genes, which is important for genome annotation and gene function research. Genomic studies have revealed the presence of poly(A) sites in intergenic regions, which may be attributed to 3'-UTR extensions and novel transcript units. However, there is no systematically evaluation of intergenic poly(A) sites in plants.

RESULTS: Approximately 16,000 intergenic poly(A) site clusters (IPAC) in Arabidopsis thaliana were discovered and evaluated at the whole genome level. Based on the distributions of distance from IPACs to nearby sense and antisense genes, these IPACs were classified into three categories. About 70 % of them were from previously unannotated 3'-UTR extensions to known genes, which would extend 6985 transcripts of TAIR10 genome annotation beyond their 3'-ends, with a mean extension of 134 nucleotides. 1317 IPACs were originated from novel intergenic transcripts, 37 of which were likely to be associated with protein coding transcripts. 2957 IPACs corresponded to antisense transcripts for genes on the reverse strand, which might affect 2265 protein coding genes and 39 non-protein-coding genes, including long non-coding RNA genes. The rest of IPACs could be originated from transcriptional read-through or gene mis-annotations.

CONCLUSIONS: The identified IPACs corresponding to novel transcripts, 3'-UTR extensions, and antisense transcription should be incorporated into current Arabidopsis genome annotation. Comprehensive characterization of IPACs from this study provides insights of alternative polyadenylation and antisense transcription in plants.

2014

Ma L, Pati PK, Liu M, Li QQ, Hunt AG. High throughput characterizations of poly(A) site choice in plants.. Methods (San Diego, Calif.). 2014;67(1):74–83. doi:10.1016/j.ymeth.2013.06.037

The polyadenylation of mRNA in eukaryotes is an important biological process. In recent years, significant progress has been made in the field of mRNA polyadenylation owing to the advent of the next generation DNA sequencing technologies. The high-throughput sequencing capabilities have resulted in the direct experimental determinations of large numbers of polyadenylation sites, analysis of which has revealed a vast potential for the regulation of gene expression in eukaryotes. These collections have been generated using specialized sequencing methods that are targeted to the junction of 3'-UTR and the poly(A) tail. Here we present three variations of such a protocol that has been used for the analysis of alternative polyadenylation in plants. While all these methods use oligo-dT as an anchor to the 3'-end, they differ in the means of generating an anchor for the 5'-end in order to produce PCR products suitable for effective Illumina sequencing; the use of different methods to append 5' adapters expands the possible utility of these approaches. These methods are versatile, reproducible, and may be used for gene expression analysis as well as global determinations of poly(A) site choice.

Zhao Z, Wu X, Kumar PKR, Dong M, Ji G, Li QQ, Liang C. Bioinformatics analysis of alternative polyadenylation in green alga Chlamydomonas reinhardtii using transcriptome sequences from three different sequencing platforms.. G3 (Bethesda, Md.). 2014;4(5):871–83. doi:10.1534/g3.114.010249

Messenger RNA 3'-end formation is an essential posttranscriptional processing step for most eukaryotic genes. Different from plants and animals where AAUAAA and its variants routinely are found as the main poly(A) signal, Chlamydomonas reinhardtii uses UGUAA as the major poly(A) signal. The advance of sequencing technology provides an enormous amount of sequencing data for us to explore the variations of poly(A) signals, alternative polyadenylation (APA), and its relationship with splicing in this algal species. Through genome-wide analysis of poly(A) sites in C. reinhardtii, we identified a large number of poly(A) sites: 21,041 from Sanger expressed sequence tags, 88,184 from 454, and 195,266 from Illumina sequence reads. In comparison with previous collections, more new poly(A) sites are found in coding sequences and intron and intergenic regions by deep-sequencing. Interestingly, G-rich signals are particularly abundant in intron and intergenic regions. The prevalence of different poly(A) signals between coding sequences and a 3'-untranslated region implies potentially different polyadenylation mechanisms. Our data suggest that the APA occurs in about 68% of C. reinhardtii genes. Using Gene Ontolgy analysis, we found most of the APA genes are involved in RNA regulation and metabolic process, protein synthesis, hydrolase, and ligase activities. Moreover, intronic poly(A) sites are more abundant in constitutively spliced introns than retained introns, suggesting an interplay between polyadenylation and splicing. Our results support that APA, as in higher eukaryotes, may play significant roles in increasing transcriptome diversity and gene expression regulation in this algal species. Our datasets also provide useful information for accurate annotation of transcript ends in C. reinhardtii.

Liu M, Xu R, Merrill C, Hong L, Von Lanken C, Hunt AG, Li QQ. Integration of developmental and environmental signals via a polyadenylation factor in Arabidopsis.. PloS one. 2014;9(12):e115779. doi:10.1371/journal.pone.0115779

The ability to integrate environmental and developmental signals with physiological responses is critical for plant survival. How this integration is done, particularly through posttranscriptional control of gene expression, is poorly understood. Previously, it was found that the 30 kD subunit of Arabidopsis cleavage and polyadenylation specificity factor (AtCPSF30) is a calmodulin-regulated RNA-binding protein. Here we demonstrated that mutant plants (oxt6) deficient in AtCPSF30 possess a novel range of phenotypes–reduced fertility, reduced lateral root formation, and altered sensitivities to oxidative stress and a number of plant hormones (auxin, cytokinin, gibberellic acid, and ACC). While the wild-type AtCPSF30 (C30G) was able to restore normal growth and responses, a mutant AtCPSF30 protein incapable of interacting with calmodulin (C30GM) could only restore wild-type fertility and responses to oxidative stress and ACC. Thus, the interaction with calmodulin is important for part of AtCPSF30 functions in the plant. Global poly(A) site analysis showed that the C30G and C30GM proteins can restore wild-type poly(A) site choice to the oxt6 mutant. Genes associated with hormone metabolism and auxin responses are also affected by the oxt6 mutation. Moreover, 19 genes that are linked with calmodulin-dependent CPSF30 functions, were identified through genome-wide expression analysis. These data, in conjunction with previous results from the analysis of the oxt6 mutant, indicate that the polyadenylation factor AtCPSF30 is a regulatory hub where different signaling cues are transduced, presumably via differential mRNA 3' end formation or alternative polyadenylation, into specified phenotypic outcomes. Our results suggest a novel function of a polyadenylation factor in environmental and developmental signal integration.