Transcriptome responses to heat- and cold-stress in ladybirds (Cryptolaemus montrouzieri Mulasnt) analyzed by deep-sequencing

Background Changed temperature not only threaten agricultural production, but they also affect individual biological behavior, population and community of many insects, and consequently reduce the stability of our ecosystem. Insect’s ability to respond to temperature stress evolved through a complex adaptive process, thus resulting in varied temperature tolerance among different insects. Both high and low extreme temperatures are detrimental to insect development since they constitute an important abiotic stress capable of inducing abnormal biological responses. Many studies on heat or cold tolerance of ladybirds have focused on measurements of physiological and biochemical indexes such as supercooling point, higher/lower lethal temperatures, survival rate, dry body weight, water content, and developmental duration. And studies of the molecular mechanisms of ladybird responses to heat or cold stress have focused on single genes, such as those encoding heat shock proteins, but has not been analyzed by transcriptome profiling. Results In this study, we report the use of Digital Gene Expression (DGE) tag profiling to gain insight into transcriptional events associated with heat- and cold-stress in C. montrouzieri. About 6 million tags (49 bp in length) were sequenced in a heat stress group, a cold stress group and a negative control group. We obtained 687 and 573 genes that showed significantly altered expression levels following heat and cold shock treatments, respectively. Analysis of the global gene expression pattern suggested that 42 enzyme-encoding genes mapped to many Gene Ontology terms are associated with insect’s response to heat- and cold-stress. Conclusions These results provide a global assessment of genes and molecular mechanisms involved in heat and cold tolerance.


Background
With the impact of human activities on the climate, extreme temperature events are becoming more and more frequent. Fluctuations in environmental temperatures are encountered over the life span of most organisms. Many species have a metabolism that is adapted to the temperature range of the environment in which they evolved. When the external temperature out of this range, this triggers an evolutionarily conserved heat stress transcriptome response modulating genes that control multiple cellular activities including protein folding, protein degradation, transport, metabolism, DNA repair, and replocaion [1,2].
Insects are relatively sensitive to temperature changes [3,4]. Temperature has a direct impact on ontogenetic development, survival, and reproduction, while an indirect impact on generation time and population growth rate [5]. Both high and low extreme temperatures are detrimental to insect development since they constitute an important abiotic stress capable of inducing abnormal biological responses [6]. Insect's ability to respond to temperature stress evolved through a complex adaptive process, thus resulting in varied temperature tolerance among different insects [7]. The ever-increasing occurrences of extreme weather events can impact natural predators in the field. Owing to such a difference in tolerance, temperature changes could consequently alter the synchrony between pests and their natural predators. For instance, if predators are relatively more sensitive to temperature changes than the pests, the latter will be able to escape from the control by the former. As a result, damages to crop plants incurred by the pests will be more severe, and furthermore, the predators might be driven to extinction [8]. It has been demonstrated that high temperatures are conducive to changes in insect metabolism, respiration, nervous and endocrine systems [9]. Insects respond to heat stress with increased expression of many genes including those encoding heat shock proteins, heat shock transcription factors, the hsr-omega protein and phosphoglucose isomerase [10][11][12][13]. Also, insects develop cold resistance through mechanisms including freeze tolerance, freeze avoidance and accumulation of polyols [14].
The ladybird Cryptolaemus montrouzieri Mulsant (Coleoptera: Coccinellidae) is a native species of Australia. The importance of C. montrouzieri as a biological control agent is expected to gain wider recognition due to concerns about over-reliance on insecticide usage. Many studies on heat or cold tolerance of ladybirds have focused on measurements of physiological and biochemical indexes such as supercooling point, higher/ lower lethal temperatures, survival rate, dry body weight, water content, and developmental duration [15,16]. So far studies of the molecular mechanisms of ladybird responses to heat or cold stress have focused on single genes, such as those encoding heat shock proteins, but has not been analyzed by transcriptome profiling.
In the few years since its initial application, the highthroughput sequencing (also called deep sequencing or Next Generation Sequencing; NGS) has become a powerful tool that allows the concomitant sequencing of millions of signatures to the genome and identification of specific genes and the abundance of gene expression in a sample tissue [17]. Previous transcriptome profiling studies, based on microarray data, which has been the most commonly used technology over the last decade, have some limitations as well because genes are represented by unspecific probe sets and, at low expression levels, they cannot be reliably detected. Sequencing-based methods generate absolute rather than relative gene expression measurements and avoid these inherent limitations of microarray analysis [18][19][20][21].
Recently, NGS technology has been adapted for transcriptome analysis because of the inexpensive production of large volumes of sequence data [22][23][24][25]. The next generation sequencing system developed by Solexa/Illumina [26], which is also referred to as Digital Gene Expression (DGE) tag profiling, allows identification of millions of short RNAs directly from mRNA and of differentially expressed genes without either the need to use bacterial clones or for prior annotations [27][28][29]. DGE tag profiling have dramatically changed the way that resistancerelevant genes in insects are identified because these technologies facilitate investigations on the functional complexity of transcriptomes [30,31].
Using a Digital Gene Expression (DGE) tag profiling approach, we employed the Illumina HiSeq ™ 2000 platform to perform a deep transcriptome analysis of C. montrouzieri to gain insight into the molecular mechanisms of ladybird responses to heat and cold stresses. This approach was suitable for investigating deep transcriptome variations in ladybirds and identified several loci with high transcription signals for not previously identified genes. Our results yielded sets of up-regulated and down-regulated genes in response to heat and cold stress and some genes, both related to heat and cold tolerance in ladybirds, are discussed. Furthermore, our analysis even obtained some novel heat/cold stressed relevant transcripts and, therefore, the benefit of better understanding of the molecular mechanism of heat and cold tolerance may result.

Analysis of DGE libraries
To investigate the transcriptome responses to heat and cold stress in C. montrouzieri, the Solexa/Illumina's DGE system was used to perform high throughput tagsequencing analysis on ladybird adult libraries. We sequenced three DGE libraries, namely NC, HS and CS with insects collected 2 h following treatments at 25, 38 and 4 °C, respectively. Major characteristics of these three libraries were summarized in Table 1. Approximately a total of 6 million expressed sequence tags per library were obtained (submitted to the NCBI Short Read Archive [SRA] database, Accession No. SRR346079). Prior to mapping, these tag sequences from the reference transcripts database [32], adaptor tags, low quality tags and tags with one copy were filtered, producing approximately 6 million in total of clean sequence tags per library. The distribution of the total and distinct clean tag copy numbers showed highly similar tendencies in the three libraries. The HS library had the highest number of both clean and distinct clean tags. The other two libraries had similar counts. Moreover, the HS library showed the highest ratio of the number of distinct clean tags over total clean tags (1.01 %), and the CS library showed the lowest percentage of distinct high copy number tags (4.88 %). These data suggested that more genes were detected in the HS library than that in the other two libraries and more transcripts expressed at lower levels in the CS library. To estimate whether or not the sequencing depth was sufficient for the transcriptome coverage, the sequencing saturation was analyzed in the three libraries. The genes that were mapped by all clean tags and unambiguous clean tags increased with the total number of tags. When the number of sequencing tags reached 2 million, library capacity reached saturation (Fig. 1).

Analysis of tag mapping
A basic reference tag database of C. montrouzieri containing 38,381 transcript sequences was prepared for tag mapping. Among these sequences, genes with a CATG site accounted for 23,774 (61.94 %). Also, all CATG+ 17-base tags were used as gene's reference tags. Finally, a total of 54,293 tag sequences, among which 53,872 (99.22 %) possessed an unambiguous match to reference tags were obtained. Based on aforementioned criteria, 45.08-47.42 % of the distinct clean tags were mapped to the virtual transcripts database, 44.67-47.05 % of the distinct clean tags were mapped unambiguously to the transcripts, and 52.58-54.92 % of the distinct clean tags did not map to the transcripts tag database ( Table 1). The occurrence of unknown tags was probably due to factors such as the lack of ladybird genome sequences, incomplete NlaIII digestion of cDNA during library preparation, many tags matching with noncoding RNAs and alternative use of polyadenylation signals and/or splicing sites [33,34]. Solexa/Illumina sequencing can distinguish transcripts originating from both DNA strands. Based on the strand specificity of the sequencing tags obtained, bidirectional transcription was found in 10,156 to 11,420 of all detectable transcripts, including 8983 to 10,304 sense-strand transcripts and 5157 to 5519 antisense-strand transcripts ( Table 2). In other words, the ratio of sense to antisense strands of the transcripts was approximately 1.8:1 for all the libraries. This suggests that, in spite of the high number of sense and antisense mapping events detected, the transcriptional regulation in the heat and cold response acts most strongly on the sense strand in ladybirds.

Identification of differentially expressed (DE) genes across treatments
Global analysis of transcriptome variations in C. montrouzieri adults exposed to each temperature treatment revealed the up-or down-regulation of genes transcribed between differential temperature points (HS/ NC, CS/NC) across treatments. To compare the DE genes between libraries, the level of gene expression was determined by converting the number of unambiguous tags in each library to Tags Per Million (TPM). Out of all tag-mapped genes, 6859, 7925 and 7232 genes could be annotated by the NCBI non-redundant (Nr) database (E-value ≤ 10 −5 ) in NC, HS and CS libraries, respectively. In order to find the significant changes in gene expression under heat and cold stresses, the DGE analysis with tags from these three groups was carried out based on Bayesian algorithm [35]. False discovery rates (FDR) ≤0.001 and the absolute value of the log 2 Ratio ≥1 were used as a threshold to judge the statistical significance of gene expression. From the three data sets, 687 genes, mapped by a total of 2585 significantly changed tag entities, were detected in the HS/NC libraries, while the total number of genes (573) mapped by 2084 tag entities in the CS/NC libraries was a little lower.
As compared to the NC data set, most of the genes (403) were up-regulated in the HS group, and most of the genes (389) were down-regulated in the CS group (Fig. 2 Table 3. The most DE genes in the HS/NC group were   . 3).
Functional annotation of DE genes was performed to map all the genes to terms in the KEGG database. We compared DE genes with the whole transcriptome background in order to search for genes involved in metabolic or signal transduction pathways that were significantly enriched. Among all the DE genes between HS/NC and CS/NC data sets, 379 and 309 genes were mapped in 173 and 165 terms in the KEGG pathway database, respectively. Notably, specific enrichment of genes was observed for the KEGG pathway with a significant P value of <0.05 indicating involvement in both heat and cold response metabolic or signal transduction pathways (Table 4). These genes were also involved in drug metabolism (other enzymes, metabolic pathways, metabolism of xenobiotics by cytochrome P450 and drug metabolism), cytochrome P450, antigen processing and presentation, starch and sucrose metabolism, steroid hormone biosyntheses, retinol metabolism, amino sugar and nucleotide sugar metabolism, oxidative phosphorylation, tyrosine metabolism and fatty acid metabolism. However, some other pathways were only influenced under heat stress, such as protein processing in the endoplasmic reticulum, ribosomes, NOD-like receptor signaling pathways, neuroactive ligand-receptor interaction, fatty acid elongation in mitochondria, insect hormone biosynthesis, cardiac muscle contraction, the PPAR signaling pathway and complement and coagulation cascades. Some other pathways were only affected under cold stress, such as ascorbate and aldarate metabolism, other glycan degradation, phagosome, alpha-linolenic acidmetabolism, pentose and glucuronte interconversions, linoleic acid metabolism and glycerolipid metabolism.

Validation of DGE data by QPCR
To validate the differential DGE results identified by Solexa/Illumina sequencing, we selected 12 genes for quantitative RT-PCR confirmation. The gene set included 4 down-regulated genes (cytochrome p450 [P450], acyl  (Fig. 4).

Discussion
C. Montrouzieri is mainly distributed in the subtropical and tropical regions, it generally suitable for the survival temperature in 20-30 °C, although the different climate make it a certain regional group differences. Heat and cold tolerance of C. montrouzieri directly limits its  (32-36 °C) killed all larvae and pupae, and stopped the adults from laying eggs, while at 40 °C, the adults stopped feeding and 40 % of them died [13]. Low-temperature treatment at 10 °C stopped the adults from feeding, treatment at 2 °C suspended their movement, and 40 % of the adults died when temperature raised slowly from −2 °C to above 15 °C. Our previous observation (unpublished data) also revealed that high-temperature treatment at 38 °C or low-temperature treatment at 4 °C for 12 h would not cause death of C. montrouzieri, but extended treatment to 48 h would kill almost half of the Tested sample population. We chose these two temperatures because such temperatures can rapidly stimulate stress responses while treatment at such temperatures for 2 h would not cause lethal cell damage or organ failure. Development of tolerance to extreme temperatures by C. montrouzieri is likely through their avoidance behavior and expression regulation of several tolerance-related proteins and metabolisms. Comprehensive investigation of gene expression regulation under heat and cold stress will be of great importance in order to understand the biochemical and physiological adaptation process in C. montrouzieri, since there are only 62 Cryptolaemus sequence entries in GenBank. Previous transcriptome profiling studies using microarray data analysis, which has been the most commonly used technology over the last decade, provided limited information because many genes are under-represented by unspecific probe sets and relatively low expression levels for reliable gene detection. Sequencing-based methods generate absolute rather than relative gene expression measurements and avoid such inherent limitations as found in microarray data analysis [18,21]. In this study, large-scale DGE tag data were obtained by high throughput sequencing to provide a clear insight into the molecular mechanism of the response to temperature stress in C. montrouzieri.By targeting a defined cDNA library, the DGE method can generate broader transcriptome coverage and a higher number of cDNA tags per gene, leading to more precise gene transcription information. Provided that a reference genome or transcriptome database is available and that the aim is to quantify transcript levels between different biological samples, this method is perfectly suited for deep transcriptome analysis [36]. In contrast to many previous studies on transcriptome analysis using mixed samples, we chose adult samples based on the premise that adult period is the longest part of the life cycle of C. montrouzieri and the adults also have the strongest tolerance to extreme temperatures. It is in adult form that the C. montrouzieri passes through winter and lays eggs when temperature rises to appropriate levels. A mixed sample of ladybirds at all stages may provide more sequence data, but such data may render the differentially expressed genes less obvious. In addition, for non-model insects such as C. montrouzieri, analysis of differentially expressed genes in adults can improve the accuracy to detect temperature tolerancerelated genes compared to analysis of mixed samples. Although sequencing the mixed mRNAs was lack of correlation analysis, the results of the group change tendency is still representative.
We sequenced 3 distinct cDNA libraries obtaind from a heat-stressed group, a cold stressed group and a negative control group, using the DGE method. we obtained 10,016 to 11,259 (among three libraries) tagmapped genes out of 38,381 transcripts (26.10-29.33 %) by mapping the tags to a transcriptome reference database. Many transcripts did not map to tags, most likely as a result of differences in gene expression at different development ladybird stages, distinct genes might be matched by the same tags and were possibly removed from the data sets, and/or a CATG site does not exist in all genes. In our transcriptome database, 14,607 genes (38.06 %) without CATG sites could not generate 49 bp tags to sequence on the Solexa/Illumina platform. In addition, more than 52 % (NC: 54.92 %, HS: 52.58 %, CS: 54.01 %) of the distinct clean tags could not be mapped to the transcriptome reference database. The occurrence of unknown tags was most likely due to the lack of ladybird genome sequences, the incomplete NlaIII digestion during library preparation, many tags matching noncoding RNAs and the usage of alternative polyadenylation and/ or splicing sites [33,34].
Recently, some other species transcriptome response to temperature stress were reported, and they almost interest in species response to heat stress, such as chicken, marine snail, duck, fungus, etc. [37][38][39][40]. However numbers of the regulated genes were very different in different animal, many same biological processes and pathways were found by differential expression genes, including transcription factor, chromatin modification, signaling pathways, DNA repair and replication, translation, ect. In this study, stress by heat and cold treatments resulted in significant alterations of transcriptome profile in the ladybird, including change in expression of 1033 transcripts. Among them, 227 genes were affected by both heat and cold stresses, including 42 significantly differentially transcribed enzyme-related gene (Fig. 5). Fifteen genes were strongly up-regulated by heat and cold stress, including those encoding carboxylesterase, juvenile hormone esterase, mps one binder kinase, cathepsin, ATP synthase, beta-ureidopropionase, phosphatidylinositol-3,4,5-trisphosphate 3-phosphatase, RNA helicase, beta-1,4-mannosyltransferase, ribose-phosphate pyrophosphokinase, endonuclease-reverse transcriptase and casein kinase. Based on the finding that genes encoding multiple serine proteases, cytochrome, cathepsin, chitinase and some other enzymes were down-regulated (Fig. 5), we speculated that these enzymes are most likely relevant to both heat and cold tolerance.
However, these significantly differentially transcribed enzyme-related genes represented only a small proportion of the total transcripts that were found in transcriptome database. In addition to these annotated genes, 146 and 141 genes out of HS/NC group and CS/NC group, respectively, could not be matched with any existing genes. Novel genes that may be involved in the insecticide response might be identified from this group in the future.

Experimental insect and sample preparation
Stock cultures of C. montrouzieri were reared on mealybugs (Planococcus citri Risso) at Sun Yet-Sen University, Guangzhou, China, and were maintained in laboratory for more than 10 years before the onset of this project. C. montrouzieri adults (3 days after eclosion of female adults) were obtained by rearing the colony under insectary conditions at 25 ± 1 °C and with a photoperiod of 14 h light and 10 h darkness. To understand the transcriptome responses to extreme temperatures, two groups named heat stress group (HS) and cold stress group (CS) were treated at 38 and 4 °C for 2 h, respectively. In addition, a negative control group (NC) without temperature stress was treated at 25 °C for 2 h. Finally, these samples were harvested and immediately frozen in liquid nitrogen for subsequent analysis.

RNA extraction, DGE library construction and sequencing
Total RNA was extracted from each sample group using the UNlQ-10 Column Trizol Total RNA Isolation Kit (Sangon, Shanghai) according to the manufacturer's instructions. The integrity of RNA samples was determined using a 2100 Bioanalyzer (Agilent Technologies) and standardized with a minimum RNA integrated number value of 7.5. For each group, total RNA from 3 replicates was pooled together in equal quantities. Approximately 6 μg of RNA representing each group were used for Solexa/Illumina sequencing.
Sequencing tag library construction was performed in parallel using an Illumina Gene Expression Sample Prep Kit. Briefly, mRNA was isolated from total RNA using magnetic oligo (dT) beads. First and second strand cDNA were synthesized, and bead-bound cDNA was subsequently digested with the restriction enzyme NlaIII, which recognizes and cleaves at the CATG site. Then the Fig. 5 Enzymes differentially transcribed in ladybirds stressed by heat and cold. Clustering analysis based on transcription levels was performed on 42 enzyme-encoding genes showing differential transcription in C. montrouzieri exposed to heat and cold. Color scale from red to green indicates Log 2 transcription ratios from +2 (fourfold over transcription) to −2 (fourfold under transcription). For each gene, gene ID and annotation are indicated 3′-cDNA fragments attached to oligo (dT) beads were ligated to Illumina Adapter 1, which contained a recognition site for the restriction enzyme MmeI for cleavage 17 bp downstream of the CATG site to produce tags with adapter 1. After removing the 3′ fragments via magnetic bead precipitation, the Illumina adaptor 2 was ligated to the 3′ ends of tags, rsulting in tags with different adaptors attached to both ends to form a tag library. After 15 cycles of linear PCR amplification, 95 bp fragments were purified by 6 % TBE PAGE gel electrophoresis. Subsequently, the single-chain molecules were fixed onto the Illumina Sequencing Chip (flow cell). Each molecule was allowed to grow into a single-molecule cluster sequencing template through in situ amplification. In this process, color-labeled nucleotides were added, and products were sequenced via the method of sequencing by synthesis (SBS). Up to millions of sequence reads with a length of 49 bp were generated. The Illumina Sequencing of three libraries for each of HS, CS and NC groups was performed by Beijing Genomics Institute (BGI), Shenzhen, China. The raw data (tag sequences) were submitted to the NCBI SRA database with the accession number SRR346079.

DGE tag profiling
For the raw data, we filtered all raw sequence reads by the Illumina pipeline. All low quality tags, such as tags with unknown sequences 'N' , empty tags (sequences with only adaptor sequences), low complexity, and tags with only one copy (probably resulting from sequencing errors) were removed, then the remainder of tags are clean tags. For annotation, a virtual library containing all the possible CATG + 17 base length sequences were created using a reference transcripts database [32] (assembled by our laboratory, and the raw data was deposited in the NCBI SRA database with the Accession No. SRR343064) by SOAP programs (developed by Beijing Genomics Institute, Beijing, China). If a clean tag were mapped to a gene in virtual library, the number of this tag was considered to be the expression of the gene. For monitoring the mapping events on both strands, both the sense and complementary antisense sequences were included in the data collection. All clean tags were mapped to the virtual library and only one nucleotide mismatch was allowed. Clean tags that could be mapped to reference sequences in virtual library were filtered from multiple genes. The remainders of the clean tags were designed as unambiguous clean tags. For gene expression analysis, the number of unambiguous clean tags for each gene was calculated and then normalized to the number of transcripts per million clean tags (TPM) [20,41], and the differentially expressed tags were used for mapping and annotation.

Gene expression pattern analysis
A statistical analysis of the frequency of each tag in these three cDNA libraries was performed to compare geneexpression results. Statistical comparison was performed with custom written scripts using the method described by Audic and Claverie [35]: Denote the number of unambiguous clean tag from gene A as x, as every gene's expression occupies only a small part of the library, the p(x) is in the Poisson distribution.
where λ is the real transcripts of the gene.
The total clean tag number of the sample 1 is N1, and total clean tag number of sample 2 is N2; gene A holds x tags in sample 1 and y tags in sample 2. The probability of gene A expressed equally between two samples can be calculated with: The P value corresponds to the DGE test. The false discovery rate (FDR) is a method to determine the threshold of the P value in multiple tests and analyses through manipulating the FDR value. We used "FDR ≤ 0.001 and the absolute value of log 2 Ratio ≥1" as the threshold to judge the significance of gene expression differences. The Nr protein database with an E-value cut-off of 10 −5 was used for a blast search and annotation by BLASTx using Blast2GO software. In gene expression profiling analysis, GO enrichment analysis available at the Gene Ontology Consortium website http://www.geneontology.org was performed using a hypergeometric test to map all DGEs to terms in the GO database with a corrected P-value ≤0.05. For pathway enrichment analysis, we mapped all differentially expressed genes to terms in the KEGG database and identified significantly enriched KEGG terms relevant to extreme temperatures with P-value ≤0.05 and Q-value ≤0.05. We also performed cluster analysis of DGE patterns using software programs Cluster [42] and Java Treeview [43].

Quantitative RT-PCR analysis
To verify DGE analysis results, we designed 12 pairs of primers using Primer Premier 5.0 to perform qPCR analysis targeting 8 up-regulated and 4 down-regulated genes. Beta-tubulin used as the house keeping gene due to its stable expression in NC/HS/CS group in this study. The RNA samples used for the qPCR assay were the same as that used in the DGE experiments with independent RNA extractions from three replicates. The first cDNA strand was synthesized from 1.0 μg of total RNA by a PrimeScript RT reagent Kit (TaKaRa). The qPCR was performed using a cycling program of 95 °C for 30 s, and 40 cycles of 95 °C for 10 s, 55 °C for 10 s and 72 °C for 20 s on an iQ ™ 5 Multicolor real-time PCR detection system (Bio-RAD). With the application of SYBR Premix Ex Taq polymerase (TaKaRa, Ohtsu, Japan) and SYBR-Green detection method was followed, according to manufacturer's instructions. The forward and reverse primers used for qPCR are shown in Table 5. Each reaction was run in triplicate. After amplification the average threshold cycle (C t ) was calculated for each sample. The beta-tubulin gene was used as a reference to normalize expression levels, and the relative expression levels of genes were calculated by the 2 − C t method.