- Short report
- Open Access
De novo transcriptome assembly, functional annotation and differential gene expression analysis of juvenile and adult E. fetida, a model oligochaete used in ecotoxicological studies
Biological Researchvolume 50, Article number: 7 (2017)
Earthworms are sensitive to toxic chemicals present in the soil and so are useful indicator organisms for soil health. Eisenia fetida are commonly used in ecotoxicological studies; therefore the assembly of a baseline transcriptome is important for subsequent analyses exploring the impact of toxin exposure on genome wide gene expression.
This paper reports on the de novo transcriptome assembly of E. fetida using Trinity, a freely available software tool. Trinotate was used to carry out functional annotation of the Trinity generated transcriptome file and the transdecoder generated peptide sequence file along with BLASTX, BLASTP and HMMER searches and were loaded into a Sqlite3 database. To identify differentially expressed transcripts; each of the original sequence files were aligned to the de novo assembled transcriptome using Bowtie and then RSEM was used to estimate expression values based on the alignment. EdgeR was used to calculate differential expression between the two conditions, with an FDR corrected P value cut off of 0.001, this returned six significantly differentially expressed genes. Initial BLASTX hits of these putative genes included hits with annelid ferritin and lysozyme proteins, as well as fungal NADH cytochrome b5 reductase and senescence associated proteins. At a cut off of P = 0.01 there were a further 26 differentially expressed genes.
These data have been made publicly available, and to our knowledge represent the most comprehensive available transcriptome for E. fetida assembled from RNA sequencing data. This provides important groundwork for subsequent ecotoxicogenomic studies exploring the impact of the environment on global gene expression in E. fetida and other earthworm species.
Earthworms are exposed to environmental contaminants through their intestine, skin, and alimentary and dermal uptake routes and are widely used as a model organism for assessing impact of environmental chemicals and toxins [1,2,3]. Earthworms are sensitive to soil pollution that arises from intensive use of biocides in agriculture, industrial activity and atmospheric deposition  and so are useful bio-indicator organisms (Sivakumar). Eisenia fetida is a commonly used test species in ecotoxicological studies . Molecular genetic data are now available on the response of worms to toxin exposure, particularly focusing on the antioxidant and stress protein response to pollutants [6,7,8]. In order to establish the impact of exogenous toxins on gene expression at the genome level it is first important to assemble a baseline transcriptome for the model organism. Gong et al. produced transcriptome wide oligonucleotide probes for E. fetida in their  report, comprising 63541 ESTs, but a complete annotated transcriptome for E. fetida has not been available thus far for this organism. Many environmental toxins have putative roles in impeding reproduction , therefore identifying differentially expressed genes between adult and juvenile E. fetida can be useful to identify possible genes associated with sexual maturity and reproduction that may be more susceptible to such toxins.
Worms were established in a composting wormery from an initial population of 150 E. fetida. A box of worms was bought from Bunnings (Lyall Bay. Wellington, NZ) and grown in a composting wormery supplemented with fruit and vegetable waste for 6 months. Adult worms were selected on the basis of well-developed clitella, present only at sexual maturity; juveniles were selected on the basis of the absence of well-developed clitella. Adult worms were dissected to enhance samples for the reproductive organs including gonadal structures. For each extraction four gonadal segment samples were pooled to maximize RNA content. RNA was extracted from pooled juvenile and fully clitellated adult samples respectively and then sequenced using Illumina HiSeq 2500, sequencing was carried out by New Zealand Genomics Ltd (http://www.nzgenomics.co.nz/). Prior to RNA extraction, worms were removed, washed with RNAse, DNAse free water and then used for total RNA extraction. The total RNA was isolated using the QIAGEN Mini kit and by following the manufacturers protocol. Extra vigilance was taken to minimise degradation of RNA; RNAse zap was used to clean all dissection tools and equipment used in the RNA extraction process. Trimmomatic was used to quality trim reads using the default settings and Trinity v2.1.1, and Edger 3.3 were used for de novo transcriptome assembly,including, identification of candidate coding regions, and differential gene expression analysis as outlined by Haas et al. . Transdecoder v2.01, Trinotate v2.0.2, BLASTX v2.3.1, BLASTP v2.3.0+ and HMMER 3.0 searches were used for functional annotation of the transcriptome produced and to populate an Sqlite database. All bioinformatic analyses were carried out via VPN connection to a remote high speed computer (CPU 16) on the BioIT, NZGL network via a standard PC laptop.
De novo transcriptome assembly
Trinity is a freely available software tool used for de novo transcriptome assembly that uses three software modules (Inchworm, Chrysalis and Butterfly) to assemble a de novo transcriptome when no model data is available , it does this by first assembling the RNA sequence data into transcripts or ‘genes’, clustering these contigs, constructing de Bruijn graphs and then processing the graphs to report full length transcripts for all alternatively spliced isoforms. The resulting transcriptome file was generated in fasta (.fq) format and was 32 MB in size comprising 74,381 ‘genes’, 90,862 ‘transcripts’ and with a GC percentage of 41.74. The N50 was 339 and 329 for transcripts and genes, respectively, which is the maximum length where at least 50% total assembled sequence resides in contigs of at least that length; further details on the transcriptome are found in Table 1. At a stringency of a minimum 5 TPM there were 21282 estimated ‘genes’.
Transdecoder 2.0.1 (http://transdecoder.sf.net) was used to identify candidate coding regions within the generated transcriptome and looked for an open reading frame of at least 100 amino acids long. This generated four output files (.pep peptide sequence for final candidate ORFs;.cds nucleotide sequence for coding regions;.gff3 position in target transcript of final ORFs and.bed format describing ORF that can be used in IGV). A BLASTP v2.3.0+ search against known proteins and HMMER 3.0 search to identify common protein domains were also carried out. Trinotate was used to carry out functional annotation of the transcriptome using the Trinity generated transcriptome file and the transdecoder generated peptide sequence file for final candidate ORFs, along with BLASTX v2.3.1, BLASTP v2.3.0+ and HMMER 3.0 searches and were loaded into an Sqlite database (see Additional file 1: Table S1). Figure 1 summarises the annotation data based on the classification by PFAM Gene Ontology (GO), the figure shows frequency of genes grouped on the basis of the GO classification that have a frequency in the annotated transcriptome of at least 15, the most frequent classification of genes were in the protein and metal binding category and cellular component membrane. Interestingly there were also over 30 hits with methyltransferases indicative that a potential DNA methylation mechanism could exist in the earthworm .
Differential gene expression analysis
To identify differentially expressed transcripts; RSEM was first used to estimate expression values. Each of the six original.fasta raw sequence files (three adult and three juvenile pooled paired samples) were aligned to the de novo assembled Trinity.fasta transcript using Bowtie and RSEM used to estimate expression values based on the alignment. EdgeR was then used to calculate differential expression between the two conditions; with a P value cut off of 0.001, this returned six significantly differentially expressed genes, (most significant FDR) (Table 2). Initial BLASTX hits (Table 2) included hits with annelid ferritin and lysozyme proteins, as well as fungal NADH cytochrome b5 reductase and senescence associated proteins. At a cut off of P = 0.01 there were a further 26 differentially expressed genes.
Differential gene expression analysis was carried out on the de novo generated transcriptome to compare adult and juvenile samples. At a threshold of significance of P = 0.001 there were six significantly differentially expressed genes identified between adult and juvenile, potentially highlighting genes associated with sexual maturity for follow up in subsequent analyses with this and other earthworm species. Such targets may be important for investigating the impact of toxins on reproductive capability. The annotation of the transcriptome was dominated by protein and metal ion binding proteins, as expected, but there was also a suggestion of potential epigenetic mechanisms for gene control existing in E. fetida with methyltransferase function also occurring with relatively high frequency in the annotated transcript. This supports of the findings of Santoyo et al.  in that earthworms may be useful in experiments investigating the impact of terrestrial toxins on DNA methylation. From the perspective of genes involved specifically in reproduction, the annotation identified multiple sperm, spermatogenesis, testis, ova and other and reproductive associated proteins that could be potentially good targets for further studies exploring the impact of toxin on reproductive capability. Subsequent studies investigating a potential epigenome in E. fetida could also provide important data in terms of further understanding the relationship between environmental toxins and gene expression. Future work will involve using both transcriptome and epigenome analysis to explore the impact of the environment on gene expression and health in E. fetida and other earthworm species.
This report illustrates the scope of what can be done with limited resources using Trinity and Trinotate freeware to assemble and annotate a RNA sequenced derived transcriptome de novo. The data generated have been made publicly available. To our knowledge this is the most comprehensive transcriptome for E. fetida and will provide an important baseline for further ecotoxicogenomic studies focusing on the impact of environmental toxins on global gene expression in E. fetida and other earthworm species.
expressed sequence tags
open reading frame
Lévêque T, Capowiez Y, Schreck E, Mombo S, Mazzia C, Foucault Y, Dumat C. Effects of historic metal (loid) pollution on earthworm communities. Sci Total Environ. 2015;1(511):738–46.
Sivakumar S. Effects of metals on earthworm life cycles: a review. Environ Monit Assess. 2015;187(8):530.
Calisi A, Lionetto MG, Schettino T. Biomarker response in the earthworm Lumbricus terrestris exposed to chemical pollutants. Sci Total Environ. 2011;409(20):4456–64.
Schnug L, Ergon T, Jakob L, Scott-Fordsmand JJ, Joner EJ, Leinaas HP. Responses of earthworms to repeated exposure to three biocides applied singly and as a mixture in an agricultural field. Sci Total Environ. 2015;1(505):223–35.
Sanchez-Hernandez JC. Earthworm biomarkers in ecological risk assessment. Rev Environ Contam Toxicol. 2006;188:85–126.
Santoyo MM, Flores CR, Torres AL, Wrobel K, Wrobel K. Global DNA methylation in earthworms: a candidate biomarker of epigenetic risks related to the presence of metals/metalloids in terrestrial environments. Environ Pollut. 2011;159(10):2387–92.
Mo X, Qiao Y, Sun Z, Sun X, Li Y. Molecular toxicity of earthworms induced by cadmium contaminated soil and biomarkers screening. J Environ Sci. 2012;24(8):1504–10.
Fujino Y, Nagahama T, Oumi T, Ukena K, Morishita F, Furukawa Y, Matsushima O, Ando M, Takahama H, Satake H, Minakata H, Nomoto K. Possible functions of oxytocin/vasopressin-superfamily peptides in annelids with special reference to reproduction and osmoregulation. J Exp Zool. 1999;284(4):401–6.
Gong P, Pirooznia M, Guan X, Perkins EJ. Design, validation and annotation of transcriptome-wide oligonucleotide probes for the oligochaete annelid Eisenia fetida. PLoS ONE. 2010;5(12):e14266.
Rajender S, Avery K, Agarwal A. Epigenetics, spermatogenesis and male infertility. Mutat Res. 2011;727(3):62–71.
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, Macmanes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, Leduc RD, Friedman N, Regev A. De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat Protoc. 2013;8(8):1494–512.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis Xian, Fan Lin, Raychowdhury Raktima, Zeng Qiandong, Chen Zehua, Mauceli Evan, Hacohen Nir, Gnirke Andreas, Rhind Nicholas, di Palma Federica, Birren Bruce W, Nusbaum Chad, Lindblad-Toh Kerstin, Friedman Nir, Regev A. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat Biotechnol. 2011;29(7):644–52.
Zhong X. Comparative epigenomics: a powerful tool to understand the evolution of DNA methylation. New Phytol. 2016;210(1):76–80.
MT conceived the study, carried out the RNA extraction, bioinformatic analyses and drafted the manuscript. JC and YL participated in the design of the study and helped to draft the manuscript. All authors read and approved the final manuscript.
The sequencing for this work was carried out by NZGL, made possible, in part by a HiSeq sequencing subsidy, part of the Illumina Pilot the Possibilities Programme by Illumina Australia and NZ. Massey University Research Fund supported the author for the manuscript preparation. Funding bodies had no role, in design, in the collection, analysis, and interpretation of data; in the writing of the manuscript; and in the decision to submit the manuscript for publication. The author would also like to acknowledge the fantastic support from NZGL, specifically assistance provided by Shane Sturrock.
The authors declare that they have no competing interests.
Availability of data set
Full sequence read data were submitted to the NCBI Sequence Read Archive under, Bioproject number PRJNA304461 (http://www.ncbi.nlm.nih.gov/bioproject/PRJNA304461). The assembled transcriptome was submitted to the Transcriptome Shotgun Assembly Sequence database (http://www.ncbi.nlm.nih.gov/genbank/tsa) the biosample id is SAMN04334593. The transcriptome annotation data is available as Additional file 1: Table S1.