Identification of GhM2H Gene Family Members in Gossypium hirsutum
The protein sequences and nucleic acid sequences of four genes (AK067086, AK065790, AK119413, AK101447) that have been reported to have 2-melatonin hydroxylase activity in rice were used as query sequences for comparison using the software local Blast. We identified the candidate genes of the M2H gene in the Gossypium hirsutum genome, deleted the incomplete genes in the conserved domains 2OG-FeII_Oxy, DIOX_N (PF03171.20, PF14226.6), and renamed these genes according to their positions on the chromosomes GhM2H1-GhM2H265, and then we analyzed and predicted the physical properties of these genes, including ID, isoelectric point, molecular weight, protein length and subcellular location (Additional file 1).
265 M2H genes were identified in the Gossypium hirsutum genome. The protein sequences encoded by these genes range from 212 (GhM2H199) to 418 (GhM2H38) amino acids, with isoelectric points ranging from 4.67 (GhM2H200) to 9.42 (GhM2H81). The MW ranges from 25.14 (GhM2H208) kDa to 47.33 (GhM2H37) kDa. The subcellular location predicts that 217 genes are in the cytoplasm, 1 gene is outside the cell, 44 genes are in the outer membrane, and 29 genes are in the intermembrane.
Phylogenetic analysis of GhM2H
To investigate the evolutionary relationship of plant M2Hs, we compared the amino acid sequences of cotton M2Hs with those of Arabidopsis, Oryza sativa and Theobroma cacao. And a total of 1185 protein sequences (265 from Gossypium hirsutum, 272 from Gossypium barbadense, 169 from Gossypium arboretum, 174 from Gossypium raimondii, 97 from Arabidopsis, 87 from Oryza sativa, 121 from Theobroma cacao) were used to construct a phylogenetic tree (Fig. 1A). The M2H proteins of these four species are distributed in almost every clade. The M2H gene phylogenetic tree of these plants is mainly divided into seven branches, which are randomly distributed. Among them, clade IV has the least members (79), clade I has the most members (253), and clades II, III, V, VI, VII contains 171, 208, 105, 222, and 147 genes respectively. Interestingly, in Arabidopsis, Oryza sativa and Theobroma cacao, the M2H proteins of the four cotton species have corresponding homologous genes in each clade, indicating that the M2H proteins of these plants are closely related to each other. Phylogenetic analysis shows that the amount of M2H in Gossypium hirsutum is more than twice that of Theobroma cacao, Oryza sativa and Arabidopsis thaliana, and it has undergone significant gene family amplification during evolution [31]. In phylogenetic trees, we found that gene pairs of GhM2H and GbM2H were always clustered together, which could be used as evidence of gene duplication. Meanwhile, the M2H protein of tetraploid cotton (Gossypium hirsutum and Gossypium barbadense) and diploid cotton (Gossypium arboreum and Gossypium raimondii) were congealed, confirming that Gossypium hirsutum and Gossypium barbadense were the result of a cross between Gossypium arboreum and Gossypium raimondii.
In order to further study the evolutionary relationship of M2H in Gossypium hirsutum, we constructed an evolutionary tree of the intraspecific M2H gene of Gossypium hirsutum (Fig. 1B). The M2H gene in Gossypium hirsutum was found in all seven clades. M2H genes have been found in the seven clades in Gossypium hirsutum, clade I, VI highest proportion (there are 53 GhM2H), clade II contains 43 GhM2H, clade III contains 44 GhM2H, clade VII contains 35 GhM2H, clade IV contains 20 GhM2H and clade V only 12 GhM2H. These results indicate that clade I is an ancient M2H gene group with the largest number of M2H members in almost all plants. In the phylogenetic tree, two genes are gathered together to form a gene pair, forming a total of 107 gene pairs. In each gene pair, one gene is from the A sub-genomes and the other is from the D sub-genomes.
Chromosomal location of GhM2H members
To study the chromosomal distribution of members of the GhM2H gene family, we mapped the physical location of these genes on cotton chromosomes (Fig. 2). 262 genes are mapped to 26 chromosomes, which are unevenly distributed, and 3 genes (GhM2H223, GhM2H264, GhM2H265) are mapped to scaffold. The number of M2H genes on each chromosome is between 2 and 17. There are only 2 genes on chromosome A03, 3 genes on chromosomes A04 and D06; 18 genes on chromosome A13, 17 genes on chromosome D13, and 15 genes on chromosome A01 and A08. There is one gene (GhM2H223) on Scaffold2336, and two genes (GhM2H264, GhM2H265) on scaffold531.
Analysis of Ka/Ks
In order to explore the impact of Darwin's selection on the differentiation of repetitive genes, an aligned sequence covering more than 80% of the longest gene and sequence similarity of more than 70% was used as the criterion for inferring gene repetition events. According to this gene duplication standard, we used the software TBtools to calculate the Ka/Ks ratio of these gene pairs (Additional file 2). In Gossypium hirsutum, we calculated the Ka/Ks value of 252 pairs of genes. It is generally believed that Ka/Ks = 1 is a neutral selection, Ka/Ks < 1 is a purifying selection, and Ka/Ks > 1 is a positive selection. The Ka/Ks values of these gene pairs are all less than 1. Among these genes, 233 pairs of genes have Ka/Ks values between 0–0.5, and 19 pairs of genes have Ka/Ks values between 0.5–99. We hypothesized that the M2H gene family of cotton underwent strong purification selection after fragment repetition and genome-wide duplication, with limited functional differences.
Analysis gene duplication and collinearity
Replication events play an important role in gene amplification. Replication events include genome-wide replication, fragment replication and tandem replication. Through the homology analysis of the M2H genes of four cotton species (Gossypium hirsutum, Gossypium barbadense, Gossypium arboreum, Gossypium raimondii), the positional relationship of the homologous M2H genes of the four cotton species was visualized (Fig. 3). The adjacent genes on the same chromosome belong to tandem duplication, and the remaining genes from the same genome belong to fragment duplication [32]. The remaining genes from different genomes and sub-genomes belong to genome-wide replication. Gene duplication events are one of the main contributors to evolutionary dynamics, and they have a major impact on genome rearrangement and expansion. We identified homologous gene pairs in GhAt, GhDt, GbAt and GbDt sub-genomes of two tetraploid cotton and A and D genomes of two diploid cotton. Collinearity shows that the At and Dt sub-genomes of the two tetraploid cottons have several gene loci that are highly conserved.
The genes connected by the lines of the same color represent the same gene. In Fig. 3, we can see that many chromosomes in the GhAt/GhDt, GbAt/GbDt sub-genomes and the A and D genomes are connected by the same color line, namely GhAt/GhDt and GbAt/GbDt sub-genomes with M2H homologous genes in the A and D genomes, indicating that these genomes/sub-genomes are related in evolution, and most of the M2H genes are preserved in the evolution of polyploidy. We made a comparison between the genomes and sub-genomes of Ga-ga, Ga-Gb, Ga-Gr, Ga-Gh, Gb-Gb, Gb-Gr, Gb-Gh, Gr-Gr, Gr-Gh, and Gh-Gh, and identified a total of 3397 orthologous/paralogous gene pairs, 150 pairs of repeated genes have tandem duplication, 699 pairs of repeated genes have fragment duplication, and the remaining 2548 pairs of duplicate genes have genome-wide duplication.
Analysis gene structure and motif
To further understand the possible structural evolution of GhM2H, we constructed an association analysis of GhM2H members with phylogenetic tree, gene structure, and motif. We used the protein sequences of GhM2H members to construct a phylogenetic tree, obtained the gene structure of GhM2H members in the whole-genome annotation file of Gossypium hirsutum, submitted the protein sequences of GhM2H members to online tool MEME for conservative motif prediction, and used TBtools for association analysis (Fig. 4).
From the results of gene structure and phylogenetic tree, we can see that similar genes are clustered together in the same group of phylogenetic trees. The number of exons in each gene ranges from 2 to 12. In most cases, the two genes in a gene pair have similar exon–intron knot structure and length. There are also some genetic structures and pairs of genes that differ in length, such as: GhM2H119/GhM2H251, GhM2H108/GhM2H243, GhM2H220/GhM2H87, etc. There are 74 genes without introns, 29 genes have only one intron, and 4 genes have 12 exons. The number of gene exons in different clades is different, but most GHM2H members of the same clade have the same exon–intron structure. We found that the genetic structure of GhM2H gene has a strong relationship with phylogeny on the basis of evolution.
The online tool MEME was used to analyze the protein sequence of GhM2H members to obtain a total of 10 motifs, and proteins with the same motif composition were preferentially clustered together. As shown in Fig. 4, most of the GhM2H members of the same clade, especially the closely related members, often have similar motif composition. All protein motifs are distributed in the GhM2H protein, and the motif of each protein ranges from 6 to 10. In addition, the two genes in most gene pairs have the same motif composition, which means that they are functionally similar at the protein level. We found that most GhM2H proteins contain Motif10, 2, 6, 4, 7, 1, 5, 8. Almost all the GhM2H proteins of clades I, II, and III consist of 10 motifs, while some of the GhM2H proteins in clades IV, V, VII and VI have no motif 9. Interestingly, GhM2H129 only has motif 6, 4, 7, 1, 5, 8, and the motif composition of GhM2H259, which belongs to the same gene pair, has obvious differences, and there are also obvious differences in the structure of the two genes. We speculate that GhM2H129 may have lost some of its functions during evolution.
Analysis of GhM2H promoter
Promoters can interact with transcription factors to control the start time and degree of gene expression. The cis-acting element is located in the promoter region of the gene and can be used as a reference for stress response and tissue specificity in different environments [33]. Therefore, analyzing the GhM2H promoter region is helpful to explore the potential functions of genes. We used the 2000 bp DNA sequence in the upstream region of GhM2H as the promoter, and used the online tool PlantCARE to predict the cis-acting elements (Fig. 5). A large number of cis-acting elements in the promoter region were detected, and selected cis-acting elements related to plant hormones and abiotic stress for further analysis.
For plant hormones, ABA response components, SA response components, gibberellin response components, MeJA response components and auxin response components were selected; and components that respond to abiotic stress include defense and stress responsiveness, wound-responsive components, drought-inducibility components and low-temperature responsive components. More than half of GhM2H members contain ABA-responsive elements and MeJA-responsive elements, and only 18 genes are involved in trauma response. We believed that these genes participate in abiotic stress response together with hormone response. Unexpectedly, five GhM2H (GhM2H 38, GhM2H 67, GhM2H 68, GhM2H 198, GhM2H204) did not find the components we need. It is speculated that they may have lost their corresponding functions during the evolution. Through promoter analysis, we can summarize the response mechanism of genes to different plant hormones and abiotic stress.
Expression profiles of GhM2H genes under different abiotic stress
In order to understand the response mechanism of GhM2H to abiotic stress, we downloaded RNA-seq data (PRJNA248163) from the NCBI database to analyze the expression patterns of these genes under various stress (cold, heat, salt and PEG). A total of 231 genes of FPKM were found in the RNA-seq data, and plotted heat maps based on the expression levels of these genes under cold, heat, salt and PEG stress (Fig. 6). The results showed that nearly half of GhM2H had no obvious differential expression under a variety of abiotic stress, and some genes were strongly induced under multiple stress and had obvious differential expression, such as GhM2H71, GhM2H169, GhM2H238, GhM2H262, GhM2H181, etc. We found that gene expression from the same clade is not similar. Interestingly, some genes are only induced by specific stress. For example, GhM2H149 has high expression under cold stress, but is not strongly induced by other stress; GhM2H155 has almost no expression under cold stress, but occurs under heat stress. The number of GhM2H significantly differentially expressed genes under different stress was calculated. There were 101 GhM2H genes differentially expressed under cold treatment, 120 GhM2H genes differentially expressed under high temperature treatment, 94 and 113 GhM2H genes under PEG treatment and salt stress, respectively. Under different stress conditions, gene expression levels changed, indicating that GhM2H members were involved in the regulation of abiotic stress.
qRT-PCR analysis of GhM2H genes under different abiotic stress
As we know, plant melatonin is widely involved in plant responses to abiotic stresses. In rice, AK067086, AK065790, AK119413 and AK101447 encode proteins with melatonin 2-hydroxylase activity. To investigate the role of GhM2Hs in melatonin involvement in abiotic stress, we performed expression analysis of GhM2H198, GhM2H232, GhM2H252, GhM2H244, GhM2H112, GhM2H121, GhM2H182, which are homologous to rice M2H genes, and GhM2H262, GhM2H27, GhM2H196, GhM2H2h181, GhM2H82, GhM2H71, GhM2H1, GhM2H19, which are from the same family in which GhM2H is present (Fig. 7). Under the three abiotic stresses, most genes were differentially expressed under stress, but not in response to all stresses. Such as GhM2H19, GhM2H71, GhM2H82, GhM2H12, GhM2H182, and GhM2H196 were significantly differentially expressed under all three stresses; GhM2H1, GhM2H121, GhM2H181 in response to high-temperature stress; GhM2H1, GhM2H82 were not responsive to Na2CO3 stress; GhM2H27, GhM2H232 were insensitive to high temperature stress.
Changes in melatonin content and expression of GhM2Hs in cotton under salt stress
To explore whether the changes in melatonin content were associated with the changes in the expression of GhM2Hs under salt stress, we detected the changes in the melatonin content and the expression of GhM2Hs in cotton under salt stress. Salt stress caused severe effects on the growth and development of cotton seedlings (Fig. 8A). When treated with salt stress for 12 h, the cotyledons of cotton seedlings lost luster and wilted slightly, and the leaves became thinner. With increasing salt stress time, cotton seedlings true leaves appeared wilted. At five days of salt stress treatment, seedling cotyledons were completely detached, true leaves wilted, and seedlings died. The level of melatonin in cotton showed a changing trend of: Rise-Fall-rise (Fig. 8B). Both GhM2Hs were induced to undergo differential expression under salt treatment, and most genes showed an up-down trend (Fig. 8C), and GhM2H82, GhM2H196, and GhM2H262 were up-regulated with 12 h salt treatment. In contrast to the changing trend of melatonin level, the homologs of rice M2H genes, GhM2H121, GhM2H182, GhM2H198, GhM2H232, GhM2H252, and GhM2H244, were all up-regulated at 6 h and down regulated at 12 h of salt stress.
Effect of exogenous melatonin on GhM2Hs expression
Exogenous application of melatonin has been reported to improve cotton salt tolerance [34]. To explore the effects of exogenous melatonin on GhM2Hs expression, we examined the relative expression amounts of some genes in cotton GhM2H family members in response to melatonin under salt stress (Fig. 9). Some M2H genes were differentially expressed (GhM2H121, GhM2H244, GhM2H262, GhM2H196 were up-regulated, GhM2H198, GhM2H182, GhM2H232 were down-regulated) were induced by melatonin, and GhM2H19, GhM2H71, GhM2H112, GhM2H181, and GhM2H27 were not induced by melatonin. Under salt stress, GhM2H196, GhM2H82, GhM2H19, GhM2H71, GhM2H112, GhM2H18, GhM2H198 down regulated expressions were induced by melatonin; GhM2H1, GhM2H252, and GhM2H232 were induced to upregulate expression by melatonin, and GhM2H27 did not respond to exogenous melatonin.