Identification of NDA proteins
To identify the NDA gene family in cotton, a local blast on the proteome of the G. hirsutum, G. barbadense, G. arboreum, and G. raimondii was performed, and genome with the hidden Markov model (HMM) profile of PF07992 was used as a query condition. Redundant sequences were detected and deleted by manual, 51, 52, 26, 24 genes were identified in G. hirsutum (GhNDAs), G. barbadense (GbNDAs), G. arboreum (GaNDAs), and G. raimondii (GrNDAs), respectively (Fig. 1). Statistical analysis of the number of genes in four Gossypium species showed that the number of NDA genes in two tetraploid cotton species (G. hirsutum and G. barbadense) were almost double of diploid cotton species (G. arboreum and G. raimondii). To understand the genes of NDA family more conveniently, genes were renamed according to the position of genes on chromosomes. GhNDA1-GhNDA51 was assigned for G. hirsutum, GbNDA1-GbNDA52 for G. barbadense, GaNDA1-GaNDA26 for G. arboreum and GrNDA1-GrNDA24 for G. raimondii, other species were also renamed according to this rule (Additional file 2: Table S2). Subsequently, the physical properties of the GhNDAs genes were analyzed and predicted, including protein length, protein molecular weight (MWs), isoelectric point(pI), protein hydrophilicity and hydrophobicity analysis and subcellular location (Additional file 3: Table S3). For G. hirsutum, all of the 51 genes encoded proteins ranging from 357 (GhNDA35) to 2209 (GhNDA48) amino acids, with an average of 603.922 amino acids. The MWs varied from 38.742 (GhNDA35) kDa to 242.628 (GhNDA22) kDa with an average of 66.107 kDa and pI varied from 5.553 (GhNDA36) to 10.024 (GhNDA35) with a mean of 7.909. The values of the grand average of hydropathy were all negative, which proved that the proteins of NDA family were hydrophilic. The prediction of subcellular localization showed that there were 20 genes located in the mitochondria, 17 in cytoplasmic and 14 in chloroplast.
Phylogenetic analysis of NDA genes
To understand the evolutionary relationship of NDA genes among four Gossypium species and other plant species, multiple sequence alignment of 247 protein sequences (including 51 in G. hirsutum, 52 in G. barbadense, 26 in G. arboreum, 24 in G. raimondii, 23 in Arabidopsis thaliana (A. thaliana), 12 in Solanum tuberosum (S. tubersum), 19 in Glycine max (G. max), 11 in Oryza sativa (O. sativa), 8 in Populus trichocarpa (P. trichocarpa), 5 in Theobroma cacao (T. cacao), 9 in Vitisvinifera Genoscope (V. vinifera) and 7 in Zea mays (Z. mays)) was performed and a phylogenetic tree was constructed using MEGA 7 software based on the neighbor-joining (NJ) method. The phylogenetic tree was subsequently decorated using EvolView (https://www.evolgenius.info/evolview) (Fig. 2B). According to previous method of dividing the phylogenetic tree [22], we divided NDA genes into 8 clades based on the sequence similarity, tree topology and structural characteristics in each sequence. The clade a had the most extensive genes, of which 14 were G. hirsutum, 14 were G. barbadense, 7 were G. arboreum, and 6 were G. raimondii, 7 were A. thaliana, 3 were S. tubersum, 2 were O. sativa, 2 were G. max, 1 was Z. mays, 1 was V. vinifera, 1 was T. cacao and 1 was P. trichocarpa, respectively. All species had gene pairs derived from the same node, demonstrating that the NDA genes in all species had experienced gene duplication events that causing the expansion of the NDA gene family. The number of genes in G. barbadense and G. hirsutum was much higher than that of other species, indicating that the NDA gene family in two allotetraploid species showed a large-scale expansion during the evolution process. These results indicated that gene duplication was the main reason for the expansion of the NDA gene family in cotton.
In addition, to study the relationship between the common ancestors of diploid cotton (G. arboreum and G. raimondii) and allotetraploid cotton (G. hirsutum and G. barbadense), NJ tree of four Gossypium species was constructed (Fig. 2A). With reference to the classification of subgroups, we found that the NDA genes of the four Gossypium species were distributed in each subgroup, and each branch contained proteins from diploid and allotetraploid Gossypium species and the NDA genes in tetraploid cotton were almost twice as many as diploid cotton in each subgroup.
Chromosomal location of four Gossypium species
To study the chromosomal distribution of NDA genes in four Gossypium species, the physical location of these genes on chromosomes was drew. 149 out of 153 NDA genes were distributed to their specific chromosomes, while the remaining only 4 NDA genes, GhNDA20, GhNDA23, GhNDA49 and GaNDA26, were not located on any one chromosome because they were located on scaffold (Fig. 3). This result showed that the genetic evolution process of NDA genes was mature and stable. There was no GhNDA gene on Chr8 of G. hirsutum At sub-genome (GhAt), Chr5 of G. hirsutum Dt sub-genome (GhAt), Chr1, 5, 8 of G. arboreum, Chr4 of G. raimondii, Chr5 of G. barbadense (Fig. 3A, B, E, F), which may be related to the chromosome deletion of G. hirsutum or the translocation of large fragments during the evolution. The distribution of NDA genes on 13 chromosomes of different cotton species was uneven, and the number of each chromosome did not show a significant positive correlation with its length.
Among 52 identified NDA genes of G. hirsutum, 3 genes were located on scaffold (Fig. 3A, B). Chr9 in GhAt had most NDA genes, with a total of 4 genes, and Chr9 in GhDt had most NDA genes, with a total of 5 genes (Fig. 4). In G. barbadense, Chr9 in GbAt and GbDt had most NDA genes, with a total of 5 genes, which was similar to the situation of G. hirsutum (Fig. 4). Among 26 identified NDA genes of G. arboreum, 25 genes were distributed on 13 chromosomes while one gene was found at scaffold (Fig. 3E). In G. arboreum, Chr9 had most NDA genes with a total of 5 genes while 5 genes in Chr6 in G. raimondii (Fig. 3E, F).
Gene structure and motif composition analysis
To further examine the structural characteristics of the NDA genes in G. hirsutum, the conserved motifs of NDA proteins were predicted by using the online website MEME (http://meme-suite.org/). 10 conserved motifs were found (named motif 1 to motif 10) and the results were represented in schematic diagrams (Fig. 5B). The number of motifs was different in each protein, varying from 3 to 9. NDA members within the same clade were found to have a similar motif composition, which was conspicuous in class a, b, c, h, f. All the proteins belonged to the h and f subgroups contained motif 8, but the other clades did not had motif 8. Each clade had motif 1, and only clade a had the motif 4, while the other clade did not have, which may be the reason that clade a had obtained the special motif 4 through evolution or other clade had lacked specific conserved motifs during evolution. The different compositions of the motifs may represent the diversity of function. The clade a, b, c, h, f, h showed the similar motif arrangements, indicating that the protein was conserved within a specific subgroup. However, the functions of these conserved motifs remained to be further verified.
The diversity of gene structure was mainly influenced by the evolution of multigene families [23]. To further explore the diversity of NDA gene’s structure, the characteristics of exon-intron structures were analyzed. As can be seen from Fig. 5C, NDA genes of the same subfamilies had similar intron-exon arrangement, and different subfamilies displayed variation in exon-intron structures. The structures of NDA genes can be divided into two types, with intron less and multiple-exon. In G. hirsutum, the number of exons of all NDA genes varied from 1 to 22. Clade b had the least exon number (4), while Clade d had the most exon number (22). It was noting that one gene GhNDA24 belonged to clade h had only one exon, no intron.
Gene duplication and collinearity analysis
In general, plants had a higher rate of gene replication than other eukaryotes. Whole genome duplication, segmental duplication, and tandem duplication were considered to be the main causes of expansion in plant gene family [24]. Segmental duplications in the genome region may result in the expansion of the gene family [25]. To investigate the evolutionary process of NDA genes, the gene duplication pattern of four Gossypium species was performed using MCScanX. In this study, a total of 361 gene pairs were identified as whole genome duplication, 2 gene pairs were the tandem duplication and 102 were the segmental duplication (Fig. 6). Among the 2 tandem repeat gene pairs, one was GaNDA6/7, the other was GrNDA17/18 (Fig. 3E, F). The collinearity gene pairs between G. hirsutum and G. barbadense were the most among the 10 groups, with 80 gene pairs, while G. raimondii and G. raimondii had the least, with only 3 gene pairs, which was in line with the comparison of the number of diploid and tetraploid genes. From these results, we presumed that whole genome duplication and segmental duplication played an important role in the evolution of NDA gene family.
Calculation of non-synonymous (Ka) to synonymous (Ks) substitution rates during evolution
To study the selection pressure of duplicated gene pairs in the evolutionary process, Ka, Ks, and Ka/Ks of 416 homologous gene pairs from 10 combinations of four Gossypium species were determined, including Ga-Ga, Ga-Gb, Ga-Gr, Gb-Gb, Gb-Gr, Gh-Ga, Gh-Gb, Gh-Gh, Gh-Gr and Gr-Gr (Additional file 4: Table S4). Among them, 2 (0.48%) duplicated gene pairs with Ka/Ks ratio > 1, which occurred in Gh-Ga and Gh-Gb, respectively, indicating that these genes may experience relatively rapid evolution and undergo the positive selection pressure. 414 (99.52%) duplicated gene pairs with Ka/Ks ratio < 1, exhibiting pure selection (Fig. 7).
Gene ontology (GO) annotation analysis of GhNDAs
To further study the functions of GhNDAs, three major categories, including biological process, cellular component, and molecular function, were analyzed via CottonFGD (Fig. 8). According to GO analysis, we found that the molecular functions were mainly involved in oxidoreductase activity (51), flavin adenine dinucleotide binding (46) and NADP binding (6). Biological processes were mainly in the following three aspects, oxidation-reduction process (51), cell redox homeostasis (14) and glutathione metabolic process (5). The cellular components were mainly in cytoplasm (16).
Analysis of promoters and expression profiles of GhNDA genes under alkaline stress
Cis-acting elements played vital roles in the response to abiotic stress [26]. Some plant hormones such as ethylene (ET), abscisic acid (ABA), methyl jasmonate (MeJA) and gibberellin (GA) were necessary for plants to adapt to abiotic stress, and some transcription factors could combine with hormone-related cis-acting elements to regulate the expression of genes [27, 28]. To identify putative cis-acting elements in the NDA genes, the region of 2000 bp upstream of the start codon of the NDA genes was selected, and the promoter region related to stress and plant hormones was extracted. In this study, most GhNDA genes in G. hirsutum contained cis-acting elements related to plant hormones (ABA, MeJA, GA,) and various stresses (low temperature, light and wound stress) (Fig. 9B), the cis-acting elements of NDA genes in the same subfamily were different despite had similar motifs, and therefore their functions may differ. Most of NDA genes contained cis-acting elements related to plant hormones and various stresses. GhNDA5 had 11 cis-acting elements (3 were light responsiveness, 1 were abscisic acid responsiveness, 3 were MeJA responsiveness, 3 were wound responsiveness), whereas GhNDA33 and GhNDA11 had only one cis-acting element (light responsiveness). Almost all GhNDA genes were involved in MeJA cis-responsive elements except clade g.
It was found that gene expression levels were highly correlated with gene function [29]. We detected the expression pattern of GhNDA genes under different stresses (AS, SAS) by analyzing RNA-seq data. By analyzing the heat map, we found that only four genes, GhNDA8, GhNDA14, GhNDA32 and GhNDA40, only responded to SAS stress, GhNDA1, GhNDA22, GhNDA26 and GhNDA48 only responded to AS stress, GhNDA11, GhNDA19, GhNDA25, GhNDA35 and GhNDA51 responded to both AS and SAS stress in roots (Fig. 9C). In addition, we found that most of the gene expression levels in leaves were higher than those in roots (Fig. 9C). The expression patterns of NDA genes in roots and leaves were different, which required us to use qRT-PCR to further verify the differences in gene expression.
Expression analysis of GhNDA genes under different alkaline stress
After being exposed to AS and SAS stress, qRT-PCR assays were conducted to explore the expression patterns of GhNDA genes in roots and leaves (Figs. 10, 11). 18 genes were picked up randomly from the NDA genes actively responding to SAS and AS stress for further analysis through qRT-PCR. We found that the GhNDA genes all responded to alkaline stress, although they showed different up-regulation relationships. In addition, GhNDA7, GhNDA8, GhNDA48 showed extremely significant up-regulation under AS stress in roots, GhNDA1, GhNDA22, GhNDA25, GhNDA35, GhNDA40, GhNDA50, GhNDA51 showed extremely significant up-regulation under AS stress in leaves. Almost all genes were responsive to SAS stress in roots except GhNDA1 and GhNDA50. At the same time, gene expression in leaves also showed a similar trend to that in roots, except GhNDA1, GhNDA25, GhNDA32, GhNDA40 and GhNDA50. Therefore, we believed that most GhNDA genes responded to both AS and SAS stress.
Cotton plants with GhNDA gene silenced by VIGS were sensitive to NaHCO3 alkaline stress
To verify whether the GhNDA genes responded to alkaline stress, a high-expressed gene GhD06G1340.1 (GhNDA32) from transcriptome data was selected for further study. First, we tested the tissue specificity of the GhNDA32 gene, and the results showed that the relative expression in roots was the highest, followed by leaves and stems (Fig. 12A). Then we analyzed the relative expression of GhNDA32 gene at different time periods treated with alkaline stress (125 mM NaHCO3) and it showed a trend of first increasing and then decreasing, while reached the peak at 6 h. After that, a VIGS silencing vector pYL156: GhNDA32 was constructed, when the seedlings carrying pYL156: PDS appeared albino phenotype, qRT-PCR was used to detect the expressions of GhNDA32 gene in the pYL156 plants and pYL156: GhNDA32 plants, the results showed that the expression level of GhNDA32 in the pYL156: GhNDA32 was significantly lower than that of the control pYL156 (Fig. 12D), indicating that gene silencing was successful. After treatment with 125 mM NaHCO3, the leaves of the seedlings all appear wilting, and we found that the pYL156: GhNDA32 cotton seedlings wilted more severely than the pYL156 (Fig. 12C). Therefore, GhNDA32 was also involved in the response to alkaline stress.
Interaction network of GhNDA32 protein
According to the homologous gene in Arabidopsis, we used STRING database (https://string-db.org/) to construct an interaction network to analyze the function of GhNDA32 protein (Fig. 13). From the Fig. 9, we could conclude that NDA1 protein of Arabidopsis, which was homologous to GhNDA32 protein of G. hirsutum, could interact with AOX1A, AOX1C, AOX2, PUMP1, and CIB22 proteins. By analyzing KEGG pathway of GhNDA32 protein from the transcriptome data, we found that GhNDA32 protein was mainly involved in oxidative phosphorylation (ko00190), so we speculated that GhNDA32 protein interacted with AOX1A, AOX1C, AOX2, PUMP1 and CIB22 proteins, which could remove excess reducing energy and balance the redox poise of the cell, thereby improving the resistance of cotton seedlings to oxidative stress.