Integrated analyses of copy number variations and gene differential expression in lung squamous-cell carcinoma

Background Although numerous efforts have been made, the pathogenesis underlying lung squamous-cell carcinoma (SCC) remains unclear. This study aimed to identify the CNV-driven genes by an integrated analysis of both the gene differential expression and copy number variation (CNV). Results A higher burden of the CNVs was found in 10–50 kb length. The 16 CNV-driven genes mainly located in chr 1 and chr 3 were enriched in immune response [e.g. complement factor H (CFH) and Fc fragment of IgG, low affinity IIIa, receptor (FCGR3A)], starch and sucrose metabolism [e.g. amylase alpha 2A (AMY2A)]. Furthermore, 38 TFs were screened for the 9 CNV-driven genes and then the regulatory network was constructed, in which the GATA-binding factor 1, 2, and 3 (GATA1, GATA2, GATA3) jointly regulated the expression of TP63. Conclusions The above CNV-driven genes might be potential contributors to the development of lung SCC.


Background
Lung cancer is a leading cause of cancer mortalities worldwide [1] and the main type of lung cancer is nonsmall cell lung cancer (NSCLC), accounting for about 80 % of lung cancers [2]. NSCLC can be further divided into three histologic subtypes: squamous-cell carcinoma (SCC), adenocarcinoma (AC), and large-cell lung carcinoma (LCC) [3]. Of them, lung SCC represents about one third of the NSCLC burden [4]. Although numerous efforts have been made to elucidate the underlying pathogenesis and therapy of this disease over the past few decades, lung SCC is still incurable due to lack of effective therapeutic methods and late diagnosis [5,6]. Therefore, further research in molecular pathology is still needed.
Copy number variants (CNVs) are DNA segments of ≥1 kb in length, which is present in the genome in a variable frequency [7]. Once, it was recognized that their frequency was low and associated with specific chromosome disorders directly. During the 1990s, copy number duplications and deletions were found to cause a quantity of single gene disorders [8]. Changes in DNA copy number, whether confined to specific genes or affecting whole chromosomes, have been identified as major causes of some developmental abnormalities and diseases [9]. Recently, studies related to CNVs and their roles in tumorigenesis have increased markedly [10]. For example, alteration in copy number of chromosomal regions such as 3q26.2-q29, 3p26.3-p11.1, 17p13.3-p11.2 and 9p13.3-p13.2 has been deemed as predictors of lung cancer [11]. Using molecular pathway analysis, copy number alterations in 11 genes are associated with the focal adhesion pathway in SCLC [12]. All these evidence has proven a vital role of CNV in lung cancer.
Expression profile analyses also have been performed to identify a mass of differentially expressed genes (DEGs) between normal and lung tumor samples. Wang et al. have identified 17 genes preferentially expressed in lung SCC, including four novel genes [13]. Takefumie et al. identified 40 DEGs that could separate cases with lymph-node metastasis from those without metastasis in NSCLC [14]. Furthermore, four mRNA expression subtypes including classical, basal, secretory and primitive are identified in lung LCC by gene expression profile analysis [15]. However, only few DEGs are commonly detected in different studies, which may be due to the differences in statistical methods or specimen characteristics. One possible way to increase homogeneity in these findings is an integrated analysis using both DEGs and CNVs [16,17].
The integrated analysis of CNV and gene differential expression has previously been performed for lung AC [17], but not lung SCC. Thus, the goal of this study was to identify the CNV-driven genes in lung SCC by combining the transcriptional profile data contributed by Anders and Huber (GSE17710) [15] and CNVs data deposited in the Cancer Genome Atlas (TCGA). The identified CNV-driven genes may have a key role in elucidating the underlying mechanisms of lung SCC, thus providing a basis for developing new therapies against this disease.

DEGs screening
A total of 428 DEGs were screened out with the cut-off value of |log 2 Ratio| > 1. Among them, 211 ones were upregulated in lung SCC, while 217 ones were down-regulated. To explore their functions, they were subjected to GO-(p value <0.05) and KEGG (p value <0.05) pathway enrichment analyses. Noticeably, genes including matrix metallopeptidase 7 (MMP7), transferrin (TF) and serpin peptidase inhibitor clade A (SERPINA1) were enriched in several GO functional terms and pathways ( Table 1).

Identification of CNV-driven DEGs
It is found that regions of 1-10 kb long had the most copy number deletions (Table 2), while regions of >50 kb long had the most copy number duplications (Table 3). Overall, the ratio of copy number deletions in tumors to those in controls was larger in regions of 10-50 kb long, mostly larger than 6, and the ratio of copy number duplications in tumors to those in controls was lowest in regions of >50 kb long (Fig. 1).
A total of 313 CNV genes related to lung SCC were detected, which were only found in more than 80 % cases, while not in the controls. Then these CNVs were checked for overlap with the DEGs. A total of 163 overlapping CNV genes that were also detected by microarrays were obtained, of which 24 genes displayed significantly different expression. Furthermore, 16 CNVdriven genes were identified with the same expressional tendencies, namely copy number increasing with the expression level (Table 4). Among them, seven (FCGR3A, FCGR2B, AMY2A, AMY2B, CFH, LCE1D, and CFHR3) were located on chr 1, three on chr 3 (TP63, MUC4, and MUC20), and one on chr 4, 5, 6, 7, 16 and 19, respectively (Fig. 2).

GO and pathway analysis of CNV-driven genes
To functionally understand the CNV-driven genes, GO (p value <0.05) and KEGG pathway (p value <0.05) enrichment analyses were also performed. According to GO annotation, the CNV-driven genes were mainly functionally related to the biological process term immune response (such as CFH, FCGR3A), and cellular component term amylase activity (AMY2A and AMY2B), as well as molecular function terms amylase activity and IgG binding. Meanwhile, the CNV-driven genes were significantly enriched in pathways of systemic lupus erythematosus, starch and sucrose metabolism pathway (amylase alpha 2A, AMY2A) ( Table 5).

Discussion
In the present study, using both transcriptional profile data and CNV data, we identified genes with differential expression that may be caused by CNV. DEGs such as MMP7, TF and SERPINA1 might be associated with lung SCC. The up-regulated MMP7 was enriched in several significant GO functional terms such as negative regulation of cellular protein metabolic process, male gonad development and sterol homeostasis in our study. MMP7 belongs to metalloproteinase (MMP) family and plays a role in the breakdown of extracellular matrix [18]. It is shown that the polymorphism in MMP7 promoter increases susceptibility to esophageal SCC [19]. Moreover, down-regulated TF enriched in significant GO functional terms including extracellular space and extracellular region part. It functioned as an iron transporter [20]. The receptor of TF contributes to NSCLCs and it may be an indicator of poorer prognosis in certain groups of patients [21]. Up-regulated SERPINA1 was enriched in several significant GO function terms such as extracellular region and extracellular region part as well. It is a serine protease inhibitor and contributes to chronic obstructive pulmonary disease [22]. There is evidence that SERPINA1 is a biomarker for progression of cutaneous SCC [23]. All these evidences suggest that MMP7, TF and SERPINA1 may play pivotal roles in lung SCC.
CNV analysis presents a higher burden of the CNVs in length of 10-50 kb in lung SCC. A significant increase in CNV burden was observed in most of the individual chromosomes. It is reported that the increased burden of structural variation is a genetic risk factor for cancer [24]. Hence, no wonder the structural variation in length is correlated with lung SCC. Our study provides a hint for the etiology of lung SCC, which is the fragile or disordered genomes, specifically due to the structural variations of copy number in length of 10-50 kb.
Among the 16 overlapping genes, 7 were located on chr 1, and 3 on chr 3, indicating CNV related to lung SCC may mainly occur on chromsomes 1 and 3. According to GO functional analysis and KEGG pathway annotation, these 16 genes may be mainly involved in lung SCC by influencing starch and sucrose metabolism (e.g. AMY2A, AMY2B), and the immune response (e.g. FCGR3A, FCGR2B, CFH, HLA-DQA1). AMY2A and AMY2B encoding amylases that hydrolyze 1,4-α-glucoside bonds  [28]. Meanwhile, a recent study also discovered a role of CD16 signaling receptor in antibody-dependent cancer cell killing [29]. Thus, it may be speculated that alterations in the copy number of these two genes might influence the immune processes, further contributing to lung SCC. However, CNV of the two genes have never been reported in lung SCC. More significantly, CFH at chr 1 is a member of the regulator of complement activation gene cluster. It was found that CFH sensitizes NSCLC cells to complement attack and inhibit the growth of tumor cells [30]. Therefore, the presence of CNV located in chr 1 and chr 3 might be potential contributory factors to the development of lung SCC. TP63 (transformation-related protein 63) encoded by TP63 gene, along with p53 and p73, constitutes the p53 gene family of transcription factors [31]. Noticeably, TAp63 may functionally similar to p53 as it is reported to transactivate multiple p53 downstream targets. However, p53 has only one promoter with three conserved domains, whereas either p63 or p73 has two promoters, thus each having two isotypes: one containing TA domain (TAp63, TAp73), and the other containing no TA domain (ΔNp63 and ΔNp73). Wu et al. investigated the cell functions by introducing TAp63α and ΔNp63α into Saos2 cells using adenovirus expression vectors and subsequently detecting the gene profiles using DNA microarrays, and they have found that p63 can regulate a wide range of various cellular functions, such as cell cycle control, stress, and signal transduction, which are critical events in cancer and development [32]. It thus may be inferred that CNV of TP63 might have a role in lung SCC by altering the expression of its downstream genes, although no copy number variation has been reported in p63 gene so far.

Conclusions
In this study we conducted an integrated analysis of transcriptional profile and CNV of lung SCC, and finally screened 16 CNV-driven genes. The variation in these gene copy number is speculated to have a role in lung SCC occurrence. For example, FCGR3A, FCGR2B and HLA-DQA1 may function via the systemic lupus erythematosus pathway, and AMY2B and AMY2A may participate in lung SCC via starch and sucrose metabolism pathway. Our work provides new insights into the mechanisms underlying lung SCC, and also suggest some new targets for therapy of lung SCC. However, their roles in lung SCC require further experimental validation. The genes AMY2A, CFH, and TP63 However, more in-depth studies are needed in order to verify our findings.

Methods
As the paper did not involve any human or animal study, ethical approval was not required.

Source of gene expression data and CNV data
Transcriptional profile of GSE17710 [33] was downloaded from Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/), which was annotated using the platform of GPL9053 (Agilent-UNCcustom-4X44k). This dataset was collected from both the tumor tissues and adjacent normal tissues of 56 lung SCC patients. CNV data were downloaded from the TCGA database (https://tcga-data.nci.nih.gov/tcga/dataAccess-Matrix.htm?mode=ApplyFilter&showMatrix=true&d iseaseType=LUSC&tumorNormal=TN&tumorNorma l=T&tumorNormal=NT&platformType=1&platform Type=4&platformType=40) in Dec., 2013. Only data of level 3 were accessible and downloaded, which included CNV information (segment mean value) in both the tumor issues and matched adjacent normal tissues (control) from 513 patients with lung SCC using Affymetrix Genome-Wide Human SNP 6.0 array containing 1.8 million SNP and CNV probes. A segment mean value is log2 transformed ratio of the detected copy number in either the tumor or normal tissues to the copy number 2 that is detected in a normal human using hg19 as reference genome. A value larger than one indicates a copy number increase, and a value smaller than one means a copy number depletion.

Preprocessing of transcriptional profile data
The probe-level data of transcriptional profile were first converted into expression measures. DEGs were identified between the tumor tissues and adjacent normal tissues with the cut-off criteria of |log 2 FC (Fold change)| > 1.

Preprocessing of CNV microarray data
First, the distribution of CNVs (copy number deletion or increase) on the 22 chromosomes was investigated in both the cases and controls, and the number of CNV regions of 1-10 kb, 10-50 kb and >50 kb in length on each chromosome was calculated, respectively. Permutation test was performed to calculate the P value of CNVs in either the cases or controls in each chromosome based on 1000 replicates. Next, Circos software was used to display the distribution of on each chromosome in both tumors and controls.

Identification of lung SCC-related CNVs
First, genes located within CNV regions were identified according to the human hg19 reference genome, and its copy vari in an identified gene was also calculated. Next, a gene that is related to lung SCC with CNV was retained only when its CNV was not detected in controls but detected in more than 80 % ceases. The segment mean value and the copy number were denoted as 0 and 1, respectively when missing in a sample. Fig. 1 The ratios of the number of copy number deletions/duplications in cases to that in controls at different lengths

Identification of CNV-driven genes
The association between the copy number difference data and differential expression data was analyzed to screen CNV-driven genes. A CNV-driven gene was defined only when the differential expression trend was consistent with the copy number change, namely an up-regulated gene should also have an increased copy number; vice versa.

Functional annotation and pathway enrichment analysis of CNVs
The functional enrichment analysis of the DEGs and CNV-driven genes was carried out using Database for Annotation, Visualization and Integrated Discovery (DAVID) software based on the gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) pathway databases [34]. P < 0.05 was set as the cut-off.