- Research article
- Open Access
Integrated analyses of copy number variations and gene differential expression in lung squamous-cell carcinoma
Biological Researchvolume 48, Article number: 47 (2015)
Although numerous efforts have been made, the pathogenesis underlying lung squamous-cell carcinoma (SCC) remains unclear. This study aimed to identify the CNV-driven genes by an integrated analysis of both the gene differential expression and copy number variation (CNV).
A higher burden of the CNVs was found in 10–50 kb length. The 16 CNV-driven genes mainly located in chr 1 and chr 3 were enriched in immune response [e.g. complement factor H (CFH) and Fc fragment of IgG, low affinity IIIa, receptor (FCGR3A)], starch and sucrose metabolism [e.g. amylase alpha 2A (AMY2A)]. Furthermore, 38 TFs were screened for the 9 CNV-driven genes and then the regulatory network was constructed, in which the GATA-binding factor 1, 2, and 3 (GATA1, GATA2, GATA3) jointly regulated the expression of TP63.
The above CNV-driven genes might be potential contributors to the development of lung SCC.
Lung cancer is a leading cause of cancer mortalities worldwide  and the main type of lung cancer is non-small cell lung cancer (NSCLC), accounting for about 80 % of lung cancers . NSCLC can be further divided into three histologic subtypes: squamous-cell carcinoma (SCC), adenocarcinoma (AC), and large-cell lung carcinoma (LCC) . Of them, lung SCC represents about one third of the NSCLC burden . Although numerous efforts have been made to elucidate the underlying pathogenesis and therapy of this disease over the past few decades, lung SCC is still incurable due to lack of effective therapeutic methods and late diagnosis [5, 6]. Therefore, further research in molecular pathology is still needed.
Copy number variants (CNVs) are DNA segments of ≥1 kb in length, which is present in the genome in a variable frequency . Once, it was recognized that their frequency was low and associated with specific chromosome disorders directly. During the 1990s, copy number duplications and deletions were found to cause a quantity of single gene disorders . Changes in DNA copy number, whether confined to specific genes or affecting whole chromosomes, have been identified as major causes of some developmental abnormalities and diseases . Recently, studies related to CNVs and their roles in tumorigenesis have increased markedly . For example, alteration in copy number of chromosomal regions such as 3q26.2-q29, 3p26.3-p11.1, 17p13.3-p11.2 and 9p13.3-p13.2 has been deemed as predictors of lung cancer . Using molecular pathway analysis, copy number alterations in 11 genes are associated with the focal adhesion pathway in SCLC . All these evidence has proven a vital role of CNV in lung cancer.
Expression profile analyses also have been performed to identify a mass of differentially expressed genes (DEGs) between normal and lung tumor samples. Wang et al. have identified 17 genes preferentially expressed in lung SCC, including four novel genes . Takefumie et al. identified 40 DEGs that could separate cases with lymph-node metastasis from those without metastasis in NSCLC . Furthermore, four mRNA expression subtypes including classical, basal, secretory and primitive are identified in lung LCC by gene expression profile analysis . However, only few DEGs are commonly detected in different studies, which may be due to the differences in statistical methods or specimen characteristics. One possible way to increase homogeneity in these findings is an integrated analysis using both DEGs and CNVs [16, 17].
The integrated analysis of CNV and gene differential expression has previously been performed for lung AC , but not lung SCC. Thus, the goal of this study was to identify the CNV-driven genes in lung SCC by combining the transcriptional profile data contributed by Anders and Huber (GSE17710)  and CNVs data deposited in the Cancer Genome Atlas (TCGA). The identified CNV-driven genes may have a key role in elucidating the underlying mechanisms of lung SCC, thus providing a basis for developing new therapies against this disease.
A total of 428 DEGs were screened out with the cut-off value of |log2Ratio| > 1. Among them, 211 ones were up-regulated in lung SCC, while 217 ones were down-regulated. To explore their functions, they were subjected to GO— (p value <0.05) and KEGG (p value <0.05) pathway enrichment analyses. Noticeably, genes including matrix metallopeptidase 7 (MMP7), transferrin (TF) and serpin peptidase inhibitor clade A (SERPINA1) were enriched in several GO functional terms and pathways (Table 1).
Identification of CNV-driven DEGs
It is found that regions of 1–10 kb long had the most copy number deletions (Table 2), while regions of >50 kb long had the most copy number duplications (Table 3). Overall, the ratio of copy number deletions in tumors to those in controls was larger in regions of 10–50 kb long, mostly larger than 6, and the ratio of copy number duplications in tumors to those in controls was lowest in regions of >50 kb long (Fig. 1).
A total of 313 CNV genes related to lung SCC were detected, which were only found in more than 80 % cases, while not in the controls. Then these CNVs were checked for overlap with the DEGs. A total of 163 overlapping CNV genes that were also detected by microarrays were obtained, of which 24 genes displayed significantly different expression. Furthermore, 16 CNV-driven genes were identified with the same expressional tendencies, namely copy number increasing with the expression level (Table 4). Among them, seven (FCGR3A, FCGR2B, AMY2A, AMY2B, CFH, LCE1D, and CFHR3) were located on chr 1, three on chr 3 (TP63, MUC4, and MUC20), and one on chr 4, 5, 6, 7, 16 and 19, respectively (Fig. 2).
GO and pathway analysis of CNV-driven genes
To functionally understand the CNV-driven genes, GO (p value <0.05) and KEGG pathway (p value <0.05) enrichment analyses were also performed. According to GO annotation, the CNV-driven genes were mainly functionally related to the biological process term immune response (such as CFH, FCGR3A), and cellular component term amylase activity (AMY2A and AMY2B), as well as molecular function terms amylase activity and IgG binding. Meanwhile, the CNV-driven genes were significantly enriched in pathways of systemic lupus erythematosus, starch and sucrose metabolism pathway (amylase alpha 2A, AMY2A) (Table 5).
In the present study, using both transcriptional profile data and CNV data, we identified genes with differential expression that may be caused by CNV. DEGs such as MMP7, TF and SERPINA1 might be associated with lung SCC. The up-regulated MMP7 was enriched in several significant GO functional terms such as negative regulation of cellular protein metabolic process, male gonad development and sterol homeostasis in our study. MMP7 belongs to metalloproteinase (MMP) family and plays a role in the breakdown of extracellular matrix . It is shown that the polymorphism in MMP7 promoter increases susceptibility to esophageal SCC . Moreover, down-regulated TF enriched in significant GO functional terms including extracellular space and extracellular region part. It functioned as an iron transporter . The receptor of TF contributes to NSCLCs and it may be an indicator of poorer prognosis in certain groups of patients . Up-regulated SERPINA1 was enriched in several significant GO function terms such as extracellular region and extracellular region part as well. It is a serine protease inhibitor and contributes to chronic obstructive pulmonary disease . There is evidence that SERPINA1 is a biomarker for progression of cutaneous SCC . All these evidences suggest that MMP7, TF and SERPINA1 may play pivotal roles in lung SCC.
CNV analysis presents a higher burden of the CNVs in length of 10–50 kb in lung SCC. A significant increase in CNV burden was observed in most of the individual chromosomes. It is reported that the increased burden of structural variation is a genetic risk factor for cancer . Hence, no wonder the structural variation in length is correlated with lung SCC. Our study provides a hint for the etiology of lung SCC, which is the fragile or disordered genomes, specifically due to the structural variations of copy number in length of 10–50 kb.
Among the 16 overlapping genes, 7 were located on chr 1, and 3 on chr 3, indicating CNV related to lung SCC may mainly occur on chromsomes 1 and 3. According to GO functional analysis and KEGG pathway annotation, these 16 genes may be mainly involved in lung SCC by influencing starch and sucrose metabolism (e.g. AMY2A, AMY2B), and the immune response (e.g. FCGR3A, FCGR2B, CFH, HLA-DQA1). AMY2A and AMY2B encoding amylases that hydrolyze 1,4-α-glucoside bonds in oligosaccharides and polysaccharides, thus is necessary for the digestion of dietary starch and glycogen . Kang et al. have reported that AMY2A is a possible tumor-suppressor gene of 1p21.1 in gastric carcinoma . Furthermore, Xi et al. have detected CNV of AMY2A using the Bayesian information criterion . Thus, it may be inferred alteration in AMY2B and AMY2A copy numbers may have a role in lung SCC occurrence via the starch and sucrose metabolism pathway. FCGR3A encoding a cell surface molecule CD16a, a member CD16 gene family, which is much similar to FCGR3B on chromosome 1. Zhou et al. have reported that it is CNV of FCGR3A other than FCGR3B and FCGR2B that is involved in anti-GBM disease . Meanwhile, a recent study also discovered a role of CD16 signaling receptor in antibody-dependent cancer cell killing . Thus, it may be speculated that alterations in the copy number of these two genes might influence the immune processes, further contributing to lung SCC. However, CNV of the two genes have never been reported in lung SCC. More significantly, CFH at chr 1 is a member of the regulator of complement activation gene cluster. It was found that CFH sensitizes NSCLC cells to complement attack and inhibit the growth of tumor cells . Therefore, the presence of CNV located in chr 1 and chr 3 might be potential contributory factors to the development of lung SCC.
TP63 (transformation-related protein 63) encoded by TP63 gene, along with p53 and p73, constitutes the p53 gene family of transcription factors . Noticeably, TAp63 may functionally similar to p53 as it is reported to transactivate multiple p53 downstream targets. However, p53 has only one promoter with three conserved domains, whereas either p63 or p73 has two promoters, thus each having two isotypes: one containing TA domain (TAp63, TAp73), and the other containing no TA domain (ΔNp63 and ΔNp73). Wu et al. investigated the cell functions by introducing TAp63α and ΔNp63α into Saos2 cells using adenovirus expression vectors and subsequently detecting the gene profiles using DNA microarrays, and they have found that p63 can regulate a wide range of various cellular functions, such as cell cycle control, stress, and signal transduction, which are critical events in cancer and development . It thus may be inferred that CNV of TP63 might have a role in lung SCC by altering the expression of its downstream genes, although no copy number variation has been reported in p63 gene so far.
In this study we conducted an integrated analysis of transcriptional profile and CNV of lung SCC, and finally screened 16 CNV-driven genes. The variation in these gene copy number is speculated to have a role in lung SCC occurrence. For example, FCGR3A, FCGR2B and HLA-DQA1 may function via the systemic lupus erythematosus pathway, and AMY2B and AMY2A may participate in lung SCC via starch and sucrose metabolism pathway. Our work provides new insights into the mechanisms underlying lung SCC, and also suggest some new targets for therapy of lung SCC. However, their roles in lung SCC require further experimental validation. The genes AMY2A, CFH, and TP63 However, more in-depth studies are needed in order to verify our findings.
As the paper did not involve any human or animal study, ethical approval was not required.
Source of gene expression data and CNV data
Transcriptional profile of GSE17710  was downloaded from Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/), which was annotated using the platform of GPL9053 (Agilent-UNC-custom-4X44k). This dataset was collected from both the tumor tissues and adjacent normal tissues of 56 lung SCC patients. CNV data were downloaded from the TCGA database (https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm?mode=ApplyFilter&showMatrix=true&diseaseType=LUSC&tumorNormal=TN&tumorNormal=T&tumorNormal=NT&platformType=1&platformType=4&platformType=40) in Dec., 2013. Only data of level 3 were accessible and downloaded, which included CNV information (segment mean value) in both the tumor issues and matched adjacent normal tissues (control) from 513 patients with lung SCC using Affymetrix Genome-Wide Human SNP 6.0 array containing 1.8 million SNP and CNV probes. A segment mean value is log2 transformed ratio of the detected copy number in either the tumor or normal tissues to the copy number 2 that is detected in a normal human using hg19 as reference genome. A value larger than one indicates a copy number increase, and a value smaller than one means a copy number depletion.
Preprocessing of transcriptional profile data
The probe-level data of transcriptional profile were first converted into expression measures. DEGs were identified between the tumor tissues and adjacent normal tissues with the cut-off criteria of |log2FC (Fold change)| > 1.
Preprocessing of CNV microarray data
First, the distribution of CNVs (copy number deletion or increase) on the 22 chromosomes was investigated in both the cases and controls, and the number of CNV regions of 1–10 kb, 10–50 kb and >50 kb in length on each chromosome was calculated, respectively. Permutation test was performed to calculate the P value of CNVs in either the cases or controls in each chromosome based on 1000 replicates. Next, Circos software was used to display the distribution of on each chromosome in both tumors and controls.
Identification of lung SCC-related CNVs
First, genes located within CNV regions were identified according to the human hg19 reference genome, and its copy vari in an identified gene was also calculated. Next, a gene that is related to lung SCC with CNV was retained only when its CNV was not detected in controls but detected in more than 80 % ceases. The segment mean value and the copy number were denoted as 0 and 1, respectively when missing in a sample.
Identification of CNV-driven genes
The association between the copy number difference data and differential expression data was analyzed to screen CNV-driven genes. A CNV-driven gene was defined only when the differential expression trend was consistent with the copy number change, namely an up-regulated gene should also have an increased copy number; vice versa.
Functional annotation and pathway enrichment analysis of CNVs
The functional enrichment analysis of the DEGs and CNV-driven genes was carried out using Database for Annotation, Visualization and Integrated Discovery (DAVID) software based on the gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) pathway databases . P < 0.05 was set as the cut-off.
Bray F, Jemal A, Grey N, Ferlay J, Forman D. Global cancer transitions according to the Human Development Index (2008–2030): a population-based study. Lancet Oncol. 2012;13(8):790–801.
Peters S, Adjei A, Gridelli C, Reck M, Kerr K, Felip E. Metastatic non-small-cell lung cancer (NSCLC): ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2012;23(suppl 7):vii56–64.
Reinmuth N, Brandt B, Semik M, Kunze W-P, Achatzy R, Scheld HH, et al. Prognostic impact of Cyfra21-1 and other serum markers in completely resected non-small cell lung cancer. Lung Cancer. 2002;36(3):265–70.
Cumberbatch M, Tang X, Beran G, Eckersley S, Wang X, Ellston RP, et al. Identification of a subset of human non-small cell lung cancer patients with high PI3Kβ and low PTEN expression, more prevalent in squamous cell carcinoma. Clin Cancer Res. 2014;20(3):595–603.
Perez-Moreno P, Brambilla E, Thomas R, Soria J-C. Squamous cell carcinoma of the lung: molecular subtypes and therapeutic opportunities. Clin Cancer Res. 2012;18(9):2443–51.
Drilon A, Rekhtman N, Ladanyi M, Paik P. Squamous-cell carcinomas of the lung: emerging biology, controversies, and the promise of targeted therapy. Lancet Oncol. 2012;13(10):e418–26.
Kirov G, Rees E, Walters JT, Escott-Price V, Georgieva L, Richards AL, et al. The penetrance of copy number variations for schizophrenia and developmental delay. Biol Psychiatry. 2014;75(5):378–85.
Riccardi VM, Lupski JR. Duplications, deletions, and single-nucleotide variations: the complexity of genetic arithmetic. Genet Med. 2012;15(3):172–3.
Cooper GM, Coe BP, Girirajan S, Rosenfeld JA, Vu TH, Baker C, et al. A copy number variation morbidity map of developmental delay. Nat Genet. 2011;43(9):838–46.
Greenman CD, Bignell G, Butler A, Edkins S, Hinton J, Beare D, et al. PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data. Biostatistics. 2010;11(1):164–75.
van Boerdonk RA, Sutedja TG, Snijders PJ, Reinen E, Wilting SM, van de Wiel MA, et al. DNA copy number alterations in endobronchial squamous metaplastic lesions predict lung cancer. Am J Respir Crit Care Med. 2011;184(8):948–56.
Ocak S, Yamashita H, Udyavar A, Miller A, Gonzalez A, Zou Y, et al. DNA copy number aberrations in small-cell lung cancer reveal activation of the focal adhesion pathway. Oncogene. 2010;29(48):6331–42.
Wang T, Hopkins D, Schmidt C, Silva S, Houghton R, Takita H, et al. Identification of genes differentially over-expressed in lung squamous cell carcinoma using combination of cDNA subtraction and microarray analysis. Oncogene. 2000;19(12):1519–28.
Kikuchi T, Daigo Y, Katagiri T, Tsunoda T, Okada K, Kakiuchi S, et al. Expression profiles of non-small cell lung cancers on cDNA microarrays: identification of genes for prediction of lymph-node metastasis and sensitivity to anti-cancer drugs. Oncogene. 2003;22(14):2192–205.
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.
Fan B, Dachrut S, Coral H, Yuen ST, Chu KM, Law S, et al. Integration of DNA copy number alterations and transcriptional expression analysis in human gastric cancer. PLoS One. 2012;7(4):e29824.
Lu T-P, Lai L-C, Tsai M-H, Chen P-C, Hsu C-P, Lee J-M, et al. Integrated analyses of copy number variations and gene expression in lung adenocarcinoma. PLoS One. 2011;6(9):e24829.
Quan T, Little E, Quan H, Voorhees JJ, Fisher GJ. Elevated matrix metalloproteinases and collagen fragmentation in photodamaged human skin: impact of altered extracellular matrix microenvironment on dermal fibroblast function. J Invest Dermatol. 2013;133(5):1362.
Zhang J, Jin X, Fang S, Wang R, Li Y, Wang N, et al. The functional polymorphism in the matrix metalloproteinase-7 promoter increases susceptibility to esophageal squamous cell carcinoma, gastric cardiac adenocarcinoma and non-small cell lung carcinoma. Carcinogenesis. 2005;26(10):1748–53.
Canonne-Hergaux F, Gros P. Expression of the iron transporter DMT1 in kidney from normal and anemic mk mice. Kidney Int. 2002;62(1):147–56.
Whitney JF, Clark JM, Griffin TW, Gautam S, Leslie KO. Transferrin receptor expression in nonsmall cell lung cancer. Histopathologic and clinical correlates. Cancer. 1995;76(1):20–5.
Chappell S, Daly L, Morgan K, Guetta Baranes T, Roca J, Rabinovich R, et al. Cryptic haplotypes of SERPINA1 confer susceptibility to chronic obstructive pulmonary disease. Hum Mutat. 2006;27(1):103–9.
Farshchian M, Kivisaari A, Ala-aho R, Riihilä P, Kallajoki M, Grénman R, et al. Serpin peptidase inhibitor clade A member 1 (SerpinA1) is a novel biomarker for progression of cutaneous squamous cell carcinoma. Am J Pathol. 2011;179(3):1110–9.
Yang R, Chen B, Pfütze K, Buch S, Steinke V, Holinski-Feder E, et al. Genome-wide analysis associates familial colorectal cancer with increases in copy number variations and a rare structural variation at 12p12. 3. Carcinogenesis. 2014;35(2):315–23.
Sawai T, Hehre EJ. A novel amylase (Candida transglucosyl-amylase) that catalyzes glucosyl transfer from starch and dextrins. J Biol Chem. 1962;237(7):2047–52.
Kang JU, Koo SH, Kwon KC, Park JW. AMY2A: a possible tumor-suppressor gene of 1p21. 1 loss in gastric carcinoma. Int J Oncol. 2010;36(6):1429.
Xi R, Hadjipanayis AG, Luquette LJ, Kim T-M, Lee E, Zhang J, et al. Copy number variation detection in whole-genome sequencing data using the Bayesian information criterion. Proc Natl Acad Sci. 2011;108(46):E1128–36.
X-j Zhou, Lv J-c Bu, D-f YuL, Y-r Yang, Zhao J, et al. Copy number variation of FCGR3A rather than FCGR3B and FCGR2B is associated with susceptibility to anti-GBM disease. Int Immunol. 2010;22(1):45–51.
Kudo K, Imai C, Lorenzini P, Kamiya T, Kono K, Davidoff AM, et al. T lymphocytes expressing a CD16 signaling receptor exert antibody-dependent cancer cell killing. Cancer Res. 2014;74(1):93–103.
Ajona D, Hsu Y-F, Corrales L, Montuenga LM, Pio R. Down-regulation of human complement factor H sensitizes non-small cell lung cancer cells to complement attack and reduces in vivo tumor growth. J Immunol. 2007;178(9):5991–8.
Murray-Zmijewski F, Lane D, Bourdon J. p53/p63/p73 isoforms: an orchestra of isoforms to harmonise cell differentiation and response to stress. Cell Death Differ. 2006;13(6):962–72.
Wu G, Nomoto S, Hoque MO, Dracheva T, Osada M, Lee C-CR, et al. ΔNp63α and TAp63α regulate transcription of genes with distinct biological functions in cancer and development. Cancer Res. 2003;63(10):2351–7.
Wilkerson MD, Yin X, Hoadley KA, Liu Y, Hayward MC, Cabanski CR, et al. Lung squamous cell carcinoma mRNA expression subtypes are reproducible, clinically important, and correspond to normal cell types. Clin Cancer Res. 2010;16(19):4864–75.
Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2008;4(1):44–57.
ZY and BZ participated in the design of this study, and they both performed the statistical analysis. YY and SMJ carried out the study and collected important background information. TW drafted the manuscript. All authors read and approved the final manuscript.
This study was supported by National Natural Science Foundation of China (81170049, 81470252, 81160011) and Natural Science Foundation of Ningxia (NZ11151).
Compliance with ethical guidelines
Competing interests The authors declare that they have no competing interests.