Protein expression profiles and clinicopathologic characteristics associate with gastric cancer survival

Background Prognosis remains one of most crucial determinants of gastric cancer (GC) treatment, but current methods do not predict prognosis accurately. Identification of additional biomarkers is urgently required to identify patients at risk of poor prognoses. Methods Tissue microarrays were used to measure expression of nine GC-associated proteins in GC tissue and normal gastric tissue samples. Hierarchical cluster analysis of microarray data and feature selection for factors associated with survival were performed. Based on these data, prognostic scoring models were established to predict clinical outcomes. Finally, ingenuity pathway analysis (IPA) was used to identify a biological GC network. Results Eight proteins were upregulated in GC tissues versus normal gastric tissues. Hierarchical cluster analysis and feature selection showed that overall survival was worse in cyclin dependent kinase (CDK)2, Akt1, X-linked inhibitor of apoptosis protein (XIAP), Notch4, and phosphorylated (p)-protein kinase C (PKC) α/β2 immunopositive patients than in patients that were immunonegative for these proteins. Risk score models based on these five proteins and clinicopathological characteristics were established to determine prognoses of GC patients. These proteins were found to be involved in cancer related-signaling pathways and upstream regulators were identified. Conclusion This study identified proteins that can be used as clinical biomarkers and established a risk score model based on these proteins and clinicopathological characteristics to assess GC prognosis.

urgent need for additional biomarkers to identify patients at risk of poor outcomes or recurrence.
GC is a heterogeneous disease, and its initiation and progression are influenced strongly by genetic and environmental factors [14]. Presently, many candidate gene products, such as MUC1, CEA, p53, p16, and E-cadherin, have been suggested to predict the survival of patients with GC [15][16][17][18][19]. Although there has been much investigation of the genetic factors that predict survival, few genetic alterations have been used in GC diagnosis. Recently, it has become possible to conduct large-scale molecular studies on formalin-fixed tissue samples with tissue microarrays and immunohistochemistry. Such large-scale studies have involved numerous markers and cases. For example, a large-scale cluster analysis showed that multiple markers correlated significantly with patient survival [20,21]. Therefore, tissue microarrays and immunohistochemistry may be a practical method for routine testing and validation studies.
In a previous study, we screened for GC-associated signaling proteins using protein pathway arrays (PPAs) and found that 22 proteins, or phosphorylated (p)-protein forms, were differentially expressed between cancer and normal tissues. Of those 22 proteins, the following 9 were upregulated in GC tissues [1]: proliferating cell nuclear antigen (PCNA), Notch4, cyclin-dependent kinase (CDK)4, CDK6, X-linked inhibitor of apoptosis protein (XIAP), p-protein kinase C (PKC)α/βII, Akt, β-catenin, and p-PKCα. In this study, we aimed to verify overexpression of these proteins in GC using tissue microarrays and immunohistochemistry. We also analyzed survival characteristics to establish prognostic scoring models to improve clinical outcome predictions.

Patients and tissue samples
This study included 121 surgically resected primary GC samples collected from patients who underwent D2 gastrectomy over a period of 5 years, between January of 2006 and December of 2012, and 30 normal gastric samples from patients who underwent gastrectomy for non-cancer diseases. Two pathologists confirmed the histological diagnoses and tumor-node-metastasis (TNM) staging of the collected tissues. Clinicopathological characteristics of the GC patients, including gender, age, tumor size, tumor location, histologic differentiation, vascular or lymphatic invasion, and tumor stage, were obtained by reviewing medical charts and pathology records ( Table 1). None of the patients received preoperative chemotherapy or radiotherapy. Patients were followed-up from the date of surgery for a period of 6-119 months (mean, 55 months) and clinical outcomes were recorded. Survival time was calculated from the date of surgery to the date of death or the last day of follow-up. The majority of the patients died of the cancer. Cases lost to follow-up were not included in our survival analysis. The Institution Ethical Review Board of The First Hospital of Jilin University approved this study, and all of the patients provided informed consent.

Tissue microarrays
Tissue microarrays were prepared as previously described [22]. Briefly, whole sections of individual donor tissue blocks, which were stained with hematoxylin and eosin (H&E), were used to select tumor areas for tissue microarray cores. Three cylinders of tissues (0.6 mm in diameter) were punched from each sample and re-embedded in a recipient paraffin block at predetermined positions. Multiple 4-µm-thick sections were cut from each tissue array block and mounted on microscope slides.

Immunohistochemistry
Sections on tissue microarray slides were dewaxed in xylene and then rehydrated in a series of graded alcohols. Antigen retrieval was performed by autoclaving the sections in citrate buffer (pH 6.0) for 2 min and then cooling them in dH 2 O. Then, the sections were immersed in 3% hydrogen peroxide in phosphate-buffered saline (PBS) for 15 min to block endogenous peroxidase activity. Nonspecific binding was then blocked in 10% normal goat serum at room temperature for 10 min. Subsequently, the sections were incubated with primary antibodies ( Table 2) at 4 °C overnight in a moist chamber. After washing with PBS, the sections were incubated with secondary antibodies for 1 h at room temperature. The sections were stained with 3,3-diaminobenzidine and counterstained with Harris hematoxylin, dehydrated, and mounted. Immunoreactivity was evaluated microscopically by two pathologists. Protein staining was graded according to a previous study [23]: 0, negative, − (no cells stained); 1, weakly positive, + (< 10% of cells stained); 2, moderately positive < ++ (10-50% of cells stained); or 3, strongly positive, +++ (> 50% cells stained).

Cluster analysis
Hierarchical cluster analysis was conducted in the Cluster program (complete linkage clustering) [24]. Clustering analysis results were displayed in TreeView software [25]. Expression data were graded as follows: − 3, negative staining; 1, weak positive staining; 2, moderately positive staining; and 3, strongly positive staining.

Signaling network analysis
To visualize the interactions and upstream regulators of differentially expressed proteins, pathway and network analyses were carried out in Ingenuity Pathway Analysis version 9.0 (IPA), a protein-gene and protein-protein interaction analysis program.

Statistical analysis
Chi-square test and Fisher's exact (two-sided) tests were used to determine associations between protein expression status and clinicopathological variables. Kaplan-Meier survival curves were created, and differences between the curves were examined by log-rank testing. Cox proportional hazard regression analysis was used to determine independent prognostic factors. Principal component analysis (PCA) was performed to establish a survival prediction model for GC patients. Additionally, the Kaiser-Meyer-Olkin measure and Bartlett's test of sphericity were used to ensure appropriate extraction factor analysis. To determine the optimal prognosis prediction model, receiver operating characteristic (ROC) curve analysis was applied to principal components (PCs), and the area under the curves (AUCs) were calculated. All analyses were performed in SPSS 17.0 (SPSS Inc, Chicago, IL). A p value < 0.05 was considered to be statistically significant.

Protein expression profiling in GC and normal tissues
The immunohistochemistry results for the nine evaluated proteins are shown in Fig. 1 and summarized in Table 3. Notably, p-PKCα was expressed in 97.5% (117/121) of the GC cases, with weakly positive, moderately positive, and strongly positive expression in 46, 59, and 12 cases, respectively. Meanwhile, p-PKCα was expressed in 66.7% (20/30) of the normal tissues, with weakly positive, moderately positive, and strongly positive expression in 10, 19, 1 and 0 cases, respectively. p-PKCα/β2 was expressed in 82.6% (100/121) of the GC cases; weakly positive, moderately positive, and strongly positive expression was seen in 59, 37, and 4 cases, respectively, whereas weakly positive p-PKCα/ β2 expression was observed in 32.5% (13/30) of the normal tissues. Akt1 was expressed in 81.8% (99/121) of the GC cases, and weakly positive, moderately  In GC cells, p-PKCα/ β2, Akt1, CDK6, Notch4, and PCNA were expressed mainly in the nucleus and the cytoplasm. CDK6 was also expressed in some muscle tissues near GC cells. p-PKCα, CDK2, and XIAP were expressed mainly in the nucleus with low-level expression in the cytoplasm.

Correlations between protein expression profiles and clinicopathologic parameters of GC
Correlations between protein expression status (negative vs. positive) and clinicopathologic characteristics were determined (summarized in Table 4). Positive p-PKCα/β2 and CDK2 expression in primary GC was associated with older age (p < 0.05). Increased CDK2 expression also correlated with the presence of vascular/lymphatic invasion (p = 0.014), advanced N stage (p = 0.042), and advanced TNM stage (p = 0.020).

Hierarchical cluster analysis of GC and survival-associated feature selection
Hierarchical cluster analysis was performed with eight protein expression profiles from 121 GC cases. Tumors were separated into two clusters based on protein expression patterns (Fig. 2a)   Feature selection was performed with the Kaplan-Meier method (univariate analysis) to identify protein expression profiles associated with survival. Overall survival was worse in cases with CDK2, Akt1, XIAP, Notch4, and p-PKCα/β2 than in cases without immunopositivity for these proteins (log-rank test: p = 0.014, 0.026, 0.042, 0.011, and < 0.001, respectively; Fig. 3).
Four factors (PC 1-4) arose from our analysis of PCs with eigenvalues > 1.0 in the PCA. The contributing rate of the cumulative sums of the squares was 65.16%. PC loadings for each of the variables are shown in Table 5. PC 1 was heavily loaded with the Notch4, p-PKCα/β2, XIAP, DK2, and Akt1 variables and termed the protein factor. PC 2 was heavily loaded with the TNM stage and vascular/ lymphatic invasion variables and was termed the pathological factor. PC 3 was heavily loaded with the age and tumor size variables and was termed the clinical factor. PC 4 was heavily loaded with the histologic differentiation variable. PC scores (calculated in SPSS) indicated that each of these four factors was independent of the others.
Because PC 1 was a five-protein factor, we established a risk score model based on the five proteins to predict GC prognosis. Risk scores were calculated on the basis of protein expression status (immunonegative/immunopositive) and the corresponding regression coefficients from univariate Cox proportion hazard regression analysis. Risk scores for patients were calculated by multiplying the regression coefficient of a protein by the protein expression status for each protein and then summing the values. The regression coefficients for Notch4, p-PKCα/β2, XIAP, CDK2, and Akt1 were 3.13, 1.52, 0.74, 0.58, and 0.77, respectively. Therefore, in this study, a risk score = 3.13 × (Notch4) + 1.52 × (p-PKCα/β2) + 0.74 × (XIAP) + 0.58 × (CDK2) + 0.77 × (Akt1). The distribution of risk scores for the 121 patients (range, 0-6.73) is presented in Fig. 6a. Based on the risk score curve, the patients were separated into two groups: low-risk (scores < 6.15; 54 cases) and high-risk (scores ≥ 6.15; 67 cases). Kaplan-Meier analysis revealed that overall survival of patients was worse in the high-risk score group than in the low-risk score group (log-rank test: p = 0.005; Fig. 6b) indicating that the risk score model based on Notch4, p-PKCα/β2, XIAP, CDK2, and Akt1 expression predicts GC prognosis.
TNM stage and vascular/lymphatic invasion were categorical covariates. To improve the prognosis predicting efficiency of PC 2 (TNM stage and vascular/ lymphatic invasion), we separated PC scores in PC 2 by median score. Kaplan-Meier analysis indicated that the low-score group (≤ median) had better overall survival than the high-score group (> median) (log-rank test: p = 0.004; Fig. 6d). The PC scores in PC 3 (age and tumor size) were also separated by median score. Kaplan-Meier analysis revealed that the low-score Li et al. Biol Res (2019) 52:42 Fig. 3 Expression of a CDK2, b Akt1, c XIAP, d Notch4, e p-PKCα/β2, f p-PKCα, g PCNA and h CDK6 and their association with overall survival. The five proteins with p < 0.05 were chosen by feature selection using the Kaplan-Meier method. p values were determined by log-rank testing group had better overall survival than the high-score group (log-rank test: p = 0.002; Fig. 6e). Thus, PC 2 and PC 3 scores can predict GC prognosis. Histologic differentiation in PC 4 separated GC patients into two natural groups. Patients with poor histologic differentiation had worse overall survival than those with moderate differentiation.

Discussion
In this study, to reduce the limitations to clinical application, we used the tissue microarray method on formalin-fixed specimens to evaluate the expression of nine proteins that we identified in a previous PPA study [1] in 121 primary GC tissues and 30 normal gastric tissues. The results showed that all of the nine proteins were expressed in primary GC tissues and eight of the proteins (all except CDK2) were expressed in normal gastric tissues. Additionally, eight of the proteins were upregulated in GC tissues, in accordance with our previous findings [1].
Hierarchical clustering of immunolabeling data from tissue microarrays of 121 formalin-fixed GC samples with nine GC-associated antibodies yielded patient clusters based on protein expression patterns. The patients in Cluster A (20 cases) had better survival than patients in Cluster B11 (35 cases), Cluster B12 (15 cases), and Cluster B2 (51 cases) (p = 0.006).
Feature selection based on Kaplan-Meier survival analysis yielded Cluster 1 (53 cases) and Cluster 2 (68 cases), which had distinct clinicopathologic features and patient outcomes, with Cluster 2 being associated with poorer prognoses than Cluster 1. Examination of subsequently developed risk score models established to predict clinical outcomes indicated that the PC combination prognosis risk model based on five proteins and five clinicopathological variables was clinically relevant and useful for guiding medical treatment. For example, aggressive chemotherapy may be recommended for patients with high-risk scores in this model to improve survival.
We correlated the five proteins obtained from feature selection with clinicopathologic characteristics. Given the generally long delays from tumorigenesis to diagnosis, protein markers would enable earlier diagnoses. Currently, GC is diagnosed based on symptoms, patients' knowledge of the disease, and the overall medical condition of the patient. Age is also a major factor in GC diagnosis [26]. In our study, the expression levels of p-PKCα/ β2 and CDK2 were higher in patients > 60 years old than in younger patients. In addition to age, depth of invasion can be considered a prognostic factor and an indicator of GC progression. Previous studies have shown that vascular/lymphatic invasion status correlates with GC progression [27,28]. This study showed that upregulation of CDK2 correlates with vascular/lymphatic invasion. Pathway analysis showed that eight of the proteins studied are involved in cellular signaling (p-PKCα and p-PKCα/β2), cell survival and apoptosis (Akt and Fig. 6 Kaplan-Meier survival analysis of patients with GC and risk score models. a A risk score model based on expression of five proteins. The patients were ranked according to their risk scores; the line divides the patients into low-risk and high-risk score groups. b Kaplan-Meier survival analysis of patients in the low-risk and high-risk score groups based on the five-protein risk score model. c Kaplan-Meier survival analysis of patients with different numbers of immunopositive proteins. d Kaplan-Meier survival analysis of patients in the low-risk and high-risk score groups based on TNM stage and vascular/lymphatic invasion. e Kaplan-Meier survival analysis of patients in the low-risk and high-risk score groups based on tumor size and age. p values were determined by log-rank testing XIAP), the cell cycle (CDK6 and CDK2), cell differentiation (Notch4), and cell proliferation (PCNA). A deregulated cell cycle is a fundamental aspect of cancer. Normal cells only proliferate in response to mitogenic or developmental signals, whereas cancer cells proliferate unchecked [29]. In addition, upregulation of PCNA, Akt, and CDK2 has been associated with GC [30][31][32]. Association between the PC combination risk score model and survival. a The PC combination risk score model was based on five proteins and five clinicopathological variables from four PCs. The patients were ranked according to their risk scores; the line divides patients into low-risk and high-risk score groups. b Kaplan-Meier survival analysis of patients in the low-risk and high-risk score groups. p values were determined by the log-rank test. c ROC curves for the various prognosis prediction models. Five prediction models, PC combination, PC1, PC2, PC3, and PC4, were included in the analysis. The PC combination and PC1 models were better predictors than the other models (p < 0.05) Pathway analysis revealed that the proteins evaluated in this study are involved in several canonical signaling pathways, including HER-2 signaling, MAPK signaling, VEGF signaling, and p53 signaling. It has been suggested that HER-2 expression is a prognostic indicator of GC [33]. MAPK signaling mediates many biological events, such as cell proliferation, differentiation, apoptosis, migration, and invasion in various human cancers, including GC [34,35]. VEGF is a potent angiogenic factor that has been implicated in tumor-induced angiogenesis, which has been shown to be related to GC development and prognosis [36]. p53 is a transcription factor that regulates a complex signal transduction network referred to as the p53 pathway. The p53 tumor suppressor protein plays a critical role in protection from tumor progression by inducing apoptosis or cell cycle arrest [37]. Thus, we speculate that the proteins evaluated in this study are involved in GC progression and prognosis via these pathways.

Conclusion
In this study, tissues from 121 GC cases were immunolabeled with nine tumor-associated antibodies in tissue microarrays. Hierarchical cluster analysis based on eight upregulated proteins revealed two clusters with different clinicopathologic features and prognoses. Kaplan-Meier method-based feature selection revealed five proteins that correlated strongly with overall survival suggesting that a risk score model including these proteins could predict the prognoses of patients with GC. These proteins have been shown to be involved in cancer-related signaling pathways. Future studies will focus on elucidating the roles of these proteins in GC.