Heritable genomic diversity in breast cancer driver genes and associations with risk in a Chilean population

Background Driver mutations are the genetic components responsible for tumor initiation and progression. These variants, which may be inherited, influence cancer risk and therefore underlie many familial cancers. The present study examines the potential association between SNPs in driver genes SF3B1 (rs4685), TBX3 (rs12366395, rs8853, and rs1061651) and MAP3K1 (rs72758040) and BC in BRCA1/2-negative Chilean families. Methods The SNPs were genotyped in 486 BC cases and 1258 controls by TaqMan Assay. Results Our data do not support an association between rs4685:C > T, rs8853:T > C, or rs1061651:T > C and BC risk. However, the rs12366395-G allele (A/G + G/G) was associated with risk in families with a strong history of BC (OR = 1.2 [95% CI 1.0–1.6] p = 0.02 and OR = 1.5 [95% CI 1.0–2.2] p = 0.02, respectively). Moreover, rs72758040-C was associated with increased risk in cases with a moderate-to-strong family history of BC (OR = 1.3 [95% CI 1.0–1.7] p = 0.02 and OR = 1.3 [95% CI 1.0–1.8] p = 0.03 respectively). Finally, risk was significantly higher in homozygous C/C cases from families with a moderate-to-strong BC history (OR = 1.8 [95% CI 1.0–3.1] p = 0.03 and OR = 1.9 [95% CI 1.1–3.4] p = 0.01, respectively). We also evaluated the combined impact of rs12366395-G and rs72758040-C. Familial BC risk increased in a dose-dependent manner with risk allele count, reflecting an additive effect (p-trend = 0.0002). Conclusions Our study suggests that germline variants in driver genes TBX3 (rs12366395) and MAP3K1 (rs72758040) may influence BC risk in BRCA1/2-negative Chilean families. Moreover, the presence of rs12366395-G and rs72758040-C could increase BC risk in a Chilean population.

susceptibility genes account for about half of hereditary BRCA1/2-negative BC cases, leaving much of the genetic risk unexplained.
Cancer is fundamentally a genomic disease. While numerous somatic mutations accumulate during tumorigenesis, the majority of these variants are neutral "passenger" mutations. Those variations that do contribute to tumorigenesis are known as "driver" mutations [2]. A tumor typically contains 2-8 driver mutations that initiate carcinogenesis [3][4][5]. The driver mutations and mutational processes operative in BC have not yet been comprehensively defined [6].
Several studies have used next-generation sequencing (NGS) to identify potential driver mutations [6][7][8][9]. Various relevant genes have been identified in sporadic breast tumors: ARID1B, CASP8, MAP3K1, MAP3K13, NCOR1, SAMRCD1, CDKN1B, AKT2, and TBX3. However, there has been scant research into the possibility that these driver genes contain inherited variants that influence the development of cancer [10]. In one of these few studies, Göhler et al. [10] investigated whether known driver genes contain heritable variants that influence risk and/ or survival in Swedish BC patients. That group evaluated selected single-nucleotide polymorphisms (SNPs) located in 15 genes that have consistently been classified as BC driver genes by NGS. Five genes were associated with BC risk: TBX3 (rs2242442) was associated with decreased risk; TTN (10497520) and MAP3K1 (rs702688 and rs72758040) were associated with increased risk; MLL2 (rs11168827) was associated with increased overall risk, positive hormone receptor status, and low-grade tumors; and SF3B1 (rs4688) had a protective effect and was associated with negative lymph node findings, metastasis, and hormone receptor status [11]. Considering that variations in these novel driver genes had not been assessed in any Latin American population, our group performed an association study on germline variations in BC driver genes in a Chilean population. We evaluated associations between SNPs in the driver genes TTN (rs10497520), TBX3 (rs2242442), MLL2 (rs11168827), and MAP3K1 (rs702688 and rs702689) with BC risk in BRCA1/2-negative Chilean families. The results did not support an association between rs702688:A > G (MAP3K1) or rs702689:G > A (MAP3K1) and risk. The rs10497520 (TTN) T allele was associated with decreased risk in patients with a family history of BC or early-onset BC (OR = 0.6, p < 0.0001 and OR = 0.7, p = 0.05, respectively), and rs2242442-G (TBX3) also demonstrated a protective effect (OR = 0.6, p = 0.02). On the other hand, rs11168827-C (MLL2) was linked to increased risk in families with a strong history of BC (OR = 1.4, p = 0.05) [12].
The SF3B1 gene encodes subunit 1 of the splicing factor 3b protein complex. Several studies have identified SF3B1 mutations in solid tumors, including 9.7% of uveal melanomas, 4% of pancreatic cancers, and 1.8% of BC [13]. The T-box 3 gene (TBX3) is a member of the T-box gene family. Functional analysis has shown that T-box family members are transcription factors with a highlyconserved DNA binding domain known as the T-box and a nuclear localization signal. These proteins can activate and/or repress target genes by binding to T-elements [14]. TBX3 is a critical developmental regulator of several structures but has no known function in adult tissue. Nevertheless, TBX3 is frequently overexpressed in several cancers, such as colon cancer, hepatocarcinoma, melanoma, chondrosarcoma, and BC. The identification of TBX3 mutations in breast tumors samples suggests that TBX3 is a driver gene in BC [15]. The protein MAP3K1, on the other hand, acts within the MAP-signaling pathway, which triggers expression of genes important for angiogenesis, proliferation, and cell migration [6]. Moreover, there is evidence to suggest that MAP3K1 is a potential driver gene in BC [6]. On the other hand, posttranslational modifications of TBX3 include phosphorylation of 29 sites in which some MAP kinase proteins are involved. Therefore, it is very important to determine whether inherited genetic variants in SF3B1, TBX3, and MAP3K1 genes affect BC risk.
The present study evaluates the association between specific SNPs and SNP-SNP interactions in the driver genes SF3B1, TBX3, and MAP3K1 with familial and early-onset sporadic BC, studying cases and controls from Chilean families who are negative for BRCA1/2 point mutations. A case-control design was used to explore the relationship between BC susceptibility and the following SNPs: rs4685 (SF3B1), rs12366395 (TBX3), rs72758040 (MAP3K1), rs8853 (TBX3), and rs1061651 (TBX3). Moreover, we assessed the SNP-SNP interaction for rs12366395 and rs72758040 to evaluate their combined effect on BC risk. The SNPs selected in this study were chosen based on their genetic location and their possible consequence within the gene. In addition, it is important to replicate the previous association studies of these SNPs in other populations in order to confirm their effect on BC risk.

Results
Association between rs4685, rs12366395, rs72758040, rs8853, and rs1061651 SNPs and familial or early-onset sporadic breast cancer in non-carriers of BRCA1/2 mutations The whole case sample was subdivided into two groups: cases with two or more family members with BC and/or OC (n = 308) (subgroup A) and non-familial early-onset BC (diagnosis at ≤ 50 years of age) (n = 178) (subgroup B). Table 1 shows the genotype and allele frequencies of the rs4685:C > T (SF3B1), rs12366395:A > G (TBX3), rs72758040:G > C (MAP3K1), rs8853:C > T (TBX3), and rs1061651:T > C (TBX3) polymorphisms in the whole data set, subgroups A and B, and controls. The observed genotype frequencies were in Hardy-Weinberg equilibrium for four of the five polymorphisms in controls (rs4685:C > T, p = 0.06; rs12366395:A > G, p = 0.834; rs8853:T > C, p = 0.164; rs1061651:T > C, p = 0.122), while the p-value was < 0.0001 for rs72758040:G > C. The single-locus analysis showed no significant differences between cases and controls in terms of genotype or allele distribution for rs4685:C > T, rs8853:T > C, or rs1061651:T > C, for the whole case group or either subgroup (p > 0.05) (Table 1).
However, the genotype and allele distribution for rs12366395:A > G (located in the TBX3 gene) was significantly different for controls vs. the whole sample of BRCA1/2-negative cases and vs. subgroup A (p < 0.05) ( Table 1). The minor allele frequency (MAF) (allele G) was higher in the whole sample (14.2%) and in subgroup A (15.3%) than in controls (11.4%) (OR = 1.2 [95% CI 1.0-1.6] p = 0.02; OR = 1.4 [95% CI 1.0-1.8] p = 0.009, respectively). Furthermore, we observed a significantlyincreased BC risk for heterozygous individuals (A/G) and allele G carriers (A/G + G/G) in the whole sample (OR = 1.3 [95% CI 1.0-1.7] p = 0.01; OR = 1.3 [95% CI 1.0-1.7] p = 0.01, respectively). BC risk was also significantly higher in cases with genotype A/G and in allele G carriers from subgroup A (OR = 1.4 [95% CI 1.0-1.9] p = 0.01 and OR = 1.4 [95% CI 1.1-1.9] p = 0.008, respectively). We also analyzed the relationship between rs12366395:A > G and BC risk according to number of BC and/or OC cases per family (Table 2). No association between rs12366395:A > G and BC risk was found in cases from families with two BC/OC cases. However, BC risk was significantly higher in cases with three or more family members affected by BC and/or OC. In these families, the G allele frequency was 16.2% in BC cases vs. 11.4% in controls (OR = 1.5 [95% CI 1.0-2.1] p = 0.02), and both heterozygous individuals and allele G carriers had a significantly increased BC risk (OR = 1.5 [95% CI 1.0-2.2] p = 0.04 and OR = 1.5 [95% CI 1.0-2.2] p = 0.02, respectively) ( Table 2). These results suggest that the allele G and allele G carrier genotypes are associated with risk in the context of a strong family history of BC. No association was found between rs12366395 and nonfamilial early-onset BC (≤ 50 years) (Table 1).
Similarly, the rs72758040:G > C (MAP3K1 gene) genotype and allele distribution differed significantly between controls and the whole group of cases and between controls and subgroup A (p < 0.05) ( Table 1). The MAF, allele C, was significantly higher in the whole sample (22.3%) and in cases with two or more family members with BC and/or OC (24.4%) vs. controls (19.0%) (OR = 1.2 [95% 1.0-1.4] p = 0.02; OR = 1.3 [95% 1.0-1.6] p = 0.003, respectively). This result indicates that the C allele is associated with increased BC risk. We also observed increased BC risk for homozygous C/C individuals in the whole sample and subgroup A cases (OR = 1.6 [95% CI 1.1-2.3] p = 0.01; OR = 1.9 [95% CI 1.3-3.0] p = 0.001, respectively). We then assessed the effect of rs72758040-C according to number of BC and/or OC cases per family ( Table 2). The MAF (allele C) was significantly higher in families with two BC/OC cases (24.1%) and with three or more cases (24.5%) than controls (19.0%) ( Table 2). Furthermore, BC risk was significantly higher in homozygous C/C individuals, both in the families with two BC/ OC cases and with three or more cases (OR = 1.8 [CI 1.0-3.1] p = 0.03; OR = 1.9 [CI 1.1-3.4] p = 0.01, respectively). No association was observed between rs72758040 and cases from subgroup B. These results suggest that the C allele and C/C genotype are associated with elevated BC risk in cases with a family history of BC (Table 1).

Combined effect of TBX3 rs12366395-G and MAP3K1 rs72758040-C alleles on breast cancer risk
As noted, TBX3 and MAP3K1 are driver or potential driver genes. Because the results indicated that rs12366395-G and rs72758040-C are associated with BC risk, we evaluated the combined effect of the two SNPs. Cases were divided into five groups for this analysis, according to risk allele count: zero (A/A + G/G), one (A/A + G/C, A/G + G/G), two (A/A + C/C, G/G + G/G, A/G + G/C), three (A/G + C/C, G/G + G/C), or four (G/G + C/C). Table 3 shows that the combined genotype distribution differed significantly in controls vs. the whole BC sample and in controls vs. subgroup A (global p = 0.009 and 0.0002, respectively), and BC risk increased in a dose-manner with number of risk alleles in the whole case group and subgroup A (p-trend = 0.01 and 0.001, respectively). No additive effect was observed in the early-onset BC group (diagnosis ≤ 50 year of age). We also analyzed this additive effect according number of BC and/or OC cases per family (Table 4). BC risk was elevated in the families with two BC and/or OC cases as well as in the families with the strongest history of BC (p-trend 0.05 and 0.003, respectively), These results indicate an additive effect of TBX3 rs12366395 and MAP3K1 rs72758040 on the risk conferred.

Discussion
Cancer is essentially a disease of the genome, and a large number of somatic mutations accumulate during the process of tumorigenesis. Some of those mutations contribute to tumor initiation/progression and are known as driver mutations [2]. The driver mutations and mutational processes underlying BC have not been comprehensively explored [6].
The T-box transcription factor 3 gene (TBX3) is a member of a gene family that shares a common DNA-binding domain, the T-box. T-box genes encode a transcription factor that regulates stem cell pluripotency-associated and reprogramming factors and is involved in normal breast development [18,19]. Furthermore, TBX3 overexpression has been observed in primary breast tumors and BC cell lines with elevated expression in estrogen receptor-positive tumor cells [20]. Recently, somatic variations in TBX3 have been classified as BC driver mutations [6-9, 21, 22]. Göhler et al.
[10] studied the rs12366395 germline variation in a Swedish cohort, reporting that  Table 3 Combined effect of rs12366395 (TBX3) and rs72758040 (MAP3K1) on breast cancer risk  , identified a G-to-A somatic mutation that mapped within the δ-catenin 5′ leader region, nine nucleotides upstream of the AUG codon. The presence of the A allele in reporter mRNAs resulted in a three-to seven-fold increase in protein expression relative to mRNAs harboring the G allele, with no effect on mRNA levels. Therefore, given the location of rs12366395 in the TBX3 5'UTR, this SNP could produce and increase TBX3 protein levels in cells, which could explain the effect on BC risk. The SNP rs72758040 is located in the promotor region of the MAP3K1 gene at 439 nt upstream from Transcription Start Site (ENST00000399503.4) [10]. There is evidence to suggest that MAP3K1 is a potential driver gene in BC and acts within the MAP-signaling pathway, which triggers the expression of genes crucial for angiogenesis, proliferation, and cell migration [6]. Thus, it is important to determine whether the SNP rs72758040 contributes to HBC risk in the Chilean population. In this study, we found that rs72758040 was significantly associated with familial BC risk in a Caucasian-Amerindian South American population. These results are in agreement with those published by Göhler et al. in a Swedish sporadic BC cohort. Both studies observed an increased BC risk for homozygous C/C individuals. Nevertheless, there are no other publications in the literature on MAP3K1 rs72758040 and BC. Therefore, the results should be replicated in other populations to clarify the role on this SNP in risk. Moreover, as MAP3K1 seems to be a potential driver gene, functional studies would be helpful, in order to confirm that this SNP is a BC driver variation. One important issue to consider is that the genotype distribution of rs72758040 in MAP3K1 gene is Table 4 Combined effect of rs12366395 (TBX3) and rs72758040 (MAP3K1) on breast cancer risk according to number of BC cases per family (a) 0 risk alleles: A/A + G/G; 1 risk allele: A/A + G/C, A/G + G/G; 2 risk alleles: A/A + C/C, G/G + G/G, A/G + G/C; 3 risk alleles: A/G + C/C, G/G + A/C; 4 risk alleles: in a Hardy-Weinberg disequilibrium, which could distort the results. The possibility that different selective factors may directly or indirectly alter the association between rs72758040 and BC risk cannot be discarded.
As our results showed that SNPs rs12366395 (TBX3) and rs72758040 (MAP3K1) were associated with BC risk, we evaluated their combined effect and constructed a genetic score based on risk allele count. A dose-response association was observed for familial BC (Table 3). As noted above, TBX3 is a transcription factor frequently overexpressed in various types of human cancers, including BC [10], while the MAP3K1 gene induces MAP-kinase pathway. There is no information in the literature regarding the interaction between these two genes. To assess for an interaction between TBX3 and MAP3K1 proteins that could explain a synergistic effect on risk, we used the default parameters of the STRING software v11.0 (https:// string-db. org/) to analyze the TBX3-MAP3K1 interaction. We found that TBX3 related directly to MAPK1 (Fig. 1), which is a protein that interact directly with MAP3K1 in MAP signaling. Further studies are necessary to evaluate the functional impact of rs12366395-G (TBX3) and rs72758040-C (MAP3K1) on BC tumorigenesis. Although our study provides evidence for an association of rs12366395 and rs72758040 with BC risk, certain limitations must be considered. Firstly, the genotype distribution of rs72758040 did not conform to the Hardy-Weinberg expectations, which may distort the results. Secondly, the sample size of the whole group in the present study is sufficient to yield 80% power; nevertheless, the sample size limits the subgroup analyses. Therefore, these results should be replicated using subgroups with larger sample sizes.

Conclusion
Our study suggests that germline variants in driver genes TBX3 (rs12366395) and MAP3K1 (rs72758040) may influence BC risk in BRCA1/2-negative Chilean families. Moreover, the presence of rs12366395-G and rs72758040-C could increase BC risk in a Chilean population. Given that this is the first association study of these SNPs in a South American population, analyses in other populations would be helpful to clarify their role in BC tumorigenesis. Furthermore, functional studies should be performed to determine the biological impact of these mutations.

Families
We reviewed records from the Servicio de Salud del Area Metropolitana de Santiago, Corporación Nacional del Cáncer (CONAC) and private providers in Santiago to identify BC patients from high-risk BRCA1/2-negative Chilean families. A total of 486 women with BC were enrolled (one case per family). We tested index cases for BRCA1 and BRCA2 mutations as previously described [16], then developed pedigrees based on the index case with the greatest probability of carrying a deleterious mutation. None of the families met strict criteria for BCrelated syndromes such as Li-Fraumeni, ataxia-telangiectasia, or Cowden disease.
We performed extensive ancestry interviews with several family members of each case, including persons from different generations. All families self-reported exclusive Chilean ancestry for multiple generations. Table 5 shows the specific characteristics of the families selected according to the inclusion criteria. A total of 18.1% (88/486) of the study families had cases of bilateral BC; 9.7% (47/486) had cases of both BC and ovarian cancer (OC); and 1.1% (5/486) had cases of male BC. The mean age at diagnosis was 44.3 years, with 78.4% cases diagnosed before 50 years of age.
This study was approved by the Institutional Review Board of the University of Chile School of Medicine (Grant Number 1200049, March 2020). Written informed consent was obtained from all participants. All methods were performed in accordance with the relevant guidelines and regulations.

Control population
CONAC files were also reviewed to recruit healthy women (control group n = 1258). Controls were unrelated to the study families and reported no personal or family history of cancer. All controls confirmed that they were of Chilean ancestry, and over 90% were residents of Santiago. Case and control groups were matched for age and socioeconomic status. Participants provided written informed consent, and DNA samples were obtained in accordance with all ethical and legal requirements.

Genotyping analysis
Genomic DNA was extracted from peripheral blood lymphocytes of the 486 cases from the selected high-risk families and 1258 controls, following Chomczynski and Sacchi [17].

Statistical analysis
The Hardy-Weinberg equilibrium assumption was assessed in the control sample using a goodness-of-fit chi-square test (HW Chisq function, "Hardy Weinberg" package v1.4.1). Fisher's exact test was used to test the association between genotypes and/or alleles for cases and controls. Odds ratios (OR) with 95% confidence intervals (CI) were calculated to estimate the strength of the associations in cases and controls. For all analyses, the level of significance was set at p-≤ 0.05. GraphPad Prism software v6.0 for Windows 10, CA, USA, www. graph pad. com) was used for the Fisher's exact test and odds ratio analyses. A chi-square test for trend was performed identify any additive effects of the SNPs ('p-trend' was determined using the Stata/MP v13.0 for Windows 10, Unix-StataCorp, College Station, TX, USA; 'p-trend' package).

Methodology authority statement
All methods using in this study can be found in a previous published article of our authority [12].

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.