Germline DNA Repair Gene Mutations in Young-onset Prostate Cancer Cases in the UK: Evidence for a More Extensive Genetic Panel

Background Rare germline mutations in DNA repair genes are associated with prostate cancer (PCa) predisposition and prognosis. Objective To quantify the frequency of germline DNA repair gene mutations in UK PCa cases and controls, in order to more comprehensively evaluate the contribution of individual genes to overall PCa risk and likelihood of aggressive disease. Design, setting, and participants We sequenced 167 DNA repair and eight PCa candidate genes in a UK-based cohort of 1281 young-onset PCa cases (diagnosed at ≤60 yr) and 1160 selected controls. Outcome measurements and statistical analysis Gene-level SKAT-O and gene-set adaptive combination of p values (ADA) analyses were performed separately for cases versus controls, and aggressive (Gleason score ≥8, n = 201) versus nonaggressive (Gleason score ≤7, n = 1048) cases. Results and limitations We identified 233 unique protein truncating variants (PTVs) with minor allele frequency <0.5% in controls in 97 genes. The total proportion of PTV carriers was higher in cases than in controls (15% vs 12%, odds ratio [OR] = 1.29, 95% confidence interval [CI] 1.01–1.64, p = 0.036). Gene-level analyses selected NBN (pSKAT-O = 2.4 × 10−4) for overall risk and XPC (pSKAT-O = 1.6 × 10−4) for aggressive disease, both at candidate-level significance (p < 3.1 × 10−4 and p < 3.4 × 10−4, respectively). Gene-set analysis identified a subset of 20 genes associated with increased PCa risk (OR = 3.2, 95% CI 2.1–4.8, pADA = 4.1 × 10−3) and four genes that increased risk of aggressive disease (OR = 11.2, 95% CI 4.6–27.7, pADA = 5.6 × 10−3), three of which overlap the predisposition gene set. Conclusions The union of the gene-level and gene-set-level analyses identified 23 unique DNA repair genes associated with PCa predisposition or risk of aggressive disease. These findings will help facilitate the development of a PCa-specific sequencing panel with both predictive and prognostic potential. Patient summary This large sequencing study assessed the rate of inherited DNA repair gene mutations between prostate cancer patients and disease-free men. A panel of 23 genes was identified, which may improve risk prediction or treatment pathways in future clinical practice.


Introduction
Prostate cancer (PCa) is the most common solid tumour in men living in the developed world besides nonmelanoma skin cancer and responsible for over 300 000 deaths per year worldwide [1], although the majority of PCa cases are diagnosed with low-or intermediate-risk disease. Family history (FH) is a strong risk factor for PCa, and twin studies demonstrate a large contribution by heritable genetic factors [2]. Increasing evidence indicates that both common and rare germline variation contribute to PCa predisposition [3,4]. Rare loss of function (LoF) germline mutations in BRCA2 have convincingly been implicated as contributing to both FH of PCa and increased likelihood of aggressive disease with poor prognosis, whilst lower mutational frequencies or less consistent evidence has also been presented for a small subset of additional DNA repair genes including ATM, BRCA1, BRIP1, CHEK2, GEN1, MSH2, NBN, PALB2 and RAD51D[ 4 _ T D $ D I F F ] [5][6][7].
In this study, we performed screening of 167 genes from DNA damage response and repair pathways within a large UK-based case-control cohort with long follow-up, to further investigate the role of germline DNA repair gene mutations in PCa predisposition, clinical outcome, and survival. To maximise the power in this study, we utilised young-onset cases (diagnosed at 60 yr) and control samples screened for either no PCa FH or low prostatespecific antigen (PSA; <0.5 ng/ml). These results should help inform the composition of future gene panels for clinical screening and risk profiling.

2.
Patients and methods

Study population
Self-reported European ancestry PCa cases were randomly selected from the young-onset (diagnosed at 60 yr) subcohort of the UK Genetic Prostate Cancer Study (UKGPCS) [8]. Control men with no FH of PCa were recruited from GP practices participating in UKGPCS, or those with PSA <0.5 ng/ml were recruited from the Prostate Testing for Cancer and Treatment (ProtecT) trial [9]. Cases
Variants were annotated by wANNOVAR [17] using RefSeq Gene definitions [18], and variant consequence was checked using Variant opposing direction of effect, which finds the optimal combination between burden and kernel tests for the tested data [21]. permutations and using the mid p value setting [23].
We subsequently performed an additional gene discovery analysis (test 2) in which ADA was used to identify a candidate gene set rather than individual variants, by collapsing tier 1 mutations with MAF <0.5% in controls on a per-gene basis rather than a variant-level basis (except for CHEK2 where 1100delC was a separated from all other CHEK2 PTVs due to its relatively higher frequency), under the assumption that rare

Sequencing and sample summary
After QC, variant data were available for 1281 PCa cases and 1160 control samples. Of 175 genes targeted, three (GTF2H2,

Gene-level association
Gene-level analyses were restricted to genes containing two or more tier 1 and 2 variants. In the case/control analysis (159 genes tested) NBN reached significance (p = 2.4 Â 10 À4 ; p = 0.18 for aggressiveness), as did XPC for the aggressive phenotype (146 genes tested; p = 1.6 Â 10 À4 , p = 0.90 for overall PCa; To further investigate these SKAT-O association signals, we used ADA to interrogate the combination of variants contributing to the association (HOXB13 and POLL were also included due to the well-characterised role of HOXB13 in PCa predisposition). For both NBN and HOXB13, ADA identified a single-recurrent heterozygous nonsynonymous variant enriched among PCa cases to be responsible for the gene-level signal, whilst for POLL, four of the 15 tested variants were identified to be responsible for potentially modulating risk (three protective and one pathogenic). For XPC, ADA selected six singleton heterozygous variants from the nine variants tested as contributing to the aggressive phenotype, all of which were observed in different individuals (  Fig. 2B). Three of these genes overlap with the case/control gene set (BRCA2, CHEK2, and MSH2), whilst ERCC2 is unique to the aggressive set. In contrast to other CHEK2 PTVs, the CHEK2 1100delC variant was not enriched among aggressive cases.
The combined set of 21 genes identified in these analyses demonstrated a continuum of aggressive phenotype risk ( Supplementary Fig. 7), with the upper tail defining predisposition genes with a lower risk of aggressive disease [ ( F i g . _ 1 ) T D $ F I G ]  Fig. 2C). As would be expected, given the phenotype criteria, Agg4 carriers showed significant enrichment for several clinical indicators of aggressive disease (higher PSA, Gleason score, tumour stage, and nodal spread). Predis18 carriers showed no association with any clinical variable (Table 4). A modest increase in PCa FH rate was observed among Predis18 carriers compared with noncarriers, whilst PCa FH rates were lower among Agg4 carriers; however, both these trends were nonsignificant. Suggestive but nonsignificant increases in rates of breast and pancreatic cancer FH were also observed for carriers of the Agg4 gene set (Supplementary Table 5). Kaplan-Meier survival analysis showed a significant global difference across gene-set carriers (Agg4, Predis18, and noncarriers) for both all-cause and PCa-specific mortality (log-rank test, p all-cause = 9.8 Â 10 À8 [ 4 9 _ T D $ D I F F ] , p PCa-specific = 4.1 Â 10 À6 ). This is attributable to Agg4 carriers demonstrating significantly worse survival than noncarriers, as survival between Predis18 carriers and [ ( F i g . _ 2 ) T D $ F I G ] noncarriers was very similar. For all-cause survival (Fig. 3A), 5-yr survival rates were 60% for Agg4 (95% CI 34-79%), 93% for Predis18 (95% CI 85-97%), and 89% for noncarriers (95% CI 87-91%). The hazard ratio for Agg4 carriers compared with noncarriers was 2.69 (95% CI 1.32-5.50; Fig. 3C). A similar pattern was observed when considering only PCaspecific survival (Fig. 3B), though hazard ratios were not statistically significant, possibly due to the reduction in the number of events (282 compared with 212). Five-year survival rates were 60% for Agg4 (95% CI 34-79%), 94% for Predis18 (95% CI 86-98%), and 91% for noncarriers (95% CI 89-92%). The hazard ratio for Agg4 carriers compared with noncarriers was 1.83 (95% CI 0.77-4.39; Fig. 3D).

Discussion
Direct sequencing approaches are required to investigate the effect of rarer germline variants in complex disease predisposition; however, to date, these studies in PCa have generally been smaller in size, considered only a handful of candidate genes, or lacked control cohorts. In this study, we investigated the role of DNA repair and damage response genes in predisposition to PCa and aggressive disease in a case/control cohort. We focused on protein truncating (tier 1) and predicted conserved (tier 2) variants using both gene-level SKAT-O and gene-set-level ADA analyses. Gene-level analysis of tier 1 and 2 variants identified significant associations in NBN for PCa predisposition and XPC for disease aggressiveness. The NBN signal was refined by ADA to rs61753720, a G>T single nucleotide variant (SNV) resulting in a D95N substitution. A previous study by the ICPCG consortium found this variant at a low frequency in both unselected (1/613) and familial (1/121) Finnish PCa cohorts, and absent (0/440) in controls [26]. For the association between the XPC gene and a higher Gleason score, ADA selected multiple singleton SNVs across the gene. Both POLL and HOXB13 were also marginally associated with PCa predisposition in the case/control analysis. Since the role of HOXB13 rs138213197 in PCa risk has been well established, sample size may have been a limiting factor in achieving Bonferroni-corrected significance, suggesting that POLL may also warrant additional follow-up in larger cohorts or meta-analyses of individual studies.
Gene-set-level analysis identified 20 genes in which PTVs were associated with PCa predisposition. These included the established BRCA1/2 genes, a handful of additional genes that have been indicated previously as prospective PCa candidates (ATM, CHEK2, GEN1, MSH2, and RNASEL), and several novel genes for which limited substantive evidence for a role in PCa predisposition has been presented to date (BLM, CDC25C, ERCC3, LIG4, MSH5, NEIL2, NHEJ1, PARP2, POLD1, POLE, POLM, RECQL4, and TDP1). We furthermore identified four genes associated with more aggressive PCa phenotype, three of which overlapped the 20-gene PCa predisposition set. These include BRCA2, for which association with a more aggressive phenotype has reliably been demonstrated [6,7,27,28], whilst we also present evidence that carriers of PTVs in MSH2, CHEK2 (excluding 1100delC), and ERCC2 also have a substantially higher likelihood of developing aggressive disease. Our criteria to stratify cases for the aggressive phenotype analysis (Gleason score 7 vs !8) were chosen to maximise the homogeneity and risk of the aggressive group. Within the Gleason 7 category, however, Gleason 4 + 3 patients have poorer prognosis than Gleason 3 + 4 patients, with these two subgroups categorised separately according to the prognostic grade grouping method [29]. We therefore compared the results of our aggressive analysis with those of Gleason 4 + 3 cases reclassified as aggressive, equivalent to grade group 2 versus !3 (n = 924 vs 324) instead of  set, we nevertheless observed substantial enrichment over noncarriers for nodal invasion (38% vs 9.5%), metastatic disease (18% vs 11%), and reduced survival (PCa-specific 5-yr survival rate 60% vs 91%), suggesting that these genes could potentially demonstrate clinical utility for the identification of individuals at a higher risk of advanced disease prior to progression. The absence of BRCA1 and ATM from our aggressive gene set is however notable, as PTVs in these genes have been implicated in increased risks of metastatic and lethal PCa cancer previously [6,7,30]. This discrepancy may in part reflect our use of Gleason score to define aggressive disease due to the modest proportion of patients with metastatic disease in our unselected cohort (7.2% of overall cohort,11% excluding unknown status) in comparison with the more stringent metastatic or lethality indicators employed elsewhere in cohorts enriched for these outcomes, or alternatively that these genes confer lower influence upon aggressiveness in younger patients. It is also noteworthy that whilst CHEK2 was associated with PCa predisposition for both 1100delC and other PTVs, only the non-1100delC CHEK2 variants were found to contribute towards aggressive disease in our study. This observation, however, contrasts with a recent report in which only the 1100delC variant and not overall CHEK2 mutations were enriched in lethal PCa patients [31], and therefore requires further validation in independent cohorts. These combined reports could, however, potentially indicate that the downstream functional consequence of the 1100delC founder mutation may partly differ from those of other CHEK2 PTVs in prostate tissue.
Whilst the novel genes that we have identified represent exciting candidate moderate-penetrance PCa-risk genes, these findings nonetheless require additional validation in independent cohorts. In particular, we note that the optimal p value truncation thresholds used by ADA are tuned towards greater sensitivity than specificity to maximise power for rare variant discovery in sequencing study sample sizes, and no suitable replication set was available for confirmation of our findings. Furthermore, even though this is the largest DNA repair gene germline sequencing study for PCa to date, our power to detect rare associations with moderate effect sizes remained modest.
Whilst our strategy of using screened controls (no PCa FH or PSA <0.5 ng/ml) potentially increased our power to detect associations, this also has the potential to introduce bias in our case/control analyses. We therefore cannot completely exclude the possibility that the use of PSA or FH in our control selection criteria led to an observed depletion of LoF variants among controls; although this would imply a uniform direction and comparatively high penetrance of effects across a wide range of DNA repair genes and pathways[ 5 2 _ T D $ D I F F ] should these associations [ 5 3 _ T D $ D I F F ] have been driven exclusively by extraneous variables such as low PSA levels independently of PCa.

Conclusions
In this study, we confirmed previous PCa predisposition gene reports and also present evidence for additional novel genes. Our combined gene and gene-set-level analyses provide evidence for a prospective screening panel of 23 genes that may facilitate identification of individuals at a higher PCa risk prior to disease onset, who would warrant enhanced screening. In addition, PCa patients who are carriers of mutations in these genes could potentially benefit from personalised treatment pathways [27,32]. We believe that these genes warrant evaluation by the wider scientific and clinical communities in larger prospective studies or meta-analyses. There is also a need to formally test the ability of these genes to predict survival in an independent cohort within aggressiveness strata.
Author contributions: Zsofia Kote-Jarai had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.