Computational Prediction of Tumor-Specific Antigens as Potential Vaccine Candidates against Germ-line Mutations in Endometrial Cancer

Endometrial cancer is the fourth most common cancer in women. It arises from the endometrium and accompanied by the abnormal growth of the cells. Sign and symptoms include pelvic pain and abnormal vaginal bleeding. It has two categories. Type 1 tumors are estrogen-dependent and they have mutations in PTEN, PIK3CA while Type 2 tumors are more sensitive and have mutations in TP53. Overactivation of the signaling pathway (PI3K) results in anti-apoptosis. Here, this study aims to identify Tumor-Specific Antigen for germline mutations in endometrial cancer which can be used as a potential vaccine candidate. The germline mutations data are obtained from cancer gene census of the cosmic database. Genes mutating with crucial role in endometrial cancer are considered. Peptides libraries are generated using peptide design library. Human leukocyte antigen alleles are identified for the peptide library through NetMHC. Binding affinities of alleles with peptide are determined. Linear regression is performed to generate graphs. PTEN, TP53, PIK3CA, KRAS, and CTNNB1 proved to have critical role. About 575 overlapping peptide libraries are generated and each peptide has a length of 18-20 amino acids. Approximately 58 HLAs are identified, having strong interactions with HLAs. Regression analysis shows that the no. of mutations are directly associated with a binding affinity of peptides. From this, we suggest that the identified TSA can be used as personalized peptide vaccines that directly target the mutated genes in endometrial cancer. This research work can be used in the laboratories for further validation.


Introduction
Endometrial cancer (EC) is the sixth most commonly diagnosed cancer among women worldwide, arises from the endometrium and accompanied by the abnormal growth of the cell [1,2]. The uterus is the hollow, pear-shaped pelvic organ in women where fetal development occurs. Endometrial cancer begins in the layer of cells that form the lining (endometrium) of the uterus, sometimes also called uterine cancer, results from the abnormal growth of cells with the ability to invade or spread to other parts of the body. Endometrial cancer occurs most commonly after menopause [3]. Most EC is low grade (G1 or G2) tumor that is diagnosed at early stages [1,4].
It is fourth most common gynecological malignancy in Europe and in the United State. More than 280000 women are diagnosed every year and 74000women died from it annually, second-most gynecological cancer in Taiwan [5]. According to the cancer statistics, 2005 and extent of 40880 new cases were diagnosed whereas the mortality rates are 7310 [1]. In 2008 estimated cases of endometrial cancer and the mortality chances are 49560 and 8190 respectively. Most recent data on the incidence rate and death rates lies in the range of 60050 and 10470 independently according to the report of cancer journal [6].
In 2013 approximately 49560 new cases of endometrial cancer and 8190 related deaths are predicted [7].
Endometrial cancer is broadly divided into two groups. Type I called endometrioid carcinomas and type II called uterine carcinoma about 80-90% of EC are endometrioid carcinoma and 2-10% are serious carcinomas [5,8]. Type I are associated with loss of estrogen/progesterone receptor absence of unopposed estrogen reduced E cadherin expression and mutation in P53, type II carcinoma exhibit extrauterine spread at the time of surgery [7,9]. Although, 56 Computational Prediction of Tumor-Specific Antigens as Potential Vaccine Candidates against Germ-line Mutations in Endometrial Cancer type II is in minority and it is much more aggressive than Type I and has a poor outcome [5].
There are different stages of endometrial cancer. Stage 0: the abnormal cells are found only on the surface of the inner lining of the uterus, stage I: the tumor has grown through the inner lining of the uterus to the endometrium, it may have perfused the myometrium, stage II: the tumor has perfused the cervix, stage III: the tumor has grown through the uterus to reach nearby tissues, such as the vagina, and stage IV: the tumor has invaded the bladder or intestine [10,11].
Signs and the symptoms include abnormal vaginal bleeding, pain or difficulty when emptying the bladder, discharge of a watery, blood-streaked flow that gradually contains more blood and pain in the pelvic area and during the intercourse [3].
For the detection of uterine cancers cytological and histological methods were considered the best for screening and the earlier diagnosis of the uterine cancers. Squamous cell carcinoma-associated antigen (SCC) is relatively tumor-specific and widely used for monitoring patients with squamous cell carcinoma. There is no particular tumor marker for uterine corpus carcinoma, several tumor markers incorporations including cancer antigen 125 (CA125) may be of greater diagnostic value in cases of uterine corpus carcinoma [12]. There is no specific tumor marker for endometrial cancer, and the investigation of HLA [13] class 1 in endometrial cancer is very limited. This scientific study reports the identification of tumor-specific antigen for the germline mutations in endometrial cancer (uterine corpus carcinoma) that will be of great prognostic value and are considered the potential targets for the anticancer treatment.

Materials and Methods
To identify tumor Specific Antigen for germline mutations in endometrial cancer, a step by step methodology was applied shown in Fig.1.

Screening and Selection of Genes
Genes containing mutations that have been implicated in endometrial cancer from cancer gene census of the COSMIC database were cataloged. COSMIC database curates cancer data on somatic mutation. There were 23986 genes having mutations in endometrial cancer, top five genes with high mutations were selected and identified as: PTEN, PIK3CA, KRAS, CTNNB1, and TP53 [14].

Peptide Library Generation
Peptide library was generated using the Peptide Design Library. As Peptide Design Library generate overlapping library by breaking the original protein into many equal length overlapping fragments. The protein sequences of those genes were downloaded from UniProt. The peptide length and amino acid overlap were chosen as 18 and 16 respectively. The peptides having hydrophobic composition greater than equal to 50% were selected [15].

Identification of Human Leukocyte Antigen
Human leukocyte antigen is the complex genes that encode major histocompatibility complex (MHC) proteins. In humans, these proteins are responsible for the control of the immune system. Down-regulation of human leukocyte antigen class I has a vital role in several types of cancer [10]. Human leukocyte antigen alleles were identified using MHCpred, it covers a wide range of human MHC allele peptide specificity model [16]. For HLAs prediction, the sequence of a protein was entered, MHC allele and other parameters were selected, HLAs having IC50 less than 200 were identified for MHC alleles (HLA-A*0201 and HLA-A*0101). Binding affinities of alleles with peptides were determined [13]

Statistical Analysis
Descriptive analysis has been performed to evaluate expression dependency of the variables i.e. amino acids, confidence level and logIC50 on one another. Pearson correlation was used for evaluating the relationship between an amino acid and their logIC50 based on R-value. R-value should be closed to 1 or -1 [17].
We have performed regression linear modeling to determine the association between amino acids sequence and their log IC50 in our research work. Amino acids were taken as the dependent variable and log IC50 as the independent variable. Linear Modeling was performed to determine the lead peptide as a vaccine target on the basis of a maximum confidence level of prediction [18].

Protein Peptide Docking
After the recognition of lead peptide through linear modeling, the protein (HLA) peptide docking was done using GalaxyPepDock. GalaxyPepDock is freely accessible [19]. For this purpose, the HLA was downloaded from Protein databank in PDB format. Protein Data Bank is an archive for macromolecular structure [20]. The endometrial protein 4MZR having highest mutations was obtained, the protein preparation was done in Discovery studio by removing non-standard residues, hetero-atoms, water molecules and docking was done to get the structural protein-peptide complexes [21].

Results
In this research work, the list of genes involved in endometrial cancer was retrieved. There were 23986 genes having mutations in endometrial cancer. Top five genes with high mutations were selected and identified as germline mutations are: PTEN, PIK3CA, KRAS, CTNNB1, and TP53. Mutated genes are listed in Table 1. Overlapping peptides are essential to decide which part of the protein contains essential amino acid and they are characterized by the parameters i.e. peptide length and amino acid overlap. Therefore, the Peptide length was taken as 18-20 amino acids whereas amino acid overlap was taken 16. Total 1676 peptide libraries were generated for the above mentioned mutated genes and 575 peptides based on parameter i.e. Hydrophobicity composition greater than or equal to 50 were screened out.

Prediction of Peptide Binding Affinity with MHCPred 2.0
After submission of the overlapping peptides to MHCPred, binding affinities were checked for the MHC alleles (HLA-A-*0101 and HLA-A-*0201). 898 HLA, s for MHC alleles were identified. For HLA-A-*0101 the count was 170 and for HLA-A-*0201 the count was 728. Ability to predict binding affinities will aid in determining the most reputed vaccines.
For the strong association between the variables, the confidence of prediction should be equal to 1 i.e. maximum value. For HLA-A-*0201 the confidence level of prediction was .695 nearest to the standard value, similarly for HLA-A-*0101 the confidence level of prediction was .963 again closest to the maximum level of prediction. For analyzing the differences among the variances Anova Test was performed. In regression analysis, the degree of freedom for HLA-A-*0201 and for HLA-A-*0101 was 1. Whereas Mean square for HLA-A-*0201 was 525682.514 and for HLA-A-*0101 was 166.551.

Mutation Distribution for HLA-A-*0201 and HLA-A-*0101
On the basis of linear modeling, it was concluded that out of among 728 HLA'S, one HLA (LLLSVLLSV) was having the maximum confidence level of prediction for HLA-A-*0201. The graph in Fig.3 shows there are two HLA'S (GSDDINVVT, TTDCLQILA) having the maximum confidence level for HLA-A-*0101.

Identified Tumor-Specific Antigen
The identified TSAs are mentioned in table 4: The three HLAs have been identified as tumor-specific antigens based on maximum confidence. These results indicated that untreated endometrial cancer cells acquired a wide range of germline mutations in PIK3CA, KRAS, PTEN, TP53, and CTNNB1. The identified TSAs can be a prognostic marker for the identification of endometrial cancer and can be used as drugs target.

Docking Analysis
A computational peptide-protein docking was used to analyze structural complexes of 4MZR with peptides to understand the structural basis. The docking analysis results are shown in Fig.4.

Interacting Residues
Interacting residues in each docked complex are shown in Fig.5. These residues illustrate there is maximum interaction of the peptides with the receptor proteins.

Discussion
In this scientific study, the mutated peptides in 23985 genes were examined and checked for the binding affinity of those mutated amino acids to MHC Alleles (HLA-A-*0101 and for HLA-A-*0201) computationally. Mutated data was taken from cancer gene census of the cosmic database [22]. 23985 genes were having germ-line mutations in endometrial cancer and top 5 genes were given consideration on account of having the highest score.
Mutated peptides are of great significant value and they act as a potential immunological target for the cancer therapy [23]. Therefore, those mutated peptides were selected for further analysis [25]. After the submission of those peptides to MHCPRED 2.0, MHC alleles (HLA-A-*0201 and HLA-A-*0101) were selected to check binding affinity. The screening was performed based on the parameter i.e. IC50 value. Only those HLAs were screened out with the inhibitory concentration (IC50) less than 200.
For comparison and for the identification of Tumor-specific antigen (TSA) among the total count of 898 HLAs statistical analysis was performed. To determine the strength of association between variables correlation analysis (r) between dependent parameter i.e. a Confidence level of residues and the independent parameter i.e. amino acid was performed which revealed that r value was positive and was showing the strong interaction/association. Based on linear modeling, we could suggest there were only 3 HLAs from the list of 898 having the maximum confidence level. Hence they could be referred to as best TSA against endometrial cancer. The TSAs were also validated through docking and the result shows that the peptide ligands bind best with the active pocket of the protein. A computational peptide ligand-protein docking was used to analyze and validate the structural complex of receptor protein (target) with peptide-ligands. Our results show that receptor-ligand (peptide) shows strong interaction and the identified TSAs bind best with the active pocket of 4MZR protein.
Estimated accuracy for predicted protein-peptide complex of GSDDINVVT, LLLSVLLSV and TTDCLQILA are 0.305, 0.357 and 0.294 respectively. Figure 5 shows the potential binding sites of TSAs with target protein residues. Maximum of amino-acid residues bind with the identified TSAs, which show strong association [24].

Conclusions
In this study, an Insilco approach is formulated for the identification of tumor-specific antigen. This research work provides the best framework for the analysis of those mutated peptides and their binding with HLAs which could be a great significance towards developing personalized peptide vaccines that could directly mark the mutated targets in cancer therapy. This could have a superior effect in mutating target cell population [25]. These identified TSAs (GSDDINVVT, TTDCLQILA, and LLLSVLLSV) are of great prognostic value. In future, results of this study require experimental validation for implementing these personalized vaccines as it was an Insilco work.