Insight into the LGP2 Helicase Domain – An in Silico Approach

LGP2, a member of retinoic acid inducible gene–I like receptors (RLR), encoded by the gene DHX58 in human induces antiviral response against many RNA viruses. The LGP2 shares a considerable similarity to the amino acid sequence of Hef Helicase Domain, the helicase domain of RIG-I-like protein helicase-associated endonuclease (Hef), which is involved in RNA binding. Earlier studies suggest that RLR contains a conserved C-terminal domain (CTD), which is responsible for the binding specificity to the viral RNAs and C-terminal domain of LGP2 also takes part in RNA binding. The present study is aimed to explore the interactions of LGP2 and RNA, thereby finding the crucial residues for the interaction, with the help of in-silico tools. The predicted crucial residues were validated by docking and molecular dynamics simulation studies as well as by fitting the model on a LGP2 density map. Our results in this study suggest that the residues in the helicase domain of LGP2 are crucial for RNA binding and it is positioned around the groove region of LGP2. Though earlier experimental studies identified the RNA binding residues, but our in silico binding studies with the full length LGP2 predicted some additional residues that can be valuable


Introduction
The first line of defense, innate immune response is dependent on pattern recognition receptors; one class of pattern recognition receptors is toll like receptors, found on the cell surface or endosomes in effector cells, that recognize pathogen associated molecular pattern typically located on the surface of pathogen cells [1], whereas RIG-I like receptors (RLRs) are in the cytoplasm. The RLR family, within superfamily 2 (SF2) RNA helicases, include RIG-I (retinoic acid-inducible gene-I), MDA5 (melanoma-differentiation-associated gene 5), and LGP2 (Laboratory of Genetics and Physiology 2) and are closely related to DEAD-box helicases that can sense RNAs to initiate antiviral responses [2,3]. In human, LGP2 is encoded by DHX58 gene [4]. Though LGP2 is quite similar to other RLR family members RIG-I and MDA-5, it lacks the signature N-terminal caspase activation and recruitment domain (CARD) (Fig. 1) that is required for signaling [5]. Different studies defined LGP2 as a positive as well as negative regulator of RIG-I and MDA5-mediated viral recognition [6,7]. Further, it was demonstrated that LGP2 protein produced in insect cells can bind both single-and double-stranded RNA (ss-and ds-RNA), with higher affinity for dsRNA and significant change in the conformation of the LGP2 monomer to a stable dimer takes place when it was complexed to a dsRNA [8]. The RLR contains a conserved C-terminal domain (CTD), which is responsible for the binding specificity to the viral RNAs. The solution structure of LGP2 (PDB ID: 2RQA) C-terminal domain (CTD) (546 -678) has shown that it binds to dsRNA and 5ʹppp-ssRNA with higher affinity than CTD of RIG-I [9]. A similar view was also reported in the 2.0-Å resolution crystal structure of human LGP2 CTD and 8-bp dsRNA complex (PDB ID: 3EQT), which was also supported by gel filtration chromatography and analytical ultracentrifugation studies [10].
The helicase domain of RIG-I-like protein, helicase-associated endonuclease (Hef), consists of three structural sub-domains that share common conserved sequences like other SF2 helicase members, e.g. RIG-I and DEAD-box. The Hef helicase domain (Hhd, residues 1-494) contains seven helicase motifs that were spread in sub-domains 1 and 3 [11,12]. The crystal structure of Hhd resembles a partially opened clamp with a concave grove associated with its lower portion [11]. Murali et al [8] , using electron microscopy and single particle analysis reported the low resolution structure of LGP2. By superimposing the crystal structure of Hef helicase domain into the electron density map of LGP2, it was observed that LGP2 shares a considerable structural similarity with the three domains of Hhd [7]. Further LGP2 also shares a considerable sequence similarity with Hhd (Figs. 1 and 3.i). It was also reported that the RNA binding motifs that were observed in Hhd [11] were also conserved in LGP2 (Fig. 3.i) and that they aligned in the groove part of the LGP2 structure when the crystal structure of Hhd is superimposed on the LGP2 density map [8]. Though the structure of LGP2 CTD complexed with RNA was extensively studied by X-Ray diffraction and NMR techniques [9,10], the interaction of RNA with the full length LGP2 (C-and N-termini) will be more biologically relevant. In this study, we made an attempt to understand the RNA interactions with full length LGP2 with the help of several in silico tools.

Protein Structure Prediction
In the absence of a full length, experimentally determined high resolution structure of LGP2, homology modeling approach was used to predict the full length structure using MODELLER 9.10 [13]. Amino acid sequence of LGP2 was retrieved from UniProtKB [14] database (Q96C10) for three-dimensional structure prediction. PDB database was searched for homology by using BLAST tool. From homology search, Helicase associated endonuclease (HEF) helicase domain was retrieved as best suitable template for homology modeling with 42% sequence identity.

Cryo-EM Flexible Fitting
The LGP2 density map was already solved by electron microscopy and was used in this study [8]. Since this density map has a low resolution to determine the secondary structural elements, we have used modeling based on flexible fitting approach to predict its structure. The structure of LGP2 that resulted in homology approach was fitted to the density map, with the Flex-EM module [15] of MODELLER.

Active Site Prediction
In the present work, CASTp (Computed Atlas of Surface Topography of Proteins) [16] is used for identification of active site regions. The in silico model of LGP2 was submitted to CASTp server for active site identification. It provides identification and measurements of surface accessible pockets as well as interior accessible cavities.

Docking Studies
As we have already found helicase domains of LGP2 as active site region through sequence alignment, docking was performed for the confirmation of those active site residues by using RNA strand. Li et al [10] reported a crystal structure of C-terminal domain of LGP2 complexed with RNA. This RNA was used as a reference ligand for the identification of active site residues. Docking was performed with the help of FFT-based protein docking server HEX [17].

Molecular Dynamics Simulations (MDS)
The stability of the three-dimensional structure of LGP2 along with the interaction strength was checked by using Gromacs 4.5.3 simulation package [18] with the help of Amber force field [19]. A maximum of 50,000 steps of energy minimization was carried out for the system with a tolerance of 1000 kJ mol −1 nm −1 using conjugant gradient algorithm followed by steepest descent minimization with the same number of minimization steps. PME method was used to apply a twin-range cutoff (1.0 nm for Vander Waals and electrostatic interactions) for long interactions [20]. The minimized and solvated systems upon satisfying the geometry and solvent orientations were further used in the simulation steps. The LINCS [21] algorithm was used to constrain all bond angles, while the SETTLE [22] algorithm used to constrain the geometry of water molecule. Equilibration of the entire system was carried out for 50 ps each for position restraints of both NVT (constant number of particles, volume and temperature) and NPT (constant number of particles, pressure and temperature). Equilibration of the entire ensemble confirmed the uniformity of temperature, pressure, density and total energy of the system. The pre-equilibrated system was subsequently used in the production MDS of 15000 ps (15 ns) with a time-step of 2 fs. For every two ps structural coordinates were saved and analyzed using the analytical tools available in the Gromacs package. Conformation with the lowest potential energy was selected from the 15 ns MDS trajectory, which was further refined by energy minimization. The refined models were validated using the structural analysis and verification server (SAVES) (http://nihserver.mbi.ucla.edu/SAVES/), which uses various tools, including PROCHECK [23] , ERRAT [24], and VERIFY_3D [25].

Sequence Alignment
For structure prediction, Hef helicase domain (PDB ID: 1WP9) of Pyrococcus furiosus was used as template because it shares a considerable structural similarity with LGP2 ( Fig.  1). To gain an insight into the residual arrangement of protein, we performed sequence alignment by ClustalW [26] (shown in Fig. 3.i). From Fig. 3.i, it can be seen that both the sequences are sharing several conserved regions, including the helicase motifs reported for Hhd [11] (marked in black boxes in Fig. 3.i). Insight into the LGP2 Helicase Domain -An in Silico Approach

Three Dimensional Structure of LGP2
For the better understanding of secondary structural components for active site formation in three-dimensional space, three dimensional structure predictions are necessary. Murali et al [8] reported the low-resolution LGP2 structure in its monomeric and dimeric forms. However, in order to explore the active site, it is required to have a better resolution. The three-dimensional model of LGP2 was modeled using homology modeling with the help of CryoEM module of modeller 9.10. The model having the lowest DOPE score (-68165) and molpdf value (4987.37) was chosen for further analysis. Analysis of the secondary structure elements of model was done with the help of PDBsum [27]. The PDBsum results suggested that LGP2 three-dimensional structure contains 6 β sheets, 4 βαβ units, 4β hairpins, 2 β bulges, 20 β strands, 30 α helices, 26 helix-helix interactions, 50 β turns, and 5 γ turns (Fig. 2). Information of the helix-helix interactions are given in the There are total 50 β turns present in the three-dimensional structure of LGP2 and out of them 6 β turns (present between helix 9-10, helix 19-20, helix 10-11, helix 15-16, helix 17-18, and helix11-12) are playing important role in the proper folding of this protein along with them one γ turn that is present between helix 15-16 is also important. β hairpin turn present between residues 570-575, 610-617 and 620-630 are also important for the formation of C-terminal domain, which is the RNA binding site of this protein. Total 3 β sheets are present in this protein and these sheets contain 20 β strands. These secondary structure elements of the protein are necessary for the better understanding of spatial arrangement of LGP2 structure as interactions between these elements defines the folds, which eventually forecasts the biological relevance.
The 3D model was then subjected to validation. The Ramachandran plot analysis shows that the model is of good quality as 92.5% of the residues fall into the most favoured regions. The model was also submitted to ProSA Server [28], which shows Z-Score: -9.72 for overall model quality.

Fitting of Electron Density Map
The flexible fitting was done by optimizing the model simultaneously with its position and orientation in the density map of LGP2 [8] using Flex-EM, which integrates rigid and flexible fitting. First, rigid fitting of the model structure into the density was performed using Mod-EM guided by a cross-correlation coefficient (CCF) between the atomic structure and the density map. Next, the best fit (with CCF ~ 0.8296) was refined by using Flex-EM. The rigid residues were found with help of RIBFIND server [29]. All residues, except the rigid bodies found from RIBFIND server, were allowed to move flexibly by using a molecular dynamics optimization. The final model fitted into the electron density map using the fit-in-map utility of UCSF Chimera is shown in Fig. 3.ii. The flexible fitting of the model structure into the LGP2 density map and the position of the helicase domain in the density confirms the earlier observations [8] with a marginal difference. The earlier study have shown that all Hhd helicase motifs fall into the groove part of the LGP2 density map, while in our study we found that the conserved helicase domains of LGP2 are spread along the curvature and groove of the density map ( Fig. 3.ii). Some of residues of the conserved helicase motifs of LGP2 moved away from the groove. However in docking studies we have seen that these residues are making interaction with the RNA (see docking results in next section).

Binding Site Prediction
After confirming the position of helicase motifs in LGP2 density map Fig. 3.iii.a, we used CASTp to identify the active site residues. In CASTp analysis, we have chosen the pocket having the largest area and the volume. The residues, which were falling in the active site pocket, are given in table 2 and we can see the position of those residues in Fig. 3.iii.b in three-dimensional space. The common residues (132-135, 374-377, 439-444) from both the conserved helicase motifs and CASTp calculations were selected as active site residues for further work and they are shown in Fig. 3.iii.c. Insight into the LGP2 Helicase Domain -An in Silico Approach  Docking studies were performed for finding the hotspot residues of LGP2 to confirm whether the binding residues correlate with our active site residues identified by CASTp and sequence alignment. Here protein-RNA docking was performed with help of HEX protein docking server without providing the active site residues. HEX generated 15 clusters of docked protein and RNA. Out of those 15-clusters, best cluster with the maximum dock score was selected for further analysis (Table 3). This complex was analyzed with the help of DIMPLOT module of LIGPLOT [30] software for visualization of interactions (with a cutoff value of 3.5 Å chosen for hydrogen bonds). The atoms, which are forming Hydrogen bonds and hydrophobic interactions, are given in table 3 along with their residue numbers. Interestingly, those residues, which are common in both CASTp calculations and sequence alignment are also making bond with RNA (as identified by DIMPLOT) (Fig. 4.i).

Docking
The residues forming H-bonds in our in silico studies are also common in the other experimental studies such as NMR titration experiments (T 575, H 576, K 634, R 636, K 651 and W 652) as well as X-ray crystallography (H 576, K 651 and W 652). Apart from these, we have seen some additional residues (377 and 635) making different interactions with LGP2 (table 4). These results suggest that the RNA binding cavity is not confined to the C-terminal.   LGP2 got local minima at points I and II till 9ns but at point III (after 10 ns) stability was achieved and remained same for rest of the simulation period. ii) RMSF profile of LGP2. The C-terminal part of LGP2 shows more flexibility compared to rest of the structure. This is the region where RNA binds. The flexibility of this region may help in the complex association and dissociation.

Molecular Dynamics Simulation
A 15-nano second molecular dynamics simulation study was carried out to check the stability of RNA-protein interactions.
The stability of the LGP2-RNA complex was checked through RMSD profile (shown in Fig. 5.i). This graph was generated between the conformation of protein LGP2 before and after 15 ns simulation. The RMSD profile of any structure is generated based on the scalar distance between same atoms of two different conformations. Here the RMSD profile was generated for only the backbone residues. As we can see in the Fig. 5.i, the LGP2 was showing fluctuation up to 3 ns, after that protein got minimum deviation for some extent. Again, at 7 ns, it returned to almost same minimum deviation as seen at 3ns. After a simulation for some more time it got stability at 9 ns and it was stabilized at that conformation for rest of simulation period. Protein was deviating up to 3 Å for getting stability in structure from modeled structure. The fluctuation range is acceptable for modeled structure. After getting the RMSD profile and reaching stable conformation, identification of most fluctuating region of LGP2 was done by generating its RMSF profile (shown in Fig. 5.ii). The most fluctuating part of LGP2 consists in C-terminal region (between residue numbers 500-555); this is the region, which is adjacent to RNA binding residues. The RMSF profile of LGP2 is generated only for α carbon of backbone.
After identification of these fluctuating moieties, the interaction strength of protein and RNA was analyzed by generating graph of hydrogen bonds (Fig.4.ii). As we can see, the numbers of H-bonds were ranging between 15-25 (occasionally going to a minimum of 12) and it was remaining at ~20 for most of the simulation period. This shows that interaction between protein and RNA is strong enough for its functioning in immune system.
An H-bond existing graph was generated with the cutoff range of 3.5Å for H-bonds. Here the H-bonds were calculated from each MDS trajectory in 2fs interval. Multiple occurrences of H-bonds between same base-residue pair were counted as one. The total numbers of H-bonds made between LGP2 and RNA were around 186 throughout simulation. Out of 186 H-bonds, around 13 hydrogen bonds were showing consistency (Fig. 4.iii). Out of these 13 consistent H-bonds, 4 bonds were matching with LigPlot analysis. The details of interactive residues along with distances are given in table 3.
The results were further analyzed by molecular dynamics simulations. The full length model with RNA, LGP2 C terminal-RNA model and NMR structure of LGP2 C-terminal (PDB ID: 2RQA) were simulated for 15 ns using Gromacs to compare our modeled structure with the experimental structure. The NMR structure 2RQA was chosen for the comparison as it has more similarity with our modeled structure. From Fig. 6, we can see that both the full-length LGP2-RNA as well as LGP2 C-terminal-RNA model are showing a similar pattern of fluctuation with the experimentally determined structure of C-terminal domain (2RQA). This similar pattern clearly indicates that our in-silico observations are supported by NMR observations.

Conclusions
In summary, our study provides a more detailed insight into the LGP2 RNA binding scenario. Though the crystal structure of CTD bound with RNA and the NMR structure without RNA were solved, there is no full-length structure available for LGP2, and information regarding the active site of LGP2 is also limited. In this study we have modeled the full-length structure of LGP2 through homology modelling approach. We have also predicted the active site residues for LGP2 with in-silico tool, which was supported by the sequence analysis of Hef helicase domain.
The flexible fitting of the model structure into the LGP2 density map and the position of the helicase domain in the density map is correlating with earlier studies [8]. The successful non biased docking and molecular dynamics simulation studies of RNA-LGP2 complex also supports the position of active site residues. We have also seen that the hydrogen bonding was stable throughout the simulation confirming the tighter bonding between LGP2 and RNA. This study suggests that, in the case of full length LGP2 structure, the critical RNA binding residues spreads out of its niche compared to the earlier reports focusing only on the CTD. Thus our structural information provides the basis for a molecular understanding of the way LGP2 binds, interactions required for binding and conformational change. Further in vitro and in vivo studies on LGP2 will be helpful for developing new methods against viral attacks.