A Statistical Model for Prediction of Lower Limb Injury of Active Sportsperson

For an active sportsperson, running is the most common physical activity, but it carries a high risk of musculoskeletal injuries. Half of the running injuries are identified as overuse injuries, with the most affected areas being the lower limbs. Previous studies had revealed several factors responsible for the development of running-related lower-limb injuries of sportspersons. However, there have been few studies aiming at predicting them. Therefore, the present study aimed to develop a predictive model to predict lower limb injury of active sportsperson. The BTS G-WALK system synchronised with two GoPro Hero 6 cameras were used to conduct the study on seventy-five (N=75) healthy male subjects without any lower limb injury history. The BTS G-WALK system provided spatio-temporal parameters while Kinovea software was used to extract kinematic data from raw videos of treadmill running movement of the subjects. A prospective cohort study design was used to investigate how the difference in running gait kinematic affects the outcome of lower limb injury occurrences of active sportspersons. Further, a prediction model was developed using binary logistic regression, for which IBM® SPSS® version 25 was used. All statistical analyses were tested at 0.05 (p = 0.05) level of significance. The model indicated that Range of Pelvic Obliquity (RPO) and Maximum Toe Out (MTO) were positively and Symmetry Index (SI) was negatively associated with an increased likelihood of exhibiting lower limb injury. The model explained 85.7% variance and correctly classified 93.3% cases of lower limb injury of an active sportsperson. The risk factors for lower limb injuries of a sportsperson can be identified and prediction of lower limb injury of a sportsperson is theoretically possible. To generalize the model for practical implications, the researcher suggested further research with larger sample size.

most common physical activity, but it carries a high risk of musculoskeletal injuries. Half of the running injuries are identified as overuse injuries, with the most affected areas being the lower limbs. Previous studies had revealed several factors responsible for the development of running-related lower-limb injuries of sportspersons. However, there have been few studies aiming at predicting them. Therefore, the present study aimed to develop a predictive model to predict lower limb injury of active sportsperson. The BTS G-WALK system synchronised with two GoPro Hero 6 cameras were used to conduct the study on seventy-five (N=75) healthy male subjects without any lower limb injury history. The BTS G-WALK system provided spatio-temporal parameters while Kinovea software was used to extract kinematic data from raw videos of treadmill running movement of the subjects. A prospective cohort study design was used to investigate how the difference in running gait kinematic affects the outcome of lower limb injury occurrences of active sportspersons. Further, a prediction model was developed using binary logistic regression, for which IBM® SPSS® version 25 was used. All statistical analyses were tested at 0.05 (p = 0.05) level of significance. The model indicated that Range of Pelvic Obliquity (RPO) and Maximum Toe

Introduction
Prediction is a feature of statistical interpretation with several approaches. While statistical analysis generally provides knowledge about a population based on a sample population, this is not always the case with predictive statistical analysis. Prediction is the process of forecasting and it involves time-series data. A predictive model of injury occurrence may help to prevent future injury of sportspersons with a higher risk of developing injury [1]. It can inform the subject about the risk level or probability of occurrence of injury and can help in taking appropriate preventive actions.
Running is one of the favourite activities of a sportsperson. Running-related injuries are typically attributed to several factors, although poor running mechanics is considered to be a key cause. To address the prevalence of running injuries including medial tibial stress syndrome (MTSS), patellofemoral pain (PFP), iliotibial band syndrome (ITBS), and achilles tendinopathy (AT); a great amount of study has been conducted already. The majority of these studies assume that the same kinematic factors are related to various running injuries [2]. For example, excessive hip abduction has been linked to the development of both MTSS and PFP [2]. Similarly, it was also reported that greater hip adduction or hip internal rotation may produce increased rearfoot eversion [2]. The majority of such researches have been emphasising and analysing the pattern of injuries to develop training protocols aiming at extenuating them, rather than examining the factors that contribute to the risk of injury, while others have been conducted at a more sophisticated level to use the information to prevent future injuries [3]. However, there have been few studies aiming at developing a predictive model to predict running-related lower limb injuries of an active sportsperson. Therefore, in the present study, the researchers intend to develop a statistical predictive model to predict lower limb injury of an active sportsperson.

Participants
By using the term "Active Sportsperson", the researchers intended to mean the persons who were regularly involved in running-related sports activities. Since male and female sportspersons' gait kinematics and injury occurrence significantly differ, therefore, the study was confined to male subjects only [4]. A total of 75 (N=75) subjects (Age=18.31±1 years, Weight=63.60±8.14 kg and Height=173±7.44 centimetres) were selected for the study. None of the subjects had any lower limb injury history and it was confirmed by using an injury history questionnaire.

Procedures
The procedure was approved by the departmental research committee of the institute where it was conducted. The subjects had given prior consent to participate in the study before the collection of the data. Data were collected with the minimum risk of injury to the subjects. The confidentiality of the subjects and collected data were maintained by the researchers with utmost care.
The data were collected by synchronization of BTS G-WALK ® and a video-based two-dimensional gait analysis system. The two-dimensional gait analysis system was composed of two 'GoPro Hero 6' cameras and a two-dimensional motion analysis software named 'Kinovea'. The 'BTS G-WALK ® ' system was a highly reliable and valid tool to measure spatio-temporal gait parameters and pelvic kinematics [5]. On the other hand, the high reliability and validity of Kinova software's ability for gait analysis had already been established and it's widely popular among the sports science community [6,7].
The data collection was administered inside a laboratory which facilitated a treadmill and enough space to follow the procedure for camera setup. The subjects were recorded while running on the treadmill barefoot for at least five minutes at a constant speed of 8 km/h. The 'Run Protocol' of 'BTS G-WALK ® ' system was employed for the collection of data for spatio-temporal parameters, pelvic kinematics, and symmetry index, while Kinovea two-dimensional motion analysis software was used to manually extract lower limb kinematic data from the recorded treadmill running video files of the subjects

Selection of Variables
After a careful investigation of the related literature, the following variables were selected ( Table 1). The mentioned variables were considered as independent variables to investigate the dependent variable "Injury Occurrences". The variable "Injury Occurrence" is a dichotomous variable that had only two outcomes: 'Yes' and 'No'.

Cadence (C)
Cadence is the average value of the number of steps per minute over the whole trial. The unit of measurement is 'steps/min'.

2.
Relative Stride Length (RSL) Relative stride length is the ratio between actual stride length and the height of the subject. The unit of measurement is 'metre'.

3.
Relative Running Base (RRB) Relative running base is the ratio of the running base with the height of the subject. It is measured in 'metre'.

4.
Stance Phase Duration (STPD) Stance phase duration is the average duration of the right and left foot support phase. It is measured in the unit of '% of the running cycle'.

5.
Float Phase Duration (FPD) Float phase duration is the average duration of the phase in which none of the two feet is on the ground. The unit of measurement is '% of the running cycle'.

Propulsion Speed (PS)
Propulsion speed is the average pushing speed when the limb is in contact with the ground. It is measured in the unit 'm/s'.

Range of Pelvic Tilt (RT)
Pelvic tilt is the rotation of pelvic in the sagittal plane. Range of pelvic tilt is the amount of variation between the highest and lowest angle of rotation of pelvic in the sagittal plane. The unit of measurement is 'degree'.

Range of Pelvic Obliquity (RO)
Pelvic obliquity is the rotation of pelvic in the frontal plane. And the range of obliquity is the amount of variation between the highest and lowest angle of rotation of pelvic in the frontal plane. The unit of measurement is 'degree'.

Range of Pelvic Rotation (RR)
Pelvic rotation is the rotation of pelvic in the transversal plane. Range of rotation is the amount of variation between the highest and lowest angle of rotation of pelvic in the transverse plane. The unit of measurement is 'degree'.

10.
Symmetry Index (SI) Symmetry index comes from the comparison between right and left running cycle. It is the index of similarity between right and left running cycle.

Maximum Knee Adduction (MKAD)
Maximum knee adduction is the maximum medial deviation of the knee towards the midline of the trunk during running movements. The maximum knee adduction is measured in the unit 'degree'.

Maximum Knee Flexion (MKF)
Maximum knee flexion is the highest angle created by flexion of the knee during the stance phase of running gait. It is measured in the unit 'degree'.

Maximum Ankle Inversion (MAI)
Maximum ankle inversion is the highest angle formed at the ankle joint towards the medial side during the stance phase of running gait. It is measured in the unit 'degree'.

Maximum Ankle Dorsiflexion (MAD)
Maximum ankle dorsiflexion is the dorsiflexion movement of the ankle at the flat foot position of the stance phase. The unit of measurement is 'degree'.

Maximum Ankle Plantarflexion (MAP)
Maximum ankle plantarflexion is the plantarflexion movement of the ankle at toe off position of the stance phase. The unit of measurement is 'degree'.

Maximum Toe Out (MTO)
Maximum toe out is the amount of angle created by the toe with the movement line of the tibia by moving laterally outside during the stance phase of running gait. It is measured in the unit 'degree'.

Experimental Protocol
The study followed a prospective cohort study design. Active sportspersons' gait kinematics data were recorded initially with a one-year follow-up period. After completion of the follow-up period, the researcher inquired about the subjects' injury occurrences using an injury assessment questionnaire. Afterwards, the researcher analyses how differences in gait kinematic affect the occurrence of lower limb injury among active sportspersons.

Statistical Analysis
Since the outcome of the study was dichotomous (Yes/No) and the predictors were continuous variables; 'Binary logistic regression was used as the statistical technique [8]. The assumptions were verified using the 'Correlation matrix', 'Box-Tidwell test' and 'Tolerance statistics. Further, the 'Forward (LR)' method of binary logistic regression was employed for exploratory analysis of the variables. In various steps of the binary logistic regression test; the 'Omnibus test' was used for models' coefficients, the 'Hosmer and Lemeshow test' was used for goodness of fit of the models and to find the efficiency of the models 'Nagelkerke R Square' was used. For analysis of data, 'IBM® SPSS®, Version 25' [9] was used and all statistical analyses were tested at a 0.05 (p = 0.05) level of significance

Results
The injury assessment questionnaire reported that 24% (n=18) of total subjects suffered from any lower limb injury during one-year follow-up period. Descriptive statistics (   Table 3 exhibits the mean difference of the selected variables between the injured and non-injured subjects. The table reveals that, among all the variables, only range of pelvic tilt, range of pelvic obliquity, range of pelvic rotation, symmetry index, maximum knee adduction and maximum toe out has exhibited significant difference (p<0.05) between injured and non-injured subjects.
The findings of the binary logistic regression test lead to develop five logistic models and all are statistically significant (x 2 (1) =33.076, p<0.05; x 2 (2) = 50.802, p<0.05; x 2 (3) = 61.166, p<0.05; x 2 (4) = 66.246, p<0.05; x 2 (5) = 63.658, p<0.05) ( Table 4). The first model explains 53.4% (Nagelkerke R 2 ) of the variance (Table 5) in lower limb injury and correctly classified 84.0% of cases ( Table 6). The model indicates that Range of Pelvic Tilt is positively associated with an increased likelihood of exhibiting lower limb injury. Further, it is revealed that the Range of Pelvic Tilt a statistically significant (p<0.05) factor to the model ( Table 7). The second model explains 73.7% (Nagelkerke R 2 ) of the variance (Table 5) in lower limb injury and correctly classified 90.7% of cases ( Table 6). The model indicates that the Range of Pelvic Tilt is positively and Symmetry Index is negatively associated with an increased likelihood of exhibiting lower limb injury. Further, it is revealed that the Range of Pelvic Tilt and Symmetry Index are statistically significant (p<0.05) factors to the model ( Table 7). The third model explains 83.5% (Nagelkerke R 2 ) of the variance (  Step 1 Step Step 2 Step Step 3 Step Step 4 Step A negative Chi-squares value indicates that the Chi-squares value has decreased from the previous step. a. Estimation terminated at iteration number 6 because parameter estimates changed by less than .001. b. Estimation terminated at iteration number 7 because parameter estimates changed by less than .001. c. Estimation terminated at iteration number 8 because parameter estimates changed by less than .001. d. Estimation terminated at iteration number 9 because parameter estimates changed by less than .001.  Here, "p" is the probability of lower limb injury occurrence.

Discussion
The study aimed at finding the gait kinematic factors which are responsible for lower limb injury of sportsperson and to develop a statistical model with those factors to explain the variance and correctly classify the cases of lower limb injury among sportsperson.
The descriptive statistics of the selected variables revealed that the cohort had an average cadence of 169.38±9.26 steps per minute. A previous study had suggested that the range of step frequency or cadence of elite ultramarathon runners were between 155.40 and 203.10 steps per minute with an average of 182 steps per minute [10]. Furthermore, the study also revealed that cadence increases with faster speed but it has no relationship with gender, weight, age and experience of running [10]. The University of Michigan [11] had reported that cadence is highly individual, and runners should not attempt to manipulate it to achieve the ideal measure of 180 steps per minute. To minimize the effect of varied height on the subjects' stride length and running base, the researcher went for relative value for both the variables, which is the proportion between the actual value and height of the subject. The average relative stride length and running base were found to be 54.89±3.533 and 2.97±2.338 centimeters respectively. Landers et al. [12] had reported a significant relationship between height and stride length in their study. Therefore, it was necessary to consider the relative value for both the variables of the current study. The average stance phase duration, swing phase duration and float phase duration were 22.25±3.44, 77.74±3.44 and 27.38±3.30 percentage of gait cycle respectively. Different authors described running phases in different ways. Some had described in two phases, i.e.; stance and swing phase and others have added float phase along with it. In recent times, float phase is considered as part of the swing phase. The duration for running phases depends on the speed of running. The stance phase ends with toe off and it occurs before completion of 50 percentage of gait cycle [13]. Novacheck [13] also reported that elite sprinters' stance phase ends as early as 22% of the gait cycle which also indicates the swing phase of 78 percentage of the gait cycle. The propulsion speed of the subjects of the current study was 0.72±0.18 meters per second. Hamner et al. [14] describe propulsion speed as the speed of forwarding movement of body center of mass. Most research on propulsion addressed the force required to push the body forward instead of the speed of propulsion. Propulsion force increases with the increasing speed of running. Research conducted on runners of different sprinting abilities also reported that maximum running speed can be achieved by applying higher propulsion force. It is also the key to reduce fatigue and gait efficiency. The present study reported that the range of pelvic tilt was 8.30±2.55 degrees, the range of pelvic obliquity was 8.13±1.69 degrees and the range of pelvic rotation was 9.49±2.47 degrees. Pelvic tilt keeps the propulsion force in a way that allows forward movement of the body. Excessive pelvic rotation may increase stride length excessively. A three-dimensional study of lumbar spine and pelvic join during running had reported that average pelvic obliquity to be 2.3±1.2, tilt 15.1±3.7 and rotation -3.9±2.5 degree but the study didn't inform about the range of movement [15]. The basic method of calculating symmetry index is to find the proportion between values of parameters of both limbs [16]. Viteckova et al. [17] had listed a number of methods to calculate and quantify the symmetry of gait. In some methods, 100 value of indices indicates full asymmetry [18] but here, in the current study, symmetry index of 100 indicates no asymmetry between both the limbs. The current study exhibited an average symmetry index value of 98.07±1.39 among the subjects. Langford et al. [19] also suggested that a symmetry index of 85 or above is considered within the normal range. Numerous studies had suggested already that gait asymmetry is a major factor for lower-body injury. The maximum adduction angle was 176.69±13.58 degrees. Barrios et al. [20] had mentioned, knee adduction movement is the result of varus knee alignment and the amount can be varied by altering the gait pattern. The maximum ankle inversion angle was found to be 176.68±6.21 degrees. Previous research had reported the ankle joint range of motion between 5-degree eversion to 5-degree inversion [21]. In the present study, absolute angles were calculated for various kinematic parameters and while converted according to the method of Chan et al. [21]; the result of the present study also exhibited similar trends. In the present study, maximum ankle dorsiflexion is calculated by measuring the ankle angle during the heel strike phase of running gait and maximum ankle plantarflexion is calculated by measuring the ankle angle during toe off phase of the running gait cycle. The values were 83.2±5.72 and 115.77±6.51 degrees respectively. Leblanc & Ferkranus [22] said that ankle angle significantly differs between the shod and barefoot condition of running. In addition to it, foot kinematic differs according to different types of shoes. Therefore, the current study was confined to only barefoot running conditions. Many studies had already reported that ankle kinematics is a significant risk factor for lower limb injury [23,24]. The current study suggested that maximum toe out angle of the subjects was 25.32±14.64 degree. Generally, a typical amount of toe out angle can be seen among runners and it varies according to the speed of running.
From the comparative mean difference between injured and non-injured subjects; six variables displayed significant mean difference. The result showed the range of pelvic tilt, range of pelvic obliquity, range of pelvic rotation, symmetry index, maximum knee adduction, and maximum toe out had exhibited significant mean differences between injured and non-injured subjects. A similar kind of result was reported by many previous studies. Bell [25] had reported in his thesis that significant variability in pelvic kinematic was observed between injured and non-injured elite level football players. The result presented in their papers by Bayne et al. [26] and Alizadeh & Mattes [27] also explained similar results about injured and non-injured subjects. In another study conducted by Lessi et al. [28] reported that change in pelvic kinematic is dependent on the mode of locomotion, fatigue and gender differences. Wellenkotter et al. [29] had suggested that limb symmetry index can estimate injury risk. And this result also supports the finding of the present study. Many research evidence suggested the difference of knee kinematics between healthy and injured subjects and highlighted the importance of medical application of knee joint kinematics to classify injury among different populations [30,31] however, the association between altered knee kinematics and lower limb injury seems to be conflicting according to Cronström et al. [32]. The result of the comparative descriptive statistics about toe out angle is supported by the results of Rosenbaum [33] and Simpson & Jiang [34]. They suggested that toe out can highly intensify the pressure on the medial aspects of midfoot and forefoot compared to toe in and can ultimately cause injury. Though the current study exhibited a greater toe out angle in an injured group of subjects, Chang et al. [35] suggested that a higher toe out angle is inversely associated with the progression of knee osteoarthritic.
Bramah et al. [2] conducted a study to identify the association of pathological gait with soft-tissue running injuries. The study identified several kinematic factors including pelvic tilt and pelvic obliquity as contributors to running-related injuries. Bramah et al. [2] reported that high pelvic obliquity and pelvic tilt along with higher knee extension and dorsiflexion angles are highly associated with common running injuries. The present study also witnesses a similar kind of result. The finally selected model (fifth model) indicates that pelvic obliquity is positively associated with an increased likelihood of exhibiting lower limb injury.
Excessive pelvic obliquity had been proved to be associated with patellofemoral pain, medial tibial stress syndrome, iliotibial band syndrome and achilles tendinitis [2]. The possible excessive pelvic obliquity may be due to low neuromuscular functional ability and lower strength of the hip joint. Excessive pelvic obliquity increases the tension in the iliotibial band at the lateral femoral condyle consequent of which is increasing strain and compression between the iliotibial band and lateral femoral condyle which may lead to the occurrence of iliotibial band syndrome [2]. Further, the increasing tension at the iliotibial band may cause lateral displacement of the patella and ultimately lead to patellofemoral pain because of the excessive stress at the patellofemoral joint [36]. More of it, excessive pelvic obliquity can medially shift the direction of ground reaction force from the centre of the knee joint which may result in imbalanced force distribution in the lower limbs of a human body. An imbalanced force distribution to the lower body can also change the pressure distribution on the foot and imbalanced pressure distribution on the foot can lead to developing medial tibial stress syndrome and achilles tendinitis [37].
The final injury prediction model of the current study also revealed that the gait symmetry index is negatively associated with the likelihood of exhibiting lower limb injury. That means a person with higher symmetry index is less likely to suffer from sports-related lower limb injuries. A recent study conducted by Gogoi et al. [16] revealed that along with the other three pelvic kinematics, symmetry index is also proved to be a significant predictor of lower limb sports injuries. Evidence suggests that not only in sports injury, in many other neurophysiological disorders like stroke, parkinson's disease or cerebral palsy; symmetry index can be considered as the tool to measure the severity of the conditions [40]. Further, the study conducted by Hoerzer et al. [38] also revealed that footwear can improve gait symmetry index. Cabral [39] said, gait asymmetry is usually present in pathological gait due to various factors like injury, anthropometric reasons etc. however, Liu et al. [40] suggested that gait asymmetry is unpredictable even in patients with a leg length discrepancy.
According to the final model for lower limb injury prediction, maximum toe out angle positively contributes to the likelihood of lower limb injury occurrence. The basic cause of toe out is the external tibial rotation and excessive external tibial rotation can cause excessive toe out angle during gait. Excessive tibial rotation can increase the load on the hip and patellofemoral joints causing malalignment of the patella as well as of the hip which may cause injury to the lower limb in the long run. Tighter ligaments and tendons can cause these external rotations in lower limbs. Cameron & Saha [41] had mentioned that genetics and external forces can also contribute to excessive external rotation of the tibia. Sometimes, excessive external rotation of the hip can also cause toe out resulting in pain in the hip. Another cause of toe out is calcaneal eversion. Fischer et al. [42] reported a possibility of talus and tibial rotation because of calcaneal movement contributing to developing overuse knee injury. Morrison & Kaminski [43] found a significant correlation between ankle injury with increased foot width and calcaneal eversion.
For a sportsperson, one of the strategies to achieve a winning place is to remain injury-free. Even after recovery, injuries may have a detrimental effect on the chances of success, be it an individual sport or a team sport. Proper prevention techniques can minimize the incidences of injury. To develop proper prevention techniques, one must understand why injuries happen and predict who is at risk of injury. The risk factors are multidimensional and their interaction is the root cause of injury. An efficient screening program can effectively identify these factors of injuries and can help to properly identify the players at high risk of injury. Once a player is identified as being at a high risk of injury; appropriate preventive measures can be implemented.

Conclusions
In conclusion, it can be said that the risk factors for lower limb injuries of an active sportsperson can be identified and prediction of lower limb injury of an active sportsperson is theoretically possible. The biggest drawback of such a predictive model is the methodological difference in assessing the factors. More of it, the model's reliability is highly contingent on the number of cases used in the study. If sufficient samples are not employed, the results cannot be deemed representative of a large group of athletes. The current study didn't follow the ten events per variable (EPV) criteria for logistic regression because of administrative feasibility which may cause poorer predictive performance upon validation of the result. Therefore, it is recommended that future studies should use a larger sample size to develop a generalized injury prediction model.