Inter-Rater Reliability and Intra-Rater Reliability Testing of My Jump 2 Mobile Application in Measuring Countermovement Jump

My Jump 2 is a mobile application that is objectively valid and reliable to measure vertical jumping height. The objectives of this study are to find the inter-rater and intra-rater reliability of the My Jump 2 mobile application in measuring countermovement jump (CMJ) height. A total of 25 male recreational athletes performed CMJ for five attempts and was recorded by using iPhone 7 Plus in 240 frames per second. The videos were then rated by three raters and rated again seven days later using the My Jump 2 mobile application. An excellent degree of reliability was found between rater measurements. The average measure ICC was 1.00 with a 95% confidence interval from 1.00 to 1.000 (F(124,248)= 18867201.171, p<.001). An excellent degree of reliability was found within rater measurements. The average measure ICC was 1.00 with a 95% confidence interval from 1.00 to 1.00 (F (124,124) = 44750598.291, p<.001). This study provides evidence in supporting the use of the My Jump 2 mobile application to measure the height of the countermovement jump in a research setting. Future research is needed to study the validity and reliability of other parameters possibly measured by the My Jump 2 mobile application like force-velocity profile, jumping launch force, and power.


Introduction
The ability of modern mobile phones to record videos as high as 240 frames per second is considered very useful for analyzing the movement of participants [1,2]. One of the rising mobile applications recognized as valid and reliable today is the My Jump 2 mobile application [3]. This mobile application was developed ad hoc by a researcher to find a new valid, reliable, and simple way to measure jumping height in the sample. During its initial release, this mobile application could only measure jumping height in Counter Movement Jump.
The My Jump 2 mobile application developer claimed that this mobile application was rebuilt from scratch and used a more complicated algorithm. After several updates, this mobile application could now calculate and project more complex tasks such as the force-velocity profile. However, this mobile application still has one weakness. It still needs to rely on the human eyes to pick take-off and landing points during testing. Human factors during testing lead to systematic bias [4].
In recent years, the mobile phone considered a very 1320 Inter-Rater Reliability and Intra-Rater Reliability Testing of My Jump 2 Mobile Application in Measuring Countermovement Jump versatile tool that is affordable and accessible to anyone. The ability of the modern mobile phone to record as high as 240 frames per second is considered very useful for analyzing the movement of participants [3]. This advancement is a blessing for the researcher because the tool for research has become cheap and portable [5]. One of the rising mobile applications that are valid and reliable today is the My Jump 2 mobile application [3]. This app is used widely by coaches and athletes in assessing their jumping height. However, some researchers are a bit skeptical about the ability of this app [6]. Therefore, the purpose of this study was to investigate the inter-rater and intra-rater reliability of My Jump 2 application in measuring countermovement jump height.

Participants
The recruiting of a total sample of 25 healthy male recreational athletes in this study, by adapting the research by Balsalobre-Fernández, Tejero-González, del Campo-Vecino and Bavaresco [1], conducted with the use of the α =0.05 and β= 0.2. The researcher used the same value in the equation to calculate sample size for the reliability study formulated by Walter, Eliasziw and Donner [7]. Therefore, according to this equation, the total sample is 25 (n= 25) and each participant performing five (k=5) attempts. Each attempt counted as a single measurement (nk=125). Three raters were involved in measuring the intra-reliability and inter-reliability of this mobile application.

Instrumentations
The instrument used in this study was the My Jump 2 mobile application readily installed on iPhone 7 Plus (iOS 13.0, Apple Inc. USA). This app has already been proven valid and reliable by a study conducted by Carlos-Vivas, Martin-Martinez, Hernandez-Mocholi and Perez-Gomez [2]. Several measures and/ or actions took to provide the best condition for the test. The conditions were: i.
Carried outdoor to get optimum lighting condition. ii.
On an anti-slip red mat to provide contrast during video analysis. iii.
Perform on a bracket and use a wireless remote to get a stable video.

Procedures
For countermovement jump height, the participants first performed a warm-up procedure, through 5 minutes on a stationary bike without any resistance. Then, the participants proceed with the instruction on the countermovement jump procedure [2]: Standing with feet shoulder-width apart. ii.
Hand placed at the hip throughout the test. iii.
The selection of depth of countermovement is on the participant. iv.
Jump as high as possible. v.
Land as close as possible to the jumping point.
The participants performed a countermovement jump with their timing. If any mistake or foul presented during the jumping procedure, a reattempt would take place. The researcher records this video on iPhone 7 plus at 240 frames per second at 1280 x 720-pixel resolution in portrait orientation. The total number of recorded video clips for this study was 125 videos, whereby each of the 25 participants, had five attempts.
For analyzing the videos, the study made an appointment with three raters. Each rater used the same model smartphone (iPhone 7 Plus). Each rater analyzed the jumping on a different occasion to avoid bias. Besides, the video order was analyzed and randomized to minimize the risks of prejudice against the participant in question. A week later, all raters are called to rate the video again.
All raters underwent the same training to equalize their knowledge about this mobile application. The first training was theory training by watching a tutorial video officially produced by the application developer. This video is available on the application official website. After theory training, they were given hands-on training on the mobile application by the researcher. The instructions made for the raters were as follow: i.
Please choose the final frame in which the participant's feet touch the ground during take-off. ii. Please choose the first frame in which the participant's feet touch the ground during landing.

Data Analysis
A record of the details identified after the assessment finishes. All data received are analyzed using the Statistical Package for Social Science version 21.0 (SPSS 21.0). Inferential and descriptive statistics functioned to obtain quantitative metrics for the finding. As for the descriptive data, the analysis includes frequency distribution, mean, standard deviation and rank.
By referring to a study conducted by Koo and Li [8] where they used the same specific rater for all subjects, this study chose a two-way mixed effect agreement for the inferential statistics. This study used the mean of three raters for an intra-rater, while inter-rater is based on each rater. Both tests aim to find an absolute agreement.

Results
An excellent degree of reliability was found between rater measurements ( Table 1). The average measure ICC was 1.00 with a 95% confidence interval from 1.00 to 1.000 (F(124,248)= 18867201.171, p<.001). An excellent degree of reliability was found within rater measurements. The average measure ICC was 1.00 with a 95% confidence interval from 1.00 to 1.00 (F (124,124) = 44750598.291, p<.001).
To visualize the difference between session measurements (Table 2), the Bland-Altman plot was utilized with measurement A (mean of the first measurement) and measurement B (mean of the second measurement seven days later). As for the spread of the scores around the zero line, 95% of the different scores fall within two standard deviations above and below the mean difference scores (Table 3). This finding suggests that the data was unbiased, homoscedastic [9], and there was a good agreement between the two sessions.

Inter-rater reliability of My Jump 2 mobile application in measuring countermovement jump
One of the purposes of this study was to analyze the reliability of the My Jump 2 mobile application between raters. Several studies tackled the application validity and reliability. For example, [6] demonstrate that the consistency of the application among highly experienced raters was excellent (ICC=0.997) for both, with a mean difference of 1.1 ± 0.5 cm and 1.3 ± 0.5 cm respectively for raters 1 and 2. Equally, both sessions and mean differences of 0.0±0.7 and 0.1±1.0 cm for sessions 1 and 2 [10] with high correlation reported (ICC=0.99) respectively. Stanton, Wintour and Kean [11] demonstrated intrarater reliability of ICC=0.99 and a mean difference of 0.43 cm. Such values follow this study's ICC values, ranging from 0.962 for inexperienced raters to 0.984 for qualified raters. Absolute mean differences for both types of raters (0.4 to 0.96 cm) were also similar in this study.
The level of agreement between the raters is also significant for practical purposes, as it needs the specification to be used to identify take-off and landing sites, regardless of prior experience. The amount of scattered data points in Bland-Altman plots of previous experiments with experienced raters is very close to that of the current sample [6]. This finding indicates that non-trained raters' utilizing the software does not imply any potential ambiguity. The internal performance was strong, regardless of the raters' prior experience (α=0.99 and CV=1.1 to 1.4%). Related reliability levels for experienced raters observed in other studies: α=0.997, CV=3.4, and 3.6% [6] and α=0.99 and CV=4.82−5.58% for countermovement jump [10].
In the current study, an excellent degree of reliability was found between rater measurements. The average measure ICC was 1.00 with a 95% confidence interval from 1.00 to 1.000 (F (124,248)  The excellent level of intraclass correlation coefficient might originate from the extra step taken by the researcher to ensure the high quality of the video. The steps taken were a part of the procedure carried out outdoor to get optimum lighting conditions. The participants also performed on an anti-slip red mat to show contrast during video analysis performed on a bracket and use a wireless remote to get a stable video. Another reason that leads the excellent degree of inter-rater reliability is from the training and tutorial given to the rater. Compared to the study by Pueo, Jimenez-Olmedo, Penichet-Tomás and Bernal-Soriano [12] and Stanton, Wintour and Kean [11], there was no training given to the raters. In this study, the researcher records a different video for different raters. Meanwhile, by using the same video for different raters for the current study, the researcher could eliminate other factors that might cause a distraction in video analysis of the different angles of view and video quality. However, a study by Gallardo-Fuentes, Gallardo-Fuentes, Ramírez-Campillo, Balsalobre-Fernández, Martínez, Caniuqueo, Cañas, Banzer, Loturco and Nakamura [10] replicates a real-world measurement while another study that was close to the current research is [12]. There was excellent reliability amongst observers, irrespective of their involvement in mobile application service. The only difference was that these papers focused more on experienced raters compared to the inexperienced ones. The current study focuses more on the technical site and human error of the application. Besides, the researcher could not find the number of framerate (frame per second) used in the study [12]. The framerate influenced the quality of the video because of the feature of having a high frame rate (240 frames per second) in such a small camera sensor that would cause low light to reach the sensor.

Intra-rater reliability of My Jump 2 mobile application in measuring countermovement jump
An excellent degree of reliability was found between rater measurements. The average measure ICC was 1.00 with a 95% confidence interval from 1.00 to 1.00 (F (124,124) = 44750598.291, p<.001). This study showed that a countermovement jump height measurement could be performed reliably by the same rater and produces a consistent result. Similarly, as stated before, the excellent degree of reliability might be caused by the extra effort taken by the researcher to ensure the quality of the video.
Intra-rater reliability shows a very high level of similarity between Day 1 and Day 7 measurements. This finding suggests that the frequency of random error is minimal when the equivalent rater scores measurements and systemic error is a significant drawback. A systematic error that occurs when using the My Jump 2 mobile application may be due to the difficulty of correctly selecting the take-off and landing frames because of the video capture frequency used (240 Hz). However, as suggested in the previous section, the systemic error becomes smaller as the researcher uses a higher frequency video sampling.
One similar study with the current research is by [11] which found that the intra-rater reliability was excellent. The methodologies are very similar between these different studies. The only difference is that the past researcher used a well-trained athlete and both genders, while the current study used a recreational athlete and male-only. However, as mentioned earlier, the results in this study were almost in perfect agreement due to the extra precaution taken by the researcher to ensure the high quality of the high-speed video.
Excellent reliability in both inter-rater and intra-rater allows the researcher to use this mobile application without a doubt in conducting research. Although this research proceeded in a controlled manner (excellent lighting, contrast colour and use of a tripod), a similar result from field testing is to be expected [11,12].

Conclusions
This study showed that in a healthy, recreational athlete, the countermovement jump measurement could be performed reliably by the same rater, as well as different rater with varying clinical experience. This study gives evidence to support the My Jump 2 mobile application in measuring the height of the countermovement jump in a research setting. Further research is needed to confirm the result in a different type of jumping, such as drop jump and countermovement jump (freehand). One of the limitations of this study was the smartphone used. The current study uses an iPhone 7 plus, which is not the latest phone from the manufacturer. A relatively old phone is easy to heat up after an extended usage for high framerate videos. This limitation causes the measurement process by the rater to become slightly slower, and the video playback became somewhat sluggish and lagged. Overall, this might cause the process to pick a frame for lift-off and landings to become a bit hard. Future research is needed to study the validity and reliability of another aspect of the My Jump 2 mobile application as it can measure parameters such as jumping height. Besides this, it can also measure force-velocity profile, jumping launch force, and power. Fellow researchers pay lesser attention to this aspect of the study.
Besides that, this research requires a repeat in the future with this study sets in a field-testing environment. As mentioned earlier, this study has both high intra and inter reliability due to controlled conditions during data collection because of the high amount of light and usage of tripod to ensure recorded videos are of very high quality and not blurry. In addition, the My Jump 2 mobile application is also developed for coaches to monitor the performance and their athletes.