The Validity and Reliability of Three Field Tests for Assessing College Freshmen Students' Cardiovascular Endurance

In an effort to reduce obesity rates among young adults resulting from their increasingly sedentary and stressful lifestyle and poor diet, three common endurance exercises that have been practiced in PE classes and also studied worldwide are jumping rope, step test, and beep test ( 20 meter multistage fitness test) . This study shed light on the suitability of these three field tests as assessment tools for measuring the cardiovascular endurance of the college freshmen students of Physical Fitness (HKD-01) course since none has been investigated so far about this in the Philippine university context. Specifically, this study aimed to identify (1) the validity of the three tests as assessment tools for gauging cardiovascular fitness; and (2) their reliability as shown in the correlation between the directly measured and predicted VO 2 max. The statistical analyses revealed that the fitness tests were found valid and reliable. Each fitness test has its own peculiarity and course of action that make each distinct from one another. The step test is known to measure submaximal, the jumping rope test can be categorized as mid-maximal, and the beep test is known as a maximal test for cardiovascular endurance. It is then recommended that these fitness tests be employed in PE classes in the university following correct protocols to develop the cardiorespiratory endurance of the students.


Introduction
In recent years, university students have decreased physical activity participation and more and more of them develop sedentary behavior as they bridge to adulthood, enjoy autonomy, and make their personal lifestyle choices [1]. For instance, in one study in the US, it was reported that a small and declining minority of male and female college students are physically fit from 1996-2008. The male and female students' fitness levels gradually decreased over the past 13 years. These data are crucial especially since it was also implied in the study that physical activity and healthy nutrition can have a positive effect on the overall performance of the students [2].
Lang, Tremblay, Lé ger, Olds, and Tomkinson's review [3] revealed that the Philippines is in the 42nd percentile out of 50 countries in the 20m shuttle run performances of children and youth, indicating their low cardio-respiratory endurance. It was deemed as a public health issue in the country and in other developing countries. In another study, more than half of the Filipino students (67.2%) did not meet the recommended physical activity per week due to lack of time. The students do not sustain interest in physical fitness beyond their Physical Education classes due to more coursework as they progress in college and their poor dietary intake. Even Physical Education majors themselves are not spared from this issue as they are also prone to living a sedentary lifestyle and possibly contracting chronic communicable diseases [4].
In fact, in one study in Brazil, it was found that 10% of the females and 7% of the males presented values of VO 2 max below the recommended levels for age, similar to other studies. Female students are expected to have lower cardiac output and oxygen transport capacity while males are physically more active than women--facts which some studies also supported [4]. Therefore, strengthening or revitalizing Physical Education (PE) classes and programs, particularly physical fitness courses, in the universities must be taken seriously.
To fulfill such demand, one of the objectives of the Physical Fitness Class (HKD-01) course of San Beda University is to assess the students' current physical fitness level through field tests to help the students achieve and ideally sustain a healthy lifestyle for lifelong learning. Starting the first semester, the teacher-researcher enhanced the course by adding jumping rope, step test, and beep test as fitness/field tests to measure the students' cardiovascular endurance. Previously, these tests were commonly used for varsity students' routine training, but no research has been done yet that checks the possibility of using the same tests for non-athletes, given the changing lifestyle and food intake due to technological advancements and fitness opportunities open for the new generation of students to seize more than ever.
One of the important indicators of physical fitness highlighted in the course is cardiovascular (or cardiorespiratory, cardiopulmonary) endurance. It "represents the combined ability of (1) the pulmonary system to exchange oxygen between the outside air and the blood circulating through the capillaries in the lung, (2) the cardiovascular system to transport oxygen to the working muscles, and (3) the muscular system to use oxygen." [5] Moreover, it also has mental health benefits along with other long-term effects like improved ability to extract oxygen from air during exercise, increased sweat rate, reduced risk of cancer, and increased density and breaking strength of bones, ligaments, and tendons, among many others. Thus, when one has an ideal cardiovascular endurance level, it would take less effort for that person to fulfill daily routine especially difficult tasks that require more physical exertion [6].
The best quantitative measure of cardiorespiratory endurance is maximal oxygen consumption or VO 2 max that is the amount of oxygen the body uses when a person reaches maximum ability to supply oxygen during exercise. However, this type of measurement is done in a laboratory and can be costly and time-consuming for a regular person [6]. Thus, some PE teachers only get an approximate measurement unlike what can be taken using a treadmill with an ECG machine in clinical practice.
Three common endurance exercises that have been practiced and also studied worldwide are jumping rope, step test, and beep test.
In doing jumping rope or skip test, balance, and coordination abilities are crucial that it is considered as a skill (Trecroci, Cavaggioni, Caccia & Alberti, 2015) to sustain the precisely timed and rhythmic movements necessary for this exercise. It also develops the coordination of neuromuscular skills, muscle strength, and cardiovascular endurance [7]. Several studies, mostly from Asia, tested whether the inclusion of jumping rope routines or programs in PE classes would be beneficial to students--most of them with children as subjects [8,9]. It was found that the training had significant effects on their cardiovascular endurance, flexibility, and muscular strength and endurance [10]. One recent study by Dimarucot and Soriano [5] employed the same fitness test and used the paired t-test to identify the significant difference in the VO 2 max of male and female university students after the jumping rope test. The study attested the effectiveness of the test in increasing their VO 2 max. Understandably, rope jump exercises could be found enjoyable for children to do and it is commonly practiced among athletes, but nothing much is known about its validity and reliability as a test to measure cardiovascular endurance of Filipino university students.
Step test is one of the popularly used assessments that provide reasonably close estimates of VO 2 max. It is considered a relatively quick and easy method for measuring cardiopulmonary fitness compared to other exercises or tests [11]. Knight, Stuckey, and Petrella [12] tested the validity of the prediction equation for VO 2 max used in the Step Test and Exercise Prescription (STEP) tool among a sample of 18 to 64 years old using the Bland-Altman analysis. There was a statistically acceptable agreement between the VO 2 max predicted using the STEP equation and directly measured through laboratory-based maximal treadmill testing. Cooney et al. [13] determined if the administration of the Siconolfi step test is a valid measure of cardiorespiratory fitness in patients with rheumatoid arthritis. Pearson correlation coefficient, intraclass correlation coefficient, and the Bland and Altman plots were used to determine the validity and reliability of the submaximal test. The Siconolfi step test provides a valid, reliable, and reproducible estimation of cardiorespiratory fitness (VO 2 max) in routine clinical practice, particularly in patients with rheumatoid arthritis. While the different varieties of step test were studied and tried clinically for people with special conditions and non-clinically for college students [14,11] no study has been conducted on the STEP tool by Knight, Stuckey, and Petrella [12] as tested and applied in the Philippine context and no comparison on the performance of young non-athletes using the same test and statistical analysis was done so far.
Lastly, the beep test (also called multistage fitness test [MSFT] or shuttle run test [SRT]) has been widely used as a predictive test for VO 2 max, originally designed to predict fitness in healthy adults attending fitness classes and in athletes participating in sports. It is used by at least 50 countries due to the low cost of equipment, simplicity in administering and scoring the test, flexibility in testing, convenience for the participants, and practical use for simultaneously testing large groups of children and youth [3] Paradisis et al. [15] investigated the validity and suitability of predicting VO 2 max and velocity VO 2 max of PE students on the basis of their performance in the 20m multistage fitness test (MSFT) using Pearson's product-moment correlation coefficient. Results indicated a high correlation coefficient between shuttles in the 20m MSFT and VO 2 max as well as vVO 2 max. It was then concluded that the 20m it can accurately predict VO 2 max and vVO 2 max and this field test can provide useful information regarding the aerobic fitness of young adults. On the other hand, Mayorga-Vega, Aguilar-Soto, and Viciana [16] used the Hunter-Schmidt's psychometric meta-analysis approach in their own study and it was revealed that the test has a moderate-to-high mean correlation coefficient of criterion-related validity for estimating VO 2 max and that its validity is higher for adults than for children. It was emphasized, however, that test scores are just an estimation of cardiorespiratory fitness.
Though the beep test was found to be an effective and accurate international population health indicator in children and youth [17,18,19,20,21,22,23], it is still worth examining whether the results are particularly true as well in the Philippine context, especially when this test is compared with the first two field tests above using a different statistical analysis tool.
Thus, this present study shed light on the suitability of the three field tests as assessment tools for measuring the cardiovascular endurance of the college freshmen students of the Physical Fitness Class (HKD-01) course at San Beda University. Specifically, this study aims to identify (1) the validity of the three tests as assessment tools for gauging cardiovascular fitness and (2) the reliability as revealed by the correlation between the directly measured and predicted VO 2 max.

Research Design
A cross-sectional study was conducted that involves examining the subjects who differ in cardiovascular endurance level at one specific point in time. The data are collected at the same time from the participants who are similar in age but different in anthropometric factors such as body mass index (BMI), waist circumference, waist to hip ratio, body fat percentage, and heart rate.

Research Participants
All the anthropometric measurements in the three field tests were taken from all the HKD-01 (Physical Fitness Class) students of San Beda University who voluntarily agreed to participate in the study. The study included apparently healthy male and female students whose ages range from 18 to 20 years who are also non-smokers and non-alcoholics. The latest version of the Physical Activity Readiness Questionnaire known as (PAR-Q + 2019) as a screening test for the participants was also used to identify those who are qualified to undergo the test. The study did not cover those students who have a history of any acute or chronic illness, were on medication and were undergoing regular physical training (as an athlete or non-athlete). Those who answered one YES or more in any of the questions in the 2019 PAR-Q were automatically disallowed to participate in the study.

Measurement and Instrumentation
The anthropometric variables like height and weight, heart rate, waist circumference, and waist to hip ratio were measured using a tape measure, a digital weighing scale, and a pulse oximeter. The waist-to-hip ratio was computed by dividing the waist circumference with the hip measurement. The formula for getting the body mass index (BMI) is kg/m 2 where kg is a person's weight in kilograms and m 2 is their height in meters squared [24]. For the body fat percentage, the prediction equation of [25] was used which was also adopted by [9]: Percentage body fat = 0.610 x (sum of skinfold of triceps and calf) + 5).
The formula that was used to assess the cardiovascular endurance levels of the students [24] is as follows: VO 2 max = 132.6 -(0.17 x weight in kg) -(0.39 x age) + (6.31 x gender [0 for females; 1 for males]) -(3.2 x time to walk 1 mile) -(0.156 x post-exercise heart rate [bpm]).

Data Collection Procedures
The following protocols were observed for each of the three field tests: Table 1 shows the rope jumping test routine for eight weeks held at the school gym. This was slightly patterned after Partavi's rope-jump training program [8]. It starts with the usual warm-up exercise which is dynamic stretching for 10 minutes except for the first week with only 5 minutes. Since the jumping rope test that was done is a progressive and multistage one, 100 repetitions were added per week which starts at 100 in Week 1 and ends at 800 in Week 8. All tests end with a 5-minute static stretching for the cool down. The anthropometric measurements as well as the VO 2 max are calculated in the pre-test and the post-test.

Step Test
The Step Test and Exercise Prescription (STEP) tool developed by Knight, Stuckey, and Petrella [12] was the basis for the protocol in this study. The test begins with a demonstration of the stepping pattern to the students: stepping one foot at a time up both steps and back down again. One step is counted after stepping up to the top step and returning back down to the starting position. When the student is ready to begin, the teacher starts the timer as the first foot leaves the ground. The teacher has to offer encouragement and watch for signs of fatigue or balance problems during the test. As the student performs the last step, the teacher takes the heart rate, palpating radially in a six-second count. Next, the teacher calculates the patient's target training heart rate, using 65% to 85% of the age-predicted maximal heart rate as a target.

Beep Test (20 Meter Multistage Fitness Test)
The protocol is a slight modification of the same test protocol implemented by Voss and Sandercock [26]. The students ran 20-meter shuttles in time to an audible metronome at an initial speed of 8.5 km h-1 increasing by 0.5 km h-1 each minute. All testing was inside the school gym and not in an open area or field unlike in other studies. The testing was conducted in groups of up to 30 with a ratio of five participants to one teacher-researcher. Participants were not practiced in this test prior to the pre-test, but they performed it for 8 weeks as one of their lessons on cardiovascular endurance. The teacher-researcher reminded the subjects to run for as long and as best as possible as this test will predict their present cardiovascular endurance level. The teacher also acted as a spotter during the test and recorded the participants' final shuttle count at either the point of volitional exhaustion or when they have failed thrice. The number of completed shuttles was then converted to running speed in km/h -1 at the final completed stage. The participants had their training heart rate with an eight-second interval within one minute then they rested for 1 minute. Lastly, they got their recovery heart rate for another minute which was used for the VO 2 max computation later. After each field test, both pre-and post-test, the VO 2 max were computed and subjected to statistical analysis.

Statistical Tools
A paired-samples t-test was conducted to determine if there are significant differences between the VO 2 max measured using the different fitness tests and measured using a portable machine. This test was utilized to assess the validity of the different fitness tests in measuring VO 2 max.
To determine the reliability of the different fitness tests, several tests were used. First, the Bland-Altman plot (limits of agreement or different plots) and analysis is a method comparison technique [27] used for assessing agreement and average bias (or difference) between two measurements techniques and assumes that data are paired. As suggested by Twomey [27], the relative difference plot [28] was used in this study for two reasons: 1) the presence of 'funnel 'effect in the absolute difference plot, that is, there is an increase in the degree of differences as mean values increase, 2) by inspection, the slopes in the data are significantly different from zero, and 3) the histogram and normality tests indicated the non-normality of the differences. Using this plot and analysis, two methods are considered to be in agreement and may be used interchangeably if the limits of agreement (or LoA, the mean difference ± SD of differences) do not exceed the maximum allowed difference Δ [29]. Further, the methods do not disagree if the maximum allowed difference Δ is higher than the upper 95% CI limit of the higher LoA and -Δ is lower than the lower 95% CI limit of the lower LoA, as shown in Figure 1. Another statistical test used is the one-sample t-test for a mean. The Bland-Altman plot allows identification of any systematic difference between the measurements (i.e., fixed bias). The mean difference is the estimated bias, and the standard deviation of the differences measures the random fluctuations around this mean. The presence of fixed bias is evident when the mean value of the difference differs significantly from 0 on the basis of a one-sample t-test [30].
This study also utilized tests of significance of the standardized coefficients resulting from simple linear regression analysis in order to detect the presence of proportional bias (the percentage difference between the measures is a function of the average of the measures) than those from the other by an amount that is proportional to the level of the measured variable). Additionally, the intra-class correlation coefficient was also used in performing reliability testing. It is an improvement compared to Pearson's (as what was used in Paradisis et al. [15]) and Spearman's as it takes into account the difference in the measures for individual subjects, along with the correlation between methods.

Characteristics of the Sample
A total of 115 freshman students of the San Beda University participated in this study, 52.2% of which are male. Table 2 presents the characteristics of the sample.
As presented in Table 2, the average age of the sample was 18.29 ± 0.542 years, the average height was 160.79 ±14.083 cm, the average weight was 77.784 ±38.379 kgs. The average BMI was 23.213 ± 8.523 kg/m 2 which is under the "at-risk" level based on the standards posted on the Philippine Association for the Study of Overweight and Obesity website. The average resting heart rate was 69.69 ± 16.938 bpm which is within the normal range for adults based on the Cleveland Clinic website.

Validity of the Fitness Tests: Tests of Difference between Predicted and Directly Measured VO 2 max
To determine whether there are significant differences that exist between the predicted (Test_1) and directly measured (Test_2) VO 2 max in the three different fitness exercises, the t-test for dependent groups was used.
The results shown in Table 3 indicate that in the beep test, there is a significant difference between the predicted VO 2 max ( = 30.25, = 4.710) and the directly-measured VO 2 max ( = 31.80, = 5.686 ) since (114) = −5.728, < .01. In fact, it can be concluded that the mean predicted VO 2 max is significantly lower than the mean directly-measured VO 2 max mean difference (-1.542).
On the other hand, results do not support significant differences between the mean predicted and mean directly-measured VO 2 max when the subjects underwent the jumping rope test, (114) = −.7697, > .05, and the step test (114) = 2.4963, > .05.

Reliability of the Three Fitness Tests: Bland-Altman Plot and Analysis
Figures 2 to 5 present the Bland-Altman plots that were generated to assess agreement between the predicted VO 2 max arising from the different fitness tests and the directly-measured VO 2 max.     Results shown in Table 4 indicate that in the Beep test, there is a significant difference between the predicted VO 2 max () and the directly-measured VO 2 max () since. In fact, it can be concluded that the mean predicted VO 2 max is significantly lower than the mean directly-measured VO 2 max, mean difference = -1.542.
On the other hand, results do not support significant differences between the mean predicted and mean directly-measured VO 2 max when the subjects underwent the skipping test, and the step test. Figure 3 shows the Bland-Altman plot of differences between the % difference and the mean of the predicted VO 2 max (skip test) and directly-measured VO 2 max. Here, it is observed that there is a very slight underestimation since the mean percentage difference between the predicted and the directly-measured VO 2 max is very slightly below 0 (mean percentage difference = -3.017%, with 95% confidence interval of -8.738% to 2.704%). Thus, the predicted VO 2 max obtained after the respondents performed the skip test underestimated the directly-measured VO 2 max by an average of 3.017%. The variability of the percentage differences appears to be +/-60%, which is somewhat wide. There seems to be some random variation of the percentage difference around the constant mean. Table 5 presents the bias (% difference) and limits of agreement together with their corresponding 95% confidence interval (CI) estimates. These are how precise the ballpark figure is. The 95% CI of the mean % difference illustrates the magnitude of the systematic difference [31]. In this case, there is no significant systematic difference or fixed bias since the line of equality (horizontal line through 0) is in the interval (-8.738 to 2.704). The presence of fixed bias is evident when the mean value of the difference differs significantly from 0 on the basis of a one-sample t-test, with results shown in Table  6 (114) = −1.045, > .05. The 95% CI of the level of agreement (LoA) limits allows for the estimates of the size of the possible sampling error (Giavarina, 2015). It can be observed that the CIs, both the mean % difference (+/-5.72%) and the LoA limits are narrow (+/-9.81%) because of the considerably large sample size used in this study. Figure 4 shows the Bland-Altman plot of differences between the percentage difference and the mean of the predicted and directly-measured VO 2 max after the beep test. Here, it can be observed a very slight underestimation since the mean percentage difference between the predicted and the directly-measured VO 2 max is slightly below 0 (mean percentage difference = -4.609%, with a 95% confidence interval of -6.236% to -2.982%). Thus, the predicted VO 2 max obtained underestimated the directly-measured VO 2 max by an average of 4.609%. The variability of the percentage differences appears to be +/-17%, which is narrow. However, the variability around the mean does not appear to be constant. There seems to be an observable downward trend when the mean VO 2 max is greater than 40.    Table 7 presents the bias (% difference) and LoA together with their corresponding 95% confidence interval (CI) estimates. In this case, there is a significant systematic difference since the line of equality (horizontal line through 0) is not in the interval (-6.236 to -2.982). This implies that the predicted VO 2 max obtained after the respondents undergo the Beep Test more consistently underestimates the directly-measured VO 2 max. The presence of fixed bias is supported by the results of the one-sample t-test, as shown in Table 6, (114) = −5.612, < .05. It can be observed that the CIs, both the mean percentage difference (+/-1.63%) and the LoA limits (+/-2.79%) are narrower. Figure 5 shows the Bland-Altman plot of differences between the percentage difference and the mean of the predicted VO 2 max (step test) and directly-measured VO 2 max. Here, a very slight over estimations since the mean percentage difference between the predicted and the directly-measured VO 2 max is slightly above 0 (mean percentage difference = 3.1867%, with a 95% confidence interval of -2.639% to 9.013%). Thus, the predicted VO 2 max underestimated the directly-measured VO 2 max by an average of 3.1867%. The variability of the percentage differences appears to be +/-62%, which is somewhat wide. The variability around the mean does not appear to be constant. Noticeably, there seem to be more positive percentage differences when the mean VO 2 max is greater than 55. Table 8 presents the bias (% difference) and LoA together with their corresponding 95% confidence interval (CI) estimates. In this case, there is no significant systematic difference since the line of equality (horizontal line through 0) is in the interval (-2.639 to 9.013). It can be observed that the CIs, both the mean % difference (+/-5.83%) and the LoA limits (+/-9.99%) are also narrow    Table 9 shows the standardized coefficients resulting from the simple linear regression analysis. Notably, the proportional bias is present in the cases of predicted VO 2 max of the beep test (= −.255, < .05) and predicted VO 2 max of the step test (= .369, < .05).

Discussion
There are no significant differences between the mean predicted and the mean directly-measured VO 2 max when the subjects underwent the skipping test and the step test, except for the beep test. The predicted VO 2 max obtained after the respondents underwent the three tests underestimated the directly-measured VO 2 max, with wide variability except for the results in the beep test. There is no significant systematic difference in the skip and step test, but not the same case in the beep test. There seems to be a random variation of the percentage difference around the constant mean from the results for the three tests.
Thus, despite the three tests' apparent differences, it can still be concluded that the tests are valid and reliable tools to be used for university non-athlete students, similar to the results of the study of Knight, Stuckey, and Petrella [12] and Cooney et al. [13] for step test that also used Bland-Altman analysis; and Paradisis et al. [15] and Mayorga-Vega, Aguilar-Soto, and Viciana [16] for beep test, though the study used different statistical tools. This was, however, the first study so far that has tested the validity and reliability of the jumping rope test for university students that can be used for comparison in future studies.
The Bland-Altman method defines the intervals of agreements, but not the limits of acceptability. Thus, it can be noted that though the tests have been found suitable and practical assessment tools based on the analyses, they have different levels of complexity that contributed to their differences. These tests then need to be used considering the purpose of the use and the physical readiness of the students.
The step test is a procedure used to assess submaximal oxygen uptake, best for beginners. Its procedure is easy to follow. This does not pose any restrictions on the participants because it is not as strenuous as an assessment compared to the other two. It does not need any inclusion and exclusion criteria. Through this test, it is easy to measure the approximate cardiovascular endurance provided it is done with less space, used with a metronome, and executed following a strict protocol.
The jumping rope test, on the other hand, is known as a skill-based fitness test that is ideal for intermediate level due to its skill familiarization and acquisition. The participants must have sufficient time to practice the skill that includes mastering the right timing and coordination. It cannot be classified as an assessment for submaximal because it requires higher skills that have to be acquired through constant practice. This assessment was not commonly used with university students because it is somewhat tedious as teachers need to teach the additional skills mentioned and they also need to motivate students to do it by making it a fun activity aside from acquiring the skill themselves.
Lastly, the beep test is for advanced level as it requires rigid screening assessment and strict compliance of procedure. This explains its notable difference from the jumping rope test and the step test in the analyses. It gives an accurate assessment of cardiovascular endurance, but this requires inclusion and exclusion criteria because it uses maximal running aerobic fitness capability. It is one of the most challenging assessments that will not suit beginners.

Conclusions and Recommendation
This study shed light on the potential of the three field tests as assessment tools for measuring the cardiovascular endurance of the college freshmen students of the Physical Fitness and Movement Enhancement (HKD-01) course. Specifically, this identified (1) the validity of the three tests as assessment tools for gauging cardiovascular fitness and (2) their reliability as shown in the correlation between the directly measured and predicted VO 2 max. The results of the statistical analyses revealed that the fitness tests were found valid and reliable, thus, equally practical tools for assessing university non-athlete students' cardiovascular endurance. However, their level of difficulty should be considered. The step test is used for a starter-level as it is the least complicated of the three to execute. The jumping rope test is intermediate level as it takes a lot of practice to perform correctly but not as difficult as the beep test which is more recommended for advanced level and strictly considers the physical capacity and condition of the participants.
It is then recommended that PE professors gauge the cardiovascular endurance of each student using these tests if they are implementing new courses or activities that require physical performance. It is also suggested that professors also test the validity and reliability of their assessment tools in an effort to ensure quality PE instruction. It is best to administer evidence-based tests with a keen consciousness of the right cardiovascular assessment procedure to ensure that the professors keep track of the students' fitness progress while keeping in mind the wellbeing of the students as well.
Future researchers may further examine the validity and reliability of other common assessment tools used in PE classes and consider other anthropometric variables as predictors of low or high physical performance of students in these three tests.