RELIABILITY AND VALIDITY OF A DEEP WATER RUNNING GRADED EXERCISE TEST
John A. Mercer* and Randall L. Jensen#

Data Collected at:
Department of Kinesiology, Health Promotion and Recreation
University of North Texas
Denton, TX 76203

* Doctoral Student
Department of Exercise and Movement Science
University of Oregon
Eugene, OR

# Assistant Professor
Department of HPER
Northern Michigan University
Marquette, MI

Introduction

Deep water running (DWR) is a mode of exercise used in rehabilitation and run-training programs. DWR consists of running while submerged in water up to the neck without being able to touch the bottom of the pool. A flotation device, such as an Aqua-Jogger Belt (Excel Sports, Eugene, Oregon), can be worn to minimize the effort exerted to keep the head above the water level and the body in a vertical position.

The mechanics of DWR and TMR appear to be qualitatively similar to treadmill running (TMR), however, lower peak values for heart rate (HR) and oxygen uptake (VO2) expressed in relative (ml kg-1 min-1, VO2rel) and absolute units (L min-1, VO2abs) have been reported during DWR compared to TMR (Butts, Tucker & Greening, 1991; Michaud, Rodriguez-Zayas, Andres, Flynn & Lambert, 1995; Svedenhag & Seger, 1992; Town & Bradley, 1991). The magnitude of difference between DWR and TMR of VO2 and HR peak responses have varied between investigations, and might be due to the different protocols utilized to elicit peak responses during DWR. The different protocols utilized have included subjective increases in intensity to maximal effort (Svedenhag & Seger, 1992) and increases in cadence to increase intensity (Michaud, et al., 1995). To obtain a more objective maximal effort, the authors of the present study designed a graded exercise test (GXT) through the use of a system of pulleys and weights, similar to tethered swimming.

The purpose of this investigation included two parts: Part 1 was to examine the intraclass test-retest reliability for peak responses of VO2rel, VO2abs, and HR during a DWR graded exercise test utilizing a pulley system. Part 2 was to complete a comparison of the DWR responses to a graded exercise test during TMR. The null hypothesis was that the metabolic responses during DWR and TMR are not different.


Methods

Written informed consent was obtained from 12 females and 14 males in accordance with the policy statements of the Institutional Review Board of the University of North Texas.


Protocols

All subjects, inexperienced in DWR (except one male), underwent three maximal graded exercise tests 24 to 48 hours apart. Two of the tests were conducted in deep water and one on a treadmill. Order of the tests was randomly assigned and subjects were allowed a period for familiarization prior to each tests. To prevent a possible diving bradycardic reflex, subjects were instructed to keep their faces out of the water during the DWR tests.

The DWR tests were performed in a tank 1.8m x 1.8m x 1.8m (water temperature: 26.9 ±1.61 C ). Subjects wore an Aqua Jogger Belt to which a tether was attached to the back and run through a series of pulleys with the other end attached to a bucket suspended in front of the subject, 0.75 m above the deck (see Figure 1).

Figure 1 about here

The maximal protocol was continuous and consisted of 1 minute stages. To provide a graded response, a 0.57 kg weight was added to the bucket each stage. The 0.57 kg weight was selected after a pilot study was conducted in which it was determined that HR would increase approximately 10% each stage using this weight.

The protocol for the TMR test was continuous with 1 minute stages. Speed for the 1st stage was set at 80.4 m min-1 with a 3% grade. Elevation was increased to 7.5% grade for the 2nd and all subsequent stages. Speed for the next three stages was set at 93.8 m min-1, 107.2 m min-1, and 134 m min-1, respectively. For each additional stage, speed was increased by 13.4 m min-1. Air temperature was 22.7 ±1.49 C .

Expired gases for ventilatory measures were collected continuously and recorded every 15 seconds during all tests using a Medical Graphics CPMax metabolic cart (St. Paul, MN). Ventilatory calibration was performed using a calibrated 3 liter syringe. Prior to each test, O2 and CO2 analyzers were calibrated according to manufacturer's instructions. HR was monitored through radiotelemetry (Polar, CIC Accurex, Hempstead, NY) and recorded 10 seconds prior to the end of each stage. The criteria for peak values was the highest value achieved for each variable, with a RER greater than 1.05.

The statistical analyses was performed using the Statistical Package for the Social Sciences (SPSS/PC+ Version 4.0.1) program. Reliability of the DWR test was estimated by oneway intraclass test-retest correlation. The Spearman-Brown Prophecy Formula was used to estimate the maximum expected reliability (R1,1) for a single test. Differences between conditions were tested through paired t-tests. Validity of the DWR protocol was tested by calculating the Pearson's product moment coefficient of correlation. Independent t-tests were used to compare descriptive variables between males and females.


RESULTS
Reliability

Reliability of the DWR test for the total group was established for VO2rel (R=.96, R1,1=.92), VO2abs (R=.97, R1,1=.94) and HR (R=.90, R1,1=.82). All descriptive variables for males (24 ±4.9 years, 1.79 ±0.08 m, 79 ±10.9 kg) were significantly different then those for females (21 ±1.2 years, 1.68 ±0.11 m, 59 ±7.3 kg). Thus, reliability was computed for each group.

For males, the reliability for VO2rel (R=.94, R1,1=.89) and VO2abs (R=.93, R1,1=.87) as well as HR (R=.86, R1,1=.75) were very good as was the reliability of VO2rel (R=.85, R1,1=.74), VO2abs (R=.88, R1,1=.79) and HR (R=.91, R1,1=.83) for females.

There were no siginificant differences between DW1 and DW2 for either gender for the means of VO2rel (males: 50.5 vs 52.0 ml kg min; females: 37.1 vs 36.8 ml/kg/min), VO2abs (males: 4.1 vs 4.1 L min; females: 2.2 vs 2.2 L/min) or HR (males: 174 vs 175 bpm; females: 181 vs 183 bpm) (see Table 1). Because variables were reliable, and there were no significant differences between tests, DW1 and DW2 were averaged and herein represented as DWA.

Table 1 about here

Validity

There was a significant correlation between TMR and DWA for the total group for VO2rel (r=.88, p=.001) and VO2abs (r=.93, p=.001) as well as HR (r=.64, p=.001). When males were analyzed separately, VO2rel (r=.82, p=.001), VO2abs (r=.82, p=.001), and HR (r=.64, p=.013) were also significantly correlated between DWA and TMR. In contrast, for females, only VO2rel (r=.72, p=.009) was significantly correlated between DWA and TMR, but HR (r=.52, p=.086) and VO2abs (r=.36, p=.248) were not significantly correlated.

For both genders, there was a significant difference between DWA and TMR for the means for VO2rel (males: 51.2 vs 63.0 ml kg min; females: 37.0 vs 46.6 ml kg min), VO2abs (males: 4.1 vs 5.0 L min; females: 2.2 vs 2.7 L min) and HR (males: 174 vs 187 bpm; females: 182 vs 192 bpm) (see Table 1). Finally, there was no significant difference between DW1 and DW2 or DWA and TMR for either RPE or RER for the total group, males or females (p>.05).


DISCUSSION

The graded exercise protocol designed to elicit peak VO2 (either VO2rel or VO2abs) and HR during DWR using a system of pulleys and weights was reliable, regardless of gender. There have been no other studies known to the authors which have reported the test-retest reliability of a DWR protocol.

The DWR protocol used in this study elicited peak responses in 6.8 ±2.04 minutes as opposed to 8.1 ±2.58 minutes for the TMR test. These durations are acceptable to elicit peak responses (Shepherd, 1984). The lack of difference between tests for RPE and RER indicates that maximal effort was ellicited each test. However, because the unfit subjects fatigued quickly in some cases during the DWR graded exercise test, the authors suggest that either a smaller weight increment or longer stages be utilized.

The significant correlation between DWR and TMR for the total group suggests the protocol was valid. That is, the subjects that had high peak responses during DWR had high peak responses during TMR. Interestingly, the lack of correlation for VO2abs and HR between DWR and TMR for females might suggest that the DWR protocol was not valid for both genders. The cause of this could possibly be due to the homogenous fitness level of the female subjects (as evident by the smaller standard deviations of HR and VO2 for the female group compared to the male group), or, could involve percentage of body fat (%BF). Pate and Kriska (1984) summarized that when comparing females and males, a greater %BF for females contributes to a lower maximal aerobic power. During DWR, the interaction of %BF on VO2 and HR is complicated due to the bouyancy and insulating properties of fat. Although %BF was not estimated in this study, it seems likely that it was a factor affecting VO2abs and HR. The result that only females did not correlate significantly is probably due to the well documented observation that females tend to have a greater %BF than males (Pate, & Kriska, 1984).

In the current study, subjects inexperienced in DWR achieved approximately 81% of TMR VO2 during DWR (about 11 ml kg-1 min-1 or 0.75 L min-1 lower) while HR was about 94% (about 11 bpm lower) of TMR. The magnitude of difference between DWR and TMR is in agreement with other studies (Butts, Tucker & Greening, 1991; Michaud, Rodriguez-Zayas, Andres, Flynn & Lambert, 1995; Svedenhag & Seger, 1992; Town & Bradley, 1991). It appears, interestingly, that despite the different protocols utilized to elicit maximal effort during DWR in other studies, the peak responses are consistently lower than those recorded during TMR.

Although the cause of the lower responses during DWR are not known, there are several possible explanations. The results of the present study indicate that subject skill level does not seem to be related to the peak VO2 during DWR. There was no evidence in the current study of a learning effect over the course of the two DWR tests as demonstrated by the lack of difference between the two DWR tests. Furthermore, previous studies which have used subjects experienced in DWR have reported DWR VO2peak to be approxiately 75-90% of TMR and HRpeak 10-20 bpm lower than TMR peak values (Butts, tucker, & Greening, 1991; Michaud, et. al. 1995; Svedenhag, & Seger, 1992; Town, & Bradley, 1991), which is in agreement with the results of the present study.

The lower HR during DWR might be related to a greater stroke volume due to an enhanced venous return during water immersion. There have been no studies known to the authors that have investigated the changes in cardiac output during DWR.

Shephard (1984) noted that the amount of active muscle mass can be a factor in determining VO2max. Although the active muscle mass during DWR has not been quantified, a subjective observation is that the antigravity muscles are not utilized during DWR to the degree that they are during TMR.

Interestingly, DWR training studies have determined that land based aerobic performance was unchanged (Eyestone, Fellingham, George, & Fisher, 1993) or improved (Morrow, Jensen & Peace, 1996; Wilber, Moffatt, Scott, Lee & Cucuzzo, 1996) following a DWR training program. Thus, despite lower peak responses during DWR, maintenance of fitness appears possible through the use of DWR.


Summary

In summary, the graded exercise protocol which utilized a system of pulleys and weights to elicit peak responses of HR and VO2 during DWR was reliable, regardless of gender. The peak responses of VO2rel, VO2abs and HR were significantly correlated between TMR and DWA for the total group which suggest that the DWR graded exercise test resulted in a valid measure of peak responses during DWR. Additionally, VO2rel, VO2abs and HR were significantly corrleated for males, while for females, only VO2rel was significantly correlated. Finally, lower mean peak values for VO2rel, VO2abs and HR were recorded during DWR compared to TMR for both males and females.



REFERENCES

Butts, N.K., Tucker, M., & Greening, C. (1991). Physiologic responses to maximal TM and DWR in men and women. The American Journal of Sports Medicine, 19(6), 612-614.

Christie, J.L., Sheldahl, L.M., Tristani, F.E., Wann, L.S., Sagar, K.B., Levandoski, S.G., Ptacin, M.J., Sobocinski, K.A., & Morris, R.D. (1990). Cardiovascular regulation during head out water immersion exercise. Journal of Applied Physiology, 69(2), 657-664.

Eyestone, E.D., Fellingham, G., George, J., & Fisher, A.G. (1993). Effect of water running and cycling on maximum oxygen consumption and 2 mile run performance. The American Journal of Sports Medicine, 21(1), 41-44.

Michaud, T.J., Rodriguez-Zayas, J., Andres, f.F., Flynn, M.G., & Lambert, C.P. (1995). Comparative exercise responses of deep-water and treadmill running. Journal of Strength and Conditioning, 9(2), 104-109.

Morrow, M., Jensen, R.L., & Peace, C.R. (1996). Physiological adaptations to deep water and land based running training programs. Medicine and Science in Sports and Exercise, 27(5s), s244 Abstract No. 1252.

Pate, R.R., & Krista, A. (1984). Physiological basis of sex difference in cardiorespiratory endurance. Sports Medicine, 1, 87-98.

Shephard, R.J. (1984). Tests of maximum oxygen intake: A critical review. Sports Medicine, 1, 99-124.

Svedenhag, J., & Seger, J. (1992). Running on land and in water: Comparative exercise physiology. Medicine and Science in Sports and Exercise, 24(10), 1155-1160.

Town, G.P., & Bradley, S.S. (1991). Maximal metabolic responses of deep and shallow water running in trained runners. Medicine and Science in Sports and Exercise, 23(2), 238-241.

Wilber, R.L., Moffatt, R.J., Scott, B.E., Lee, D.T., & Cucuzzo, N.A. (1996). Influence of water run training on the maintenance of aerobic performance. Medicine and Science in Sports and Exercise, 28(8), 1056-1062.

AUTHORS' NOTES

Excel Sport Science (Eugene Oregon) provided fifteen Aqua Jogger Belts to be used during the investigation of deep water running.

Randall L. Jensen is an assistant professor in the department of HPER at Northern Michigan University, Marquette, MI. John A. Mercer is a doctoral student in the department of Exercise and Movement Science at the University of Oregon, Eugene, OR.


Table 1: Peak responses during a deep water running and treadmill running 
graded exercise tests.

Variable	Test
	Males
	DW1		DW2#		DWA		TMR
VO2 	50.5  10.88	52.0  11.96	51.2 11.32	63.0 11.96 *
(L min)	4.1  0.99	4.1  0.10	4.1 0.95	5.0  0.11 *
HR	174  9.3	175  10.1	174 9.1		187  8.9 *
RPE	18  1.0		17  1.9		17 1.3		18  2.0
RER	1.31  0.08	1.31  0.14	1.31 0.10	1.31  0.08
	Females
	DW1		DW2#		DWA		TMR
VO2 	37.1  5.35	36.8  5.68	37.0 11.1	46.6 5.88 *
(L min)	2.2  0.25	2.2  0.34	2.2 0.28	2.7  0.37 *
HR	181  8.1  	183  6.6	182 7.1		192  7.3 *
RPE	17  1.6  	17  1.5		17 1.4		18  1.4
RER	1.35 0.13 	1.33  0.09	1.34 0.10	1.30  0.10


DW1, DW2: 1st and 2nd deep water running graded exercise tests.
DWA: the average between DW1 and DW2.
TMR: the treadmill running graded exercise test.

* Comparison between DWA and TMR (p < .01)
# Compariaon between DW1 and DW2 (p > .05)