Teaching Pronunciation with Computer Assisted Pronunciation Instruction in a Technological University

The purpose of this study is to evaluate the effectiveness of computer assisted pronunciation instruction in English pronunciation for students in vocational colleges and universities in Taiwan. The participants were fifty-one first-year undergraduate students from a technological university located in central Taiwan. The participants received an eight-week pronunciation instruction, in which the participants were presented model utterances and corresponding pitch contours of target sentences and then practiced pronunciation of target sentences with computer assisted pronunciation training software. Computerized speaking tests were conducted before, during, and after the training to measure the improvement of pronunciation quality, and a questionnaire was administered at the end of the instruction. The results of repeated measures analysis of variance on the scores of the tests indicate that the pronunciation quality of the participants was significantly improved. The results of this study provide empirical evidence in teaching pronunciation to vocational college students with computer technology.


Background/ Objectives and Goals
Taiwanese people's English proficiency level is unsatisfactory to face the international competitions and challenge in the global village era, according to several recent reports of standardized English proficiency tests and survey of adult English ability. For example, Education Testing Service (ETS) reported that Taiwanese examinees of global TOEFL in 2012 had only scored averaged 78 points, which was lower than global average and ranked 20 among 30 Asian countries [1]. It is worth noting that both the scores in speaking and writing were ranked 21, lower than those in reading (ranked 9) and listening (ranked 13).
The reasons of Taiwanese's poor speaking skills are manifold. Cappelle and Curtis [2] pointed out that, among the four language skills, the listening and speaking skills of Asian English learners are weakest even though the two skills are fundamental in communication. In Taiwan, English-as-Foreign Language (EFL) classes are mostly teacher-centered by employing Grammar-Translation method, which stresses on the importance of vocabulary and grammatical structures with little attention given to the spoken form of the language [3] [4]. This approach resulted in the class activities of Taiwan EFL classes which focus on the word-decoding, the phonetic identification, and the grammar drills [5] [6]. Consequently, speaking skills and competence of communication in different contexts are not the focus of English language teaching objectives [7] and there is little English listening and speaking instruction in Taiwan's high schools (Yao as cited in [8]). When students enroll in colleges and universities, they may have better vocabulary and grammar knowledge, but their listening and speaking skills are inadequate.
Some studies on the oral skills and communicative competence of undergraduate students in Taiwan have been conducted over the past years. A survey conducted by [9] on Taiwanese college students in first-year English classes indicated that speaking ability, among the four skills, was considered as the one that should be improved by 83.7% of the students, while Chia et al. [10] found out that the university students in higher classes perceived speaking skill more important than reading. In another survey conducted by Wu [11], 69% of the college interviewees perceived their English proficiency level as low, especially for their speaking ability, and 62% thought of "poor English pronunciation" as the common problems encountered during English learning, which coincided with the assertion that pronunciation was the most frequent cause of intelligibility problems in ELF interactions [12].
In particular, students in the technological colleges and universities encounter more severe problems than their counterparts in comprehensive universities. Their English proficiency is even lower because hands-on skills are emphasized in the course of senior vocational high schools with fewer hours and less stress in English courses [13]. For example, in 2011 TOEIC tests, test-takers from the private technological universities scored an averaged 434 points, far below averaged 557 points for overall, 638 points for undergraduate students of national comprehensive universities, 567 points of private comprehensive universities, and 507 points of national technical universities [14].
In Taiwan, the English proficiency of the students of private technological university is unsatisfactory due to the lack of appropriate teaching target and insufficient practice during their learning course. Various teaching approaches to improving this situation have been proposed in numerous studies, among which computer assisted pronunciation instruction could be a promising option with convincing pedagogical effectiveness shown in previous works. Computer assisted pronunciation training (CAPT) offers a medium for increasing users' access to their own and others' pronunciation performance, for focusing their attention on phonology, and for acquiring new pronunciation patterns. Nunan [15] suggested that CAPT-based teaching have several advantages over conventional materials in providing: individual plans; anywhere/anytime instruction; patient tutoring; a private space to make mistakes; immediate, individualized instruction; detailed records of achievement; and self-paced learning. Computer assisted pronunciation instruction provides solutions for the lack of time available for contact with the language, which might be the most important reason for incomplete acquisition of the foreign languages [16]. Furthermore, Derwing and Munro [17] pointed out that computer technology increases foreign language learners' exposure to oral demonstrations in the target language, extending the teacher's speech in class and allowing the virtual interaction with native speakers.
However, very few studies explore the effectiveness of teaching English pronunciation via using computer assisted pronunciation instruction to the students of technological universities. Therefore, it is imperative to conduct a study investigating the pedagogical effectiveness of applying computer assisted pronunciation instruction to improve their pronunciation quality. In this study, the pronunciation quality of learners is measured by MyET, an abbreviation for My English Tutor, which is on-line software designed for English pronunciation and oral skill training on the basis of Audiolingualism method and communicative approach. Providing a variety of teaching and practicing materials for learners with different levels of English proficiency, MyET has been widely adopted as a language teaching and learning platform among senior-high, junior-high schools and universities in Taiwan. The interface and scoring example can been seen in Figure 1. On the screen, learner's scores of four aspects of pronunciation, including segmental pronunciation, intonation, fluency, and stress, are shown on the right; The waveforms of teacher's and learner's sounds are displayed at the bottom of the screen.
The objective of this study is to evaluate the effectiveness of learning English pronunciation with computer assisted pronunciation instruction for the students of private technological universities in Taiwan. The effectiveness of computer assisted pronunciation instruction is evaluated by measuring and comparing the pronunciation quality of the participants who are students of a private technological university located in central Taiwan.

Experimental Design
To measure the improvement on pronunciation quality, a one group repeated-measures design was used to measure the effects of eight weeks of training (five sessions of about 45 minutes each) in English pronunciation with computer assisted pronunciation instruction and practice [18]. The participants took a computerized speaking test provided by MyET immediately before, during, and after the training.

Participants
A total of 51 first-year undergraduate students of a university of technology located in central Taiwan (34 female, 17 male) participated in the main study. All participants enrolled in the first semester of the first-year General English course, a two-credit required course, and were assigned to class A as they were the highest 25% group in terms of English scores in the Joint Entrance Examination of Technological and Vocational Universities. They came from the departments in College of Human Ecology, including Department of Cultural Creativity and Design, Department of Digital Living Innovation, Department of Food and Beverage Management, and Department of Geron-Technolgoy and Service Management. None of them were English language majors and none had studied or lived abroad at that point. Their average English proficiency was estimated to be below elementary level or A2 level of the Common European Framework of Reference for Languages. As the class size is larger than fifty students, little class time was generally available for individual pronunciation practice.

Instruments
The independent variable in this study is the instruction and practice with computer assisted pronunciation training (CAPT) tool and the dependent variable is the improvement on pronunciation quality at sentence level after eight weeks of training course.
In this study, the pronunciation quality of participants was measured by taking a speaking test provided by MyET, an on-line software designed for English pronunciation and oral skill training on the basis of Audiolingualism method and communicative approach. The content of MyET software used in this study was a mock test of General English Proficiency Test (GEPT) Elementary speaking test, consisting of three tasks: repeating, reading aloud, and answering questions.

Laboratory Settings
The testing and training, and scoring procedure of the study was conducted in Multimedia Language Laboratory of Department of Applied Foreign Language of the technological university. In the laboratory, there were 60 multimedia desktop computers equipped with hardware (including monitor, keyboard, mouse, and Earphone with Microphone), software (Windows 7 OS and its accessory software) and connection to the Internet so that each participant was able to record his/her own utterance and to access the Internet. Via the broadcasting system of the laboratory, the teacher can display a variety of materials, such as video, audio, text, and files prepared by the teacher, on the monitor screens in front of the participants. The client software of MyET was installed on all computers in the laboratory. A server with MyET administration authority provided the Internet service to all users in the university with granted account and password.

Procedures
The study was conducted during the class time which was allocated from a first-year General English course. The procedure was 11 weeks long, including test (pretest, mid-training and posttest), scoring, and training. The instruction, practice, and test activities during the procedure were conducted at Multimedia Language Laboratory, Department of Applied English during the class time 4:00~4:50 p.m. The overall procedure is described as follows.
Testing procedure. First, at week 1, all participants took a computerized test provided by MyET immediately before the training. The total test time was approximately 35 minutes and the tasks of the test were repeating, reading aloud, and answering questions. In the first part, the participants first saw five sentences (one sentence at a time) shown on computer screen and heard model utterance of that sentence in the earphone, and then produced the sentence into the microphone by imitating the model utterance. The participants were shown sentences on the computer screen in the second part of the test and asked to read aloud these sentences without model utterance. In the third part, after hearing questions and model utterance of answers to the questions from the earphone, the participants answered the questions by repeating the model utterance. The participants' pronunciations were recorded and scored by MyET with Automatic Speech Analysis System (ASAS) based on the segmental pronunciation, intonation, fluency, and stress. The participants' scores were enrolled in MyET system. The participants took the same computerized speaking test at week 9 (during the training) and week 11 (after the training).
Scoring procedure. In the tests before, during, and after the training, the pronunciations of the participants were analyzed and scored by MyET's ASAS on the items of segmental pronunciation, intonation, fluency, and stress.
Training procedure. After the pretest, the participants received four computer assisted pronunciation instruction in the language laboratory. Each session last approximately 45 minutes over an 8-week period. Aiming at providing mock tests of speaking tests of General English Proficiency Test Elementary, the version of MyET in the study allowed the teacher (administrator of class) to assign homework to the students and give quiz tests for the learners to complete during a pre-assigned period of time.
In each training session, the participants received 20-minute instruction from one of the researchers in the computer laboratory. In the instruction period, the participants were shown a series of PowerPoint slides, in which the prompt sentences and corresponding model recordings from native-speakers, provided by MyET, were imbedded. The corresponding spectrogram and pitch contour computed by Praat for each model recording were shown below the sentence. After watching each PowerPoint slide and listening to the model recording, the participants practiced the pronunciation of the sentence shown on the PowerPoint slide. An example of the PowerPoint slides is shown in Figure 2. In the next period of instruction session, the participants began to practice pronunciation of the assigned sentences for each session with MyET. As with the testing procedure, the participants were instructed by MyET to listen to model utterance of the assigned sentences presented in the earphone and produce the same sentences into the microphone. The scores of their pronunciation for each sentence were shown on computer screen immediately after they finished pronunciation of each sentence.

Results
In this study, the pronunciation quality of the participants was measured by the scores rated by the software MyET immediately before the start of the program, at week 9, and at the conclusion of the program (at week 11). To evaluate the effectiveness of the instruction, one-way within-subjects ANOVA is performed, with a significance level .05. Table 1 displays the mean and standard deviation for each of the three tests. Notice that the mean scores were lowest for prestest (mean = 51.05), followed by week 9 (mean = 57.79), and posttest (mean = 63.52). Note that the following analyses were performed on the data of 35 participants who took all three tests among the 51 students of that class. As the result of Mauchly's Test of Sphericity, which tests the null hypothesis that the variances of the differences are equal, is not statistically significant (p = .191 > .05), the null hypothesis is accepted, i.e., sphericity has not been violated. Table 2 reveals the results of one-way within-subjects ANOVA. Since the p-value is .000, the mean scores for the three MyET tests were statistically different (F(2,68) = 33.632, p < .05). Since the test for the scores of three time occasions was significant, pairwise comparisons were tested using dependent-samples t-tests to determine which of the time occasions are significantly different from one another. Three pairwise t-tests were performed: before vs. week 9, before vs. after, and week 9 vs. after. For three tests and an alpha level of .05, the per comparison level is .05/3 = .016 to ensure that the probability of committing a Type I error (rejecting the null hypothesis when it is true) will be no greater than .05 for the entire set of follow-up tests. For the test of the pairwise comparisons, all pairs are significant, as p < .016 for each test. The results indicated that the pronunciation scores were significantly higher at the end of the program (mean = 63.53, standard deviation = 11.56) than at week 9 (mean = 57.79, standard deviation = 12.44), t(34) = -4.487, p < .016, at the end of the program as compared to before the program began (mean = 51.05, standard deviation = 14.75), t(34) = -7.935, p < .016, and before the program began and at week 9, t(34) = -.3.991, p < .016.

Discussion and Conclusion
The results of the analysis on learner scores rated by MyET show that the participants' pronunciation performance of the prompted sentences were significantly improved after the training. It might be argued that the increments of scores might arise from the practice effect, which means that the participants received higher scores in the following tests simply because they were more familiar with the procedure of taking computerized speaking tests after the pretest. To further examine the improvement of pronunciation quality, observation was made by comparing the pitch contours of the participants' recordings before and after the training, and the sample speech of native speakers. It was observed from the comparison that the "flatness" of the pitch of participants' pronunciation had been reduced and the pitch variation had been increased after the training. Some participants' recordings indicated that the participants learned to produce rising-falling intonation pattern while reading aloud wh-question and rising intonation pattern while reading aloud yes/no questions. The improvement in producing appropriate intonation pattern might be attributed to the effect of receiving instruction with the focus of intonation and practicing with computer assisted pronunciation training software.
Another pedagogical feature of the instruction session is that the participants were presented the model utterance and corresponding pitch contour of every assigned sentences before they proceeded to practice pronunciation of those sentences. Displaying both the auditory and visual information of intonation may help the participants perceive the pitch variation in an utterance and produce correct pronunciation, as proposed by Molholt [19]. In addition, almost immediately after the participants produced their pronunciation to the microphone, the pitch contours of their production and model utterance were shown on the bottom of the screen so the participants were able to receive visual feedback. The participants' favorable opinion on visual display may implicate that audio-visual feedback is more effective in intonation learning than auditory feedback in intonation teaching.