A Collaboration Facilitator Model for Learning Virtual Environments

In virtual environments most pedagogical virtual tutors or facilitators supervise or guide the learning activity; they are task-oriented. In contrast, the here proposed facilitator is strictly about monitoring some aspects of collaboration and offering advice in this regard. In a multiuser virtual environment, that is, a Collaborative Virtual Environment, oral communication is chosen over written communication in order to enhance the feelings of presence, co-presence, and immersion for the user; but oral communication analysis presents a high resource overhead. As an alternative, the monitoring activity of this facilitator is based on two nonverbal cues of interaction: talking-turn patterns and object manipulation. An empirical study to validate this approach was conducted based on the participants’ perception regarding the suitability of the facilitator’s messages; the results showed that the students’ accepted a significant number of generated advice.


Introduction
As social creatures, humans are highly influenced by the interaction with their socio-cultural environment; this interaction contributes to the formation of the individual.
In a collaborative scenario, people interchange ideas and coordinate efforts to achieve their shared goals; whenever conflicts appear, according to Vygotsky [1], activity and communication conduct to knowledge. In this regard, Collaborative Virtual Environments (CVEs) fit into the Computer Supported Collaborative Learning (CSCL) paradigm, which central notion is that knowledge building is achieved through interaction with others.
CVEs can be said to merge Virtual Reality (VR) and multiuser distributed systems. Where the VR technology attempts to provide the users with the feeling of presence or "being there" in the computer's generated display [2], in which the user can interact with the virtual world. While the multiuser feature is expected to provide the users with the co-presence feeling, that is the feeling of "being there together" and interact with each other [3].
CVEs are presently unique in supporting the faithful communication of attention, the focus of action, and to some extent emotions, with respect to shared objects, across a distributed team [4]. This highly visual technology provides more aspects of social interaction, when compared to other approaches such as videoconference or shared desktop applications.
In a learning situation, a CVE also represents an adaptable context in which time, scale and physics can be controlled; where participants can get new capabilities such as the ability to fly, or to observe the environment from different perspectives and with any virtual embodiment [5].
Monitoring collaboration within the CVE for learning is helpful, either for a human or a virtual tutor in a number of ways: to personalize or adapt the learning activity, to supervise the apprentices' progress, to scaffold learners or to track the students' involvement, among others. However, this monitoring task demands to understand and assess the interaction in a computational mode [6].
For the computer to interpret interaction in learning, a number of approaches have been proposed (see e.g. [7]) aiming mainly to analyze and/or to model the Collaborative Learning (CL); however, most of these approaches are based on speech content within a two dimensional application. Our focus is different because our approach is to monitor the collaborative interaction that takes place when the learning task implies the manipulation of objects −otherwise the use of a CVE may not be justified−, in an effort to model some aspects of collaboration.
About the communication in a CVE, oral form seems to be the better practice [8] to enhance the presence, the co-presence and the immersion feelings of the users; mainly because it is the common way people communicate in real life during face-to-face collaborative spatial tasks. However, the analysis of speech comprises of high computer resource costs; this is due to difficulties like understanding paralanguages features such as the tone of voice or the voice inflexions, or the decoding of the different meanings a person might be giving to his/her words such as when sarcasm is used. Still, people communicate through multiples channels like body movements, gestures, facial expressions or certain actions; that is, their nonverbal communication interchange.
People's nonverbal messages enrich interaction while support mutual comprehension. Nonverbal cues are fundamental for a collaborative work by consciously or unconsciously conveying communicative intentions, and sometimes feelings or attitudes [9]. During interaction, the nonverbal behavior may comprise most of what people do [10]; it also includes paralanguage cues like loudness, tempo, pitch or intonation of speech. Moreover, the use of certain objects like the chosen outfit, or the physical environment when used to communicate something, without saying it, has traditionally been considered as nonverbal communication [11].
According to Knapp & Hall [11], the study of nonverbal communication in interaction had focused on three primary units: 1) The environmental structure and conditions. This category involves those elements that impinge on the human relationships but are not directly part of it.
Elements of the environment such as the furniture or lighting conditions; and the study of the use and perception of social and personal space, area denominated as Proxemics.
2) The physical characteristics of the communicators, including artifacts such as clothes, hairstyle or jewelry.
3) The various behaviors manifested by the communicators. The body movements and position also known as Kinesics: gestures, postures, touching behavior, facial expressions, eye behavior and vocal behavior. The nonverbal interaction that takes place in a VE will be evidently restricted by the media; hence these units of study present some considerations when it comes to a computer displayed scenario. For example, when the communication environment is virtual, the objects are mainly intentionally located in order to enhance the sense of the place and are rarely placed by the user. Probably the most significant difference of a VE compared to a real world environment in this regard, is that typically only the objects that have a purpose in the task or tasks to be carried out can be manipulated; and therefore they must be considered salient during interaction analysis.
The communicators' physical characteristics will be given by the VE application. Although some applications allow the users to select their graphical representation, that is their avatar, which will convey their physical characteristics in the environment, and some others allow the users to select some aspects such as the skin color or clothes. In a CVE for learning, the users' avatars will be usually standardized by giving the students more or less the same appearance.
As for body movements and positions, in a VE they will be adjusted to both the application and the task at hand. Hitherto, avatars have limited body movements and positions, even when they are tracked directly from the user physical movements, e.g. the most common practice in immersive VEs are the head and one hand movements [12]. As a result, only a limited range of nonverbal interaction can be executed and/or automatically extracted from the VE, and interpreted as part of the collaborative interaction during the learning session, particularly when there is not the vocal content interpretation, this has been discussed somewhere else [13].
From the available range of nonverbal cues that can be automatically extracted from a computer based collaborative application, two of them which are present even in text based or 2D environments were chosen to establish collaborative aspects of the learning session: patterns of verbal exchange and the manipulation of salient objects for the task. Silence-talk patterns have been used for interaction analysis providing means to understand group process [14,15] and associated with the manipulation of objects directly attached to the accomplishment of the task, indicators of collaboration can be derived (e.g. [16]).

Automatic Analysis of the Collaborative Session
In this section is presented the proposed computerized model for the analysis of the collaborative interaction session in a CVE, without the use of verbal content analysis and based on patterns of verbal interchange and object manipulation.
In order to achieve a task in a collaborative mode, participants have to create common ground, that is, mutual knowledge, mutual beliefs, and mutual assumptions [17]. This shared ground has to be updated moment-by-moment, a mechanism for the individual attempt to be understood, at least to an extent that the task can be accomplished. During conversation, the students make plans, take decisions, evaluate what they have done or what they must change, agreements are settled and consensus is reached. These important periods for CL will be here denominated as discussion periods.
Discussion periods can be inferred for the automatic comprehension of the collaborative interaction without using verbal content as follows: A talking turn, as defined by Jaffe and Feldstein [18], begins when a person starts to speak alone, and it is kept while this person is not interrupted. For practical effects, in a computer environment the talking-turn can be understood as a vocalization. In automatic speech recognition, the end of a utterance is usually measured when a silent pause occurs in the range of 500 to 2000 milliseconds [19], and the answer to a question usually goes in a smaller range, around 500 ms [20]; then a two seconds silence can be functional to automatically determine the end of a talking-turn.
Also, two discussion periods situations have to be distinguished, a simple question-answer interchange and the statements people working in a group produce alongside their actions which are directed to no one in particular [21]. Then, the procedure selected to establish a discussion period used was to determine when a number of talking-turns exchanges take place, where a pause longer than two seconds determines the end of the exchange, and it requires involving most of the group members [13].
After establishing discussion periods, a combination of discussion periods and the manipulation of objects, which as mentioned implies the accomplishment of the task, can be used to automatically assume the probable stage of the Plan-Implement-Reviewing cycle of the task, in addition to an Initial stage. This Initial stage is used because people tend to socialize before initiating collaboration in the strict sense [22]; a collaborative session usually begins with this introductory social phase, especially if the members of the group do not know each other.
The following stages for the accomplishment of the task can be automatically determined and changing the assumed state by observing discussion periods and/or object manipulation as shown in Figure 1, next explained. After the Initial stage, the students might start talking about how to accomplish the task or they might go directly to take care of it; if a discussion period occurs then the state changes to the Planning stage, otherwise, if the students initiate the manipulation of objects the state will be changed to the Implementing stage.
Once the students start the Implementation stage and if they have a discussion period, the state changes to the Reviewing stage. When the discussion period ends and if the session continues, the state changes back to the Implementation stage. Then, the participants can end the session.
Another important aspect of CL is participation. In CL the students are expected to take part in all activities with a more or less balanced participation [6,23,24]; a situation that enhances learning possibilities and helps to corroborate the interest and understanding of each student in the group shared goals. Participation rates are easy to compute by getting each member's participation time.
Monitoring the CL session this way offers insights about the group's approach to complete the task; an interaction model can be constructed. This model for the automatic monitoring of a CL session in a CVE was implemented in a virtual facilitator.
By comparing the collaborative interaction current state with the desired one, advices for the students can be formulated [25]. In CL discussion periods are expected from time to time; otherwise the students might be trying, for example, a trial and error approach or division of labor. Through the implementation of this particular model, two types of advices can be generated by the facilitator: 1) to encourage discussion periods according to the task stage; and 2) to try to balance participation among the students according to their participations rates. Text messages were set up for the facilitator to send during a CVE session. Different messages were shaped to encourage discussion periods based on the task stage (see Table 1 in the Application section) to avoid giving a direct instruction like: "you need to discuss", which might not be well received by the students and therefore not helpful; along with messages for students with over or under participation.
In order to verify the suitability of the messages according to what occurs in the collaborative session, a mechanism for the users to agree or disagree with these messages, the moment they are posted, was added to the implementation of the facilitator in a CVE. It can be argued that if the users perceived the facilitator advices as appropriate or suited with what is taking place in the environment, the monitoring task can be considered satisfactory. The next study was conducted to evaluate the users' perception of the advices the facilitator model provides.

Methods
Subjects. 90 undergraduate students, 68 males and 22 females from the Informatics School at the University of Guadalajara were asked to participate. 30 triads were formed according to available schedule to make the trial, the students' accessibility and personal preferences.

Application
The facilitator was implemented in a CVE that allows three users to work in a networked collaborative task. The three users' avatars are placed around a table, their workspace. Each user sees the scenario from a different point of view that corresponds to their place in the table; they do not see their own avatar (see Figure 2). In the workspace figures can be selected by a click and moved with the arrow keys. Group messages for discussion periods Discussion periods for these trials were established to be when the three participants had at least one talking-turn, before a silence-pause occurred.
Text messages placed at the bottom of the screen encouraging discussion periods were sent by the facilitator as listed in Table 1. In the Initial stage the Message_0 was sent when the students did not start a discussion period after an elapsed time A; and the Message_1 was sent if the students started with the Implementation instead of a more appropriate Planning stage.
During the Implementation stage the facilitator messages were sent as follows: the Message_2 was sent if the Reviewing stage did not occurred during an elapsed time B; the Message_3 was sent if the three students started to work at the same time in the Implementation, which might mean that they were dividing the task; and the Message_4 was sent if the students decided to finish the task without having a last Reviewing stage. These messages are group messages because they were addressed to no one in particular.

Individual messages for participation
During the Initial, the Planning and the Reviewing stages the speech rates of the participants were calculated, and in the Implementation stage, the speech as well as the objects manipulation rates were calculated (see [26]). As a result, when one of the participants had over participation, the facilitator sent this message with the users' name on it: "<<participantName>>, you should try to involve more your peers in the task". And the message with the user's name: "<<participantName>>, you should try to increase your participation", was sent when a participant had under participation.
If more than one group member with over and under participation were detected at the same time, the over participation message was preferred because it has been found that, at least in speech, over-participators readapt their rates better than under-participators [27]. Over and under participation messages are individual messages because they are addressed to one member of the group in particular.

Messages feedback
Whenever a message was sent, the three participants could agree on it by pressing the "O" key for being Ok with the message, and disagree with the "N" key for No.
The facilitator was made in such a way as to avoid being intrusive, trying not to break the flow of collaboration. For that, when two of the three participants disagree with one message, it was deactivated; it means it did not appear again. The participants also could continue with what they were doing even if they did not answer the messages, although the message disappeared from the screen when answered. And there was at least three minutes apart from one message to the other. This facilitator has been modeled after a number of previously conducted studies [28,29]. Materials and Task. Each participant was placed in a different room. The participants communicate with each other in oral form via a microphone using the Teamspeak TM v1.05.0.6 software. A videotape recorder was placed in one of the participants' spots pointing to the computer monitor; in this particular room the student instead of using headphones like the others, he/she had a speaker in such a way that the audio speech of the three participants could be recorder.
The task consisted of the re-arrangement of furniture in an apartment sketch, to make room for a billiard table considering certain rules about the required spaces between furniture and the number of times allowed for moving the furniture.
At the end of the collaborative session, participants were asked to answer a post-questionnaire with five questions.
Procedure. The students were verbally instructed on how to use the application and about the rules for the re-arrangement of the furniture. A description for the application functionality and the rules about the task were given to the participants in written text with a sketch view from the top, on how the furniture was placed at the start of the session. Participants were allowed a short testing time on the application and the audio before they started the session. The time to accomplish the task was restricted to 20 minutes, if the students finished the task before that time, they were asked to press the "F" key.
Data. Every student action within the environment was registered by the application in a text logs file. The logs content is: the user identification; the type of action , i.e. move furniture, point furniture, point to the table, a change in the point of view of the environment, when speaking to the others; and the time the input was made in minutes, seconds and milliseconds.
The application created a new file for each session with the timestamp of the initial and final time of each stage (i.e. Initial, Planning, Implementing and Evaluating stages). The rates and times of participation of each group member for discussion periods and manipulation of objects were also included.

Participants' evaluation for the messages
The facilitator sent 166 text messages in the 30 sessions. 30 of them were group messages encouraging discussion periods, where only Message_0, Message_1 and Message_4 were sent; and 136 were regarding the participants' individual rate of participation. In Table 2 can be seen the number of sent messages according to its type, and the evaluation of the participants on them. As expected from previous trials (e.g. [26]), not all the participants evaluated all the messages; from 501 possible answers, only 277 were received.
Of the evaluated messages, the lowest proportion of the participants that agreed with the message was 73 percent for the Message_4, see Table 2 in the Rate column, in which the expected values (E) shown were calculated with a 2-tailed 95% confidence interval. For this Message_4, the one sent at the end of the session to encourage a last review, due to the small population sample the dispersion is at a point in which the data is no longer representative. All of the other messages were well accepted with more than 80 percent agree evaluation.
For the individual messages, a distinction was made between whether the person who evaluated the message was the same one that the message was addressed to or not. Table  3 presents the evaluation rates of the agreement or disagreement of the participants based on this distinction. In this modality, the rate of answered messages increases when only the addressed person is considered from around slightly more than half of the times to a 77 percent.  Only in the last session number 30 had deactivation of messages, the deactivation occurs when two of the participants disagree with one message. These messages were: for over participation to the members 1 and 2; and the message for under participation to member 2.
Although each session was constantly monitored, the messages were at least three minutes apart from each other. If a change of attitude is assumed when the same message did not have to be consecutively resent, as shown in Table 4, then 65.3 percent of the times the user might be said to had followed the advice. By assuming that the change in attitude could take longer than three minutes, that is, consider the messages that had to be resent twice, the cumulative percent is 78.9 percent.

Post-questionnaire
The post-questionnaire had 4 questions to evaluate different aspects of the VE as shown in Table 5. The answers were in a five-level Likert scale, the scale for the answers was set up as follows: 1) not at all; 2) little; 3) regular; 4) good enough; and 5) completely.
One last question with an open answer, "If you wish, you can add any comment", was added in order to collect participants' diverse opinions. The results of the frequency on the numbers of the evaluations given to each question are presented In Table 6. The first question, number 1, was the one related to the messages sent by the facilitator; it presented a mean of 3.34, slightly above regular, in Table 6 the mean and other statistics for the 4 questions are shown.
It is worth to mentioning that there was not found significant statistical correlation between the number of received messages, either group or individual, or the total of agree, disagree or ignored evaluations with the overall evaluation the participant gave in the final questionnaire to the facilitator messages.
The presence feeling evaluation had a mean of 3.42 and the question regarding collaboration had a mean of 3.51 on the evaluation. The best evaluation is for question 3, regarding the co-presence feeling with a mean of 3.82.

Discussion
From the messages that were evaluated, the rate of acceptance or agreement of the different type of messages, as shown in Table 5, most of the time, under the perception of the participants, they were in accordance of what was taking place within the CVE; while Message_4 (see description in Table 2) requires more trials to be properly evaluated.
From the frequencies of number of the times the same type of message was resent, it could be presumed that participants followed the advice a significant number of times , although this needs to be further analyzed (see Table  7). Nevertheless, a confound response was that the participant individually or in group, even though they agreed each time a particular message was sent, sometimes they clearly continued with the same collaborative behavior, as when a message was resent 4, 5 or 6 times in a row.
The post-questionnaire answers in three of four questions showed, specifically in the first one that evaluates the messages, a very hard central tendency which could represent an unsure posture in the participants; however, no correlation was found with the evaluation in the post-questionnaire and the type of answers or rate of answers that participants gave to each message. A better designed post-questionnaire might give more insights to the participant's overall perception of the messages sent by the facilitator.
About the answers on the last open question of the questionnaire, only one participant expressed something about the messages, he said that there was something odd about them but he did not explained further.

Conclusion
A facilitator modeled for collaboration was implemented in a CVE where three participants connected by a network solved a re-arrangement of objects type of task. The facilitator monitored two aspects of collaboration based on two nonverbal interaction cues, i.e. patterns of speech and object manipulation: 1) the occurrence of discussion periods in different stages of the accomplishment of the task; and 2) the group members even participation in discussion and object manipulation. The facilitator encourages the group discussion periods under certain circumstances related to the task stage by sending different messages; and tries to balance the participation in both discussion and object manipulation directed to the over-participators or to the under-participators.
Measuring a facilitator's skills in real life is a subjective evaluation, most of the time conducted through post-questionnaires. Also, even with an experienced human facilitator there are a number of circumstances that can weaken his/her influence in the group; the immediate answer to the facilitators' advices could be an adequate form of evaluation.
In order to evaluate the suitability of the messages sent by the facilitator, the participants were asked to evaluate each message by being in agreement or disagreement with it; however, they were not forced to do so which caused ignored messages. As to be expected, when the message was sent to a particular person, this person was more likely to evaluate the message than the not addressed people.
The evaluation of the messages presents a significant number of acceptances (higher than 70 percent in all type of messages) from the participants. Although, some of the messages were not sent (i.e. Message_2 and Message_3, see Table 1) during the trials, while others (i.e. Messsage_4) require more trials to be evaluated; also, there is a high number of messages not evaluated by the participants. A more proper post-questionnaire, specifically an appropriate form to better understand the participants' evaluation of the facilitator model, could be an aid to give insights on the participants' perception of the messages.