Evaluation of Auditory and Visual Feedback on Task Performance in a Virtual Assembly Environment

This paper presents the creation of an assembly simulation environment with multisensory feedback (auditory and visual), and the evaluation of the effects of auditory and visual feedback on the task performance in the context of assembly simulation in a virtual environment (VE). This VE experimental system platform brings together complex technologies such as constraint-based assembly simulation, optical motion tracking technology, and real time 3D sound generation technology around a virtual reality workbench and a common software platform. A peg-in-a-hole and a Sener electronic box assembly task have been used as the task cases to conduct the human factor experiment, using sixteen participants. Both objective performance data (i.e., task completion time, TCT; and human performance error rate, HPER) and subjective opinions (i.e., questionnaires) on the utilization of auditory and visual feedback in a virtual assembly environment (VAE) have been gathered from the experiment. Results showed that the introduction of auditory and/or visual feedback into VAE did improve the assembly task performance. They also indicated that integrated feedback (auditory plus visual) offered better assembly task performance than either feedback used in isolation. Most participants preferred integrated feedback to either individual feedback (auditory or visual) or no feedback. The participants' comments demonstrated that nonrealistic or inappropriate feedback had a negative effect on the task performance, and easily made them frustrated.


Introduction
In the manufacturing industry, VE technology has the potential to interactively evaluate assembly-related engineering decisions, and to factor the human elements and considerations into finished products very early in the development cycle, without needing a physical realization of the products (Dai, 1998;Banerjee & Zetu, 2001;Ong & Nee, 2004).This could potentially lead to lower cost, higher quality product, and shorter time to market, thus improving the competitiveness of innovative products.Assembly is an interactive process involving an operator (user) and the handled objects, and hence simulation environments must be able to react according to the user's actions in real time.Furthermore, the action of the user and the reaction of the environment must be presented in an intuitively comprehensible way.Therefore, it is of great importance to investigate the factors related to information presenta-tion modes and integration mechanisms, which affect the user's performance in performing an assembly task in VEs.Multimodal information presentation, integrated into the VE, has the potential to stimulate different senses, increasing the user's impression of immersion and the amount of information that is accepted and processed by the user's perceptual system.Consequently, the increase of useful feedback information may enhance the user's efficiency and performance while interacting with VEs.However, despite recent efforts in assembly simulation (Lin & Gottschalk, 1998;Maxfield, Fernando, & Dew, 1998;Jayaram et al., 1999;Steffan & Kuhlen, 2001;Marcelino, Murray, & Fernando, 2003) and 3D sound performance modeling in VEs (Wenzel, 1992;Begault, 1994;Hahn, Fouad, Gritz, & Lee, 1998;O'Brien, Cook, & Essl, 2001;Doel, Kry, & Pai, 2001), very limited research has been conducted to investigate and evaluate the effects of multimodal feedback mechanisms, especially 3D auditory and visual feedback, on virtual assembly task performance within VEs (Kitamura, Yee, & Kishino, 1998).
This paper presents the overall system architecture implemented for creating a multimodal VAE, the approaches adopted to evaluate the factors that affect the user's performance in carrying out the assembly tasks, and the relevant results acquired from the experiments.In particular, it addresses whether the introduction of auditory and/or visual feedback into the VAE improves the assembly task performance and user's satisfaction; and which type of feedback is the best: neutral, visual, auditory, or integrated feedback (auditory plus visual).The remainder of the paper is organized as follows.Section 2 describes the experimental platform for assembly task performance.Section 3 presents the experiment of task performance evaluation.The experimental results are analyzed and discussed in Section 4. Finally, the conclusions are given in Section 5.

Experimental Platform for Assembly Task Performance
This section firstly presents the hardware configuration and software architecture of the experimental sys-tem platform for multimodal virtual assembly task performance.It then discusses the schemes of auditory feedback generation, and unification of auditory and visual feedback presentation.

Hardware Configuration of the Platform
The hardware configuration of the experimental system platform for virtual assembly task performance comprises three major parts: the visualization subsystem, the auralization subsystem, and the real time optical motion tracking system (see Figure 1).The core of the visualization subsystem is the Trimension's V-Desk 6, a fully integrated immersive L-shaped responsive workbench driven by a Silicon Graphics Incorporated (SGI) desk-side Onyx2 supercomputer with four 250 MHz IP27 processors and an InfiniteReality-2E Graphics board.The Trimension's V-Desk 6 is integrated with StereoGraphics' CrystalEyes3 liquid crystal shutter glasses and the infrared emitter that is connected to the Onyx2 workstation.These are used to generate stereoscopic images of the virtual world; one from the user's left eye perspective, and the other from the right eye.When the user wears a pair of CrystalEyes liquid crystal shutter glasses to view the virtual world, these images are presented to the corresponding eye, providing the user with depth cues that make the immersive experience realistic.
The auralization subsystem is based on a sound server (i.e., Huron PCI audio workstation;Huron, 2000), which is a specialized digital signal processing (DSP) system.It employs a set of TCP/IP protocol based procedures in terms of spatial network audio protocol (SNAP; Huron, 2000) to allow VE host (i.e., visualization subsystem) to transmit the attributes of the assembly scene, positional information of the user, and the sound-triggering events to the sound server through a local area network.The VE host sends packets specifying the auditory-related attributes of the scene and events, such as collisions and motion between the manipulated objects, the position of the event, the position of the user, and other environmental attributes that are derived from the geometry of the assembly environment.From these packets, the auralization subsystem generates a set of auralization filters and sends them to the DSP boards.Based on an event-driven scheme for the presentation of objects' interactions, the DSP board samples and processes sound materials (i.e., data streams) with specified filters.The processed sound materials are then sent back to a set of headphones or an array of loudspeakers within the VE area in analog form through coaxial cables.The auditory feedback in this experiment was presented to the user using a pair of Sennheiser HD600 headphones.
The optical motion tracking system (i.e., Vicon's 612 workstation; Vicon, 2001) provides dynamic, real time measurement of the position (X, Y, and Z) and the orientation (azimuth, elevation, and roll) of the tracked targets such as the user's head and hands, and manipulation tools, using passive-reflective markers and high speed, high resolution cameras.It is connected to the VE host using the TCP/IP protocol over a local area gigabit Ethernet.A wand is used to support interactive object selection and virtual assembly operations.A virtual 3D pointer with ray-casting and a virtual hand are utilized as the interaction metaphor for the assembly operation.

Software Architecture of the Platform
The software environment is a multi-threaded system that runs on SGI IRIX platforms.It consists of the User-Interface/Configuration Manager, the World-Manager, the Input-Manager, the Viewer-Manager, the Sound-Manager, the Assembly-Simulator, the CAD Translator, and the CAD Database (see Figure 2).The User-Interface/Configuration Manager tracks all master processes to allow runtime configuration of different modules.Figure 3 shows the look and feel of the user interface.
The World-Manager is responsible for the administration of the overall system.It coordinates the visualization, user's inputs, databases, assembly simulation, and visual and auditory feedback generation.The World-Manager fetches the user's inputs for manipulation, produces constrained motion using the Assembly-Simulator, and passes the corresponding data (e.g., the position and orientation information of the objects and the user) to the Viewer-Manager and the Sound-Manager for auditory and visual feedback generation.The new data is used to update the scene graph and control the sound server via the Sound-Manager.The World-Manager also has the responsibility for synchronizing various threads such as rendering and collision The Input-Manager manages user-object interactions, establishing the data flow between the user's inputs and the objects that are held by the World-Manager.It supports devices such as pinch gloves, wands, and Vicon's optical motion tracking system, and so on.These inputs describe the user's actions/commands in the VE.Each device has a thread to process its own data.These threads run in parallel with the rendering threads to achieve low latency.Once the assembly objects are loaded into the scene graph via the CAD-Translator, the Input-Manager allows the user to select and manipulate the objects in the environment.The Sound-Manager gets the location data of the user, the positions of the collision and motion (i.e., sound sources), and the parameters relating to sound signal modulation from the World-Manager and the Assembly-Simulator, and then uses the application programming interface (API) of the Huron audio workstation (Huron, 2000) to manage the audio workstation via the local network using the TCP/IP protocol.
The Assembly-Simulator carries out collision detection between the manipulated object and its surrounding objects, supporting interactive constraint-based assembly operations.During object manipulation, the Assembly-Simulator samples the position of the moving object to identify new constraints between the manipu-  lated object and the surrounding objects.Once new constraints are recognized, new allowable motions are derived by the Assembly-Simulator to simulate realistic motion of the assembly objects.Parameters such as the accurate positions of the assembly objects are sent back to the World-Manager, which defines their precise positions in the scene.When a constraint is recognized, the matching surfaces are highlighted to provide visual feedback, and/or 3D auditory feedback is generated through the Sound-Manager and the sound server.
A description of the virtual assembly scene management and rendering can be found in Zhang, Murray, and Fernando (2003), Zhang and Fernando (2003), and Zhang, Sotudeh, and Fernando (2005).

Auditory Feedback Rendering
Since the user's interaction with VE and the assembly simulation requires sufficient real time behavior, limited computation time is available for 3D sound simulation.Due to the limitation of available processing time, detailed auditory rendering has not been implemented in this work.In order to generate real time 3D sound within the limited computational power, some tradeoffs have to be made.In this research, binaural impulse responses are used to simulate the auditory-related attributes of the assembly scene, and headphones are used to play back the auditory feedback.
For the impulse response generation of the virtual assembly scene, the simplified image source method has been utilized to calculate the room impulse response.A box is used to approximate the volume of the geometry of the virtual assembly scene.The direct sound and the first-order reflections from the six surfaces of the box are calculated at runtime.Each sound arrival (direct or reflected) is characterized by the time of arrival based on the distance traveled by the echo path, the direction of arrival, and the level of attenuation due to either distance of sound propagation or material properties of the reflective surface.From the second order reflections to the reverberant tail, the impulse responses are precomputed depending on environmental parameters such as the geometry of the scene, the materials of the scene boundary, the locations/orientations of the sound sources/users, and so on.
The B-format has been selected for sound field representation and headphone playback, since it is a convenient method for creating and manipulating the sound field in auralization systems.It is used as an intermediate for generating sound material, which involves four .wavfiles labeled W, X, Y, Z, and the B-format signals are then decoded for headphone playback.The Bformat is essentially a four-channel audio format that can be recorded using a set of four coincident microphones that are arranged to provide one omnidirectional channel (W channel) and three figure-8 channels (X, Y, and Z channels) (McGrath & Reilly, 1999;McGrath, 1999).This set of X, Y, Z, and W signals represents a first-order approximation to the sound field at a point within the assembly scene.In the first step of the headphones playback process, a DSP function is built to filter the four B-format components (i.e., four channel signals), producing two outputs in such a way that a static binaural presentation can be made of the B-format sound field.The next step is to add a mixer that can rotate the X, Y, and Z components of the sound field prior to the binaural filters so that, in conjunction with the optical head-tracking device (i.e., Vicon's optical motion tracking system), the sound field remains static when the user moves his/her position or turns his/her head.The head tracking is achieved by rotating the X, Y, and Z signals using a 3 ϫ 3 matrix.The head related transfer function (HRTF) data are loaded from the disk of the sound server to the DSP memory at runtime.

Unification of Visual and Auditory Presentation
The visual aspect of the VAE focuses on the geometric definition, motion description, the physical properties of the assembly objects, and visual feedback generation displayed as modifications of color, hue, and saturation.The auditory part focuses on the 3D auditory feedback generation process from sound activation, sound synthesis, and sound propagation in the virtual assembly scene, to auralization in the user's ears.The virtual world software extracts the spatial coordinates of Zhang et al. 617 the user's and sound source's positions.These coordinates are transmitted via TCP/IP packets to the sound server, which runs a separate virtual world model with the required auditory-related properties.The sound server then spatializes the sound materials according to the received geometry information, introducing the corresponding scene attributes related to the auditory cues.
The system components and the overall information flow are shown in Figure 4.The upper half of the figure shows the auditory stream while the lower half shows the visual stream.Visual models are created using CAD tools (e.g., Pro-engineer, AutoCAD, etc.), transformed and imported into this system with OpenGL Optimizer software (OpenGL, 2001).Auditory-related models are generated using CATT-Acoustic, a software program for acoustics prediction/auralization (CATT, 2001), and then loaded into Huron PCI audio workstation with the relevant API (Huron, 2000).

Task Performance Evaluation Experiment
This section presents an experimental evaluation of assembly task performance including experiment design, experimental hypotheses, objective evaluation, and subjective evaluation.

Independent/Dependent Variables.
This research evaluated the effects of auditory and visual feedback on assembly task performance, with the hypothesis that the performance could differ significantly between different feedback conditions.The performance is measured on the basis of objective and subjective means, where objective measures are the time taken to complete the assembly task and the number of performance failures, and subjective measures are the questionnaires for subjective ratings and preferences.There are two independent variables in the experiment: auditory feedback and visual feedback, which can each be present or absent.The variations of the independent variables form the different feedback conditions of the multimodal VAE system as described in Table 1, namely, neutral condition, visual condition, auditory condition, and integrated feedback condition.The dependent variables are the TCT and HPER under each experimental condition, and subjective ratings and preferences.

Task
Cases.This research used two assembly task cases: a peg-in-a-hole assembly task and a Sener electronic box assembly task.In the case of the peg-in-a-hole assembly task, the shapes of the mating parts are very simple and geometrically well defined.Visual feedback can help the participants make a rough alignment between the axis of the peg and the axis of the hole, and auditory feedback can aid the participants to achieve precise cylindrical alignment.This makes the effects of different feedback cues contribute to the task performance distinctively.Meanwhile, the peg-in-a-hole task only needs the participants to perform one pickrelease operation, thus ruling out the time differences brought in by different participants when they take different times to release the previous parts and pick up the next ones in relatively complex tasks.It follows that the peg-in-a-hole assembly task is an appropriate case for accurately measuring the TCT and performing the objective evaluation.The deficiency of the peg-in-a-hole assembly task is that it does not provide enough operations for participants to make a subjective evaluation.Complex assemblies such as the Sener electronic box case can provide participants with more operations and richer information.However, the multiple pick-release operations in the Sener electronic box case bring in unpredictable time differences by different participants when they take different times to release the previous part and pick up the next one.This may introduce uncertainties and unforeseen outcomes in the experiment.Consequently, this experiment was conducted by using a complex case study, the Sener electronic box case, for the purpose of subjective evaluation, and the relatively simple and well-defined peg-in-a-hole case for the purpose of objective evaluation of the VAE.

Experimental Measures.
This experiment is a 2 ϫ 2 (two-factor) within-participants design with auditory feedback (present versus absent) along with visual feedback (present versus absent) being the within factors.For the four conditions (auditory ϫ visual), the presentation order was counterbalanced across participants and conditions, and determined by employing a 4 ϫ 4 Latin Square, providing sixteen different orders of feedback presentation.Each participant was randomly assigned to one of the orders.Considering the learning effects observed from the pilot study and alleviating the workload generated by the experiment, only data of the assembly TCT and HPER from the third and fourth trials were recorded and quantitatively analyzed to calculate the average TCT and HPER under each condition, although under each condition each participant went through four trials.

Participants.
Sixteen participants from the students and staff of the Center for Virtual Environments at the University of Salford were invited to attend this experiment.All of them have normal or corrected to normal visual acuity, normal color vision, and normal hearing.They do not have any VE experience, but they may have varying computer experience from basic email and office processing to programming skills.They are healthy without any major cognitive defects or physical limitations.No participants dropped out in the middle of the experiment, and all of them went through the experiment smoothly.

Experimental Procedures. Participants
were randomly assigned to one of the sixteen orders of feedback presentation prior to their arrival.Upon arrival, each participant was asked to read and sign a consent form.They were then required to complete a questionnaire in order to assess standard demographic information, including any previous computer and VE experience of the participant.The participants' color vision and hearing were then tested by simple means.Next, a briefing of the specifics of the two cases was given, and diagrams of the assembly parts and the processes of the two cases were shown.The participants were asked to complete the peg-in-a-hole assembly task as quickly as possible, and then were brought to the responsive workbench-based VAE to start the experiment.Each participant was required to complete the peg-in-a-hole task four times and then the Sener electronic box four times, once for each feedback condition.
For the peg-in-a-hole assembly task, the number of performance failures of each participant under each feedback condition was counted, and the assembly TCT and HPER from the third and fourth trials under each feedback condition were recorded.For the Sener electronic box assembly case, when the participants completed the task under each feedback condition, they were required to complete the questionnaires which will be described in Section 3.4.

Experimental Hypotheses
The following hypotheses were assumed in the experiment: • The use of visual feedback can lead to better task performance than the neutral condition.Task performance is measured by TCT, HPER, and subjective satisfaction.TCT is expected to decrease by providing essential collision, interaction, and constraint cues by visual feedback for the assembly task.HPER is expected to decrease by introducing visual feedback into VAE.The subjective preference for and satisfaction with the interface associated with visual feedback is expected to be higher than without any feedback.It is expected that this could be indicated by the visual feedback condition having statistically significant higher scores on the rating scales in the questionnaires compared to the neutral condition.
• The use of 3D auditory feedback can lead to better task performance than the neutral condition.Better task performance is expected to be shown by shorter TCT, lower HPER; and better subjective satisfaction is expected for the auditory feedback condition than the neutral condition.Auditory feedback provides more information for producing a realistic and productive application than without any sensory cues, and the user could be better immersed with this information.Subjective preference for and satisfaction with the interface associated with auditory feedback is expected to be higher than without any feedback.This could be demonstrated by the auditory feedback condition having statistically significant higher scores than the neutral condition in the questionnaires.
• The use of integrated feedback can lead to better task performance than either feedback used in isolation.It is anticipated that this could be shown by shorter TCT, lower HPER, and statistically significant differences between the related rating scale results for integrated feedback and those for just auditory or visual cues.

Objective Evaluation
For the objective evaluation, a peg-in-a-hole assembly task (for the scenario see Figure 5, for the implementation see Figure 6), which is relatively simple but geometrically well defined and accurate for TCT measurement, was used to explore and evaluate the effectiveness of neutral, visual, auditory, and integrated feedback mechanisms on the assembly task performance.The peg-in-a-hole assembly task has several phases: (a) Placement of the peg towards the upper surface of the plate (see Figure 5a); (b) Collision between the bottom surface of the peg and the upper surface of the plate (see Figure 5b); (c) Constraint recognition (see Figure 5b); (d) Constrained motion on the plate (see Figure 5c); (e) Alignment constraint between the peg cylinder and the hole cylinder (see Figure 5d); (f) Constrained motion between two cylinders (see Figure 5e); (g) Collision between the bottom surface of the peg ear and the upper surface of the plate (see Figure 5f); and (h) Constraint recognition (see Figure 5f).Different realistic 3D localized sounds and/or color intensity/modification of the colliding polygons are presented as the action cues for each of the aforementioned phases.
The objective evaluation is based on the TCT and HPER.The TCT, which represents the timespan between the start and the end of the peg-in-a-hole task, was recorded by the experimental platform.The software timer was set to start when a participant grabbed the peg to begin the assembly task process, and to stop when the participant completed the assembly process and released the peg.The system clock drove the timer.The number of failures under different feedback conditions was counted by the experimental platform.A trial was considered to be a failure when the participant made some errors and thus did not complete the task successfully, or when he/she completed the trial beyond a fixed time period.The HPER was calculated by using the number of the failures and the total number of trials.

Subjective Evaluation
For the subjective evaluation of neutral, visual, auditory, and integrated feedback mechanisms on the assembly task performance, the Sener electronic box assembly case from an aerospace company called Sener in Spain was used (see Figure 7).
The assembly task scenarios for the Sener electronic box and its brackets were implemented as shown in Figure 8.The assembly task involves several phases: 1. Inspect the environment and identify the parts to be assembled: this allows a participant to be famil-  iar with the assembly parts and the final assembly status (Figure 8a). 2. Mount the supporting brackets and bolt them to the frame.This requires that the participant undertake some exploration and reasoning to perform the assembly operations (Figure 8b).It involves: (i) picking up a bracket and identifying its position; (ii) placing the bracket in the correct position; (iii) identifying and picking up the bolts; and (iv) bolting the bracket to the frame.3. Slide the electronic box into the brackets (Figure 8c).This is expected to measure performance when assembling large objects.It involves: (i) picking up the box and determining its correct orientation; and (ii) sliding the box into the brackets.4. Plug the pipes into the electronic box (Figure 8d).This involves: (i) picking up the pipes and identifying their correct locations; and (ii) attaching the pipes to the box.
The subjective evaluation used questionnaires to perform the subjective measurements including 10-point rating scales of the overall satisfaction, the realism, the perceived task difficulty and performance, ease of learning, perceived system speed, and overall reaction to the received feedback.Participants answered each question with a value from 1 up to 10 inclusive, with 1 being the most negative answer and 10 being the most positive one.Additionally, after the participants completed the tasks under all conditions they were required to rank the four feedback conditions in order according to their preference.What they liked received a high score, and what they disliked received a low score.They also completed a set of 7-point rating scales and open-ended questions comparing the different feedback cues.The 7-point rating scales asked the participants to compare how well the different feedback cues helped them complete the task, how they foresaw these cues being helpful in a real design application, and which kind of feedback cues they preferred.The participants answered each question with a value from 1 up to 7 inclusive, that is, from negative to positive.Finally, the participants were asked to provide general opinions and comments about their experience.The answers of the participants were recorded and analyzed.

Experimental Results
This section presents the experimental results, statistical analysis and discussion, including TCTs and HPERs data from the peg-in-a-hole assembly task, and data from the questionnaires of the Sener electronic box assembly task.

TCTs
The TCTs of the peg-in-a-hole case are summarized in Table 2 and illustrated in Figure 9.Both twoway repeated measures analysis of variance (ANOVA) and post-hoc pair-wise t-test comparisons were conducted on TCTs in order to find the effects of the four feedback conditions on task performance, respectively.Further pair-wise t-test comparisons of TCTs were conducted between the four feedback conditions.The analysis outcomes (one-tailed tests) are as follows.
• Between the visual feedback condition and the integrated feedback condition t (15) ϭ 6.23, (p Ͻ .005).
• Between the auditory feedback condition and the integrated feedback condition t (15) ϭ 6.51, (p Ͻ .005).
• Between the visual feedback condition and the auditory feedback condition t (15) ϭ 0.22, (p Ͼ .05).

HPERs
The HPERs decrease from 0.44 under the neutral condition to 0.19 under the visual feedback condition and auditory feedback condition, and then to 0.03 under the integrated feedback condition (see Figure 10).Two-way repeated measures ANOVA on HPERs generated statistically significant results for auditory feedback F (1,31) ϭ 6.37 (p Ͻ .05) and visual feedback F (1,31) ϭ 16.85 (p Ͻ .01),respectively.
Further pair-wise t-test comparisons of HPERs were conducted between the four feedback conditions, respectively.The analysis outcomes (one-tailed tests) are as follows.
• Between the visual feedback condition and the integrated feedback condition t (31) ϭ 2.40, (p Ͻ .05).
• Between the auditory feedback condition and the integrated feedback condition t (31) ϭ 2.40, (p Ͻ .05).
However, there is no statistically significant difference between the visual feedback condition and the auditory feedback condition, since HPER visualϪfeedback ϭ HPER auditory-feedback ϭ 0.19.

Participants' Preferences and Satisfaction
For the subjective evaluation using the Sener electronic box assembly task, Table 3 shows the number of participants who placed the different conditions as the first, the second, the third, and the fourth in their rankings, in which the first choice scales 4, the second choice scales 3, the third choice scales 2, and the fourth choice scales 1.These scales were used to calculate the mean preferences to different feedback conditions (see Figure 11).The two-way Chi-square test was used further to analyze the data in Table 3 between the four feedback conditions.The analysis outcomes (one-tailed tests) are as follows.
• Between the visual feedback condition and the integrated feedback condition observed 2 ϭ 8.55 (df ϭ 3, p Ͻ .05).
• Between the auditory feedback condition and the integrated feedback condition observed 2 ϭ 8.76 (df ϭ 3, p Ͻ .05).
• Between the visual feedback condition and the auditory feedback condition observed 2 ϭ 4.12 (df ϭ 3, p Ͼ .05).Therefore, from Table 3 it can be concluded that the neutral condition is the least preferable condition, and the integrated condition is the most preferable condition.
Similarly, the mean helpfulness of different feedback conditions to task performance from the results of 7-point questionnaires was calculated and illustrated (see Figure 12).The results from Friedman ANOVAs (four conditions) for the 7-point questionnaires showed that there was a statistically significant effect F(observed) 2 ϭ 22.63 (df ϭ 3, p Ͻ .05).
Further Wilcoxson Signed Ranks tests were conducted on the 7-point questionnaires between the four feedback conditions.The analysis outcomes (one-tailed tests) are as follows.
• Between the neutral condition and the visual feedback condition T ϭ 6, (N ϭ 16, p Ͻ .001).
• Between the neutral condition and the auditory feedback condition T ϭ 5, (N ϭ 16, p Ͻ .001).
• Between the neutral condition and the integrated feedback condition T ϭ 4, (N ϭ 16, p Ͻ .001).
• Between the visual feedback condition and the integrated feedback condition T ϭ 24, (N ϭ 15, p Ͻ .05).
• Between the auditory feedback condition and the integrated feedback condition T ϭ 24.5, (N ϭ 14, p Ͻ .05).
• Between the visual feedback condition and the auditory feedback condition T ϭ 6, (N ϭ 10, p Ͻ .05).
From the above data, it is statistically significant that the number of participants preferring integrated feedback is larger than that preferring other feedback types.The number of participants preferring the neutral condition is smaller than the number preferring other feedback types.Integrated feedback is most helpful and neutral feedback is least helpful for task completion.From the participants' general opinions and comments about their task completion experience, nonrealistic or inappropriate feedback has a negative effect on the task performance and easily makes them frustrated.Consistent with these results, frustration was much more frequently observed informally, when the participants completed the tasks under the neutral condition.

Discussion
The introduction of visual and/or auditory feedback into VAE provides more cues to collision detection, geometric constraint management involving recognition and deletion, and error indication and recovery for the peg-in-a-hole task.Besides these aspects, with the Sener electronic box task, cues were also provided to aid the participants when they identify and select the next assembly object from a collection of assembly components, identify assembly position, determine the assembly orientation of the object, reason and explore the various assembly options, and alleviate deviations or errors by warning the participants in time.Therefore, on the one hand, the introduction of intuitive feedback reduces participants' reaction times, response latencies, and mental workload for task complexity reasoning, problem solving, and decision making, thus decreasing the TCTs.On the other hand, the intuitive feedback prevents the participants' operation errors occurring by warning the participants in time and in an intuitive manner, and hence the participants make fewer or no mistakes, thus decreasing the HPERs.
Furthermore, integrated feedback provides the participants with adequate feedback in two ways that complement each other when integrated seamlessly.For instance, auditory sense can arise from any direction and is transient; visual sense tends to be more continuously available, but can only arise from the direction where the person is gazing; and the short-term auditory store is longer than the short-term visual store in human memory.As a result, the introduction of integrated feedback into VAE presents assembly task-related information in multiple modes to the participants.This mechanism supports the expansion of short-term or working memory and problem solution in spatially-orientated geometry tasks, and reduces the cognitive load on the participants.However, inappropriate integrated feedback distracts participants' attention from the task action (or action sequence), thus having a negative impact on the task performance.Meanwhile, for similar reasons, participants' preferences and satisfaction were improved by the introduction of auditory and/or visual feedback.

Conclusions
A VAE system platform, integrated with visual and auditory feedback, has been developed in order to explore and evaluate the effects of neutral, visual, auditory, and integrated feedback mechanisms on task performance in the context of assembly simulation.A peg-ina-hole and a Sener electronic box assembly task were used to perform the evaluation experiment.The results of this research verified the original hypothesis that task performance is different for the four feedback conditions for the peg-in-a-hole and the Sener electronic box assembly task cases.Under the integrated feedback condition, the assembly task performance is the best among the four feedback conditions.Under the neutral condition, the TCT is the longest and assembly task performance is the worst.For the subjective preference of the four different feedback conditions, the number of participants preferring integrated feedback is significantly larger in a statistical sense than those preferring other feedback types.The number of participants preferring the neutral condition is obviously smaller than those preferring other feedback types.
For future research, we will be determining: i) whether the factors of gender, age, and task complexity have an impact on assembly task performance with the introduction of visual and/or auditory feedback into VAE; ii) how visual and/or auditory feedback affects performance in specific design tasks; iii) how to substitute tactile and force feedback with 3D auditory feedback in the assembly and manipulation tasks in VEs; and iv) how the 3D auditory feedback should be presented to maximize its utility.For potential applications, the outcomes of this research can benefit the future assem- Zhang et al. 625 bly environment and manufacturing industry, especially in the aspects of virtual prototyping, assembly process simulation and verification, worker training for operation and maintenance, task scheduling, and risk and reliability analysis.

Figure 1 .
Figure 1.Infrastructure of the system platform.

Figure 3 .
Figure 3. User interface of the virtual assembly environment.

Figure 5 .
Figure 5. Virtual assembly scenario of peg-in-a-hole task.

Figure 6 .Figure 7 .
Figure 6.Virtual assembly process and feedback of peg-in-a-hole task.Figure7.Sener electronic box assembly task.

Figure 8 .
Figure 8. Virtual assembly scenario of Sener electronic box task.

Figure 11 .
Figure 11.Preferences to the different feedback conditions.Figure12.Helpfulness of the different feedback conditions to the task performance.

Table 1 .
Four Experimental Conditions Zhang et al. 619