Erwan David
Local in Global
-
2021-2022
4
This project relates to two experiments made using the same protocol and stimuli: one run in virtual reality, the other online. It's code is "LocalInGlobal" because I implemented a visual search protocol in which target objects may be located inside others. That means that participants may need to open cupboards, drawers, an oven, etc, in their search for a target.
Try it.

A visual search protocol is an experiment where a participant is given a target to look for (either as a picture or a word). Then they have some amount of time to look in a scene for that target. For our case, that target is an everyday object you could find indoor.
We produced 28 realistic indoor scenes using assets bought from the Evermotion website. With this protocol, we are particularly interested in seeing if search behaviours change when the expectation of object localisation ("only outside" to "may be located inside or outside") changes. Additionally, we are very curious to see how search behaviours and gaze patterns observe in VR compares, to camera movements controlled by the subject in the online study.
Results from this project were presented as a talk at the vision sciences society meeting of 2022 (VSS 2022).
This project took me a surprisingly large amount of time to create and implemented, mostly because of the time needed to process the 3D assets in Blender (indoor houses/apartments mostly) and make individual scenes in which we controlled what to look for.

Below I quickly describe the stimuli, the experimental design and procedure, and the differences between the VR and online versions of the experiment.

Stimuli

Using Blender I extracted individual indoor rooms from the original 3D assets. I selected possible "container" objects and modified them to be openable; most times that meant modelling an "inside" as a lot of object only had their outside modelled. I made 4 bedrooms, 8 kitchens, 8 living-rooms and 8 offices (all had four walls, a floor, a ceiling and a door). Scenes were modelled to fit the in the physical testing space: 3.8 by 3.5m. There are fewer bedrooms because there was not enough material to create more. This is still nearly twice as many rooms as in a previous protocol [1, 2] where we had used 16, I wanted to create more in order to have more trials without showing the same scenes too many times (to reduce participants getting used to the content of the scenes). We did not reuse these 16 rooms because I wanted the same artist's style across rooms.
The choice of what object to select as targets was decided thanks to a pilot study in which I asked naïve colleagues to place objects in each of the 4 scene types in VR. I had pre-selected objects that could be associated with any of the scenes, an object would appear randomly at the very centre of a scene for the participant to grab and put down wherever they wanted. They could of course open "containers" and place the object inside. I obtained data about which objects probably belonged to which scene types, and which could be found inside or outside in the scene.
Since we are not manipulating object-scene congruency or inside-outside congruency at this point, I selected targets that were congruent with the scene, and which would be as likely to be found inside as outside. (In ongoing work we are interested in manipulating these congruency details)
In the end, for each scene I chose 3 different objects and decided on two locations for each: inside and outside. In that manner, in our analysis we can compare the same target inside/outside and we reduce the bias of having targets harder to find inside or outside.

In this study we measured search behaviours (search phase durations, accuracy) and gaze behaviours (characteristics of fixations and saccades; e.g., average fixation durations and saccade amplitudes). I am currently working on processing camera movements in the online experiment in a way yielding gaze-like data.

Experimental design

Participants saw all 28 scenes during the experiment. One scene was randomly set aside per subject for training: getting accustomed to wearing the device (in the VR version) and learning to move around. The protocol was divided in three blocks:
  1. Cannot interact with containers
  2. Can interact with containers
  3. Can interact with containers
Participants were only told about the possibility to open "container" objects at the start of the second block before training for it during one long trial (200s) and two normal ones (40s). In the second and third blocks a target had the same probability of being placed inside as outside in scenes. The reason for the three blocks is to obtain the same number of trials in these three conditions:
  1. Object was outside and it had to be outside
  2. Object was outside and it could have been inside
  3. Object was inside and it could have been outside

In each block the same 9 scenes were repeated 3 times (27 trials per block) so that habituation to scenes would be kept within block, so as to remove a bias muddying the origin of inside/outside variable on performance. I.e., if a scene previously encountered in the first block is observed again in the second block, better performances at this repetition might be because the participant had an idea of the scene layout.

A break was suggested to the participants after the end of a block.

Procedure

All trials started in an empty room with a "screen" placed at eye-level randomly on one of the four walls (to randomise the starting orientation of the subject). In that room participants followed instructions on the screen:
  1. "Stand on the blue square on the floor"
  2. "Stay on the blue square, please"
  3. "Look here for a second"
With 1. and 2. we had participants start all trials from the centre of the rooms. 3. was a fixation check: subjects had to fixate the screen for one second before the target object appeared written on the screen. This ensured that they would not miss the target to come. Failure at this stage meant that we recalibrated the eye tracking device in the VR experiment. After that check, the target word was displayed for one second. Just after that the empty room disappeared and was replaced with the actually trial scene. This transition actually lasted 2s, during which we showed a uniformly grey background, serving as a baseline for pupil diameter data in the VR version.
The trial lasted until the participants reported having found the target: by holding the trigger of a controller held in hand in VR for 1s, or by holding the mouse's left button for 1s in the online version. A trial timed out after exactly 40s.

Virtual reality and online testing

Necessarily, a few differences exist between the VR and online version, due to the constraint of running on a typical desktop computer with a mouse and keyboard, compared to in VR.

Virtual reality

The major differences in VR is the use of the eye tracker inside the HTC Vive headset. This required calibration, which we did at the start of the experiment, then before every training trials at the start, and finally every 6 trials of after breaks after that.

Online

In the online version, participants were sent an URL to run the experiment at their leisure. The scene was observed from a first person perspective, with the camera translations controlled with the arrow-keys on the keyboard, while rotating the camera was achieved with the mouse. This replicated the controls of a typical first person shooter game.
One important aspect of the VR variant is that the view rendered is dependent on the current position and rotation of the headset worn by the user; as such participants could move their head on the vertical axis (up and down). This was replicated by binding the Space key to a crouching action, smoothly transitioning the camera from the default height (1.8m) to 0.6m.