Visorama is an original and complete enlarged reality and multimedia system, with dedicated hardware and software, aimed at the following fields: digital art, entertainment, historical tourism and education. Visorama has been under development since 1997, coordinated by Andre Parente and Luiz Velho. Visorama is a joint-venture of the N-Imagem Group from the Federal University of Rio de Janeiro and the Visgraf Laboratory from IMPA, sponsored by the following research support bodies: CNPq, FINEP, FUJB, and FAPERJ. On the hardware level, Visorama simulates a binocular or a telescope, allowing the user to employ it as if he or she were looking through the eyepiece of traditional optical devices to observe the scenery or the surrounding space. On the software level, it uses new visualisation techniques and a multiresolution image-based rendering algoritm. The Visorama may be defined as a cybernetic observatory which allows the observer to travel through space and time, led by his or her interest in a kind of virtual visit to the space observed.

The control subsystem is basically a desktop computer with the necessary interfaces for communicating with the other hardware components in the input and output subsystems. This computer stores all information about the virtual environment. It runs software programs that use this information and user data obtained from the input subsystem to generate feedback data for the output subsystem. This real-time process imposes a minimum speed constraint on the choice of processor used since it must be able to take user input and generate appropriate output without introducing any lagging effects. It is also important that current virtual panorama systems run on this platform, since we intend to use them as part of our system.
The control subsystem generates two types of data for the output subsystem: image and sound. The first type is sent to the binocular display and the second to the stereo sound equipment.
The binocular display is an immersive display device that resembles common binoculars but, instead of having a set of lenses, it has, for each eye, an eyepiece and a miniature CRT screen. The images displayed by these screens appear to the user as if they were the projection of lenses in common binoculars. Each CRT screen is connected to a video output port on the computer. If two output ports are available, each screen can be connected to either one or the other’, and a different image can be displayed for each eye. Although this allows stereo panoramas to be displayed, the first version of the Visorama system does not use this functionality.
The stereo sound equipment is basically a pair of headphones that are connected to a stereo sound output port in the computer. Alternatively, speakers can also be used, but these have the disadvantage that the sound of the real environment could be confused with the system’s output sound, resulting in a loss of auditory immersibility. By having a stereo system, different sounds can be output to each channel in order to simulate three-dimensional sound in panoramas where sound sources are associated to a specific viewing direction.
All output generated by the control subsystem is a function of the input data it receives from the input subsystem and from authoring information. This input data takes two forms: viewing direction data, which is generated by a rotating head and a set of sensors, and user control data generated by a set of additional controls.
The rotating head provides a direct manipulation of the viewing direction on the panorama. Potentiometers are attached to the two rotating axes of the head to capture the binocular display’s movement and send it to the control subsystem.
The input subsystem has a set of additional controls: two buttons and a potentiometer. The potentiometer is used, in most cases, as a control of zooming angle. One of the buttons is used to generate discrete actions to the system, such as selecting an object on the panorama. These two controls are easily accessible by the user, since they will be heavily used. Note that these two controls and the potentiometers on the rotating head allow the execution of positioning, selecting and quantifying tasks. The only task that cannot be done is entering text, which is not required for this specific system. The remaining button can be used to take the system into a control mode for specifying settings such as volume control.
A simple circuit polls all input devices periodically for their values. It sends this data to the control subsystem, which must generate the correct feedback to the user as specified by the creator of the virtual environment. This is achieved by the system’s software components.
Figure 4: Modules and their relationships.
Input and output
The input module reads the data that arrives at the communication port. This data is then translated into a format that can be understood by the remaining modules. The rotation sensor’s data is translated into two angles, pan and tilt, the potentiometer into a percentage value, and the push buttons into a binary value. By having this translation done by the input software, the remaining software components do not have to be modified when some input hardware is changed. These values are then passed to the other software components.
The data is passed to the control module by putting all arriving data into a specific memory location. If the reading rate of the control module is slower than the writing rate of the input module, data is overwritten. If a button is pressed, however, new data is only written when the old one is read, so the exact position of where the button was pressed is not lost.
For the output module, only the position and zooming data are transmitted module, since push buttons have no effect. These values are sent directly to the output software so it can immediately generate the rotating and zooming feedback on the virtual panorama. In this way, any delays that could be introduced by the control module does not affect the response time of the panorama regarding movement and zooming actions. Keeping this output coordinated with the binocular display’s movement is fundamental to the immersibility provided by the system, since any lagging effects introduced in this process could confuse the user. Because no button data is passed to this module, information is overridden if the reading rate is slower than writing rate.
The output module has two components, the image generation component and the sound generation component. The image generation component displays the virtual panorama, static images or three-dimensional objects, which are all combined into a single output. Any existing virtual panorama system can be used if it has the following functionality: displaying images and three-dimensional objects on top of panoramas and has an API that can be used to control the display of virtual panorama. This component receives commands from the control module determining which panorama, images or three-dimensional objects are to be displayed, and a few other commands. It then loads the appropriate files from disk and displays them. The viewing direction and zooming angle are obtained directly from the input module, and are updated each time a new set of data arrives, providing the correct feedback.
The sound generation component uses system resources to play sound on the hardware sound output subsystem. An environment like Apple Quicktime is used as a basis for the sound output component. This component takes commands from the control module that determine which sound files should be played and the current position in the sound files, as well as common audio commands such as play, pause, stop and volume control.
The Visorama system at work
The commands generated by the control module are based on data taken from the input module and from a file that stores authoring information about the virtual environment. This information relates input sequences to their corresponding feedback, as specified previously by the author of the environment.
The internal structure of the Visorama software is conceptually equivalent to a state diagram. Using this representation, the system is always at a known discrete state and a number of events are specified that cause the system to transition to a different state. A transition is defined by its source and destination states, a set of events that cause the transition to occur, and a set of actions that should be executed when the transition is in effect. A transition should be executed if one of its conditional events is true. The control module implicitly defines states by the set of all transitions that leave from it.
Events are defined as a Boolean expression whose elements must be a function of the module’s parameter space. This space is the set of all combinations of pan, tilt and zooming angles, button states and system timers. The first three parameters can be composed into a single parameter, the viewing position, so they are treated as a point in a three-dimensional space, the viewing space, which defines a certain viewing configuration. Given these parameters, basic events can be represented by regions in viewing space, the button state and a timer. We say that a region in viewing space is true if the current viewing position is inside it, a button is true if it is pressed and a timer is true if it has finished. Events can then be specified by a general Boolean expression involving an arbitrary number of regions, buttons and timers.
Actions can be specified to be executed while a transition is taking place. The actions that can be executed with the available virtual panorama systems are changing the current panorama; altering the current viewing parameters; playing, pausing, stopping or jumping to a point in an audio file; showing or hiding an image or three-dimensional object; and starting a timer. Other interesting actions should become available in the future, for example, interpolating smoothly between two panoramas.
A state diagram that represents a virtual environment is used to drive the control module. Its execution is basically a single loop where it reads input parameters and checks if any event that causes a transition occurred. When this happens, it executes the actions specified for the transition and replaces the current state by the transition’s destination state. This implementation provides a simple and efficient way of generating the appropriate output given the system’s input.
Despite its simplicity, the software implemented as a simple state diagram would be memory inefficient due to the huge number of states that would have to be created in a typical virtual environment. Some simple modifications can be made to the state diagram to reduce the explicit number of states, thus reducing the memory necessary to store the diagram. One such modification would be to allow the specification of actions to be executed when states are entered or left. This is equivalent to specifying an action for all transitions in or out of a state. One example where that might be useful is to start a timer every time a new state is entered: in that way events can be easily specified relative to the amount of time the system is in a state. Another possible extension would be to allow additional state variables to be used, and have tests on them as part of the Boolean expressions that define events. As a result, the same original event could cause a transition to different states depending on the value of these variables. An additional value could be created, for example, to represent the number of times an event happened.
The main problem with the approach based on state diagrams is the tedious process of creating the diagram to specify complex interactions. The approach is powerful in the sense that it can represent any interaction possible with virtual panoramas, and allows an efficient implementation. But as interactions become complicated, it is not intuitive for the author which states and transitions should be created. To illustrate a typical state diagram used in this system, Figure 5 shows a state diagram for the following interaction specification: from an initial state, if the user views a region R1 for more than t0 seconds, play an audio S1 of that region with duration t1. If user zooms into region R2 and if S1 has finished playing, then play another audio S2. If it has not finished, wait for it to finish and then play S2. If at any time the user leaves regions R1 or R2, stop audio S1. If at any time the user leaves region R2, stop audio S2.
Figure 5: State diagram for a sample interaction.
To relieve authors from having to specify complex state diagrams, and still be able to specify the complex interaction tasks possible with this system, we intend to develop an authoring environment that provides higher-level primitives for the specification of interaction tasks.
Authoring in the Visorama system
The Visorama system enables unique forms of interaction between the user and the virtual environment not available in current multimedia and virtual reality systems. It provides a new language for the communication from the user to the system and from the system to the user.
The immersibility provided by the system enables users to naturally navigate through the environment looking for the information that most interests them. As they do that, they seamlessly trigger numerous events that change the state of the environment. This does not happen in current multimedia systems, where all events are explicit and it is very obvious to the users the points on the navigation where they have to make decisions. Through this seamless navigation, it is possible for the system to estimate which parts of the virtual environment most interest the user (by analysing how long they look at a region, or how much they zoom into a certain area). This information can be used, for example, to guide the user through the environment, providing more information about areas that seem of more interest, and providing hints of next places to be visited. Two different users examining the same area could be given different information depending on the path they have followed to get to this point.
These are just a few examples of the new interaction possibilities with the system. All these forms of interaction define a new language that can be used by authors to create virtual environments in Visorama. To help them explore these possibilities, we define a set of basic language elements and operations for composing them. These elements and operations define a language that is at the same time complete, enabling the creation of most forms of interaction possible with the system, and effective, which can be easily learned. The elements and operations should have a known representation in terms of state diagrams so that a sentence in this language can be converted into a state diagram description of the system, which can, in turn, be used by the control module.
In addition to defining an authoring language, we implement an authoring environment for users to create virtual environments using this language. We define a scripting language that represents the interaction language, and define a set of semantic and syntactic rules for the specification of a virtual environment. Therefore a script can be written that is converted into a state diagram representation.