| Doorman | | | System | | | Contact | | | Finnish version |
![]() |
The most visible part of the system are the little puppets, or robots that contain means for audio input and output as well as visual feedback. Through these puppets the speech user interface technology is utilised to gain understanding of audio (ie. speech) input and respond in a natural dialogue with the users. Users are also indentified mainly through speech and speaker recognition.
Other sensors are also used in order to gain information about the environment. For example, pressing the doorbell or opening the door is useful information when making assumptions about the situation.
Many different kind of ways for modelling seemingly insignificant data into relevant meanings are used. Detailed user models are utilised when predicting their needs, plans and actions.
The huge amounts of dynamic and sensitive require a lot from data storages. Thus we need to pay special attention when moving, converting and storing information.
There are many situations, where a user can come to contact with Doorman. In the following there are some examples:
When the user approaches the door, the motion detector activates. This causes Doorman to start to listen to the user with the microphone placed in the corridor. Now the user can press the doorbell or say greetings. After the system has identified the person as a visitor or a staff member, the door will be opened. The speech recognition is made according to the languagemodels and vocabularies. The speaker recognition is done by feature extraction of the audio signal.
When the visitor has got inside and arrives at the front of the robot plced in the lobby, it guides the user to the target, that has been identified outside the door. To a staff member the robot can notify of any unread/unlistened messages and announcements.
More robots will be placed in the TAUCHI premises. This way the guiding information given at the front door can be divided into several parts. With directional EMFi speakers embedded in the corridors, the user can be informed even if they are moving. These speakers will be placed in public places, like near the printers and in the kitchen. EMFi-sensors will be placed in the floor of corridors. These sensors help the system, when it has to define the location of some user.
Doorman uses simple anthropomorphic puppets (robots) to guide the users and for communication in the TAUCHI premises. Each of the robots is made of three servomotors, one for each hand and one for the head. The servomotors are operated by a micro-controller. The robots are also equipped with a speaker to generate speech, and a microphone to capture sounds and speech from the surrounding environment. A serial bus is used to deliver instructions from the computer to the micro-controller. Multiple robots are linked together with serial cables.
Different speech technologies are used in the Doorman for interaction, synthesised speech, speech recognition and speaker recognition to be exact. Speech recognition is achieved by recognising words and sentences in the captured speech of the user during the dialogue with Doorman. At the moment LingSoft's speech recogniser is used for speech recognition. Synthesised speech is generated with Timehouse's finnish speaking Mikropuhe speech synthesis via Microsoft SAPI.
The speaker recognition in Doorman is based on word recognition and sound comparison. This means that the users are indentified by the name they are stating. The identity is verified by using feature extraction to the incoming signal and then comparing these features to the features stored in the database. The speaker recognition is intended to be incorporated in to the system in cooperation with the speaker recognition group of the Department of Computer Science in University of Joensuu.
EMFi speaker is build of plastic film, that vibrates, when electric signal is given to it. This is how EMFi speaker produces sound and passes vibrations to the objects that have been attached to it. EMFi speakers are highly directional. This comes handy, when the source of the sound is utilised for example in the route guiding process.
EMFi sensors are elastic electret-film that convert the mechanical force into proportional electrical energy. Different size of matrices of these sensors will be placed in the TAUCHI premises. With these sensors many kind of information can be gethered of the people stepping on the matrices. Information like the direction of the motion, the weight of the person and the patterns of footsteps.
Data handling, storing and delivery in the Doorman is based on Jaspis, a Java based distributed architecture implemented in java. Jaspis enables the construction of distributed, multilingual and user adaptive speech applications. Jaspis has been built especially with the speech user interface research in mind. In the Ovimies project we are developing extensions to it from the viewpoint of ubiquitous computing.