|
Gaze information
plays an important role in identifying a person's focus of attention.
The information can provide useful communication cues to a multimodal
interface. For example, it can be used to identify where a person
is looking, and what he/she is paying attention to.

In order to make intelligent
and interactive environments respond appropriately to their users'
needs, it is necessary to equip them with perceptive capabilities
to capture as much relevant information about its users and the
context in which they act as possible. Obtaining knowledge about
a person's focus of attention is a major step towards a better understanding
of what users do, how and with what
or whom they interact or to what they refer.
We address the problem of tracking the focus of attention of participants
in a meeting, i.e. tracking
who is looking at whom during a meeting. Such information
can for example be used to control interaction with a smart meeting
room or to index and analyze multimedia meeting records.
A body of research literature suggests that humans are generally
interested in what they look at and the close relationship between
gaze and attention during social interaction has been emphasized.
In addition, recent user studies reported strong evidence that people
naturally look at the objects or devices with which they interact.
A first step to determine someone's focus of attention, therefore
is, to find out in which direction the person looks. There are two
contributing factors in the formation of where a person looks: head
orientation and eye orientation.
|
head orientation can be estimated
with non-intrusive methods while eye orientation can not.
Our approach to tracking at whom participants look, i.e.
their focus of attention, is the following:
1. Detect all participants in the scene,
2. estimate each participant's head orientation and
3. map each estimated head orientation to its likely targets using
a probabilistic framework.
To improve the robustness of focus of attention tracking, we would
like to combine various sources of information.
We have found that focus of attention is correlated to who is speaking
in a meeting and that it is possible to estimate a person's focus
of attention based on the information of who is talking at or before
a given moment.
To estimate where a person is looking, based on who is speaking,
probability distributions of where participants are looking during
certain "speaking constellations" are used.
The accuracy of sound-based prediction of focus of attention can
furthermore significantly be improved by taking a history of speaker
constellations into account. We have trained neural networks to
predict focus of attention based on who was speaking during a short
period of time.
Finally, the head pose based and the sound-based estimations are
combined to obtain a multimodal estimation of the participants'
focus of attention. This leaded to significant improvements compared
to using just one modality for focus of attention tracking alone.
Our system for focus of attention detection in meetings has been
successfully installed in both our labs at the Universitat Karlsruhe,
Germany and at Carnegie Mellon University in Pittsburgh, USA.
full paper:
Tracking
Focus of Attention in Meetings
Rainer Stiefelhagen
IEEE International Conference on Multimodal Interfaces, Pittsburgh,
PA, USA, October 14-16, 2002.
[more]
|