Abstract
This paper addresses the automatic recognition of
the gender and identity of speakers in spontaneous dyadic conversations
using information about the multimodal communicative
behavior of the participants. Identifying gender or individual
specific behaviors in face to face communication is relevant for
constructing advanced and robust interactive systems. This information
also contributes to understanding how humans communicate
face-to-face. In the present work, classifiers have been trained
on features extracted from an annotated multimodal corpus of
twelve first encounters in order to distinguish the gender and the
identity of the participants. The training features comprise speech
duration and shape annotations of co-speech communicative head
movements, facial expressions, body postures and hand gestures
of six female and six male participants. Information about the
emotions shown by the participants’ facial expressions was also
added to the training set. Differing from other studies addressing
recognition of individuals for security systems using databases
built for the purpose, the multimodal training features in this
study are exclusively related to communication and the data are
spontaneous occurring conversations since we study multimodal
communication. A number of classifiers were trained on the data
and the best results were obtained by a multilayer perceptron
for gender recognition with a weighed F-score of 0.65 (accuracy
64%) and by multinomial logistic regression for the classification
of 12 participants with an F-score of 0.31 (accuracy 30%). The
most useful features for gender recognition were information
about the emotions shown by the participants, the type of head
movements and handedness, while the features which were most
useful for the identification of individuals are emotions, head
movements, handedness and body direction. The results on both
tasks are significantly better than by chance accuracy and the
results obtained by a majority classifier. This is promising since
this is a first pilot study on a corpus of limited size. The features
addressed in this study could in the future be combined to other
biometric patterns such as those used in multimedia security
systems.
the gender and identity of speakers in spontaneous dyadic conversations
using information about the multimodal communicative
behavior of the participants. Identifying gender or individual
specific behaviors in face to face communication is relevant for
constructing advanced and robust interactive systems. This information
also contributes to understanding how humans communicate
face-to-face. In the present work, classifiers have been trained
on features extracted from an annotated multimodal corpus of
twelve first encounters in order to distinguish the gender and the
identity of the participants. The training features comprise speech
duration and shape annotations of co-speech communicative head
movements, facial expressions, body postures and hand gestures
of six female and six male participants. Information about the
emotions shown by the participants’ facial expressions was also
added to the training set. Differing from other studies addressing
recognition of individuals for security systems using databases
built for the purpose, the multimodal training features in this
study are exclusively related to communication and the data are
spontaneous occurring conversations since we study multimodal
communication. A number of classifiers were trained on the data
and the best results were obtained by a multilayer perceptron
for gender recognition with a weighed F-score of 0.65 (accuracy
64%) and by multinomial logistic regression for the classification
of 12 participants with an F-score of 0.31 (accuracy 30%). The
most useful features for gender recognition were information
about the emotions shown by the participants, the type of head
movements and handedness, while the features which were most
useful for the identification of individuals are emotions, head
movements, handedness and body direction. The results on both
tasks are significantly better than by chance accuracy and the
results obtained by a majority classifier. This is promising since
this is a first pilot study on a corpus of limited size. The features
addressed in this study could in the future be combined to other
biometric patterns such as those used in multimedia security
systems.
Original language | English |
---|---|
Title of host publication | 9th IEEE International Conference on Cognitive Infocommunications (CogInfoCom 2018) |
Number of pages | 5 |
Place of Publication | Budapest |
Publisher | IEEE |
Publication date | 2 Jul 2018 |
Pages | 87-92 |
ISBN (Print) | 978-1-5386-7094-1 3 |
ISBN (Electronic) | 978-1-5386-7093-4-3 |
Publication status | Published - 2 Jul 2018 |