Facial expression recognition
Using input from a video camera or static image, the facial expression recognition system employs a novel combination of processing stages to improve the quality of input provided to the recognition engine, and extracts the most significant features needed for recognition. Notably, the system obtains recognition rates exceeding that of all competitors on large public databases, and does so in real time on commodity hardware, outputting continuous recognition results at rates of 25 Hz or better. This system was initiated as an inexpensive modality to classify affect. Low-end cameras (such as webcams) free the user from any constraining wires and electrodes, and considerably lower the implementation and deployment cost. Both means of input rely heavily on robust signal processing and pattern recognition techniques to output fast and accurate emotional state classification.
The obvious applications relate to human-computer or human-robot interaction, where the computer or robot can dynamically adapt in real time to the user’s emotional state to modulate the user experience. Many other fields stand to benefit from this system. In psychology, this technology may be used for “self-improvement” or therapeutic applications related to emotional and developmental disorders, including autism. For example, applications built to mirror the user’s facial expressions on an animated avatar may help in improving expressivity training by creating engaging and interactive environments. Also, marketing research can use this technology to assess emotional reactions to people, publicity materials, products etc.
The biosignals-based recognition system uses a wireless finger sensor to track physiological signals to capture and identify dynamic, realtime emotional profiles. While capture of emotion through biosignals has been possible for some time, the classification process has been rudimentary and realtime processing infeasible. We have now succeeded in integrating the necessary components to efficiently achieve capture, transmission and processing of biosignals.
The biosignals are captured from human subjects using a wireless finger sensor to track physiological signals: electrocardiogram, blood volume pulse, phalange temperature and galvanic skin response.
Once captured, the biosignals are transmitted in real time to one or more processing computers. In our current implementation, we transmit using the Open Sound Control (OSC) protocol to a multicast IP (internet protocol) address. Similarly, captured video is transmitted using a video-networking application. This allows a parallel distribution of the later stages of the processing pipeline.
Each biosignal is first filtered of its noisy spectral components using adapted digital filtering techniques. Secondly, running statistical features relevant to the classification of emotional arousal and valence are extracted with minimal computational lag. These features are used in a machine-learning (ML) platform adapted to biosignals-based classification.
The ML stage outputs a normalized level of emotional arousal and valence for biosignals and a normalized level of confidence for both discrete (e.g.: happiness, sadness, anger, fear, disgust, surprise, neutral) and cartesian (e.g.: 2D continuous arousal and valence space) emotional description. This normalization allows straightforward mappings to audiovisual content generation and control.
The biofeedback-driven multimedia emotional-imaging generator takes emotion-based signals and processes them in real time to generate aural and visual images. An environment designed to project these images creates a rich, external manifestation of a person’s internal, otherwise invisible, emotional state.
Audiovisual content generation and control
Audiovisual content is mapped to the normalized levels of arousal and valence and/or discrete facial expression outputs. Examples of such links are music tonality controlled by the level of arousal, stage lighting modulated by the level of valence, the animation of avatars mirroring the subject’s facial expression, and the generation of dynamic particle-system environments.
Numerous applications are likely to follow from this invention in areas as diverse as psychological diagnosis and therapy to the creative and performing arts. Examples of the former include stress-management training and the treatment of phobias, incapacitating social inhibitions, and developmental disorders such as autism.