Facial expression recognition
Using input from a video camera or static image, the facial expression recognition system employs a novel combination of processing stages to improve the quality of input provided to the recognition engine, and extracts the most significant features needed for recognition. Notably, the system obtains recognition rates exceeding that of all competitors on large public databases, and does so in real time on commodity hardware, outputting continuous recognition results at rates of 25 Hz or better. This system was initiated as an inexpensive modality to classify affect. Low-end cameras (such as webcams) free the user from any constraining wires and electrodes, and considerably lower the implementation and deployment cost. Both means of input rely heavily on robust signal processing and pattern recognition techniques to output fast and accurate emotional state classification.
The obvious applications relate to human-computer or human-robot interaction, where the computer or robot can dynamically adapt in real time to the user’s emotional state to modulate the user experience. Many other fields stand to benefit from this system. In psychology, this technology may be used for “self-improvement” or therapeutic applications related to emotional and developmental disorders, including autism. For example, applications built to mirror the user’s facial expressions on an animated avatar may help in improving expressivity training by creating engaging and interactive environments. Also, marketing research can use this technology to assess emotional reactions to people, publicity materials, products etc.
The biosignals-based recognition system uses a wireless finger sensor to track physiological signals to capture and identify dynamic, realtime emotional profiles. While capture of emotion through biosignals has been possible for some time, the classification process has been rudimentary and realtime processing infeasible. We have now succeeded in integrating the necessary components to efficiently achieve capture, transmission and processing of biosignals.
The biosignals are captured from human subjects using a wireless finger sensor to track physiological signals: electrocardiogram, blood volume pulse, phalange temperature and galvanic skin response.
Once captured, the biosignals are transmitted in real time to one or more processing computers. In our current implementation, we transmit using the Open Sound Control (OSC) protocol to a multicast IP (internet protocol) address. Similarly, captured video is transmitted using a video-networking application. This allows a parallel distribution of the later stages of the processing pipeline.
Each biosignal is first filtered of its noisy spectral components using adapted digital filtering techniques. Secondly, running statistical features relevant to the classification of emotional arousal and valence are extracted with minimal computational lag. These features are used in a machine-learning (ML) platform adapted to biosignals-based classification.
The ML stage outputs a normalized level of emotional arousal and valence for biosignals and a normalized level of confidence for both discrete (e.g.: happiness, sadness, anger, fear, disgust, surprise, neutral) and cartesian (e.g.: 2D continuous arousal and valence space) emotional description. This normalization allows straightforward mappings to audiovisual content generation and control.