Making powerful machines is no longer enough. They must also connect with humans socially and emotionally. This imperative to increase the efficiency of human-computer interactions has created a new field of research: social computing. This field is aimed at understanding, modeling and reproducing human emotions. But how can an emotion be extracted and then reproduced, based only on a vocal track, video or text? This is the complexity of the research Chloé Clavel is working on at Télécom ParisTech.
“All those moments will be lost in time, like tears in rain.” We are in Los Angeles in 2019, and Roy Batty utters these words, knowing he has only seconds left to live. Melancholy, sadness, regret… Many different feelings fill this famous scene from the 1982 cult film Blade Runner by Ridley Scott. There would not be anything very surprising about these words, if it were not for the fact that Roy Batty is a replicant: an anthropomorphic machine.
In reality, in 2018, there is little chance of us seeing a humanoid robot walking the streets next year, capable of developing such complex emotions. Yet, there is a trend towards equipping our machines to create emotional and social connections with humans. In 1995, this led to the creation of a new field of research called affective computing. Today, it has brought about sub-disciplines such as social computing.
“These fields of research involve two aspects,” explains Chloé Clavel, a researcher in the field at Télécom ParisTech. “The first is the automatic analysis of our social behaviors, interactions and emotions. The second is our work to model these behaviors, simulate them and integrate them into machines.” The objective: promote common ground and produce similarities to engage the user. Human-computer interaction would then become more natural and less frustrating for users who sometimes regret not having another human to interact with, who would better understand their position and desires.
Achieving this result first requires understanding how we communicate our emotions to others. Researchers in affective computing are working to accomplish this by analyzing different modes of human expression. They are interested in the way we share a feeling in writing on the internet, whether it be on blogs, in reviews on websites or on social networks. They are also studying the acoustic content of the emotions we communicate through speech such as pitch, speed and melody of voice, as well as the physical posture we adopt, our facial expressions and gestures.
The transition from signals to behaviors
All this data is communicated through signals such as a series of words, the frequency of a voice and the movement of points on a video. “The difficulty we face is transitioning from this low-level information to rich information related to social and emotional behavior” explains Chloé Clavel. In other words, what variation in a tone of voice is characteristic of fear? Or what semantic choice is used in speech to reflect satisfaction? This transition is a complex one because it is subjective.
The Télécom ParisTech researcher uses the example of voice analysis to explain this subjectivity criterion. “Each individual has a different way of expressing their social attitudes through speech, therefore large volumes of data must be used to develop models which integrate this diversity.” For example, dominant people generally express themselves with a deeper voice. To verify and model this tendency, multiple recordings are required, and several third parties must validate the various audio excerpts. “The concept of a dominant attitude varies from one person to another. Several annotations are therefore required for the recordings to avoid bias in the interpretation,” Chloé Clavel explains.
The same is true in the analysis of comments on online platforms. The researchers use a corpus of texts annotated by external individuals. “We collect several annotations for a single piece of text data,” the researcher explains. Scientists provide the framework for these annotations using guides based on literature in sociology and psychology. “This helps us ensure the annotations focus on the emotional aspects and makes it easier to reach a consensus from several annotations.” Machine learning methods are then used, without introducing any linguistic expertise into the algorithms first. This provides classifications of emotional signals that are as unbiased as possible, which can be used to identify semantic structures that characterize discontent or satisfaction.
Emotions for mediation
Beyond the binary categorization of an opinion—as positive or negative—one of the researchers’ greatest tasks is to determine the purpose and detailed nature of this opinion. Chloé Clavel led a project on users’ interactions with a chatbot. The goal was to determine the source of a user’s negative criticism, whether it was caused by the chatbot itself being unable to answer the user correctly, by the interaction, for example the unsuitable format of the interface, or by the user who might simply be in a bad mood. For this project, which benefited from virtual assistance from EDF, the semantic details in messages written to the chatbot had to be examined. “For example, the word ‘power’ does not have the same connotation when someone refers to contract power with EDF as it does when used to refer to the graphics power of a video game,” explains Chloé Clavel. “To gain an in-depth understanding of opinions, we must disambiguate each word based on the context.”
Read more on I’MTech Coming soon: new ways to interact with machines
The chatbot example does not only illustrate the difficulty involved in understanding the nature and context of an opinion, but it also offers a good example of the value of this type of research for the end user. If the machine is able to understand the reasons why the human it is interacting with is frustrated, it will have a better chance of adapting to provide its services in the best conditions. If the cause is the user being in a bad mood, the chatbot can respond with a humorous or soothing tone. If the problem is cause by the interaction, the chatbot can determine when it is best to refer the user to a human operator.
Recognizing emotions and the machine’s ability to react in a social manner therefore allows it to play a conciliatory role. This aspect of affective computing was used in the H2020 Animatas project, in which Télécom ParisTech has been involved since 2018 and will continue for four years. “The goal is to introduce robots in schools to assist teachers and manage the social interactions with students,” Chloé Clavel explains. The idea is to provide robots with social skills to help promote the child’s learning. The robot could therefore offer each student personalized assistance during class to support the teacher’s lessons. Far from the imaginary humanoid robot hidden among humans, an educational mediator could improve learning for children.