Many times when talking with friends and colleagues about the limits of AI, probably concepts like “creativity” “feelings” and “emotions” are some of the most recurrent topics. In the last few years, we have seen enormous progress in the field of generative models for language, vision, and even audio/music.
We have witnessed new artificial intelligences that can write novels and poetry, code new computer games and programs, give code completion suggestions or create documentation; others that can generate unreal images from natural language and create new songs. These are just a few examples that clearly show that creativity is not something exclusively for humans. But what about feelings and emotions?
Charles Darwin, one of the forerunners of affective neuroscience, intuited that if emotions are there it is because they play a positive role in our survival as a species. And he was right. For example, many studies have confirmed that we can memorize and retain in our brains much better those memories and information that are linked to emotions. But are emotions necessary for computers? From a functional and evolutionary point of view probably not, but understanding human emotions can bring many possibilities in human-machine interactions, human behavior understanding, and its impact on any decision-making process.
In the organization, having an empathic bot with emotion detection capabilities can improve and push user experiences in conversational AI chatbots to the next level, help with customer’s satisfaction analysis in call centers, or even create new input features (detected emotions) to feed into predictive models for customer churn prediction, or recommendation systems, among others. In healthcare applications, emotion recognition could help to identify early signs of autism or schizophrenia.
According to the World Health Organization more than 700 000 people die due to suicide every year. For every suicide there are many more people who attempt suicide. Suicide is the fourth leading cause of death among 15-19 year-olds. Have you ever thought what would happen if we could alert early on signs of depression, anxiety and other traits that could help early suicide prevention? Just identifying a small percentage could mean many lives saved. I hope this is reason enough for you to agree on the importance of automatic emotion recognition.
That said, in this article we will tackle the problem of Emotion Detection (ED) in both text and speech (audio) and we will see how easy is to create an AI solution powered by a No-Code AI platform like Cogniflow to help with this problem.
The first thing we need to do is to define our categories, that is, the emotion set we want to identify. Historically, one of the most widely used approaches has been the six basic universal emotions proposed by Dr. Paul Ekman in 1972 as shown in Figure 1. But depending on you specific problem you can use a subset of this or try a more fine-grained definition that fits your requirements.
The next thing we need (and the most important) to create an AI-based model is access to good data (not necessarily big data, but good). So, if we want to create smart components able to detect emotions in text and speech we can either collect a custom dataset for our domain or use some pre-built existing datasets for that goal. Fortunately, the data science community and researchers have created some datasets that we can use as the first option to train some baseline models that later we can fine-tune with our specific custom data. Most of the currently available datasets for emotion detection use the six basic universal emotions categories or some slight variation of them.
For emotion detection in speech (audio) there exists some well-known datasets like: Crowd-sourced Emotional Mutimodal Actors Dataset (Crema-D), Ryerson Audio-Visual Database of Emotional Speech and Song (Ravdess), Surrey Audio-Visual Expressed Emotion (Savee), Toronto emotional speech set (Tess). In this github page you can find a great compilation with these and many other possible datasets for speech emotion recognition in different languages. For written text, there exists a huge literature about it. In this report you can find a great summary and a starting point if you want a deeper dive into the topic. Table 1 provides a summary of the most popular publicly available datasets. In addition to this, Google released in 2020 GoEmotions: A Dataset of Fine-Grained Emotions which provides a large-scale dataset for fine-grained emotions.
Once we have the data for training our smart components, is time to let Cogniflow learn from the collected data. To train the speech emotion recognition component from the audio files, the process is exactly the same as the one described in the first episode of this series. The only difference is that we need to indicate“Audio-based type”when creating the experiment and put audio files (instead of images) into the folders (see Figure 2) that we will compress into a zip file and upload to Cogniflow. Remember that folders’ names are implicitly the categories that Cogniflow will use to supervise the training process.
To train the emotion detection model in text the process is similar, with the difference that the content in the zip archive is a CSV file with two columns, one column with the example texts and a second column with the category we want to assign (in this case one from any of the possible emotions) as shown in Figure 3.
After uploading the datasets nothing else is required. Congrats! You have completed all the required steps to create your smart emotion detection services for both text and audio. Now just wait until Cogniflow finishes the learning phase and finds the best solution for your problem. Under the hood, the platform will train different state-of-the-art natural language and acoustic audio models and it will deploy the best ones automatically in the cloud infrastructure so you can use them as a service later with new texts and audio files to check the underlying emotion behind written text and speech.
And since training different deep learning models could take a while, Cogniflow does not expect you to stay waiting in front of the computer until the experiment is ready, just go and relax, once the experiment is ready you will receive an email notification with a summary of the results and a link to start using your new smart functionality immediately as shown in Video 1 and Video 2 for written text and speech respectively.
Video 1 - The final fine-grained emotion detection model trained in Cogniflow
Video 2 - The final speech emotion detection model trained in Cogniflow
But wait, what if we had audio recordings that we wanted to use to analyze emotions there not only from an acoustic perspective but also with the text detection model? And even worse, what if we had all the recordings in Spanish? Well, no problem, keep calm and relax again! You can use Cogniflow’s add-on for Excel or Google Sheets,where you can fill a bunch of web-sources audio URLs and then run a transcription→translation→detection pipeline as shown in Video 3 (surprise emotion detected?? 🙂)
Video 3 - Transcription, Translation and detection pipeline using Cogniflow’s integration with Google Sheets
Once the emotion detection models are ready, it is easy to integrate with any front-end application and use the detection services for your purposes with tons of possibilities as we already identified at the beginning, going from just a more user-friendly experience to more complex things like alert signs identification in comments and even to improve churn prediction based on how customer have evolved their feelings when using the product or service. Figure 4 shows an example of an empathic bot using the created emotion services with Botpress as the front-end application.
Video 4- Example of an empathic bot using the created emotion services in Cogniflow
Emotions play a key role in human activities and interactions. As human beings we are social and emotional, and although machines do not need to have 8 from a functional and evolutionary point of view, building smart services able to understand human emotions can bring many possibilities in human-machine interactions, human behavior understanding and its impact on any decision-making process.
Through this blog post, we understood the impact of emotion detection, and we learned how to create emotion detection models to work with text and audio using a No-Code AI platform like Cogniflow, where anyone can train custom smart services, even non-technical people.
The only requirement? Data, data, and data! (the new gold in the information era). Despite detection in a written text could be sufficient in many use cases, sometimes emotions can be better understood if we pay attention not only to what it’s being said but also to how it’s being said.
For this reason, during this article, we presented different datasets that can be used when working with emotions either in written text or speech. Visual emotion recognition has not been covered in this article, but it is a classical use case in the field of computer vision too. We already explained how to work with visual content in Cogniflow in the first episode of this series. If you want to understand deeper which are the current state-of-the-art models and find datasets and papers about facial expression recognition I recommend this page.
At this point probably you are thinking that following this article and replicating this job could be easy but time-consuming, are you?. Well, keep calm and relax one more time because these models that we have already trained in Cogniflow are available and ready to use in our catalog of public experiments with many other general smart services like transcriptions, translations, sentiment analysis, information extraction, spoken language identification, etc. as shown in Figure 5.
And if you arrived here, then probably you are curious about what other possibilities can be achieved with AI. If that’s the case why don’t you try Cogniflow by yourself and unleash your creativity? You can sign up now for free What are you waiting for? ;)