A multi-modal personality prediction system

作者：

Highlights：

•

摘要

The behavior, mental-health, emotion, life choices, social nature, and thought patterns of an individual are revealed by personality. Cyber forensics, personalized services, recommender systems are some of the examples of automatic personality prediction. A deep learning based personality prediction system has been developed in this work. Facial and ambient features are extracted from the visual modality using Multi-task Cascaded Convolutional Networks (MTCNN) and ResNet, respectively; the audio features are extracted using the VGGish Convolutional Neural Networks (VGGish CNN), and the text features are extracted using n-gram Convolutional Neural Networks (CNN). The extracted features are then passed to a fully connected layer followed by sigmoid for the final output prediction. Finally, the text, visual and audio modalities are combined in different ways: (i) concatenation of features in multi-modal setting, and (ii) application of different attention mechanisms for fusing features. The dataset released in Chalearn-17 is used for evaluating the performance of the system. From the obtained results, it can be concluded that, the concatenation of features extracted from different modalities attains comparable results with the averaging method (late fusion). It is also shown that a hand full of images are enough for attaining comparable performance.

论文关键词：Personality prediction,Multi-modal,CNN,ResNet

论文评审过程：Received 25 July 2021, Revised 30 October 2021, Accepted 7 November 2021, Available online 19 November 2021, Version of Record 8 December 2021.

论文官网地址：https://doi.org/10.1016/j.knosys.2021.107715