Skip to content

"Heterogeneous multi-modal emotion recognition with cross-modal transformer and graph attention network" by Nhut Minh Nguyen, Thanh Trung Nguyen and Duc Ngoc Minh Dang

License

Notifications You must be signed in to change notification settings

nhut-ngnn/HemoGAT

Repository files navigation

HemoGAT: Heterogeneous multi-modal emotion recognition with cross-modal transformer and graph attention network

Official code repository for the manuscript "HemoGAT: Heterogeneous Multi-Modal Emotion Recognition with Cross-Modal Transformer and Graph Attention Network", submitted to Advances in Electrical and Electronic Engineering.

Please press ⭐ button and/or cite papers if you feel helpful.

python pytorch cuda

Abstract

Multi-modal speech emotion recognition (SER) is promising, but fusing diverse information streams remains challenging. Sophisticated architectures are required to synergistically combine the modeling of structural relationships across modalities with fine-grained, feature-level interactions. To address this, we introduce HemoGAT, a novel heterogeneous multi-modal SER architecture integrating a cross-modal transformer (CMT) and a graph attention network. HemoGAT employs a dual-stream architecture with two core modules: a heterogeneous multi-modal graph attention network (HM-GAT), which models complex structural and contextual dependencies using a graph of deep embeddings, and a CMT, which enables fine-grained feature fusion through bidirectional cross-attention. This design captures both high-level relationships and immediate inter-modal influences. HemoGAT achieves a 0.29% improvement in accuracy compared to the previous best on the IEMOCAP dataset, and obtains highly competitive results on the MELD dataset, demonstrating its effectiveness compared to the existing methods. Comprehensive ablation studies evaluate the impact of the Top-K algorithm for heterogeneous graph construction, compare uni-modal and multi-modal fusion strategies, assess the contributions of the HM-GAT and the CMT modules, and analyze the effect of GAT layer depth.

Index Terms: Heterogeneous graph construction, Graph attention network, Cross-modal transformer, Feature fusion, Multi-modal speech emotion recognition.

Install

Clone this repository

git clone https://github.com/nhut-ngnn/HemoGAT.git

Create Conda Enviroment and Install Requirement

Navigate to the project directory and create a Conda environment:

cd HemoGAT
conda create --name hemogat python=3.8
conda activate hemogat

Install Dependencies

pip install -r requirements.txt

References

Citation

If you use this code or part of it, please cite the following papers:

Update soon

Contact

For any information, please contact the main author:

Nhut Minh Nguyen at FPT University, Vietnam

Email: [email protected]
ORCID: https://orcid.org/0009-0003-1281-5346
GitHub: https://github.com/nhut-ngnn/

About

"Heterogeneous multi-modal emotion recognition with cross-modal transformer and graph attention network" by Nhut Minh Nguyen, Thanh Trung Nguyen and Duc Ngoc Minh Dang

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages