2023-07-24 Emotion Emo Classification Model

 

model typebest test acctrain accuracyeval acc 1 (improv)eval acc 2 (pp)eval acc improv augneval acc pp augmAPI accdatasetparameter count
russian baseline (not fine-tuned on our data)--0.5980.374  --316M
Russian (19)0.7300.9860.6750.822  0.68emo_audio_pp_relabelled_threshold_2_test_train_split316M
russian (21)0.91250.99990.5350.9550.5290.9080.824emo_audio_PP_threshold_2_and_other_from_relabelled_threshold_3316M
russian (23)0.96250.9920.5440.9600.5610.9410.87625emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3316M
russian (24)0.9300.9960.5500.913   emo_audio_PP_threshold_2_and_other_from_relabelled_threshold_3_without_nerona316M
xlsr (pre-trained, not fine-tuned) (22)0.96250.9860.5160.958   emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3316M
xlsr (no pre-training, just the architecture) (25)0.7200.9440.4060.731   emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3316M
wav2vec small (no pre-training, just the architecture) (26)0.70250.9310.4560.785   emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_397M
lstm big (37 layers)0.250.250.110.25   emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3318M
lstm small (11 layers)0.6250.9330.4650.644   emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_397.8M
conv1d big0.610.8980.3880.611   emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3335M
demucs lstm (33)0.5510.9510.4390.634   emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3307.7M
xlsr (pre-trained, not fine-tuned) (34)0.7790.9930.5640.9680.5320.9110.91russian_emo_dataset_un-relabelled + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3316M
russian relabelled + pp tresh3 (35)0.9220.987  0.5930.9530.91russian_emo_dataset_tresh2 + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3316M
russian relabelled + pp tresh3 (50% 8k - 16k) audeering0.9260.968  0.5930.970.86russian_emo_dataset_tresh2 + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3165M (audeering)
russian relabelled + pp tresh3 (100% 8k - 16k) audeering0.910.98  0.5890.943   
38 (attention, no resampling)0.8920.946  0.6680.92885.625russian_emo_dataset_tresh2 + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3 + upwork_emo_pp_2022-08-15_4sec_part_1_fixed_threshold_2316M
39 (attention, 50% resampled)0.9080.977  0.6360.9786.875russian_emo_dataset_tresh2 + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3 + upwork_emo_pp_2022-08-15_4sec_part_1_fixed_threshold_2316M
40 (attention, 100% resampled)0.8460.945  0.4290.80765.875russian_emo_dataset_tresh2 + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3 + upwork_emo_pp_2022-08-15_4sec_part_1_fixed_threshold_2 
41 (mean, no resampling)0.8820.942  0.6070.94886.875russian_emo_dataset_tresh2 + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3 + upwork_emo_pp_2022-08-15_4sec_part_1_fixed_threshold_2 
42 (mean, 50% resampled)0.9210.967  0.6210.9690.625russian_emo_dataset_tresh2 + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3 + upwork_emo_pp_2022-08-15_4sec_part_1_fixed_threshold_2 
43 (mean, 100%)0.9230.963  0.6290.96584.375russian_emo_dataset_tresh2 + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3 + upwork_emo_pp_2022-08-15_4sec_part_1_fixed_threshold_2 
44 (mean, 100% lowpass)0.9150.978  0.6210.96 russian_emo_dataset_tresh2 + emo_audio_PP_threshold_3_and_other_from_relabelled_threshold_3 + upwork_emo_pp_2022-08-15_4sec_part_1_fixed_threshold_2 
45 (mean, 50%, thresh2)0.7930.903  0.6680.767 russian_emo_dataset_tresh2 + emo_audio_PP_threshold_2 + upwork_emo_pp_2022-08-15_4sec_part_1_fixed_threshold_2