2024-01-02 Emotion Guidelines for AI models

TODOs AI

🔴 Marcis: Top priority API sakodēt, ka uz katru segment, emotions_summary un conversation.emotions_summary ņemtu vērā text_sentiment, piemēram, ja nav emociju, tad ņem vērtību no text_sentiment, ja ir tā pati emocija, tad pastiprina, ja dalās tad rāda abas (happiness = positive, sadness, anger = negative, lai noteiktu anger vai sadness izmantot to emociju kura vairāk ir no balss toņa).
Bez detalizētiem testiem pievienot vēl kādu pre-trained English sentiment modeli, iepriekšējie rezultāti: https://www.notion.so/evalds/2023-03-23-Betija-API-Task-text_sentiment-5b7d18e5bb5b468ca3af2e1d4770b9a8?pvs=4Kad izdarīts uzstādi un padod ziņu Arielam, Paulam, Evaldam
Mārcis: Next priority lūdzu http://api.gramatins.lv ar websocket pievienot LV-EN, EN-LV translation (man tas nepieciešams Eldigen tuvākajās dienās) un pārkārto lūdzu GPUs tā, lai varam grāmatiņš palaist uz GPUs, tāpat palūdz lūdzu Paulu piemapot PORTs pareizi, lai asya uz kuras uzlikts http://api.gramatins.lv atvērtos no web
Mārcis (ar Reiņa atbalstu) Ievietot QA AI asya.ai tone of voice validation sub folderos Tone of voice validation paraugus, kurus līdz šim izmantojām https://drive.google.com/drive/folders/1uq0qbObdIUZ120iH2TX9vvPQSQrfLunm?usp=sharing
Mārcis (ar Reiņa atbalstu) Palaist inferencē esošo emo modeli uz QA AI asya.ai tone of voice validation sub folderos validation paraugiem un aizpildīt tabulu. Notestēt vairākus threshold līmeņus, lai klasificētu gala klasi (non-other). Katram eksperimentam aizpildīt savu SHEET, lai var analizēt paugus. Katru reizi, kad veic eksperimentus esošos samples sazipojam kā ZIP un pievienojam folderī, lai nepieciešamības gadījumā varam atkārtot eksperimentus ar precīzi tiem pašiem paraugiem.
https://docs.google.com/spreadsheets/d/1CsPprLj-jBvzClT5uaGxRlOnmaOsN0aDw62ZqgoMynk/edit#gid=0
Mārcis: Atrast kādu uzticamu English Sentiment validation dataset (kurš nebūtu iekļauts esošajos modeļos) - amir var darīt, pievienot paraugus (vēlams tādu, datu kopu, kurai ir Happiness, Anger, Sadness, Other). PitchPatterns sheet validation paraugus pievienos Ariels https://docs.google.com/spreadsheets/d/1Vc6CwSExDPmHhxJjdJYhBI_S56LSfPSfmni1Jg6yv5I/edit#gid=408018272

Mārcis: Atrast vairākus English Sentiment modeļus un salīdzināt uz validation paraugiem. Aizpildīt results https://docs.google.com/spreadsheets/d/19nXnGoLyLgbeIqS1SwhYSNyYGXdSBN7uPnhuwHp8eBw/edit#gid=0 Iepriekšējie testi: https://www.notion.so/evalds/2023-03-23-Betija-API-Task-text_sentiment-5b7d18e5bb5b468ca3af2e1d4770b9a8?pvs=4
Mārcis: Sākam savākt LT-EN datu kopu no public data, ja par maz tad GPT4 mining + labelling

TODOs QA

Ariels: QA AI asya.ai tone of voice validation folderī, sagaidīt, ka AI komanda ieliek paraugus, pāriet pāri paraugiem, ja kādam nepiekrīti, tad piefiksēt nosaukumus failiem un nosūtīt Mārcim un Ēvaldam https://drive.google.com/drive/folders/1uq0qbObdIUZ120iH2TX9vvPQSQrfLunm?usp=sharing
Ariels: QA AI asya.ai tone of voice validation folderī, pievienot jaunus paraugus, kurus vēlies, lai iekļaujam testos. Jāņem vērā, ka paraugi ir jāsagriež pa 4 sek fragmentiem.
Ariels: Rezultātiem QA AI asya.ai tone of voice validation vari sekot šeit: https://docs.google.com/spreadsheets/d/1CsPprLj-jBvzClT5uaGxRlOnmaOsN0aDw62ZqgoMynk/edit#gid=0
Ariels: pievieno QA AI asya.ai sentiment validation savus paraugus Pitch Patterns sheet - jābūt veselam teikumam angļu valodā nevis vienam vārdam
https://docs.google.com/spreadsheets/d/1Vc6CwSExDPmHhxJjdJYhBI_S56LSfPSfmni1Jg6yv5I/edit#gid=408018272
Seko rezultātiem https://drive.google.com/drive/u/1/folders/1rFS3rYYvMR3xvt3qA6LepKmAW9w4J7c3

Tone of Voice - Guidelines

Language agnostic approach

Once 1-2 months extract audio samples from pitchpatterns using highest activations or marked as wrong
Send for labelling (10% validation samples), considered finished when validation samples correctly labelled
Cleanup datasets using highest activations (remove manually outliners)
Retrain model (hyper-param search, data augmentation)
Test on QA AI asya.ai tone of voice validation samples, register results https://drive.google.com/drive/folders/1uq0qbObdIUZ120iH2TX9vvPQSQrfLunm?usp=sharing
If good deploy on API

Text Sentiment - Guidelines

English only model for text sentiment, text intent etc.

Need to have highest quality EN-LV, EN-LT translation models.

Start by using multiple English Sentiment models
Once 1-2 months extract audio samples from pitchpatterns using highest activations in ENGLISH or marked as wrong
Send for labelling (10% validation samples), considered finished when validation samples correctly labelled
Cleanup datasets using highest activations (remove manually outliners)
Retrain model (hyper-param search, data augmentation) Use in combination with other pre-trained models to get highest accuracy
Test on QA AI asya.ai sentiment validation samples, register results https://drive.google.com/drive/u/1/folders/1rFS3rYYvMR3xvt3qA6LepKmAW9w4J7c3

If good deploy on API

Tone of Voice - Current results and data

Here's some of our research that is related to the domain of emotion and communication tracking:

Some of the publications about emotions

Our model results:

http://share.yellowrobot.xyz/quick/2023-7-24-A5A1BFA6-498A-46E7-B32D-5AAEF7B33AF2.html
Emotions classification: Happiness, Anger, Sadness, Other: 62.1%
http://share.yellowrobot.xyz/quick/2023-7-24-4B524F37-BAC2-48E4-8963-A01E2B5E7012.html
Laughter classification: 73%
http://share.yellowrobot.xyz/quick/2023-7-24-A2443B1A-94A4-4CA0-975C-1E682911EE1B.html Total dataset (before new dataset coming in January 2024), 4sec samples
https://www.notion.so/evalds/Laughter-models-and-data-statistics-38eac0dc55944d439402d3e9254610f7?pvs=4 Laughter model 27k samples
http://share.yellowrobot.xyz/quick/2023-7-24-66F550A4-FB3F-41F7-95A2-CD26D8407928.html LV STT dataset (January 2024): 144h + 100h

Text Sentiment - Current results and data

No pretrained models testēti: https://www.notion.so/evalds/2023-03-23-Betija-API-Task-text_sentiment-5b7d18e5bb5b468ca3af2e1d4770b9a8?pvs=4