2024-01-02 Emotion Guidelines for AI models

 

TODOs AI

  1. 🔴 Marcis: Top priority API sakodēt, ka uz katru segment, emotions_summary un conversation.emotions_summary ņemtu vērā text_sentiment, piemēram, ja nav emociju, tad ņem vērtību no text_sentiment, ja ir tā pati emocija, tad pastiprina, ja dalās tad rāda abas (happiness = positive, sadness, anger = negative, lai noteiktu anger vai sadness izmantot to emociju kura vairāk ir no balss toņa).

    Bez detalizētiem testiem pievienot vēl kādu pre-trained English sentiment modeli, iepriekšējie rezultāti: https://www.notion.so/evalds/2023-03-23-Betija-API-Task-text_sentiment-5b7d18e5bb5b468ca3af2e1d4770b9a8?pvs=4Kad izdarīts uzstādi un padod ziņu Arielam, Paulam, Evaldam

     

  2. Mārcis: Next priority lūdzu http://api.gramatins.lv ar websocket pievienot LV-EN, EN-LV translation (man tas nepieciešams Eldigen tuvākajās dienās) un pārkārto lūdzu GPUs tā, lai varam grāmatiņš palaist uz GPUs, tāpat palūdz lūdzu Paulu piemapot PORTs pareizi, lai asya uz kuras uzlikts http://api.gramatins.lv atvērtos no web

  3. Mārcis (ar Reiņa atbalstu) Ievietot QA AI asya.ai tone of voice validation sub folderos Tone of voice validation paraugus, kurus līdz šim izmantojām https://drive.google.com/drive/folders/1uq0qbObdIUZ120iH2TX9vvPQSQrfLunm?usp=sharing

  4. Mārcis (ar Reiņa atbalstu) Palaist inferencē esošo emo modeli uz QA AI asya.ai tone of voice validation sub folderos validation paraugiem un aizpildīt tabulu. Notestēt vairākus threshold līmeņus, lai klasificētu gala klasi (non-other). Katram eksperimentam aizpildīt savu SHEET, lai var analizēt paugus. Katru reizi, kad veic eksperimentus esošos samples sazipojam kā ZIP un pievienojam folderī, lai nepieciešamības gadījumā varam atkārtot eksperimentus ar precīzi tiem pašiem paraugiem. image-20240102140711698

    image-20240102140647143

    https://docs.google.com/spreadsheets/d/1CsPprLj-jBvzClT5uaGxRlOnmaOsN0aDw62ZqgoMynk/edit#gid=0

  5. Mārcis: Atrast kādu uzticamu English Sentiment validation dataset (kurš nebūtu iekļauts esošajos modeļos) - amir var darīt, pievienot paraugus (vēlams tādu, datu kopu, kurai ir Happiness, Anger, Sadness, Other). PitchPatterns sheet validation paraugus pievienos Ariels image-20240102141540199 https://docs.google.com/spreadsheets/d/1Vc6CwSExDPmHhxJjdJYhBI_S56LSfPSfmni1Jg6yv5I/edit#gid=408018272

  1. Mārcis: Atrast vairākus English Sentiment modeļus un salīdzināt uz validation paraugiem. Aizpildīt results https://docs.google.com/spreadsheets/d/19nXnGoLyLgbeIqS1SwhYSNyYGXdSBN7uPnhuwHp8eBw/edit#gid=0 Iepriekšējie testi: https://www.notion.so/evalds/2023-03-23-Betija-API-Task-text_sentiment-5b7d18e5bb5b468ca3af2e1d4770b9a8?pvs=4

  2. Mārcis: Sākam savākt LT-EN datu kopu no public data, ja par maz tad GPT4 mining + labelling


 

TODOs QA

  1. Ariels: QA AI asya.ai tone of voice validation folderī, sagaidīt, ka AI komanda ieliek paraugus, pāriet pāri paraugiem, ja kādam nepiekrīti, tad piefiksēt nosaukumus failiem un nosūtīt Mārcim un Ēvaldam https://drive.google.com/drive/folders/1uq0qbObdIUZ120iH2TX9vvPQSQrfLunm?usp=sharing

  2. Ariels: QA AI asya.ai tone of voice validation folderī, pievienot jaunus paraugus, kurus vēlies, lai iekļaujam testos. Jāņem vērā, ka paraugi ir jāsagriež pa 4 sek fragmentiem.

  3. Ariels: Rezultātiem QA AI asya.ai tone of voice validation vari sekot šeit: https://docs.google.com/spreadsheets/d/1CsPprLj-jBvzClT5uaGxRlOnmaOsN0aDw62ZqgoMynk/edit#gid=0

  4. Ariels: pievieno QA AI asya.ai sentiment validation savus paraugus Pitch Patterns sheet - jābūt veselam teikumam angļu valodā nevis vienam vārdam

    image-20240102141540199 https://docs.google.com/spreadsheets/d/1Vc6CwSExDPmHhxJjdJYhBI_S56LSfPSfmni1Jg6yv5I/edit#gid=408018272

    Seko rezultātiem https://drive.google.com/drive/u/1/folders/1rFS3rYYvMR3xvt3qA6LepKmAW9w4J7c3

 


 

Tone of Voice - Guidelines

Language agnostic approach

  1. Once 1-2 months extract audio samples from pitchpatterns using highest activations or marked as wrong

  2. Send for labelling (10% validation samples), considered finished when validation samples correctly labelled

  3. Cleanup datasets using highest activations (remove manually outliners)

  4. Retrain model (hyper-param search, data augmentation)

  5. Test on QA AI asya.ai tone of voice validation samples, register results https://drive.google.com/drive/folders/1uq0qbObdIUZ120iH2TX9vvPQSQrfLunm?usp=sharing

  6. If good deploy on API

 


Text Sentiment - Guidelines

English only model for text sentiment, text intent etc.

Need to have highest quality EN-LV, EN-LT translation models.

  1. Start by using multiple English Sentiment models

  2. Once 1-2 months extract audio samples from pitchpatterns using highest activations in ENGLISH or marked as wrong

  3. Send for labelling (10% validation samples), considered finished when validation samples correctly labelled

  4. Cleanup datasets using highest activations (remove manually outliners)

  5. Retrain model (hyper-param search, data augmentation) Use in combination with other pre-trained models to get highest accuracy

  6. Test on QA AI asya.ai sentiment validation samples, register results https://drive.google.com/drive/u/1/folders/1rFS3rYYvMR3xvt3qA6LepKmAW9w4J7c3

  1. If good deploy on API


 

Tone of Voice - Current results and data

Here's some of our research that is related to the domain of emotion and communication tracking:

Some of the publications about emotions

  1. http://share.yellowrobot.xyz/quick/2023-7-24-9C824AE9-106E-47DF-8ADB-7DD1E107C16B.pdf

  2. http://share.yellowrobot.xyz/quick/2023-7-24-1719D4FF-D5B1-4236-BA73-E629EF385E3F.pdf

 

Our model results:

  1. http://share.yellowrobot.xyz/quick/2023-7-24-A5A1BFA6-498A-46E7-B32D-5AAEF7B33AF2.html

    Emotions classification: Happiness, Anger, Sadness, Other: 62.1% image-20240102135023692

  2. http://share.yellowrobot.xyz/quick/2023-7-24-4B524F37-BAC2-48E4-8963-A01E2B5E7012.html

    Laughter classification: 73% image-20240102135121038

  3. http://share.yellowrobot.xyz/quick/2023-7-24-A2443B1A-94A4-4CA0-975C-1E682911EE1B.html Total dataset (before new dataset coming in January 2024), 4sec samples image-20240102135233595

  4. https://www.notion.so/evalds/Laughter-models-and-data-statistics-38eac0dc55944d439402d3e9254610f7?pvs=4 Laughter model 27k samples image-20240102142349678

  5. http://share.yellowrobot.xyz/quick/2023-7-24-66F550A4-FB3F-41F7-95A2-CD26D8407928.html LV STT dataset (January 2024): 144h + 100h

 


 

Text Sentiment - Current results and data

No pretrained models testēti: https://www.notion.so/evalds/2023-03-23-Betija-API-Task-text_sentiment-5b7d18e5bb5b468ca3af2e1d4770b9a8?pvs=4

image-20240102143019596