🔴 Marcis: Top priority API sakodēt, ka uz katru segment, emotions_summary un conversation.emotions_summary ņemtu vērā text_sentiment, piemēram, ja nav emociju, tad ņem vērtību no text_sentiment, ja ir tā pati emocija, tad pastiprina, ja dalās tad rāda abas (happiness = positive, sadness, anger = negative, lai noteiktu anger vai sadness izmantot to emociju kura vairāk ir no balss toņa).
Bez detalizētiem testiem pievienot vēl kādu pre-trained English sentiment modeli, iepriekšējie rezultāti: https://www.notion.so/evalds/2023-03-23-Betija-API-Task-text_sentiment-5b7d18e5bb5b468ca3af2e1d4770b9a8?pvs=4Kad izdarīts uzstādi un padod ziņu Arielam, Paulam, Evaldam
Mārcis: Next priority lūdzu http://api.gramatins.lv ar websocket pievienot LV-EN, EN-LV translation (man tas nepieciešams Eldigen tuvākajās dienās) un pārkārto lūdzu GPUs tā, lai varam grāmatiņš palaist uz GPUs, tāpat palūdz lūdzu Paulu piemapot PORTs pareizi, lai asya uz kuras uzlikts http://api.gramatins.lv atvērtos no web
Mārcis (ar Reiņa atbalstu) Ievietot QA AI asya.ai tone of voice validation
sub folderos Tone of voice validation paraugus, kurus līdz šim izmantojām
https://drive.google.com/drive/folders/1uq0qbObdIUZ120iH2TX9vvPQSQrfLunm?usp=sharing
Mārcis (ar Reiņa atbalstu) Palaist inferencē esošo emo modeli uz QA AI asya.ai tone of voice validation
sub folderos validation paraugiem un aizpildīt tabulu. Notestēt vairākus threshold līmeņus, lai klasificētu gala klasi (non-other). Katram eksperimentam aizpildīt savu SHEET, lai var analizēt paugus. Katru reizi, kad veic eksperimentus esošos samples sazipojam kā ZIP un pievienojam folderī, lai nepieciešamības gadījumā varam atkārtot eksperimentus ar precīzi tiem pašiem paraugiem.
https://docs.google.com/spreadsheets/d/1CsPprLj-jBvzClT5uaGxRlOnmaOsN0aDw62ZqgoMynk/edit#gid=0
Mārcis: Atrast kādu uzticamu English Sentiment validation dataset (kurš nebūtu iekļauts esošajos modeļos) - amir var darīt, pievienot paraugus (vēlams tādu, datu kopu, kurai ir Happiness, Anger, Sadness, Other). PitchPatterns sheet validation paraugus pievienos Ariels
https://docs.google.com/spreadsheets/d/1Vc6CwSExDPmHhxJjdJYhBI_S56LSfPSfmni1Jg6yv5I/edit#gid=408018272
Mārcis: Atrast vairākus English Sentiment modeļus un salīdzināt uz validation paraugiem. Aizpildīt results https://docs.google.com/spreadsheets/d/19nXnGoLyLgbeIqS1SwhYSNyYGXdSBN7uPnhuwHp8eBw/edit#gid=0 Iepriekšējie testi: https://www.notion.so/evalds/2023-03-23-Betija-API-Task-text_sentiment-5b7d18e5bb5b468ca3af2e1d4770b9a8?pvs=4
Mārcis: Sākam savākt LT-EN datu kopu no public data, ja par maz tad GPT4 mining + labelling
Ariels: QA AI asya.ai tone of voice validation
folderī, sagaidīt, ka AI komanda ieliek paraugus, pāriet pāri paraugiem, ja kādam nepiekrīti, tad piefiksēt nosaukumus failiem un nosūtīt Mārcim un Ēvaldam
https://drive.google.com/drive/folders/1uq0qbObdIUZ120iH2TX9vvPQSQrfLunm?usp=sharing
Ariels: QA AI asya.ai tone of voice validation
folderī, pievienot jaunus paraugus, kurus vēlies, lai iekļaujam testos. Jāņem vērā, ka paraugi ir jāsagriež pa 4 sek fragmentiem.
Ariels: Rezultātiem QA AI asya.ai tone of voice validation
vari sekot šeit: https://docs.google.com/spreadsheets/d/1CsPprLj-jBvzClT5uaGxRlOnmaOsN0aDw62ZqgoMynk/edit#gid=0
Ariels: pievieno QA AI asya.ai sentiment validation
savus paraugus Pitch Patterns
sheet - jābūt veselam teikumam angļu valodā nevis vienam vārdam
Seko rezultātiem https://drive.google.com/drive/u/1/folders/1rFS3rYYvMR3xvt3qA6LepKmAW9w4J7c3
Language agnostic approach
Once 1-2 months extract audio samples from pitchpatterns using highest activations or marked as wrong
Send for labelling (10% validation samples), considered finished when validation samples correctly labelled
Cleanup datasets using highest activations (remove manually outliners)
Retrain model (hyper-param search, data augmentation)
Test on QA AI asya.ai tone of voice validation
samples, register results https://drive.google.com/drive/folders/1uq0qbObdIUZ120iH2TX9vvPQSQrfLunm?usp=sharing
If good deploy on API
English only model for text sentiment, text intent etc.
Need to have highest quality EN-LV, EN-LT translation models.
Start by using multiple English Sentiment models
Once 1-2 months extract audio samples from pitchpatterns using highest activations in ENGLISH or marked as wrong
Send for labelling (10% validation samples), considered finished when validation samples correctly labelled
Cleanup datasets using highest activations (remove manually outliners)
Retrain model (hyper-param search, data augmentation) Use in combination with other pre-trained models to get highest accuracy
Test on QA AI asya.ai sentiment validation
samples, register results https://drive.google.com/drive/u/1/folders/1rFS3rYYvMR3xvt3qA6LepKmAW9w4J7c3
If good deploy on API
Here's some of our research that is related to the domain of emotion and communication tracking:
Some of the publications about emotions
http://share.yellowrobot.xyz/quick/2023-7-24-9C824AE9-106E-47DF-8ADB-7DD1E107C16B.pdf
http://share.yellowrobot.xyz/quick/2023-7-24-1719D4FF-D5B1-4236-BA73-E629EF385E3F.pdf
Our model results:
http://share.yellowrobot.xyz/quick/2023-7-24-A5A1BFA6-498A-46E7-B32D-5AAEF7B33AF2.html
Emotions classification: Happiness, Anger, Sadness, Other: 62.1%
http://share.yellowrobot.xyz/quick/2023-7-24-4B524F37-BAC2-48E4-8963-A01E2B5E7012.html
Laughter classification: 73%
http://share.yellowrobot.xyz/quick/2023-7-24-A2443B1A-94A4-4CA0-975C-1E682911EE1B.html
Total dataset (before new dataset coming in January 2024), 4sec samples
https://www.notion.so/evalds/Laughter-models-and-data-statistics-38eac0dc55944d439402d3e9254610f7?pvs=4
Laughter model 27k samples
http://share.yellowrobot.xyz/quick/2023-7-24-66F550A4-FB3F-41F7-95A2-CD26D8407928.html LV STT dataset (January 2024): 144h + 100h
No pretrained models testēti: https://www.notion.so/evalds/2023-03-23-Betija-API-Task-text_sentiment-5b7d18e5bb5b468ca3af2e1d4770b9a8?pvs=4