2024-07-11 Meeting 51

Llama Guard is an LLM-based safeguard model that can classify the safety risks in LLM prompts and responses. It demonstrates strong performance on existing benchmarks like the OpenAI Moderation Evaluation dataset and ToxicChat, matching or exceeding current content moderation tools

https://www.semanticscholar.org/paper/EvaluLLM%3A-LLM-assisted-evaluation-of-generative-Desmond-Ashktorab/1869b63cbb1670938fa21670021e405d5dd40a48

https://dl.acm.org/doi/abs/10.1145/3640544.3645216

https://github.com/gabrielmittag/NISQA

https://www.youtube.com/@YannicKilcher

Anthropic's HH-RLHF

https://github.com/RLHFlow/RLHF-Reward-Modeling

RRHF (Rank Responses to Align Language Models with Human Feedback)

https://arxiv.org/html/2312.07592v1

RAIN (Rewindable Auto-regressive INference) is another inference method that allows pre-trained LLMs to self-evaluate their own generation and use the evaluation to guide generation rewinding for improved AI safety

ToRA (Gou et al., 2023) or RFT (Yuan et al., 2023)

LLM to Prolog

https://arxiv.org/pdf/2405.17893

https://huggingface.co/datasets/Thomas-X-Yang/gsm8k-prolog?row=6

LLM to SQL https://github.com/defog-ai/sqlcoder/

Ko varētu, lai uzlabotu rezultātu? named-entity-recognition

https://huggingface.co/dslim/bert-base-NER