Chatbot Arena in a nutshell

Disclaimer: All of the following content are public information. Academic Evals are Cracked It all starts with the question: How do you evaluate the quality of something that’s as strong as GPT-4 (or Gemini). Before the era of large language model, various of researchers spend many times constructing evaluation benchmark to evaluate the model’s capability progress. I’d argue that a good benchmark is what drive progress in the field of NLP, and claiming the lead on a benchmark usually comes with fame and fortune, driving researchers and companies to compete with each other to create a better model....

August 7, 2024 · 11 min · Weilun Chen

From RLHF to Direct Preference Learning

It’s well known that the state-of-the-art LLM models are trained with massive human quality feedback. This feedback is either coming from a massive rater pool, or can come from the end users implicitly (sometimes explicitly as well.. remember when ChatGPT presents you 2 responses to choose from?). However, I found that there are many subtleties for one to truly understand what’s going on under the hood. This is a blog to capture my understanding of the RLHF algorithm, and how it evolves into rewardless model such as DPO and IPO....

July 4, 2024 · 11 min · Weilun Chen