Team of Rivals

I cannot precisely recall what drew me to this tome. Perhaps it was my enduring passion for history, or maybe I was simply trying to understand why Lincoln is consistently ranked as the greatest U.S. president, surpassing even Washington and Roosevelt in historical estimations. Or perhaps, I simply yearned to comprehend what kind of character could lead during a nation’s most tumultuous period in a way that has been so universally acclaimed....

November 25, 2024 · 5 min · Weilun Chen

Chatbot Arena in a nutshell

Disclaimer: All of the following content are public information. Academic Evals are Cracked It all starts with the question: How do you evaluate the quality of something that’s as strong as GPT-4 (or Gemini). Before the era of large language model, various of researchers spend many times constructing evaluation benchmark to evaluate the model’s capability progress. I’d argue that a good benchmark is what drive progress in the field of NLP, and claiming the lead on a benchmark usually comes with fame and fortune, driving researchers and companies to compete with each other to create a better model....

August 7, 2024 · 11 min · Weilun Chen

From RLHF to Direct Preference Learning

It’s well known that the state-of-the-art LLM models are trained with massive human quality feedback. This feedback is either coming from a massive rater pool, or can come from the end users implicitly (sometimes explicitly as well.. remember when ChatGPT presents you 2 responses to choose from?). However, I found that there are many subtleties for one to truly understand what’s going on under the hood. This is a blog to capture my understanding of the RLHF algorithm, and how it evolves into rewardless model such as DPO and IPO....

July 4, 2024 · 11 min · Weilun Chen

Some lessons I learned the hard way

The last 6 months have been a challenging and tough period in my life, prompting considerable reading and self-reflection. Now as things are finally getting back on track, I’d like to share some valuable lessons that helped me. If you find yourself sitting in front of the computer, bored enough to open this article, you might find these lessons helpful to you as well. Embrace Longtermism The concept of longtermism embodies various layers of interpretation....

May 31, 2024 · 5 min · Weilun Chen