Spring EconCS 2025 Seminars

Date and Time

February 7, 2025
01:30PM - 02:30PM EST

Location

SEC LL 2.221


Time & Location: Friday, February 07, 1:30pm - 2:30pm at SEC LL 2.221

Speaker 1: André Cruz (PhD student at MPI and currently visiting Rediet Adebe at Harvard)

Title: Evaluating language models as risk scorers

Abstract: Current LLM benchmarks predominantly focus on accuracy in realizable (factual) tasks. Such benchmarks necessarily fail to evaluate LLMs’ ability to quantify ground-truth outcome uncertainty. In this work, we leverage US Census data to evaluate LLMs’ ability to generate meaningful real-world distributions. We introduce folktexts, a python package to standardize the evaluation of uncertainty, calibration, and fairness of LLMs on real-world tabular data tasks. We find that predictive risk scores produced by state-of-the-art LLMs have high predictive signal but are wildly miscalibrated. Our evaluation reveals a general inability of instruction-tuned LLMs to express data uncertainty in multiple-choice Q&A, exhibiting strong over-confidence bias across a variety of benchmark tasks. These differences in ability to quantify data uncertainty cannot be revealed in realizable settings, and highlight a blind-spot in the current evaluation ecosystem that folktexts covers.

Speaker 2: Ben Schiffer (PhD Student at Harvard Statistics Department):

Title: Clone-Robust AI Alignment

Abstract: A key challenge in training Large Language Models (LLMs) is properly aligning them with human preferences. Reinforcement Learning with Human Feedback (RLHF) uses pairwise comparisons from human annotators to train reward functions and has emerged as a popular alignment method. However, input datasets in RLHF can be unbalanced due to adversarial manipulation or inadvertent repetition. Therefore, we want RLHF algorithms to perform well even when the set of alternatives is not uniformly distributed. Drawing on insights from social choice theory, we introduce robustness to approximate clones, a desirable property of RLHF algorithms which requires that adversarially adding near-duplicate alternatives does not significantly change the learned reward function. We first demonstrate that the standard RLHF algorithm based on regularized maximum likelihood estimation (MLE) fails to satisfy this property. We then propose the weighted MLE, a new RLHF algorithm that modifies the standard regularized MLE by weighting alternatives based on their similarity to other alternatives. This new algorithm guarantees robustness to approximate clones while preserving desirable theoretical properties. Joint work with Ariel Procaccia and Shirley Zhang.

As a reminder you can find the up-to-date schedule for the upcoming talks at this google sheet (requires Harvard login) or on the BEACH calendar (publicly available).