Metritocracy: Representative Metrics for Lite Benchmarks & Learning to Coordinate Bidders in Non-Truthful Auctions

Date and Time

September 26, 2025
01:30PM - 02:30PM EDT

Location

SEC LL2.221

Speaker 1: Shirley Zhang (Harvard University)

Title: Metritocracy: Representative Metrics for Lite Benchmarks

Abstract: A common problem in LLM evaluation is how to choose a subset of metrics from a full suite of possible metrics. Subset selection is usually done for efficiency or interpretability reasons, and the goal is often to select a “representative” subset of metrics. However, “representative” is rarely clearly defined. In this work, we use ideas from social choice theory to formalize two notions of representation for the selection of a subset of evaluation metrics. We first introduce positional representation, which guarantees every alternative is sufficiently represented at every position cutoff. We then introduce positional proportionality, which guarantees no alternative is proportionally over- or under-represented by more than a small error at any position. We prove upper and lower bounds on the smallest number of metrics needed to guarantee either of these properties in the worst case. We also study a generalized form of each property that allows for additional input on groups of metrics that must be represented. Finally, we tie theory to practice through real-world case studies on both LLM evaluation and hospital quality evaluation.

Speaker 2: Tao Lin (MSR, Harvard University)

Title: Learning to Coordinate Bidders in Non-Truthful Auctions

Abstract: In non-truthful auctions such as first-price and all-pay auctions, the independent strategic behaviors of bidders, with the corresponding equilibrium notion -- Bayes Nash equilibria -- are notoriously difficult to characterize and can cause undesirable outcomes. An alternative approach to designing better auction systems is to coordinate the bidders: let a mediator make incentive-compatible recommendations of correlated bidding strategies to the bidders, namely, implementing a Bayes correlated equilibrium (BCE). The implementation of BCE, however, requires knowledge of the distribution of bidders' private valuations, which is often unavailable. We initiate the study of the sample complexity of learning Bayes correlated equilibria in non-truthful auctions. We prove that the BCEs of a large class of non-truthful auctions, including first-price and all-pay auctions, can be learned with a polynomial number of O( n / \epsilon^2}) of samples from the bidders' value distributions. This moderate number of samples supports the possibility of learning to coordinate bidders in auctions. Our technique is a reduction to the problem of estimating bidders' expected utility from samples, combined with an analysis of the pseudo-dimension of the class of all monotone bidding strategies of bidders. This is joint work with Hu Fu: https://arxiv.org/abs/2507.02801