BEGIN:VCALENDAR
VERSION:2.0
X-WR-CALNAME;VALUE=TEXT:(1) Distribution over rewards for pluralistic AI alignment & (2) Data Reliability Scoring
PRODID:-//Harvard events data//EN
BEGIN:VEVENT
UID:event_1902716_0
SUMMARY:(1) Distribution over rewards for pluralistic AI alignment & (2) Data Reliability Scoring
DESCRIPTION:<p><strong>Time &amp; Location:</strong>&nbsp;Friday, November 21, 1:30pm - 2:30pm at SEC LL 2.221</p><p><strong>Speaker 1: Itai Shapira </strong>(Harvard University)</p><p><strong>Title 1:</strong>&nbsp;Dis<span>tribution over rewards for pluralistic AI alignment</span></p><p><span><strong>Abstract 1:</strong>&nbsp;Current alignment pipelines based on human feedback, typically learn a single scalar reward that reflects a presumed universal notion of desirable behavior. Yet human preferences often diverge across users, contexts, and cultures, so disagreement in the feedback collapses into a majority signal, minority perspectives are discounted, and downstream policy optimization amplifies this preference collapse. This talk treats</span>&nbsp;reward learning explicitly as preference aggregation and shows that broad classes of loss based reward learning rules, including Bradley–Terry style objectives, <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_2405.14758&amp;d=DwMFaQ&amp;c=WO-RGvefibhHBZq3fL85hQ&amp;r=Wxj8qnYuBjs6a_nW-yFZ1JVLpkjpgFkS7n3tX3v6_po&amp;m=5WhUGcAeDETuYfvdwal8E17wBFWPJ1fyOSwYnFaUCSmlkIxGxWadFlBeImelSvvo&amp;s=YyLqM6Xy_WFCFMF7bCyUBxf_zTN4CQrhAMHKv4ghPuE&amp;e="><span>cannot satisfy basic social choice axioms </span></a>once rewards are constrained to a parametric family, revealing that the resulting single reward is often a mis-specified objective.&nbsp;</p><p>Motivated by this impossibility, we propose reflecting diverse human preferences through a <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_2506.06298&amp;d=DwMFaQ&amp;c=WO-RGvefibhHBZq3fL85hQ&amp;r=Wxj8qnYuBjs6a_nW-yFZ1JVLpkjpgFkS7n3tX3v6_po&amp;m=5WhUGcAeDETuYfvdwal8E17wBFWPJ1fyOSwYnFaUCSmlkIxGxWadFlBeImelSvvo&amp;s=1HkkKr4mU2PsINCiy1ldDANzmeFhRMwszfqcHSX2vV8&amp;e="><span>distribution over multiple reward functions,</span></a>&nbsp;each inducing a distinct aligned policy. The central criterion is pairwise calibration: for every pair of candidate responses, the fraction of reward functions preferring one response matches the fraction of annotators with that preference. Our results show that even a small outlier free ensemble can accurately represent diverse preference distributions while remaining practical to train and deploy.</p><p><span><strong>Speaker 2: Shi Feng </strong>(Harvard University)</span></p><p><strong>Title 2:</strong>&nbsp;Data Reliability Scoring</p><p><span><strong>Abstract 2: </strong>How can we assess the reliability of a dataset without access to ground truth? We introduce the problem of reliability scoring for datasets collected from potentially strategic sources. The true data are unobserved, but we see outcomes of an unknown statistical experiment that depends on them. To benchmark reliability, we define ground-truth-based orderings that capture how much reported data deviate from the truth. We then propose the Gram determinant score, which measures the volume spanned by vectors describing the empirical distribution of the observed data and experiment outcomes. We show that this score preserves several ground-truth based reliability orderings and, uniquely up to scaling, yields the same reliability ranking of datasets regardless of the experiment -- a property we term experiment agnosticism. Experiments on synthetic noise models, CIFAR-10 embeddings, and real employment data demonstrate that the Gram determinant score effectively captures data quality across diverse observation processes.</span></p><p><br>&nbsp;</p><p>&nbsp;</p><p><span>&nbsp;</span></p>
LOCATION:SEC LL2.221
STATUS:CONFIRMED
DTSTART:20251121T183000Z
DTEND:20251121T193000Z
END:VEVENT
END:VCALENDAR