Reveal Planning Capability of Autoregressive Learning in Large Language Models and Computing Voting Rules with Elicited Incomplete Votes

Date: 

Friday, March 29, 2024, 1:30pm to 2:30pm

Location: 

SEC 1.413

1st Speaker:  Shi Feng (Harvard)

Title: Reveal Planning Capability of Autoregressive Learning in Large Language Models

Abstract: Search and planning are fundamental constructs of human intelligence, involved in almost every aspect of our daily lives, from completing tasks at work to organizing trips, to seeking mathematical proofs of theorems, and more. Studying the planning capabilities of large language models (LLMs) can help us understand the differences in the decision-making processes between humans and artificial intelligence. To do this, we first abstract planning as a path-finding problem in a network, analogous to real-world scenarios such as decision-making for multi-step tasks or proof planning for mathematical reasoning. We find that the transformer-based autoregressive model generally achieves high accuracy in the path-finding task. We discover that the model generates the next node on the path by learning and applying two matrices: the adjacency matrix and the reachability matrix, within its multi-layer perceptron (MLP). Our analysis shows that applying gradient descent to minimize the cross-entropy loss on the training data indeed leads to the construction of these two matrices. These findings shed light on how the internal mechanism of autoregressive learning achieves planning in networks, which may help us understand the general planning capability in other related domains.

2nd Speaker:  Daniel Halpern (Harvard)

Title: Computing Voting Rules with Elicited Incomplete Votes

Abstract: Motivated by the difficulty of specifying complete ordinal preferences over a large set of m candidates, we study voting rules that are computable by querying voters about t < m candidates. Generalizing prior works that focused on specific instances of this problem, our paper fully characterizes the set of positional scoring rules that can computed for any 1 <= t < m, which, notably, does not include plurality. We then extend this to show a similar impossibility result for single transferable vote (elimination voting). These negative results are information-theoretic and agnostic to the number of queries. Finally, for scoring rules that are computable with limited-sized queries, we give parameterized upper and lower bounds on the number of such queries a deterministic or randomized algorithm must make to determine the score-maximizing candidate. While there is no gap between our bounds for deterministic algorithms, identifying the exact query complexity for randomized algorithms is a challenging open problem, of which we solve one special case.