Tentative Program
-
Wednesday, October 8, 2025
Various activity options, including bowling, climbing, walking campus tour, drinkies at a local bar or doing something on your own, e.g., visiting a state park or a winery.
-
Thursday, October 9, 2025
- 8:30am – 9:30am Poster Session 1, with coffee and muffins, Duffield Atrium
- 9:30AM – 9:45am Walk to 571 Rhodes Hall
- 9:45am – 10:45am Scheduling and Queues
- 10:45am – 11:15 am Coffee Break
- 11:15am – 12:45pm Theory Inspired by Application
- 12:45pm – 14:15pm Lunch on your own
- 14:15pm – 15:45pm Experiments/Statistics/Inference
- 15:45pm – 16:15pm Coffee Break
- 16:15pm – 17:45pm Other Cool Stuff
- 18:30pm – 21:00pm Banquet Marriot Ballroom
-
Friday, October 10, 2025
- 8:30am – 9:30am Poster Session 2, Park Atrium, Nolan School of Hotel Administration
- 9:30am – 9:45am Walk to 571 Rhodes Hall
- 9:45am – 11:15am Continuous Optimization
- 11:15am – 11:30am Coffee Break
- 11:30am – 13:00pm AI Agents
Detailed Session Information
Talk abstracts are at the bottom of this page.
-
Scheduling and Queues
Thursday, October 9, 9:45am – 10:45am in 571 Rhodes Hall
Wentao Weng, MIT
Scheduling with Uncertain Holding Costs and its Application to Content ModerationWenxin Zhang, Columbia University
Distributed Load Balancing with Workload-Dependent Service Rates -
Theory Inspired by Application
Thursday, October 9, 11:15am – 12:45pm in 571 Rhodes Hall
Yiheng Shen, Duke University
The Price of Competitive Information DisclosureWeiyuan Li, Cornell University
Revenue Management with Calendar-Aware and Dependent Demands: Asymptotically Tight Fluid ApproximationsAlexandria Schmid, MIT
A Double Decomposition Algorithm for Network Planning and Operations in Deviated Fixed-route Microtransit -
Experiments/Statistics/Inference
Thursday, October 9, 14:15pm -15:45pm in 571 Rhodes Hall
Chao Qin, Stanford University
Adaptive Experimentation: Bringing Real-World Challenges to Multi-Armed BanditsSadegh Shirani, Stanford University
Causal message-passing: An Evolution-based Approach to Experimentation Under Unknown InterferenceMengqi Lou, Georgia Tech
Computationally efficient reductions between some statistical models -
Other Cool Stuff
Thursday, October 9, 16:15pm – 17:45pm in 571 Rhodes Hall
Roshni Sahoo, Stanford University
What would it cost to end extreme poverty?Emily Zhang, MIT
Heterogeneous Treatment Effects in Panel Data: Insights into the Healthy Incentives ProgramAndrei Graur, Stanford University
Recent advances in submodular function minimization via first-order methods -
Continuous Optimization
Friday, October 10, 9:45 am – 11:15am in 571 Rhodes Hall
Junhui Zhang, MIT
Multi-Timescale Gradient Sliding for Distributed OptimizationTianyu Wang, Columbia University
Optimizer’s Information Criterion: A Diagnostics Framework for Data-Driven OptimizationRuichen Jiang, University of Texas at Austin
Accelerating Nonconvex Optimization via Online Learning -
AI Agents
Friday, October 10, 11:30am – 13:00pm in 571 Rhodes Hall
Yoav Kolumbus, Cornell University
Markets with Learning AgentsKunhe Yang, University of California at Berkeley
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?Natalie Collina, University of Pennsylvania
Emergent Alignment via Competition
Poster Session One (Thursday, October 9, 8:30am – 9:30am in Duffield Atrium)
| Abdellah Aznag | Columbia | An Active Learning Framework for Multi-group Mean Estimation |
| Akshita Gupta | Purdue | Properties of Two-Stage Stochastic Multi-Objective Linear Programs |
| Alon Jacobson | Cornell | Model-free meta-Bayesian optimization via offline acquisition function learning |
| Ariel Goodwin | Cornell | Subgradient methods in geodesic spaces |
| Brian Liu | MIT | Foundations and Applications in Explainable AI |
| Bruce Giang | Cornell | Efficient Conditional Gradient Methods for Solving Stochastic Convex Bilevel Optimization Problems |
| Caleb Ju | Georgia Tech | Auto-explorative online reinforcement learning |
| Celeste Groux | Cornell | Modelling an NYC Electric Bike Battery Swapping Service |
| Chengyue He | Columbia | Minimum Cut Representability of Stable Matching Problems |
| Chiraag Kaushik | Georgia Tech | Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks |
| Christopher En | Columbia | An Algorithm for the Assignment Game Beyond Additive Valuations |
| Cristian Palma | Cornell | Selfish Learners in Queueing Systems with Small Buffers |
| Eleanor Best | Cornell | Machine-Learning Guided Bioengineering for Custom Sensor Design |
| Filip Stojanovic | Cornell | Limit Theory for Stable Processes Driven by Dissipative Flows |
| Gary Qiurui Ma | Harvard | Pricing with tips in three-sided delivery platforms |
| George Yu | Cornell | A Tale of Two Traffics: Optimizing Tail Latency in the Light-Tailed M/G/k |
| Giannis Fikioris | Cornell | Learning in Budgeted Auctions with Spacing Objectives |
| H. Satyam Verma | CMU | Data to Dose: Efficient Synthetic Data Generation with Expert Guidance for Personalized Dosing |
| Jiaqi Wang | Cornell | Clustering of Large deviations in heavy tailed regime |
| Jiawei Ge | Princeton | MLE is All You Need for Well-Specified Covariate Shift |
| Jieqi Di | Gatech | Optimal Pricing With Impatient Customers |
| Lilianna Gittoes | Cornell | When Short Horizons Go the Distance: Lookahead Policies for Economic Dispatch with Storage |
| Lin An | CMU | Near-Optimal Real-Time Personalization with Simple Transformers |
| Matthew Ford | Cornell | LLM-Guided MIPs for Multi-Objective Optimization Under Preference Uncertainty |
| Minda Zhao | Georgia Tech | Hidden Convexity in Pricing and Capacity Sizing for Queueing Systems |
| Momo Adegbindin | Cornell | Performance Variability in Mixed Integer Programming Solvers: Measurement and Prediction |
| Ning Duan | Cornell | Transit Line Planning and Optimization in Multi-Modal Transportation Networks |
| Qing Feng | Cornell | Fairness-Aware Static and Dynamic Assortment Optimization: Optimal Selection with Balanced Market Share |
| Shefali Ramakrishna | Cornell | Empirical Gittins: Scheduling in the M/G/1 from Job Size Samples |
| TaeHo Yoon | Johns Hopkins | Fixed-Point Algorithms: Exact Minimax Optimality (Poster) |
| Vrishabh Patil | CMU | Computing a Nonnegative Dyadic Solution to a System of Linear Equations |
| Yaroslav Mukhin | Cornell Economics/Statistics | Kernel von Mises Formula of the Influence Function |
| Yi Zhuang | Baruch | Dynamic Pricing of Split Stays |
| Yige Hong | CMU | A new 1/(1-rho)-scaling bound for multiserver queues via a leave-one-out technique |
| Youssef Chaabouni | MIT | The Price of Sparsity: Sufficient Conditions for Sparse Recovery using Sparse and Sparsified Measurements. |
| Yuan Shi | MIT | Incentivizing Smallholder Farmer Sustainability under Behavioral Regularities |
| Zhanhao Zhang | Cornell | Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition |
| Zhenghan Fang | Johns Hopkins | Beyond Scores: Proximal Diffusion Models |
| Zhongmou Chao | Cornell | Evolution-Aware Positive-Unlabeled Learning for Protein Design |
Poster Session Two (Friday, October 10, 8:30am – 9:30am, in the Park Atrium, Nolan School of Hotel Administration)
| Alex Terenin | Cornell | Bayesian Algorithms for Adversarial Online Learning: from Finite to Infinite Action Spaces |
| Ayeong Lee | Columbia | Importance Sampling for Latent Dirichlet Allocation |
| Brian Hu Zhang | MIT | Efficiently Solving Variational Inequalities under the Minty Condition, and Game-Theoretic Applications |
| Calvin Tolbert | Cornell | Trend-Informed Bayesian Optimization |
| Chengpiao Huang | Columbia | Uncertainty Quantification for LLM-Based Survey Simulations |
| Christopher Yeh | Caltech | Learning Decision-Focused Uncertainty Representations for Risk-Aware Optimization |
| Claire Chang | Cornell | Fairness in Paired Kidney Exchange Mechanisms |
| Dasol Yoon | Cornell | Bayesian Optimization of Composite Functions for High Dimensional Inverse Problems: Application to Electron Microscopy |
| David Wu | Cornell | The Treasury – SOFR Swap Spread Puzzle Explained |
| Dimitris Christou | UT Austin | Combinatorial Selection with Costly Information |
| Eliezer Fuentes | Cornell | Designing and Directing the Queues of the Sky |
| Federico Bobbio | Northwestern | Sharing with Frictions: Limited Transfers and Costly Inspections |
| Gabriel Morete de Aevedo | Cornell | Maximum entropy algorithms for selecting citizens’ assemblies |
| Hannane Yaghoubizade | Cornell | Fair Division Among Couples and Small Groups |
| Hui Lan | Stanford | A Meta-learner for Heterogeneous Effects in Difference-in-Differences |
| Imon Banerjee | Northwestern | Statistics on Controlled Markov chains |
| Jaewook J. Suh | Rice | PEPFlow: A Python Library for the Workflow of Systematically Analyzing Optimization Algorithms |
| Jessy Xinyi Han | MIT | Causal Survival Analysis — tentative |
| Jiaqi Wang | Georgia Tech | D-optimal orienteering for post-earthquake reconnaissance planning |
| Laurel Newman | Cornell | Queueing Theory for Disease Modelling |
| Libin Zhu | Washington | Iteratively reweighted kernel machines efficiently learn sparse functions |
| Licong Lin | Berkeley | A statistical theory of contrastive pretraining and multimodal AI |
| Marouane Ibn Brahim | Cornell | Assortment Optimization with Visibility Constraints |
| Mohammadreza Ahmadnejadsaein | Cornell | Adaptive Two-sided Assortment Optimization: Revenue Maximization |
| Natthawut Boonsiriphatthanajaroen | Cornell | Designing a Frank-Wolfe Algorithm for Simulation Optimization over Unbounded Linearly Constrained Feasible Regions |
| Noah Weninger | Waterloo | A fast combinatorial algorithm for the bilevel knapsack problem with interdiction constraints |
| Owen Li | Cornell | A Simple and Fast Algorithm for Fair Cuts via Sherman’s Multicommodity Flow Routine |
| Poompol Buathong | Cornell | Fast Bayesian Optimization of Function Networks with Partial Evaluations |
| Qian Xie | Cornell | Cost-aware Bayesian Optimization with Adaptive Stopping via the Pandora’s Box Gittins Index |
| Qingyuan Chen | Cornell | Chatbot Journey Optimization |
| Siyu Kong | Cornell | Algorithm Design for Nonsmooth Nonconvex Optimization |
| Soonbong Lee | Yale | Who to Offer, and When: Redesigning Feeding America’s Real-Time Donation System |
| Taha Ameen | UIUC | Exact random graph matching with multiple graphs |
| Ulysse Hennebelle | Cornell | Two-Sided Assortment Optimization: Adaptivity Gaps and Algorithms |
| Xiangyin Chen | Cornell | Data-Driven Contextual Pricing with Semi-Parametric Models |
| Yuchen Lou | Northwestern | A Decomposition Framework for Nonlinear Nonconvex Two-Stage Optimization |
| Zeyu Jia | MIT | Do we need to verify step by step? rethinking process supervision from a theoretical perspective |
| Zixuan Zhang | Georgia Tech | Diffusion Model for Manifold Data: Score Decomposition, Curvature, and Statistical Complexity |
Travel
We are looking forward to seeing you at the Young Researchers Conference to be held October 8-10, 2025 at Cornell University in Ithaca, New York. Here is some information about airport options.
-
Ithaca Airport
The most convenient option for flying is to arrive at the Ithaca airport. It is only three miles from Cornell campus, and an easy taxi/uber/lyft ride. Many hotels provide shuttle service to and from the airport as well as around town.
-
Syracuse Airport
While Syracuse has a lot more flight options, it is a bit over an hour lyft/uber ride to get from Syracuse to Ithaca. You can rent a car at the airport.
-
Other Nearby Airports
Elmira Airport and Binghamton Airport are both less than an hour from Ithaca, but both are quite small.
-
New York City Airports
The last option would be flying into any of the major NYC airports and taking a bus to Ithaca. This is the longest option, as most buses from NYC to Ithaca take about 4.5 hours. Some bus options are:
– Cornell Campus to Campus: Most comfortable option that leaves from the Cornell Club in midtown NYC and drops you off right on Cornell campus in Ithaca.
– OurBus: Has options leaving from either the George Washington Bridge or Port Authority to downtown Ithaca. From downtown it is a short cab ride to get to the Cornell campus.
Hotels
(Some of these are a few miles from the campus)
-
Statler Hotel
130 Statler Drive, Ithaca, NY 14853
(607) 254 – 2500. (Closest on to the conference venue – 5 minute walk!) -
The Hotel Ithaca
222 South Cayuga Street, Ithaca, New York 14850
(607) 272 – 1000 -
The Ithaca Marriott
120 S Aurora St, Ithaca, NY 14850
(607) 272-2222 -
Hilton Garden Inn
130 E. Seneca Street, Ithaca, New York, 14850
(607) 277 – 8900 -
Best Western
1020 Ellis Hollow Rd, Ithaca, NY 14850
(607) 272 – 6100 -
Courtyard by Marriott Ithaca Airport/University
29 Thornwood Dr, Ithaca, NY 14850
(607) 330-1000 -
Country Inn and Suites by Radisson
1100 Danby Rd #96b, Ithaca, NY 14850
(607) 252-4791 -
Hampton Inn Ithaca
337 Elmira Rd, Ithaca, NY 14850
(607) 277-5500 -
Other Options
Ithaca also has many premier Bed & Breakfasts, Airbnbs, VRBOs.
If you are not giving a talk and are interested in sharing a room, there is a shared spreadsheet to see others in the same circumstance.
Speaker Information
All talks will take place in 571 Rhodes Hall. The room has built in projectors, microphones and is zoom capable. We hope to record your talk; please let us know if that is not possible. An in-room computer can be used to present from pdf and powerpoint files. You also have the option of presenting from your own laptop using zoom. Talks are 30 minutes long with questions.
Poster Presenter Information
Poster sessions are planned for 8.30am on Thursday and on Friday mornings. If you are accepted for a poster presentation, then you will be allocated to one or the other of these sessions. We will provide posterboards and pushpins. There will be enough space at least for 36×48 inch posters, and probably for larger posters if you prefer. You can print your poster at the Mann Library, prices linked here along with hours of operation.
Slightly More Detail About the Activities
- Bowling at the Helen Newman Bowling Center
- Climbing at the Lindseth Climbing Center
- Cornell Tour starting at the Martin Y. Tang Welcome Center
- Drinks TBD
Talk Abstracts
Scheduling and Queues
Wentao Weng, MIT
Scheduling with Uncertain Holding Costs and its Application to Content Moderation
In content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is a-priori unknown. Motivated by such uncertain holding costs, we consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs. We demonstrate that in the presence of such uncertain holding costs, the two canonical algorithmic principles, instantaneous-cost (c\mu-rule) and expected-remaining-cost (c\mu / \theta -rule), are suboptimal. By viewing each job as a Markovian ski-rental problem, we develop a new index-based algorithm, Opportunity-adjusted Remaining Cost (OaRC), that adjusts to the opportunity of serving jobs in the future when uncertainty partly resolves. We show that the regret of OaRC scales as \tilde{O}(N \log L), where L is the maximum length of a job’s holding cost trajectory and N is the system size. This regret bound shows that OaRC achieves asymptotic optimality when the system size scales to infinity. Moreover, its regret is independent of the state-space size, which is a desirable property when job states contain contextual information. We corroborate our results with an extensive simulation study based on two holding cost patterns (online ads and user-generated content) that arise in content moderation for social media platforms. Our simulations based on synthetic and real datasets demonstrate that OaRC consistently outperforms existing practice, which is based on the two canonical algorithmic principles.
Wenxin Zhang, Columbia University
Distributed Load Balancing with Workload-Dependent Service Rates
We study distributed load balancing in bipartite queueing systems where frontends route jobs to heterogeneous backends with workload-dependent service rates. The system’s connectivity – governed by compatibility constraints such as data residency or resource requirements – is represented by an arbitrary bipartite graph. Each frontend operates independently without communication with other frontends, and the goal is to minimize the expected average latency of all jobs. We propose a closed-loop policy called the Greatest Marginal Service Rate (GMSR) policy that achieves effective coordination without requiring knowledge of arrival rates.
In a discrete-time stochastic model, we show that the behavior of our routing policy converges (almost surely) to the behavior of a fluid model, in the limit as job sizes tend to zero and job arrival rates are scaled so that the expected total volume of jobs arriving per unit time remains fixed. Then, in the fluid regime, we demonstrate that the policy attains an ϵ-suboptimal solution in O(δ+log1/ϵ) time from δ-suboptimal initial workloads, which implies global convergence to the centrally coordinated optimal routing. Finally, we analyze the fluid model when the system is overloaded. We show that GMSR lexicographically maximizes throughput, maximizes the number of stable backends, and minimizes their collective workload.
Theory Inspired by Application
Yiheng Shen, Duke University
The Price of Competitive Information Disclosure
In many decision-making scenarios, individuals strategically choose what information to disclose to optimize their own outcomes. It is unclear whether such strategic information disclosure can lead to good societal outcomes. To address this question, we consider a competitive Bayesian persuasion model in which multiple agents selectively disclose information about their qualities to a principal, who aims to choose the candidates with the highest qualities. Using the price-of-anarchy framework, we quantify the inefficiency of such strategic disclosure. We show that the price of anarchy is at most a constant when the agents have independent quality distributions, even if their utility functions are heterogeneous. This result provides the first theoretical guarantee on the limits of inefficiency in Bayesian persuasion with competitive information disclosure.
Weiyuan Li, Cornell University
Revenue Management with Calendar-Aware and Dependent Demands: Asymptotically Tight Fluid Approximations
Revenue management models often assume independent Poisson arrivals, but such assumptions underestimate the demand variability and neglect temporal demand correlations observed in practice. This talk develops fluid approximations for revenue management under calendar-aware and dependent demand models, which allow both high variability and structured correlations in customer arrivals. We show that the resulting fluid approximations are asymptotically tight and lead to simple, implementable policies with provable performance guarantees. We also discuss extensions to history-dependent fluid approximations, which further improve approximation quality and strengthen the guarantees. Finally, we present computational experiments demonstrating that the correct fluid approximation can significantly improve performance in practice.
Alexandria Schmid, MIT
A Double Decomposition Algorithm for Network Planning and Operations in Deviated Fixed-route Microtransit
Microtransit offers opportunities to enhance urban mobility by combining the reliability of public transit and the flexibility of ride-sharing. This paper optimizes the design and operations of a deviated fixed-route microtransit system that relies on reference lines but can deviate on demand in response to passenger requests. We formulate a Microtransit Network Design (MiND) model via two-stage stochastic integer optimization, with a first-stage network design and service scheduling structure and a second-stage vehicle routing structure. We derive a tight second-stage relaxation using a subpath-based representation of microtransit operations in a load-expanded network. We develop a double-decomposition algorithm combining Benders decomposition and subpath-based column generation. We prove that the algorithm maintains a valid optimality gap and converges to an optimal solution in a finite number of iterations. Results obtained with real-world data from Manhattan show that the methodology scales to large and otherwise-intractable instances, with up to 10-100 candidate lines and hundreds of stops. Comparisons with transit and ride-sharing suggest that microtransit can provide win-win outcomes toward efficient mobility (high demand coverage, low costs, high level of service), equitable mobility (broad geographic reach) and sustainable mobility (limited environmental footprint). We provide an open-source implementation to enable replication.
Experiments/Statistics/Inference
Chao Qin, Stanford University
Adaptive Experimentation: Bringing Real-World Challenges to Multi-Armed Bandits
Classical randomized controlled trials (RCTs) have long been the gold standard for causal inference and data-driven decision making across many fields. In recent years, adaptive experimentation has emerged as a promising alternative, offering the potential for greater efficiency and cost-effectiveness compared with conventional RCTs. Despite extensive study in communities such as multi-armed bandits (MABs), reinforcement learning, Bayesian optimization, and active learning, real-world deployment of adaptive experimentation remains challenging due to practical constraints. This talk highlights recent advances in adaptive experimentation and takes a step toward bridging the gap between theory and practice by incorporating real-world considerations into the MAB framework. In particular, it focuses on challenges such as tradeoffs between competing objectives and soft versus hard experimentation deadlines. Time permitting, the talk will also touch on nonstationary environments, delayed feedback (e.g., batch settings), and exploration tasks that extend beyond identifying a single treatment arm.
Sadegh Shirani, Stanford University
Causal message-passing: An Evolution-based Approach to Experimentation Under Unknown Interference
Randomized experiments are a cornerstone of data-driven decision making. Yet, their validity can be compromised by network interference, when a unit’s treatment affects the outcomes of others. We argue that identifying population-level causal effects does not require recovering the full network structure; instead, it suffices to characterize how network interactions contribute to the evolution of aggregate outcomes. Building on this principle, we introduce the causal message-passing framework that investigates how outcomes evolve as the treatment propagates through the network, thereby extracting valuable information to compensate for missing network knowledge. We formalize this through state evolution equations that describe the dynamics of outcome distributions and establish a functional parallelism across treatment scenarios: while observed and counterfactual outcomes are different, both evolve under the same mathematical rules. This result yields a systematic method for counterfactual estimation and validation with consistency theoretical guarantees. To evaluate our framework, we introduce an interference gym, a suite of semisynthetic experiments spanning networks of AI agents, opinion formation in communities, and ride-sharing systems. Extensive evaluations demonstrate the robustness of our method across diverse interference structures. Finally, we delineate the theoretical limits of this approach, showing that strong temporal trends or unstable endogenous interference may undermine identification, particularly when interference patterns shift after treatment density crosses a critical threshold.
Mengqi Lou, Georgia Tech
Computationally efficient reductions between some statistical models
Can a sample from one parametric statistical model (the source) be transformed into a sample from a different model (the target) without knowing the parameters of either model? This question has been posed in various forms since the 1950s, leading to the development of a beautiful asymptotic theory concerning the equivalence of experiments in the latter half of the 20th century. Motivated by problems related to information-computation gaps and differentially private data analysis, we explore a similar, non-asymptotic question in high-dimensional contexts with algorithmic considerations.
We demonstrate how a single observation from certain source models with continuous-valued sample and parameter spaces can be approximately transformed into a single observation from a broad class of target models using a computationally efficient algorithm. I will present several such reductions and discuss their applications to fundamental statistical-computational tradeoffs in high-dimensional settings. Additionally, I will discuss a potential application in transforming one differentially private mechanism into another.
This is joint work with Guy Bresler and Ashwin Pananjady.
Other Cool Stuff
Roshni Sahoo, Stanford University
What would it cost to end extreme poverty?
We study poverty minimization via direct transfers, framing this as a statistical learning problem that retains the information constraints faced by real-world programs. Using nationally representative household consumption surveys from 18 countries that together account for 38% of the world’s poor, we estimate that reducing the poverty rate to 1% (from a baseline of 31% at the time of last survey) costs $120B nominal per year. This is 4.6 times the corresponding reduction in the aggregate poverty gap, but only 42% of the cost of universal basic income. Extrapolated out of sample, the results correspond to a cost of (approximately) ending extreme poverty of roughly 0.32% of global GDP.
Emily Zhang, MIT
Heterogeneous Treatment Effects in Panel Data: Insights into the Healthy Incentives Program
We study how adding new vendors to the Massachusetts (MA) Healthy Incentives Program (HIP), a food subsidy program, affects program utilization. This is an instance of a core problem in causal inference: estimating heterogeneous treatment effects (HTE) using panel data. Existing methods either do not utilize the underlying structure in the panel data or do not accommodate the general vendor addition patterns in the HIP data. We propose the Panel Clustering Estimator (PaCE), a novel method that first partitions observations into disjoint clusters with similar treatment effects using a regression tree, and then leverages the low-rank structure of the panel data to estimate the average treatment effect (ATE) for each cluster. Our theoretical results establish the convergence of the resulting estimates to the true treatment effects. Computational experiments with semi-synthetic data show that PaCE achieves superior accuracy for ATE and HTE estimation compared to existing approaches. This performance was achieved using a regression tree with no more than 40 leaves, making PaCE both more accurate and interpretable. Applying PaCE to HIP data, we discern the heterogeneous impacts of vendor additions on program utilization across different regions of MA. These findings provide valuable insights for future budget planning and for identifying which MA ZIP codes to target with vendor additions.
Andrei Graur, Stanford University
Recent advances in submodular function minimization via first-order methods
Submodular function minimization (SFM) is a foundational combinatorial optimization problem, having been studied for over 7 decades. It has been a popular tool in the fields of theoretical computer science, machine learning and operations research. Given a submodular function $f:2^V \to \mathbb{Z}$ accessed through an evaluation oracle, the goal of the SFM problem is to minimize $f$ with as few evaluation queries and additional operations as possible.
In this talk, I will present recent advances, due to works of my collaborators and I, in several versions of the SFM problem. First, I will cover new results and techniques for decomposable SFM, a well-studied variant where the objective is a finite sum of submodular functions. I will then briefly offer a survey of new results on sparse SFM, a variant formulated in one of my works, and parallel SFM. To obtain our improvements, we applied first-order methods for each of these results, along with combinatorial insights that enable their efficient implementations.
This talk is based on three papers, “Sparse submodular function minimization,” “Parallel submodular function minimization,” and “Variance reduction for faster decomposable submodular function minimization.” The first is joint work with Deeparnab Chakrabarty, Haotian Jiang, and Aaron Sidford, while the other two are joint work with Haotian Jiang and Aaron Sidford.
Continuous Optimization
Junhui Zhang, MIT
Multi-Timescale Gradient Sliding for Distributed Optimization
We propose two first-order methods for convex, non-smooth, distributed optimization problems, hereafter called Multi-Timescale Gradient Sliding (MT-GS) and its accelerated variant (AMT-GS). Our MT-GS and AMT-GS can take advantage of similarities between (local) objectives to reduce the communication rounds, are flexible so that different subsets (of agents) can communicate at different, user-picked rates, and are fully deterministic. These three desirable features are achieved through a block-decomposable primal-dual formulation, and a multi-timescale variant of the sliding method introduced in Lan et al. (2020), Lan (2016), where different dual blocks are updated at potentially different rates.
Tianyu Wang, Columbia University
Optimizer’s Information Criterion: A Diagnostics Framework for Data-Driven Optimization
Different data-driven optimization methods – varying in model classes and algorithmic design – can outperform one another across different data instances. This variability underscores a fundamental yet challenging question: how can we reliably evaluate and compare the out-of-sample objective performance of data-driven decisions when standard sample-based evaluations are often biased relative to the truth? We introduce the Optimizer’s Information Criterion (OIC), a general, decision-focused approach that extends Akaike’s Information Criterion from model adequacy to decision selection. OIC analytically approximates and removes the bias arising from the interaction between model fitting and downstream optimization, delivering reliable performance estimates without repeatedly re-solving optimization problems as in cross-validation and applying more broadly than existing bias-approximation methods. The framework covers a wide range of formulations including empirical and parametric models, their regularized counterparts, and furthermore contextual and risk-averse optimization formulations.
Ruichen Jiang, University of Texas at Austin
Accelerating Nonconvex Optimization via Online Learning
A fundamental problem in optimization is finding an $\epsilon$-first-order stationary point of a smooth nonconvex function using only gradient information. The best-known gradient complexity for this task, assuming both the gradient and Hessian of the objective function are Lipschitz continuous, is $O(\epsilon^{-1.75})$. In this talk, I will present a new method with a gradient complexity of $O(d^{0.25}\epsilon^{-1.625})$, where $d$ is the problem dimension, yielding an improvement when $d = O(\epsilon^{-0.5})$.
Our key idea is to design a quasi-Newton method that operates through two levels of online learning. Specifically, we first reformulate the task of finding a stationary point for a nonconvex problem as minimizing the regret in an online convex optimization problem, where the loss is determined by the gradient of the objective function. Then, we introduce a novel optimistic quasi-Newton method to solve this online learning problem, with the Hessian approximation update itself framed as an online learning problem in the space of matrices. Beyond improving the known complexity bounds, our result provides the first guarantee that quasi-Newton methods can outperform gradient descent-type methods in nonconvex optimization.
AI Agents
Yoav Kolumbus, Cornell University
Markets with Learning Agents
We analyze the performance of heterogeneous learning agents in asset markets with stochastic payoffs. Our main focus is on comparing Bayesian learners and no-regret learners who compete in markets and identifying the conditions under which each approach is more effective. Surprisingly, we find that low regret is not sufficient for survival: an agent can have regret as low as $O(\log T)$ but still vanish when competing against a Bayesian with a finite prior and any positive prior probability on the correct model. On the other hand, we show that Bayesian learning is fragile, while no-regret learning requires less knowledge of the environment and is therefore more robust. Motivated by the strengths and weaknesses of both approaches, we propose a balanced strategy for utilizing Bayesian updates that improves robustness and adaptability to distribution shifts, providing a step toward a best-of-both-worlds learning approach. The method is general, efficient, and easy to implement. In this work, we formally establish the relationship between the notions of survival and market dominance studied in economics and the framework of regret minimization, thus bridging these theories. More broadly, our work contributes to understanding the dynamics of heterogeneous learning agents, their impact on markets, and the tradeoffs that arise for algorithm selection by market participants.
Kunhe Yang, University of California at Berkeley
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
After pre-training, large language models are aligned with human preferences based on pairwise comparisons. State-of-the-art alignment methods (such as PPO-based RLHF and DPO) are built on the assumption of aligning with a single preference model, despite being deployed in settings where users have diverse preferences. As a result, it is not even clear that these alignment methods produce models that satisfy users on average — a minimal requirement for pluralistic alignment. Drawing on social choice theory and modeling users’ comparisons through individual Bradley-Terry (BT) models, we introduce an alignment method’s distortion: the worst-case ratio between the optimal achievable average utility, and the average utility of the learned policy.
The notion of distortion helps draw sharp distinctions between alignment methods: Nash Learning from Human Feedback achieves the minimax optimal distortion of $(1/2+o(1))\beta$ (for the BT temperature $\beta$), robustly across utility distributions, distributions of comparison pairs, and permissible KL divergences from the reference policy. RLHF and DPO, by contrast, suffer $\ge (1-o(1))\beta$ distortion already without a KL constraint, and $e^{\Omega(\beta)}$ or even unbounded distortion in the full setting, depending on how comparison pairs are sampled.
Natalie Collina, University of Pennsylvania
Emergent Alignment via Competition
Aligning AI systems with human values remains a fundamental challenge, but does our inability to create perfectly aligned models preclude obtaining the benefits of alignment? We study a strategic setting where a human user interacts with multiple differently misaligned AI agents, none of which are individually well-aligned. Our key insight is that when the users utility lies approximately within the convex hull of the agents utilities, a condition that becomes easier to satisfy as model diversity increases, strategic competition can yield outcomes comparable to interacting with a perfectly aligned model. We model this as a multi-leader Stackelberg game, extending Bayesian persuasion to multi-round conversations between differently informed parties, and prove three results: (1) when perfect alignment would allow the user to learn her Bayes-optimal action, she can also do so in all equilibria under the convex hull condition (2) under weaker assumptions requiring only approximate utility learning, a non-strategic user employing quantal response achieves near-optimal utility in all equilibria and (3) when the user selects the best single AI after an evaluation period, equilibrium guarantees remain near-optimal without further distributional assumptions. We complement the theory with two sets of experiments.