- CSequential Decision-Making|High Dimensions|Personalization|Optimization Under Uncertainty1
Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems, with M. Uehara, A. Sekhari, J. D. Lee, and W. Sun.
arXiv June 2022.
- Abstract: We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new Partially Observable Bilinear Actor-Critic framework, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs and observable POMDPs with latent low-rank transition. Under this framework, we propose an actor-critic style algorithm that is capable of performing agnostic policy learning. Given a policy class that consists of memory based policies (that look at a fixed-length window of recent observations), and a value function class that consists of functions taking both memory and future observations as inputs, our algorithm learns to compete against the best memory-based policy in the given policy class. For certain examples such as undercomplete observable tabular POMDPs, observable LQGs and observable POMDPs with latent low-rank transition, by implicitly leveraging their special properties, our algorithm is even capable of competing against the globally optimal policy without paying an exponential dependence on the horizon in its sample complexity.
- CSequential Decision-Making|High Dimensions|Personalization|Optimization Under Uncertainty1
Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings, with M. Uehara, A. Sekhari, J. D. Lee, and W. Sun.
arXiv June 2022.
- Abstract: We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert space embeddings of POMDP where the feature of latent states and the feature of observations admit a conditional Hilbert space embedding of the observation emission process, and the latent state transition is deterministic. Under the function approximation setup where the optimal latent state-action Q-function is linear in the state feature, and the optimal Q-function has a gap in actions, we provide a computationally and statistically efficient algorithm for finding the exact optimal policy. We show our algorithm's computational and statistical complexities scale polynomially with respect to the horizon and the intrinsic dimension of the feature on the observation space. Furthermore, we show both the deterministic latent transitions and gap assumptions are necessary to avoid statistical complexity exponential in horizon or dimension. Since our guarantee does not have an explicit dependence on the size of the state and observation spaces, our algorithm provably scales to large-scale POMDPs.
- CCausal Inference|Sequential Decision-Making|High Dimensions|Personalization0
Learning Bellman Complete Representations for Offline Policy Evaluation, with J. Chang, K. Wang, and W. Sun.
- CCausal Inference|High Dimensions|Personalization1
Robust and Agnostic Learning of Conditional Distributional Treatment Effects, with M. Oprescu.
arXiv May 2022.
- Abstract: The conditional average treatment effect (CATE) is the best point prediction of individual causal effects given individual baseline covariates and can help personalize treatments. However, as CATE only reflects the (conditional) average, it can wash out potential risks and tail events, which are crucially relevant to treatment choice. In aggregate analyses, this is usually addressed by measuring distributional treatment effect (DTE), such as differences in quantiles or tail expectations between treatment groups. Hypothetically, one can similarly fit covariate-conditional quantile regressions in each treatment group and take their difference, but this would not be robust to misspecification or provide agnostic best-in-class predictions. We provide a new robust and model-agnostic methodology for learning the conditional DTE (CDTE) for a wide class of problems that includes conditional quantile treatment effects, conditional super-quantile treatment effects, and conditional treatment effects on coherent risk measures given by f-divergences. Our method is based on constructing a special pseudo-outcome and regressing it on baseline covariates using any given regression learner. Our method is model-agnostic in the sense that it can provide the best projection of CDTE onto the regression model class. Our method is robust in the sense that even if we learn these nuisances nonparametrically at very slow rates, we can still learn CDTEs at rates that depend on the class complexity and even conduct inferences on linear projections of CDTEs. We investigate the performance of our proposal in simulation studies, and we demonstrate its use in a case study of 401(k) eligibility effects on wealth.
- CCausal Inference|High Dimensions|Fairness|Personalization1
What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment.
arXiv May 2022.
- Abstract: The fundamental problem of causal inference -- that we never observe counterfactuals -- prevents us from identifying how many might be negatively affected by a proposed intervention. If, in an A/B test, half of users click (or buy, or watch, or renew, etc.), whether exposed to the standard experience A or a new one B, hypothetically it could be because the change affects no one, because the change positively affects half the user population to go from no-click to click while negatively affecting the other half, or something in between. While unknowable, this impact is clearly of material importance to the decision to implement a change or not, whether due to fairness, long-term, systemic, or operational considerations. We therefore derive the tightest-possible (i.e., sharp) bounds on the fraction negatively affected (and other related estimands) given data with only factual observations, whether experimental or observational. Naturally, the more we can stratify individuals by observable covariates, the tighter the sharp bounds. Since these bounds involve unknown functions that must be learned from data, we develop a robust inference algorithm that is efficient almost regardless of how and how fast these functions are learned, remains consistent when some are mislearned, and still gives valid conservative bounds when most are mislearned. Our methodology altogether therefore strongly supports credible conclusions: it avoids spuriously point-identifying this unknowable impact, focusing on the best bounds instead, and it permits exceedingly robust inference on these. We demonstrate our method in simulation studies and in a case study of career counseling for the unemployed.
- JSequential Decision-Making|Causal Inference|High Dimensions1
Long-term Causal Inference Under Persistent Confounding via Data Combination, with G. Imbens, X. Mao, and Y. Wang.
arXiv February 2022.
- Abstract: We study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Since the long-term outcome is observed only after a long delay, it is not measured in the experimental data, but only recorded in the observational data. However, both types of data include observations of some short-term outcomes. In this paper, we uniquely tackle the challenge of persistent unmeasured confounders, i.e., some unmeasured confounders that can simultaneously affect the treatment, short-term outcomes and the long-term outcome, noting that they invalidate identification strategies in previous literature. To address this challenge, we exploit the sequential structure of multiple short-term outcomes, and develop three novel identification strategies for the average long-term treatment effect. We further propose three corresponding estimators and prove their asymptotic consistency and asymptotic normality. We finally apply our methods to estimate the effect of a job training program on long-term employment using semi-synthetic data. We numerically show that our proposals outperform existing methods that fail to handle persistent confounders.
- C|JCausal Inference|High Dimensions|Fairness|Personalization0
Treatment Effect Risk: Bounds and Inference.
Proceedings of the 5th ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2022 (Extended abstract).
arXiv January 2022.
GitHub.
- Abstract: Since the average treatment effect (ATE) measures the change in social welfare, even if positive, there is a risk of negative effect on, say, some 10% of the population. Assessing such risk is difficult, however, because any one individual treatment effect (ITE) is never observed so the 10% worst-affected cannot be identified, while distributional treatment effects only compare the first deciles within each treatment group, which does not correspond to any 10%-subpopulation. In this paper we consider how to nonetheless assess this important risk measure, formalized as the conditional value at risk (CVaR) of the ITE distribution. We leverage the availability of pre-treatment covariates and characterize the tightest-possible upper and lower bounds on ITE-CVaR given by the covariate-conditional average treatment effect (CATE) function. Some bounds can also be interpreted as summarizing a complex CATE function into a single metric and are of interest independently of being a bound. We then proceed to study how to estimate these bounds efficiently from data and construct confidence intervals. This is challenging even in randomized experiments as it requires understanding the distribution of the unknown CATE function, which can be very complex if we use rich covariates so as to best control for heterogeneity. We develop a debiasing method that overcomes this and prove it enjoys favorable statistical properties even when CATE and other nuisances are estimated by black-box machine learning or even inconsistently. Studying a hypothetical change to French job-search counseling services, our bounds and inference demonstrate a small social benefit entails a negative impact on a substantial subpopulation.
- JCausal Inference|High Dimensions|Optimization Under Uncertainty1
Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding, with J. Dorn and K. Guo.
arXiv December 2021.
GitHub.
- Abstract: We study the problem of constructing bounds on the average treatment effect in the presence of unobserved confounding under the marginal sensitivity model of Tan (2006). Combining an existing characterization involving adversarial propensity scores with a new distributionally robust characterization of the problem, we propose novel estimators of these bounds that we call "doubly-valid/doubly-sharp" (DVDS) estimators. Double sharpness corresponds to the fact that DVDS estimators consistently estimate the tightest possible (i.e., sharp) bounds implied by the sensitivity model even when one of two nuisance parameters is misspecified and achieve semiparametric efficiency when all nuisance parameters are suitably consistent. Double validity is an entirely new property for partial identification: DVDS estimators still provide valid, though not sharp, bounds even when most nuisance parameters are misspecified. In fact, even in cases when DVDS point estimates fail to be asymptotically normal, standard Wald confidence intervals may remain valid. In the case of binary outcomes, the DVDS estimators are particularly convenient and possesses a closed-form expression in terms of the outcome regression and propensity score. We demonstrate the DVDS estimators in a simulation study as well as a case study of right heart catheterization.
- JSequential Decision-Making|Causal Inference|High Dimensions1
Controlling for Unmeasured Confounding in Panel Data Using Minimal Bridge Functions: From Two-Way Fixed Effects to Factor Models, with G. Imbens and X. Mao.
arXiv August 2021.
- Abstract: We develop a new approach for identifying and estimating average causal effects in panel data under a linear factor model with unmeasured confounders. Compared to other methods tackling factor models such as synthetic controls and matrix completion, our method does not require the number of time periods to grow infinitely. Instead, we draw inspiration from the two-way fixed effect model as a special case of the linear factor model, where a simple difference-in-differences transformation identifies the effect. We show that analogous, albeit more complex, transformations exist in the more general linear factor model, providing a new means to identify the effect in that model. In fact many such transformations exist, called bridge functions, all identifying the same causal effect estimand. This poses a unique challenge for estimation and inference, which we solve by targeting the minimal bridge function using a regularized estimation approach. We prove that our resulting average causal effect estimator is root-N consistent and asymptotically normal, and we provide asymptotically valid confidence intervals. Finally, we provide extensions for the case of a linear factor model with time-varying unmeasured confounders.
- CFairness|Personalization0
Estimating Structural Disparities for Face Models, with S. Ardeshir and C. Segalin.
- CCausal Inference|High Dimensions|Personalization|Optimization Under Uncertainty0
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning, with X. Mao, K. Wang, and Z. Zhou.
- C|JCausal Inference|Fairness1
An Empirical Evaluation of the Impact of New York's Bail Reform on Crime Using Synthetic Controls, with T. Bergin, A. Koo, S. Koppel, R. Peterson, R. Ropac, and A. Zhou.
arXiv November 2021.
- Abstract: We conduct an empirical evaluation of the impact of New York's bail reform on crime. New York State's Bail Elimination Act went into effect on January 1, 2020, eliminating money bail and pretrial detention for nearly all misdemeanor and nonviolent felony defendants. Our analysis of effects on aggregate crime rates after the reform informs the understanding of bail reform and general deterrence. We conduct a synthetic control analysis for a comparative case study of impact of bail reform. We focus on synthetic control analysis of post-intervention changes in crime for assault, theft, burglary, robbery, and drug crimes, constructing a dataset from publicly reported crime data of 27 large municipalities. Our findings, including placebo checks and other robustness checks, show that for assault, theft, and drug crimes, there is no significant impact of bail reform on crime; for burglary and robbery, we similarly have null findings but the synthetic control is also more variable so these are deemed less conclusive.
- CSequential Decision-Making|Personalization|Causal Inference0
Post-Contextual-Bandit Inference, with A. Bibaut, A. Chambaz, M. Dimakopoulou, and M. van der Laan.
- J|CSequential Decision-Making|Personalization|Causal Inference|Optimization Under Uncertainty0
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning, with A. Bibaut, A. Chambaz, M. Dimakopoulou, and M. van der Laan.
- JSequential Decision-Making|Causal Inference|Personalization|Optimization Under Uncertainty1
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes, with A. Bennett.
arXiv October 2021.
GitHub.
- Abstract: In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors, inducing confounding and biasing estimates derived under the assumption of a perfect Markov decision process (MDP) model. Here we tackle this by considering off-policy evaluation in a partially observed MDP (POMDP). Specifically, we consider estimating the value of a given target policy in a POMDP given trajectories with only partial state observations generated by a different and unknown policy that may depend on the unobserved state. We tackle two questions: what conditions allow us to identify the target policy value from the observed data and, given identification, how to best estimate it. To answer these, we extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible by the existence of so-called bridge functions. We then show how to construct semiparametrically efficient estimators in these settings. We term the resulting framework proximal reinforcement learning (PRL). We demonstrate the benefits of PRL in an extensive simulation study.
- JCausal Inference|High Dimensions1
Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach, with X. Mao and M. Uehara.
arXiv March 2021.
- Abstract: We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available. Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions. In this paper, we tackle the primary challenge to causal inference using negative controls: the identification and estimation of these bridge functions. Previous work has relied on uniqueness and completeness assumptions on these functions that may be implausible in practice and also focused on their parametric estimation. Instead, we provide a new identification strategy that avoids both uniqueness and completeness. And, we provide a new estimators for these functions based on minimax learning formulations. These estimators accommodate general function classes such as reproducing Hilbert spaces and neural networks. We study finite-sample convergence results both for estimating bridge function themselves and for the final estimation of the causal parameter. We do this under a variety of combinations of assumptions that include realizability and closedness conditions on the hypothesis and critic classes employed in the minimax estimator. Depending on how much we are willing to assume, we obtain different convergence rates. In some cases, we show the estimate for the causal parameter may converge even when our bridge function estimators do not converge to any valid bridge function. And, in other cases, we show we can obtain semiparametric efficiency.
- JPersonalization|Optimization Under Uncertainty0
Fast Rates for Contextual Linear Optimization, with Y. Hu and X. Mao.
- JPersonalization|Optimization Under Uncertainty0
Stochastic Optimization Forests, with X. Mao.
- CSequential Decision-Making|Personalization|Causal Inference|Optimization Under Uncertainty0
Stateful Offline Contextual Policy Evaluation and Learning, with A. Zhou.
- CPersonalization|Causal Inference0
Control Variates for Slate Off-Policy Evaluation, with F. Amat Gil, A. Chandrashekar, and N. Vlassis.
- J|CSequential Decision-Making|Personalization|Causal Inference|Optimization Under Uncertainty0
Fast Rates for the Regret of Offline Reinforcement Learning, with Y. Hu and M. Uehara.
- JPersonalization|Causal Inference0
The Effect of Patient Age on Discharge Destination and Complications After Lumbar Spinal Fusion, with B. Pennicooke, M. Santacatterina, J. Lee, and E. Elowitz.
- J|CSequential Decision-Making|Personalization|Causal Inference|Optimization Under Uncertainty1
Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency, with M. Imaizumi, N. Jiang, W. Sun, M. Uehara, and T. Xie.
arXiv February 2021.
- Abstract: We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and q-functions when these are estimated using recent minimax methods. Under various combinations of realizability and completeness assumptions, we show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions, characterized by the critical inequality (Bartlett et al., 2005). Based on this result, we analyze convergence rates for OPE. In particular, we introduce novel alternative completeness conditions under which OPE is feasible and we present the first finite-sample result with first-order efficiency in non-tabular environments, i.e., having the minimal coefficient in the leading term.
- JCausal Inference0
On the Optimality of Randomization in Experimental Design: How to Randomize for Minimax Variance and Design-Based Inference.
- JPersonalization|Causal Inference0
Rejoinder: New Objectives for Policy Learning.
- CFairness|Personalization0
Fairness, Welfare, and Equity in Personalized Pricing, with A. Zhou.
- JCausal Inference|High Dimensions1
The Variational Method of Moments, with A. Bennett.
- JSequential Decision-Making|Personalization|Causal Inference|Optimization Under Uncertainty1
DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret, with Y. Hu.
Major revision in JASA.
arXiv May 2020.
GitHub.
- Abstract: Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage treatment plans that adapt treatment decisions both to an individual's initial features and to intermediate outcomes and features at each subsequent stage, which are affected by decisions in prior stages. Examples include personalized first- and second-line treatments of chronic conditions like diabetes, cancer, and depression, which adapt to patient response to first-line treatment, disease progression, and individual characteristics. While existing literature mostly focuses on estimating the optimal DTR from offline data such as from sequentially randomized trials, we study the problem of developing the optimal DTR in an online manner, where the interaction with each individual affect both our cumulative reward and our data collection for future learning. We term this the DTR bandit problem. We propose a novel algorithm that, by carefully balancing exploration and exploitation, is guaranteed to achieve rate-optimal regret when the transition and reward models are linear. We demonstrate our algorithm and its benefits both in synthetic experiments and in a case study of adaptive treatment of major depressive disorder using real-world data.
- CSequential Decision-Making|Personalization|Causal Inference|High Dimensions0
Optimal Off-Policy Evaluation from Multiple Logging Policies, with Y. Saito and M. Uehara.
- CSequential Decision-Making|Personalization|Causal Inference|High Dimensions0
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies, with M. Uehara.
- JSequential Decision-Making|Personalization|Causal Inference|High Dimensions1
Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning, with M. Uehara.
arXiv June 2020.
GitHub.
- Abstract: We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the behavior policy. This is a departure from the literature on off-policy evaluation where most work consider the evaluation of explicitly specified policies. Crucially, offline reinforcement learning with natural stochastic policies can help alleviate issues of weak overlap, lead to policies that build upon current practice, and improve policies' implementability in practice. Compared with the classic case of a pre-specified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown. In this paper we derive the efficiency bounds of two major types of natural stochastic policies: tilting policies and modified treatment policies. We then propose efficient nonparametric estimators that attain the efficiency bounds under very lax conditions. These also enjoy a (partial) double robustness property.
- CSequential Decision-Making|Personalization|Causal Inference0
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders, with A. Bennett, L. Li, and A. Mousavi.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
arXiv July 2020.
Oral at AISTATS (3%)
- Abstract: Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings where experimentation is limited, such as education and healthcare. But, in these very same settings, observed actions are often confounded by unobserved variables making OPE even more difficult. We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders, where states and actions can act as proxies for the unobserved confounders. We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data. Our method involves two stages. In the first, we show how to use proxies to estimate stationary distribution ratios, extending recent work on breaking the curse of horizon to the confounded setting. In the second, we show optimal balancing can be combined with such learned ratios to obtain policy value while avoiding direct modeling of reward functions. We establish theoretical guarantees of consistency, and benchmark our method empirically.
- JCausal Inference|High Dimensions1
On the Role of Surrogates in the Efficient Estimation of Treatment Effects With Limited Outcome Data, with X. Mao.
arXiv March 2020.
- Abstract: We study the problem of estimating treatment effects when the outcome of primary interest (e.g., long-term health status) is only seldom observed but abundant surrogate observations (e.g., short-term health outcomes) are available. To investigate the role of surrogates in this setting, we derive the semiparametric efficiency lower bounds of average treatment effect (ATE) both with and without presence of surrogates, as well as several intermediary settings. These bounds characterize the best-possible precision of ATE estimation in each case, and their difference quantifies the efficiency gains from optimally leveraging the surrogates in terms of key problem characteristics when only limited outcome data are available. We show these results apply in two important regimes: when the number of surrogate observations is comparable to primary-outcome observations and when the former dominates the latter. Importantly, we take a missing-data approach that circumvents strong surrogate conditions which are commonly assumed in previous literature but almost always fail in practice. To show how to leverage the efficiency gains of surrogate observations, we propose ATE estimators and inferential methods based on flexible machine learning methods to estimate nuisance parameters that appear in the influence functions. We show our estimators enjoy efficiency and robustness guarantees under weak conditions.
- CSequential Decision-Making|Personalization|Causal Inference0
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning, with A. Zhou.
- CSequential Decision-Making|Personalization|Causal Inference|High Dimensions0
Statistically Efficient Off-Policy Policy Gradients, with M. Uehara.
- CPersonalization|Causal Inference|Optimization Under Uncertainty0
Efficient Policy Learning from Surrogate-Loss Classification Reductions, with A. Bennett.
- JCausal Inference|High Dimensions1
Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond, with X. Mao and M. Uehara.
arXiv December 2019.
GitHub.
- Abstract: We consider the efficient estimation of a low-dimensional parameter in the presence of very high-dimensional nuisances that may depend on the parameter of interest. An important example is the quantile treatment effect (QTE) in causal inference, where the efficient estimation equation involves as a nuisance the conditional cumulative distribution evaluated at the quantile to be estimated. Debiased machine learning (DML) is a data-splitting approach to address the need to estimate nuisances using flexible machine learning methods that may not satisfy strong metric entropy conditions, but applying it to problems with estimand-dependent nuisances would require estimating too many nuisances to be practical. For the QTE estimation, DML requires we learn the whole conditional cumulative distribution function, which may be challenging in practice and stands in contrast to only needing to estimate just two regression functions as in the efficient estimation of average treatment effects. Instead, we propose localized debiased machine learning (LDML), a new three-way data-splitting approach that avoids this burdensome step and needs only estimate the nuisances at a single initial bad guess for the parameters. In particular, under a Frechet-derivative orthogonality condition, we show the oracle estimation equation is asymptotically equivalent to one where the nuisance is evaluated at the true parameter value and we provide a strategy to target this alternative formulation. In the case of QTE estimation, this involves only learning two binary regression models, for which many standard, time-tested machine learning methods exist. We prove that under certain lax rate conditions, our estimator has the same favorable asymptotic behavior as the infeasible oracle estimator that solves the estimating equation with the true nuisance functions.
- J|CSequential Decision-Making|Personalization|Causal Inference|Optimization Under Uncertainty0
Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes, with Y. Hu and X. Mao.
Operations Research, 2021.
Proceedings of the 33rd Conference on Learning Theory (COLT), 2020 (Extended abstract).
arXiv September 2019.
Finalist, Applied Probability Society Best Paper Competition
- Abstract: We study a nonparametric contextual bandit problem where the expected reward functions belong to a Hölder class with smoothness parameter β. We show how this interpolates between two extremes that were previously studied in isolation: non-differentiable bandits (β≤1), where rate-optimal regret is achieved by running separate non-contextual bandits in different context regions, and parametric-response bandits (β=∞), where rate-optimal regret can be achieved with minimal or no exploration due to infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and non-differentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits.
- JSequential Decision-Making|Personalization|Causal Inference|High Dimensions0
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning, with M. Uehara.
- C|JSequential Decision-Making|Personalization|Causal Inference|High Dimensions0
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes, with M. Uehara.
Journal of Machine Learning Research (JMLR), 21(167):1--63, 2020.
Proceedings of the 37th International Conference on Machine Learning (ICML), 2020 (Preliminary version).
arXiv August 2019.
GitHub.
- Abstract: Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of q-functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness efficiently.
- J|CFairness|Causal Inference|Personalization0
Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination, with X. Mao and A. Zhou.
Management Science, 2020.
Proceedings of the 3rd ACM Conference on Fairness, Accountability, and Transparency (FAccT), 110, 2020 (Extended abstract).
arXiv June 2019.
GitHub. MS Blog.
- Abstract: The increasing impact of algorithmic decisions on people's lives compels us to scrutinize their fairness and, in particular, the disparate impacts that ostensibly-color-blind algorithms can have on different groups. Examples include credit decisioning, hiring, advertising, criminal justice, personalized medicine, and targeted policymaking, where in some cases legislative or regulatory frameworks for fairness exist and define specific protected classes. In this paper we study a fundamental challenge to assessing disparate impacts in practice: protected class membership is often not observed in the data. This is particularly a problem in lending and healthcare. We consider the use of an auxiliary dataset, such as the US census, that includes class labels but not decisions or outcomes. We show that a variety of common disparity measures are generally unidentifiable aside for some unrealistic cases, providing a new perspective on the documented biases of popular proxy-based methods. We provide exact characterizations of the sharpest-possible partial identification set of disparities either under no assumptions or when we incorporate mild smoothness constraints. We further provide optimization-based algorithms for computing and visualizing these sets, which enables reliable and robust assessments -- an important tool when disparity assessment can have far-reaching policy implications. We demonstrate this in two case studies with real data: mortgage lending and personalized medicine dosing.
- JOptimization Under Uncertainty|High Dimensions0
Data Pooling in Stochastic Optimization, with V. Gupta.
- JPersonalization|Causal Inference0
More Efficient Policy Learning via Optimal Retargeting.
Journal of the American Statistical Association (JASA), 2020, Special Issue on Precision Medicine and Individualized Policy Discovery.
arXiv June 2019.
GitHub.
Selected as JASA Discussion Paper
- Abstract: Policy learning can be used to extract individualized treatment regimes from observational data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a commonplace lack of overlap in the data for different actions, which can lead to unwieldy policy evaluation and poorly performing learned policies. We study a solution to this problem based on retargeting, that is, changing the population on which policies are optimized. We first argue that at the population level, retargeting may induce little to no bias. We then characterize the optimal reference policy centering and retargeting weights in both binary-action and multi-action settings. We do this in terms of the asymptotic efficient estimation variance of the new learning objective. We further consider bias regularization. Extensive empirical results in a simulation study and a case study of targeted job counseling demonstrate that retargeting is a fairly easy way to significantly improve any policy learning procedure.
- JPersonalization|Causal Inference0
Minimax-Optimal Policy Learning Under Unobserved Confounding, with A. Zhou.
Management Science, 2020.
arXiv May 2018.
GitHub.
Winner, INFORMS 2018 Data Mining Best Paper Award
2nd place, INFORMS 2018 Junior Faculty Interest Group (JFIG) Paper Competition
- Abstract: We study the problem of learning personalized decision policies from observational data while accounting for possible unobserved confounding. Previous approaches, which assume unconfoundedness, that is, that no unobserved confounders affect both the treatment assignment as well as outcome, can lead to policies that introduce harm rather than benefit when some unobserved confounding is present as is generally the case with observational data. Instead, because policy value and regret may not be point-identifiable, we study a method that minimizes the worst-case estimated regret of a candidate policy against a baseline policy over an uncertainty set for propensity weights that controls the extent of unobserved confounding. We prove generalization guarantees that ensure our policy is safe when applied in practice and in fact obtains the best possible uniform control on the range of all possible population regrets that agree with the possible extent of confounding. We develop efficient algorithmic solutions to compute this minimax-optimal policy. Finally, we assess and compare our methods on synthetic and semisynthetic data. In particular, we consider a case study on personalizing hormone replacement therapy based on observational data, in which we validate our results on a randomized experiment. We demonstrate that hidden confounding can hinder existing policy-learning approaches and lead to unwarranted harm although our robust approach guarantees safety and focuses on well-evidenced improvement, a necessity for making personalized treatment policies learned from observational data reliable in practice.
- JPersonalization|Causal Inference0
Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects, with F. Johansson, U. Shalit, and D. Sontag.
- CSequential Decision-Making|Personalization|Causal Inference0
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning, with M. Uehara.
- CFairness|Personalization0
The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC Metric, with A. Zhou.
- CCausal Inference0
Deep Generalized Method of Moments for Instrumental Variable Analysis, with A. Bennett and T. Schnabel.
- CFairness|Causal Inference|Personalization0
Assessing Disparate Impacts of Personalized Interventions: Identifiability and Bounds, with A. Zhou.
- CPersonalization|Causal Inference0
Policy Evaluation with Latent Confounders via Optimal Balance, with A. Bennett.
- JPersonalization|Causal Inference0
Comment: Entropy Learning for Dynamic Treatment Regimes.
- JCausal Inference0
Optimal Weighting for Estimating Generalized Average Treatment Effects, with M. Santacatterina.
- CCausal Inference0
DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training.
- CPersonalization|Causal Inference0
Classifying Treatment Responders Under Causal Effect Monotonicity.
- CPersonalization|Causal Inference0
Confounding-Robust Policy Improvement, with A. Zhou.
- CCausal Inference0
Removing Hidden Confounding by Experimental Grounding, with A. M. Puli and U. Shalit.
Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), 10911--10920, 2018.
arXiv October 2018.
GitHub.
Spotlight at NeurIPS (3.5%)
- Abstract: Observational data is being increasingly used as a means for making individual-level causal predictions and intervention recommendations. The foremost challenge of causal inference from observational data is hidden confounding, whose presence cannot be tested in data and can invalidate any causal conclusion. Experimental data does not stuffer from confounding but is usually limited in both scope and scale. We introduce a novel method of using limited experimental data to correct the hidden confounding in causal effect models trained on larger observational data, even if the observational data does not fully overlap with the experimental data. Our method makes strictly weaker assumptions than existing approaches, and we prove conditions under which our method yields a consistent estimator. We demonstrate our method's efficacy using real-world data from a large educational experiment.
- CPersonalization|Causal Inference0
Balanced Policy Evaluation and Learning.
- CHigh Dimensions|Causal Inference0
Causal Inference with Noisy and Missing Covariates via Matrix Factorization, with X. Mao and M. Udell.
- CFairness|Personalization0
Fairness under unawareness: assessing disparity when protected class is unobserved, with J. Chen, X. Mao, G. Svacha, and M. Udell.
Proceedings of the 2nd ACM Conference on Fairness, Accountability, and Transparency (FAccT), 339--348, 2019.
arXiv November 2018.
GitHub.
- Abstract: Assessing the fairness of a decision making system with respect to a protected class, such as gender or race, is complicated by not being able to observe class membership labels due to legal or operational reasons. Probabilistic models for predicting the protected class based on observed proxy variables, such as surname and geolocation for race, are sometimes used to impute these missing labels. Such methods are known to be used by government regulators and have been observed to exaggerate disparities. The reason why is unknown, as is whether overestimation is always the case. We decompose the bias of estimating outcome disparity via an existing threshold-based imputation method into multiple interpretable bias sources, which explains when over- or underestimation occurs. We also propose an alternative weighted estimator that uses soft classification rules rather than hard imputation, and show that its bias arises simply from the conditional covariance of the outcome with the true class membership. We illustrate our results with numerical simulations as well as an application to a dataset of mortgage applications, using geolocation as a proxy for race. We confirm that the bias of threshold-based imputation is generally upward; however, its magnitude varies strongly with the threshold chosen due to the complex interplay of multiple sources of bias uncovered by our theoretical analysis. Our new weighted estimator tends to have a negative bias that is much simpler to analyze and reason about.
- CPersonalization|Causal Inference0
Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding, with X. Mao and A. Zhou.
- CFairness|Causal Inference|Personalization0
Residual Unfairness in Fair Machine Learning from Prejudiced Data, with A. Zhou.
- CPersonalization|Causal Inference0
Policy Evaluation and Optimization with Continuous Treatments, with A. Zhou.
Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), 84:1243--1251, 2018.
arXiv February 2018.
GitHub.
Finalist, Best Paper of INFORMS 2017 Data Mining and Decision Analytics Workshop
- Abstract: We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments. Previous work for discrete treatment/action spaces focuses on inverse probability weighting (IPW) and doubly robust (DR) methods that use a rejection sampling approach for evaluation and the equivalent weighted classification problem for learning. In the continuous setting, this reduction fails as we would almost surely reject all observations. To tackle the case of continuous treatments, we extend the IPW and DR approaches to the continuous setting using a kernel function that leverages treatment proximity to attenuate discrete rejection. Our policy estimator is consistent and we characterize the optimal bandwidth. The resulting continuous policy optimizer (CPO) approach using our estimator achieves convergent regret and approaches the best-in-class policy for learnable policy classes. We demonstrate that the estimator performs well and, in particular, outperforms a discretization-based benchmark. We further study the performance of our policy optimizer in a case study on personalized dosing based on a dataset of Warfarin patients, their covariates, and final therapeutic doses. Our learned policy outperforms benchmarks and nears the oracle-best linear policy.
- CSequential Decision-Making|Personalization|Causal Inference0
Instrument-Armed Bandits.
- JCausal Inference0
More Robust Estimation of Sample Average Treatment Effects using Kernel Optimal Matching in an Observational Study of Spine Surgical Interventions, with B. Pennicooke and M. Santacatterina.
- JSequential Decision-Making|Causal Inference0
Optimal Balancing of Time-Dependent Confounders for Marginal Structural Models, with M. Santacatterina.
- CCausal Inference1
Learning Weighted Representations for Generalization Across Designs, with F. Johansson, U. Shalit, and D. Sontag.
arXiv February 2018.
- Abstract: Predictive models that generalize well under distributional shift are often desirable and sometimes crucial to building robust and reliable machine learning applications. We focus on distributional shift that arises in causal inference from observational data and in unsupervised domain adaptation. We pose both of these problems as prediction under a shift in design. Popular methods for overcoming distributional shift make unrealistic assumptions such as having a well-specified model or knowing the policy that gave rise to the observed data. Other methods are hindered by their need for a pre-specified metric for comparing observations, or by poor asymptotic properties. We devise a bound on the generalization error under design shift, incorporating both representation learning and sample re-weighting. Based on the bound, we propose an algorithmic framework that does not require any of the above assumptions and which is asymptotically consistent. We empirically study the new framework using two synthetic datasets, and demonstrate its effectiveness compared to previous methods.
- CPersonalization|Causal Inference0
Recursive Partitioning for Personalization using Observational Data.
Proceedings of the 34th International Conference on Machine Learning (ICML), 70:1789--1798, 2017.
arXiv August 2016.
Winner, Best Paper of INFORMS 2016 Data Mining and Decision Analytics Workshop
- Abstract: We study the problem of learning to choose from m discrete treatment options (e.g., news item or medical drug) the one with best causal effect for a particular instance (e.g., user or patient) where the training data consists of passive observations of covariates, treatment, and the outcome of the treatment. The standard approach to this problem is regress and compare: split the training data by treatment, fit a regression model in each split, and, for a new instance, predict all m outcomes and pick the best. By reformulating the problem as a single learning task rather than m separate ones, we propose a new approach based on recursively partitioning the data into regimes where different treatments are optimal. We extend this approach to an optimal partitioning approach that finds a globally optimal partition, achieving a compact, interpretable, and impactful personalization model. We develop new tools for validating and evaluating personalization models on observational data and use these to demonstrate the power of our novel approaches in a personalized medicine and a job training application.
- JCausal Inference0
Generalized Optimal Matching Methods for Causal Inference.
- JPersonalization|High Dimensions|Sequential Decision-Making0
Dynamic Assortment Personalization in High Dimensions, with M. Udell.
- JCausal Inference0
Optimal A Priori Balance in the Design of Controlled Experiments.
- JCausal Inference|Optimization Under Uncertainty1
The Power and Limits of Predictive Approaches to Observational-Data-Driven Optimization, with D. Bertsimas.
Major revision in IJOO.
arXiv May 2016.
- Abstract: While data-driven decision-making is transforming modern operations, most large-scale data is of an observational nature, such as transactional records. These data pose unique challenges in a variety of operational problems posed as stochastic optimization problems, including pricing and inventory management, where one must evaluate the effect of a decision, such as price or order quantity, on an uncertain cost/reward variable, such as demand, based on historical data where decision and outcome may be confounded. Often, the data lacks the features necessary to enable sound assessment of causal effects and/or the strong assumptions necessary may be dubious. Nonetheless, common practice is to assign a decision an objective value equal to the best prediction of cost/reward given the observation of the decision in the data. While in general settings this identification is spurious, for optimization purposes it is only the objective value of the final decision that matters, rather than the validity of any model used to arrive at it. In this paper, we formalize this statement in the case of observational-data-driven optimization and study both the power and limits of predictive approaches to observational-data-driven optimization with a particular focus on pricing. We provide rigorous bounds on optimality gaps of such approaches even when optimal decisions cannot be identified from data. To study potential limits of predictive approaches in real datasets, we develop a new hypothesis test for causal-effect objective optimality. Applying it to interest-rate-setting data, we empirically demonstrate that predictive approaches can be powerful in practice but with some critical limitations.
- JPersonalization|Optimization Under Uncertainty0
From Predictive to Prescriptive Analytics, with D. Bertsimas.
Management Science, 66(3):1025--1044, 2020.
arXiv February 2014.
Finalist, POMS Applied Research Challenge 2016
- Abstract: In this paper, we combine ideas from machine learning (ML) and operations research and management science (OR\/MS) in developing a framework, along with specific methods, for using data to prescribe decisions in OR\/MS problems. In a departure from other work on data-driven optimization and reflecting our practical experience with the data available in applications of OR\/MS, we consider data consisting, not only of observations of quantities with direct effect on costs\/revenues, such as demand or returns, but predominantly of observations of associated auxiliary quantities. The main problem of interest is a conditional stochastic optimization problem, given imperfect observations, where the joint probability distributions that specify the problem are unknown. We demonstrate that our proposed solution methods are generally applicable to a wide range of decision problems. We prove that they are computationally tractable and asymptotically optimal under mild conditions even when data is not independent and identically distributed (iid) and even for censored observations. As an analogue to the coefficient of determination R\², we develop a metric P termed the coefficient of prescriptiveness to measure the prescriptive content of data and the efficacy of a policy from an operations perspective. To demonstrate the power of our approach in a real-world setting we study an inventory management problem faced by the distribution arm of an international media conglomerate, which ships an average of 1 billion units per year. We leverage both internal data and public online data harvested from IMDb, Rotten Tomatoes, and Google to prescribe operational decisions that outperform baseline measures. Specifically, the data we collect, leveraged by our methods, accounts for an 88\% improvement as measured by our coefficient of prescriptiveness.
- JOptimization Under Uncertainty0
Robust Sample Average Approximation, with D. Bertsimas and V. Gupta.
Mathematical Programming, 171(1--2):217--282, 2018.
arXiv August 2014.
Winner, Best Student Paper Award, MIT Operations Research Center 2013
- Abstract: Sample average approximation (SAA) is a widely popular approach to data-driven decision-making under uncertainty. Under mild assumptions, SAA is both tractable and enjoys strong asymptotic performance guarantees. Similar guarantees, however, do not typically hold in finite samples. In this paper, we propose a modification of SAA, which we term Robust SAA, which retains SAA's tractability and asymptotic properties and, additionally, enjoys strong finite-sample performance guarantees. The key to our method is linking SAA, distributionally robust optimization, and hypothesis testing of goodness-of-fit. Beyond Robust SAA, this connection provides a unified perspective enabling us to characterize the finite sample and asymptotic guarantees of various other data-driven procedures that are based upon distributionally robust optimization. This analysis provides insight into the practical performance of these various methods in real applications. We present examples from inventory management and portfolio allocation, and demonstrate numerically that our approach outperforms other data-driven approaches in these applications.
- JOptimization Under Uncertainty0
Data-Driven Robust Optimization, with D. Bertsimas and V. Gupta.
Mathematical Programming, 167(2):235--292, 2018.
arXiv January 2014.
Finalist, INFORMS Nicholson Paper Competition 2013
- Abstract: The last decade has seen an explosion in the availability of data for operations research applications as part of the Big Data revolution. Motivated by this data rich paradigm, we propose a novel schema for utilizing data to design uncertainty sets for robust optimization using statistical hypothesis tests. The approach is flexible and widely applicable, and robust optimization problems built from our new sets are computationally tractable, both theoretically and practically. Furthermore, optimal solutions to these problems enjoy a strong, finite-sample probabilistic guarantee. We also propose concrete guidelines for practitioners and illustrate our approach with applications in portfolio management and queueing. Computational evidence confirms that our data-driven sets significantly outperform conventional robust optimization techniques whenever data is available.
- CCausal Inference0
A Framework for Optimal Matching for Causal Inference.
- JPersonalization|Causal Inference0
Personalized Diabetes Management Using Electronic Medical Records, with D. Bertsimas, A. Weinstein, and D. Zhuo.
- CPersonalization|High Dimensions0
Revealed Preference at Scale: Learning Personalized Preferences from Assortment Choice, with M. Udell.
- JPersonalization|Optimization Under Uncertainty0
Inventory Management in the Era of Big Data, with D. Bertsimas and A. Hussain.
- B0
On the Predictive Power of Web Intelligence and Social Media.
- JCausal Inference0
The Power of Optimization Over Randomization in Designing Experiments Involving Small Samples, with D. Bertsimas and M. Johnson.
Operations Research, 63(4):868--876, 2015.
Code. Editorial in World Bank's Development Impact.
- Abstract: Random assignment, typically seen as the standard in controlled trials, aims to make experimental groups statistically equivalent before treatment. However, with a small sample, which is a practical reality in many disciplines, randomized groups are often too dissimilar to be useful. We propose an approach based on discrete linear optimization to create groups whose discrepancy in their means and variances is several orders of magnitude smaller than with randomization. We provide theoretical and computational evidence that groups created by optimization have exponentially lower discrepancy than those created by randomization.
- C0
Predicting Crowd Behavior with Big Public Data.
Proceedings of the 23rd international conference on World Wide Web (WWW) companion, 23:625--630, 2014.
arXiv February 2014.
Slides. Media coverage.
Winner, INFORMS Social Media Analytics Best Paper Competition 2015
- Abstract: With public information becoming widely accessible and shared on today's web, greater insights are possible into crowd actions by citizens and non-state actors such as large protests and cyber activism. Turning public data into Big Data, company Recorded Future continually scans over 300,000 open content web sources in 7 languages from all over the world, ranging from mainstream news to government publications to blogs and social media. We study the predictive power of this massive public data in forecasting crowd actions such as large protests and cyber campaigns before they occur. Using natural language processing, event information is extracted from content such as type of event, what entities are involved and in what role, sentiment and tone, and the occurrence time range of the event discussed. The amount of information is staggering and trends can be seen clearly in sheer numbers. In the first half of this paper we show how we use this data to predict large protests in a selection of 19 countries and 37 cities in Asia, Africa, and Europe with high accuracy using standard learning machines. In the second half we delve into predicting the perpetrators and targets of political cyber attacks with a novel application of the na\ïve Bayes classifier to high-dimensional sequence mining in massive datasets.
- JOptimization Under Uncertainty0
Scheduling, Revenue Management, and Fairness in an Academic-Hospital Division: An Optimization Approach, with D. Bertsimas and R. Baum.