- CCausal Inference|Sequential Decision-Making0
Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes, with A. Bennett, M. Oprescu, W. Sun, and K. Wang.
- CCausal Inference|Optimization and Personalization0
Contextual Linear Optimization with Bandit Feedback, with Y. Hu, X. Mao, and Y. Wu.
- CCausal Inference0
Estimating Heterogeneous Treatment Effects by Combining Weak Instruments and Observational Data, with M. Oprescu.
- JSequential Decision-Making|Optimization and Personalization1
The Central Role of the Loss Function in Reinforcement Learning, with K. Wang and W. Sun.
- JCausal Inference|Sequential Decision-Making0
Demystifying Inference after Adaptive Experiments, with A. Bibaut.
- JCausal Inference|Sequential Decision-Making0
Long-term Causal Inference Under Persistent Confounding via Data Combination, with G. Imbens, X. Mao, and Y. Wang.
- JCausal Inference0
On the Role of Surrogates in the Efficient Estimation of Treatment Effects With Limited Outcome Data, with X. Mao.
- JCausal Inference|Optimization and Personalization0
Adjusting Regression Models for Conditional Uncertainty Calibration, with R. Gao, M. Yin, and J. McInerney.
- C|JCausal Inference1
Inference on Strongly Identified Functionals of Weakly Identified Functions, with A. Bennett, X. Mao, W. Newey, V. Syrgkanis, and M. Uehara.
Major revision in JRSS:B.
Proceedings of the 34th Conference on Learning Theory (COLT), 2023 (Extended abstract).
arXiv August 2022.
- Abstract: In a variety of applications, including nonparametric instrumental variable (NPIV) analysis, proximal causal inference under unmeasured confounding, and missing-not-at-random data with shadow variables, we are interested in inference on a continuous linear functional (e.g., average causal effects) of nuisance function (e.g., NPIV regression) defined by conditional moment restrictions. These nuisance functions are generally weakly identified, in that the conditional moment restrictions can be severely ill-posed as well as admit multiple solutions. This is sometimes resolved by imposing strong conditions that imply the function can be estimated at rates that make inference on the functional possible. In this paper, we study a novel condition for the functional to be strongly identified even when the nuisance function is not; that is, the functional is amenable to asymptotically-normal estimation at √n-rates. The condition implies the existence of debiasing nuisance functions, and we propose penalized minimax estimators for both the primary and debiasing nuisance functions. The proposed nuisance estimators can accommodate flexible function classes, and importantly they can converge to fixed limits determined by the penalization regardless of the identifiability of the nuisances. We use the penalized nuisance estimators to form a debiased estimator for the functional of interest and prove its asymptotic normality under generic high-level conditions, which provide for asymptotically valid confidence intervals. We also illustrate our method in a novel partially linear proximal causal inference problem and a partially linear instrumental variable regression problem.
- CSequential Decision-Making|Optimization and Personalization0
Switching the Loss Reduces the Cost in Offline Reinforcement Learning, with A. Ayoub, K. Wang, V. Liu, S. Robertson, J. McInerney, D. Liang, and C. Szepesvari.
- CSequential Decision-Making|Optimization and Personalization0
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning, with K. Wang, O. Oertell, A. Agarwal, and W. Sun.
- CCausal Inference|Sequential Decision-Making0
Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams, with B. Cho and K. Gan.
- CCausal Inference|Sequential Decision-Making0
Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments, with A. Tran and A. Bibaut.
- JCausal Inference|Optimization and Personalization0
Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding, with J. Dorn and K. Guo.
- CCausal Inference0
Learning the Covariance of Treatment Effects Across Many Weak Experiments, with A. Bibaut, W. Chou, and S. Ejdemyr.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2024.
arXiv February 2024.
GitHub. Netflix Technology Blog.
- Abstract: When primary objectives are insensitive or delayed, experimenters may instead focus on proxy metrics derived from secondary outcomes. For example, technology companies often infer long-term impacts of product interventions from their effects on weighted indices of short-term user engagement signals. We consider meta-analysis of many historical experiments to learn the covariance of treatment effects on different outcomes, which can support the construction of such proxies. Even when experiments are plentiful and large, if treatment effects are weak, the sample covariance of estimated treatment effects across experiments can be highly biased and remains inconsistent even as more experiments are considered. We overcome this by using techniques inspired by weak instrumental variable analysis, which we show can reliably estimate parameters of interest, even without a structural model. We show the Limited Information Maximum Likelihood (LIML) estimator learns a parameter that is equivalent to fitting total least squares to a transformation of the scatterplot of estimated treatment effects, and that Jackknife Instrumental Variables Estimation (JIVE) learns another parameter that can be computed from the average of Jackknifed covariance matrices across experiments. We also present a total-covariance-based estimator for the latter estimand under homoskedasticity, which we show is equivalent to a k-class estimator. We show how these parameters relate to causal quantities and can be used to construct unbiased proxy metrics under a structural model with both direct and indirect effects subject to the INstrument Strength Independent of Direct Effect (INSIDE) assumption of Mendelian randomization. Lastly, we discuss the application of our methods at Netflix.
- CApplied ML|Optimization and Personalization0
Neighborhood-Based Collaborative Filtering for Conversational Recommendation , with Z. Xie, J. Wu, H. Jeon, Z. He, H. Steck, R. Jha, D. Liang, and J. McAuley.
- CApplied ML|Optimization and Personalization0
Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation, with Z. He, Z. Xie, H. Steck, D. Liang, R. Jha, and J. McAuley.
- JCausal Inference1
Source Condition Double Robust Inference on Functionals of Inverse Problems, with A. Bennett, X. Mao, W. Newey, V. Syrgkanis, and M. Uehara.
arXiv July 2023.
- Abstract: We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems. Any such parameter admits a doubly robust representation that depends on the solution to a dual linear inverse problem, where the dual solution can be thought as a generalization of the inverse propensity function. We provide the first source condition double robust inference method that ensures asymptotic normality around the parameter of interest as long as either the primal or the dual inverse problem is sufficiently well-posed, without knowledge of which inverse problem is the more well-posed one. Our result is enabled by novel guarantees for iterated Tikhonov regularized adversarial estimators for linear inverse problems, over general hypothesis spaces, which are developments of independent interest.
- CCausal Inference|Sequential Decision-Making1
Anytime-Valid Continuous-Time Confidence Processes for Inhomogeneous Poisson Processes, with M. Lindon.
arXiv October 2024.
- Abstract: Motivated by monitoring the arrival of incoming adverse events such as customer support calls or crash reports from users exposed to an experimental product change, we consider sequential hypothesis testing of continuous-time inhomogeneous Poisson point processes. Specifically, we provide an interval-valued confidence process \(C^\alpha(t)\) over continuous time \(t\) for the cumulative arrival rate \(\Lambda(t) = \int_0^t \lambda(s) \mathrm{d}s\) with a continuous-time anytime-valid coverage guarantee \(\mathbb{P}[\Lambda(t) \in C^\alpha(t) \, \forall t >0] \geq 1-\alpha\). We extend our results to compare two independent arrival processes by constructing multivariate confidence processes and a closed-form \(e\)-process for testing the equality of rates with a time-uniform Type-I error guarantee at a nominal \(\alpha\). We characterize the asymptotic growth rate of the proposed \(e\)-process under the alternative and show that it has power 1 when the average rates of the two Poisson process differ in the limit. We also observe a complementary relationship between our multivariate confidence process and the universal inference \(e\)-process for testing composite null hypotheses.
- CCausal Inference1
Nonparametric Jackknife Instrumental Variable Estimation and Confounding Robust Surrogate Indices, with A. Bibaut and A. Lal.
arXiv June 2024.
- Abstract: Jackknife instrumental variable estimation (JIVE) is a classic method to leverage many weak instrumental variables (IVs) to estimate linear structural models, overcoming the bias of standard methods like two-stage least squares. In this paper, we extend the jackknife approach to nonparametric IV (NPIV) models with many weak IVs. Since NPIV characterizes the structural regression as having residuals projected onto the IV being zero, existing approaches minimize an estimate of the average squared projected residuals, but their estimates are biased under many weak IVs. We introduce an IV splitting device inspired by JIVE to remove this bias, and by carefully studying this split-IV empirical process we establish learning rates that depend on generic complexity measures of the nonparametric hypothesis class. We then turn to leveraging this for semiparametric inference on average treatment effects (ATEs) on unobserved long-term outcomes predicted from short-term surrogates, using historical experiments as IVs to learn this nonparametric predictive relationship even in the presence of confounding between short- and long-term observations. Using split-IV estimates of a debiasing nuisance, we develop asymptotically normal estimates for predicted ATEs, enabling inference.
- COptimization and Personalization0
Is Cosine-Similarity of Embeddings Really About Similarity?, with H. Steck and C. Ekanadham.
- CCausal Inference|Optimization and Personalization0
Off-Policy Evaluation for Large Action Spaces via Policy Convolution, with N. Sachdeva, L. Wang, D. Liang, and J. McAuley.
- CCausal Inference|Sequential Decision-Making1
Clustered Switchback Experiments: Near-Optimal Rates Under Spatiotemporal Interference, with S. Jia and C. Lee Yu.
arXiv December 2023.
- Abstract: We consider experimentation in the presence of non-stationarity, inter-unit (spatial) interference, and carry-over effects (temporal interference), where we wish to estimate the global average treatment effect (GATE), the difference between average outcomes having exposed all units at all times to treatment or to control. We suppose spatial interference is described by a graph, where a unit's outcome depends on its neighborhood's treatment assignments, and that temporal interference is described by a hidden Markov decision process, where the transition kernel under either treatment (action) satisfies a rapid mixing condition. We propose a clustered switchback design, where units are grouped into clusters and time steps are grouped into blocks and each whole cluster-block combination is assigned a single random treatment. Under this design, we show that for graphs that admit good clustering, a truncated exposure-mapping Horvitz-Thompson estimator achieves Õ(1/√N√T) mean-squared error (MSE), matching an Ω(1/√N√T) lower bound up to logarithmic terms. Our results simultaneously generalize the N=1 setting of Hu, Wager 2022 (and improves on the MSE bound shown therein for difference-in-means estimators) as well as the T=1 settings of Ugander et al 2013 and Leung 2022. Simulation studies validate the favorable performance of our approach.
- CSequential Decision-Making|Optimization and Personalization1
Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to Standard RL, with K. Wang, D. Liang, and W. Sun.
arXiv March 2024.
- Abstract: We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. Using an augmented Markov Decision Process (MDP), we propose two general meta-algorithms via reductions to standard RL: one based on optimistic algorithms and another based on policy optimization. Our optimistic meta-algorithm generalizes almost all prior RSRL theory with entropic risk or CVaR. Under discrete rewards, our optimistic theory also certifies the first RSRL regret bounds for MDPs with bounded coverability, e.g., exogenous block MDPs. Under discrete rewards, our policy optimization meta-algorithm enjoys both global convergence and local improvement guarantees in a novel metric that lower bounds the true OCE risk. Finally, we instantiate our framework with PPO, construct an MDP, and show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization1
Reward Maximization for Pure Exploration: Minimax Optimal Good Arm Identification for Nonparametric Multi-Armed Bandits, with B. Cho, D. Meier, and K. Gan.
arXiv October 2024.
- Abstract: In multi-armed bandits, the tasks of reward maximization and pure exploration are often at odds with each other. The former focuses on exploiting arms with the highest means, while the latter may require constant exploration across all arms. In this work, we focus on good arm identification (GAI), a practical bandit inference objective that aims to label arms with means above a threshold as quickly as possible. We show that GAI can be efficiently solved by combining a reward-maximizing sampling algorithm with a novel nonparametric anytime-valid sequential test for labeling arm means. We first establish that our sequential test maintains error control under highly nonparametric assumptions and asymptotically achieves the minimax optimal e-power, a notion of power for anytime-valid tests. Next, by pairing regret-minimizing sampling schemes with our sequential test, we provide an approach that achieves minimax optimal stopping times for labeling arms with means above a threshold, under an error probability constraint. Our empirical results validate our approach beyond the minimax setting, reducing the expected number of samples for all stopping times by at least 50\% across both synthetic and real-world settings.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization1
Multi-Armed Bandits with Interference, with S. Jia and P. Frazier.
arXiv February 2024.
- Abstract: Experimentation with interference poses a significant challenge in contemporary online platforms. Prior research on experimentation with interference has concentrated on the final output of a policy. The cumulative performance, while equally crucial, is less well understood. To address this gap, we introduce the problem of Multi-armed Bandits with Interference (MABI), where the learner assigns an arm to each of N experimental units over a time horizon of T rounds. The reward of each unit in each round depends on the treatments of all units, where the influence of a unit decays in the spatial distance between units. Furthermore, we employ a general setup wherein the reward functions are chosen by an adversary and may vary arbitrarily across rounds and units. We first show that switchback policies achieve an optimal expected regret Õ(√T) against the best fixed-arm policy. Nonetheless, the regret (as a random variable) for any switchback policy suffers a high variance, as it does not account for N . We propose a cluster randomization policy whose regret (i) is optimal in expectation and (ii) admits a high probability bound that vanishes in N.
- CCausal Inference|Applied ML|Sequential Decision-Making|Optimization and Personalization1
CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies, with B. Cho, A. Pop, K. Gan, S. Corbett-Davies, I. Nir, and A. Evnine.
arXiv August 2024.
- Abstract: When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on threshold policies, a ubiquitous class of policies with applications in economics, healthcare, and digital advertising. Existing methods rely on potentially underpowered safety checks and limit the opportunities for finding safe improvements, so too often they must revert to the baseline to maintain safety. We overcome these issues by leveraging the most powerful safety test in the asymptotic regime and allowing for multiple candidates to be tested for improvement over the baseline. We show that in adversarial settings, our approach controls the rate of adopting a policy worse than the baseline to the pre-specified error level, even in moderate sample sizes. We present CSPI and CSPI-MT, two novel heuristics for selecting cutoff(s) to maximize the policy improvement from baseline. We demonstrate through both synthetic and external datasets that our approaches improve both the detection rates of safe policies and the realized improvement, particularly under stringent safety requirements and low signal-to-noise conditions.
- CCausal Inference|Optimization and Personalization1
Hessian-Free Laplace in Bayesian Deep Learning, with J. McInerney.
arXiv March 2024.
- Abstract: The Laplace approximation (LA) of the Bayesian posterior is a Gaussian distribution centered at the maximum a posteriori estimate. Its appeal in Bayesian deep learning stems from the ability to quantify uncertainty post-hoc (i.e., after standard network parameter optimization), the ease of sampling from the approximate posterior, and the analytic form of model evidence. However, an important computational bottleneck of LA is the necessary step of calculating and inverting the Hessian matrix of the log posterior. The Hessian may be approximated in a variety of ways, with quality varying with a number of factors including the network, dataset, and inference task. In this paper, we propose an alternative framework that sidesteps Hessian calculation and inversion. The Hessian-free Laplace (HFL) approximation uses curvature of both the log posterior and network prediction to estimate its variance. Only two point estimates are needed: the standard maximum a posteriori parameter and the optimal parameter under a loss regularized by the network prediction. We show that, under standard assumptions of LA in Bayesian deep learning, HFL targets the same variance as LA, and can be efficiently amortized in a pre-trained network. Experiments demonstrate comparable performance to that of exact and approximate Hessians, with excellent coverage for in-between uncertainty.
- CSequential Decision-Making0
Low-Rank MDPs with Continuous Action Spaces, with A. Bennett and M. Oprescu.
- CSequential Decision-Making|Optimization and Personalization0
Provable Offline Reinforcement Learning with Human Feedback, with W. Zhan, M. Uehara, J. D. Lee, and W. Sun.
- JCausal Inference0
The Variational Method of Moments, with A. Bennett.
- CCausal Inference0
Minimax Instrumental Variable Regression and L2 Convergence Guarantees without Identification or Closedness, with A. Bennett, X. Mao, W. Newey, V. Syrgkanis, and M. Uehara.
- CApplied ML|Optimization and Personalization0
Large Language Models as Zero-Shot Conversational Recommenders, with Z. He, Z. Xie, R. Jha, H. Steck, D. Liang, Y. Feng, B. Majumder, and J. McAuley.
- CSequential Decision-Making|Optimization and Personalization0
The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning, with K. Wang, K. Zhou, R. Wu, and W. Sun.
- CSequential Decision-Making|Optimization and Personalization0
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs, with M. Uehara, H. Kiyohara, A. Bennett, V. Chernozhukov, N. Jiang, C. Shi, and W. Sun.
- CSequential Decision-Making|Optimization and Personalization0
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage, with M. Uehara, J. D. Lee, and W. Sun.
- CCausal Inference0
Long-Term Causal Inference with Imperfect Surrogates using Many Weak Experiments, Proxies, and Cross-Fold Moments, with A. Bibaut, S. Ejdemyr, and M. Zhao.
- CCausal Inference0
Evaluating the Surrogate Index as a Decision-Making Tool Using 200 A/B Tests at Netflix, with V. Zhang, M. Zhao, and A. Le.
- CApplied ML|Sequential Decision-Making0
JoinGym: An Efficient Query Optimization Environment for Reinforcement Learning, with K. Wang, J. Wang, Y. Li, I. Trummer, and W. Sun.
- CSequential Decision-Making|Optimization and Personalization0
Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR, with K. Wang and W. Sun.
- CSequential Decision-Making|Optimization and Personalization1
Smooth Non-Stationary Bandits, with S. Jia, Q. Xie, and P. Frazier.
Major revision in Operations Research.
Proceedings of the 40th International Conference on Machine Learning (ICML), 2023.
arXiv January 2023.
- Abstract: In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee T^(2/3) regret. However, in practice environments are often changing smoothly, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. In this paper, we study a non-stationary two-arm bandit problem where we assume an arm's mean reward is a β-Hölder function over (normalized) time, meaning it is (β-1)-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with T^(3/5) regret for β=2. We complement this result by a T^((β+1)/(2β+1)) lower bound for any integer β≥1, which matches our upper bound for β=2.
- CCausal Inference|Optimization and Personalization0
B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding, with M. Oprescu, J. Dorn, M. Ghoummaid, A. Jesson, and U. Shalit.
- CSequential Decision-Making|Optimization and Personalization0
Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings, with M. Uehara, A. Sekhari, J. D. Lee, and W. Sun.
- JCausal Inference|Sequential Decision-Making1
Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations, with A. Bibaut and M. Lindon.
arXiv December 2022.
- Abstract: Sequential testing, always-valid p-values, and confidence sequences promise flexible statistical inference and on-the-fly decision making. However, unlike fixed-n inference based on asymptotic normality, existing sequential tests either make parametric assumptions and end up under-covering/over-rejecting when these fail or use non-parametric but conservative concentration inequalities and end up over-covering/under-rejecting. To circumvent these issues, we sidestep exact at-least-α coverage and focus on asymptotically exact coverage and asymptotic optimality. That is, we seek sequential tests whose probability of ever rejecting a true hypothesis asymptotically approaches α and whose expected time to reject a false hypothesis approaches a lower bound on all tests with asymptotic coverage at least α, both under an appropriate asymptotic regime. We permit observations to be both non-parametric and dependent and focus on testing whether the observations form a martingale difference sequence. We propose the universal sequential probability ratio test (uSPRT), a slight modification to the normal-mixture sequential probability ratio test, where we add a burn-in period and adjust thresholds accordingly. We show that even in this very general setting, the uSPRT is asymptotically optimal under mild generic conditions. We apply the results to stabilized estimating equations to test means, treatment effects, etc. Our results also provide corresponding guarantees for the implied confidence sequences. Numerical simulations verify our guarantees and the benefits of the uSPRT over alternatives.
- CSequential Decision-Making|Optimization and Personalization0
Provable Safe Reinforcement Learning with Binary Feedback, with A. Bennett and D. Misra.
- CCausal Inference|Optimization and Personalization0
Robust and Agnostic Learning of Conditional Distributional Treatment Effects, with M. Oprescu.
- CCausal Inference|Fairness|Optimization and Personalization0
What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment.
- CSequential Decision-Making|Optimization and Personalization0
Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems, with M. Uehara, A. Sekhari, J. D. Lee, and W. Sun.
- CCausal Inference0
The Implicit Delta Method, with J. McInerney.
- JSequential Decision-Making|Optimization and Personalization1
A Review of Off-Policy Evaluation in Reinforcement Learning, with M. Uehara and C. Shi.
- CSequential Decision-Making|Optimization and Personalization0
Learning Bellman Complete Representations for Offline Policy Evaluation, with J. Chang, K. Wang, and W. Sun.
- C|JCausal Inference|Fairness|Optimization and Personalization0
Treatment Effect Risk: Bounds and Inference.
Management Science (Fast Track), 2023.
Proceedings of the 5th ACM Conference on Fairness, Accountability, and Transparency (FAccT), 2022 (Extended abstract).
arXiv January 2022.
GitHub.
Accepted to Fast Track in Management Science
- Abstract: Since the average treatment effect (ATE) measures the change in social welfare, even if positive, there is a risk of negative effect on, say, some 10% of the population. Assessing such risk is difficult, however, because any one individual treatment effect (ITE) is never observed so the 10% worst-affected cannot be identified, while distributional treatment effects only compare the first deciles within each treatment group, which does not correspond to any 10%-subpopulation. In this paper we consider how to nonetheless assess this important risk measure, formalized as the conditional value at risk (CVaR) of the ITE distribution. We leverage the availability of pre-treatment covariates and characterize the tightest-possible upper and lower bounds on ITE-CVaR given by the covariate-conditional average treatment effect (CATE) function. Some bounds can also be interpreted as summarizing a complex CATE function into a single metric and are of interest independently of being a bound. We then proceed to study how to estimate these bounds efficiently from data and construct confidence intervals. This is challenging even in randomized experiments as it requires understanding the distribution of the unknown CATE function, which can be very complex if we use rich covariates so as to best control for heterogeneity. We develop a debiasing method that overcomes this and prove it enjoys favorable statistical properties even when CATE and other nuisances are estimated by black-box machine learning or even inconsistently. Studying a hypothetical change to French job-search counseling services, our bounds and inference demonstrate a small social benefit entails a negative impact on a substantial subpopulation.
- JCausal Inference|Sequential Decision-Making1
Controlling for Unmeasured Confounding in Panel Data Using Minimal Bridge Functions: From Two-Way Fixed Effects to Factor Models, with G. Imbens and X. Mao.
arXiv August 2021.
- Abstract: We develop a new approach for identifying and estimating average causal effects in panel data under a linear factor model with unmeasured confounders. Compared to other methods tackling factor models such as synthetic controls and matrix completion, our method does not require the number of time periods to grow infinitely. Instead, we draw inspiration from the two-way fixed effect model as a special case of the linear factor model, where a simple difference-in-differences transformation identifies the effect. We show that analogous, albeit more complex, transformations exist in the more general linear factor model, providing a new means to identify the effect in that model. In fact many such transformations exist, called bridge functions, all identifying the same causal effect estimand. This poses a unique challenge for estimation and inference, which we solve by targeting the minimal bridge function using a regularized estimation approach. We prove that our resulting average causal effect estimator is root-N consistent and asymptotically normal, and we provide asymptotically valid confidence intervals. Finally, we provide extensions for the case of a linear factor model with time-varying unmeasured confounders.
- CFairness|Applied ML|Optimization and Personalization0
Estimating Structural Disparities for Face Models, with S. Ardeshir and C. Segalin.
- CCausal Inference|Optimization and Personalization0
Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning, with X. Mao, K. Wang, and Z. Zhou.
- C|JCausal Inference|Fairness|Applied ML0
An Empirical Evaluation of the Impact of New York's Bail Reform on Crime Using Synthetic Controls, with T. Bergin, A. Koo, S. Koppel, R. Peterson, R. Ropac, and A. Zhou.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Post-Contextual-Bandit Inference, with A. Bibaut, A. Chambaz, M. Dimakopoulou, and M. van der Laan.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning, with A. Bibaut, A. Chambaz, M. Dimakopoulou, and M. van der Laan.
- JCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision Processes, with A. Bennett.
- JCausal Inference1
Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach, with X. Mao and M. Uehara.
arXiv March 2021.
- Abstract: We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available. Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions. In this paper, we tackle the primary challenge to causal inference using negative controls: the identification and estimation of these bridge functions. Previous work has relied on uniqueness and completeness assumptions on these functions that may be implausible in practice and also focused on their parametric estimation. Instead, we provide a new identification strategy that avoids both uniqueness and completeness. And, we provide a new estimators for these functions based on minimax learning formulations. These estimators accommodate general function classes such as reproducing Hilbert spaces and neural networks. We study finite-sample convergence results both for estimating bridge function themselves and for the final estimation of the causal parameter. We do this under a variety of combinations of assumptions that include realizability and closedness conditions on the hypothesis and critic classes employed in the minimax estimator. Depending on how much we are willing to assume, we obtain different convergence rates. In some cases, we show the estimate for the causal parameter may converge even when our bridge function estimators do not converge to any valid bridge function. And, in other cases, we show we can obtain semiparametric efficiency.
- JOptimization and Personalization0
Fast Rates for Contextual Linear Optimization, with Y. Hu and X. Mao.
- JOptimization and Personalization0
Stochastic Optimization Forests, with X. Mao.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Stateful Offline Contextual Policy Evaluation and Learning, with A. Zhou.
- CCausal Inference|Optimization and Personalization0
Control Variates for Slate Off-Policy Evaluation, with F. Amat Gil, A. Chandrashekar, and N. Vlassis.
- C|JSequential Decision-Making|Optimization and Personalization0
Fast Rates for the Regret of Offline Reinforcement Learning, with Y. Hu and M. Uehara.
Mathematics of Operations Research, 2024.
Proceedings of the 34th Conference on Learning Theory (COLT), 2021 (Extended abstract).
arXiv February 2021.
- Abstract: We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted Q-iteration (FQI), suggest a O(1/√n) convergence for regret, empirical behavior exhibits much faster convergence. In this paper, we present a finer regret analysis that exactly characterizes this phenomenon by providing fast rates for the regret convergence. First, we show that given any estimate for the optimal quality function Q*, the regret of the policy it defines converges at a rate given by the exponentiation of the Q*-estimate's pointwise convergence rate, thus speeding it up. The level of exponentiation depends on the level of noise in the decision-making problem, rather than the estimation problem. We establish such noise levels for linear and tabular MDPs as examples. Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees. As specific cases, our results imply O(1/n) regret rates in linear cases and exp(-Ω(n)) regret rates in tabular cases.
- JCausal Inference|Applied ML0
The Effect of Patient Age on Discharge Destination and Complications After Lumbar Spinal Fusion, with B. Pennicooke, M. Santacatterina, J. Lee, and E. Elowitz.
- C|JSequential Decision-Making|Optimization and Personalization1
Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency, with M. Imaizumi, N. Jiang, W. Sun, M. Uehara, and T. Xie.
arXiv February 2021.
- Abstract: We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and q-functions when these are estimated using recent minimax methods. Under various combinations of realizability and completeness assumptions, we show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions, characterized by the critical inequality (Bartlett et al., 2005). Based on this result, we analyze convergence rates for OPE. In particular, we introduce novel alternative completeness conditions under which OPE is feasible and we present the first finite-sample result with first-order efficiency in non-tabular environments, i.e., having the minimal coefficient in the leading term.
- JCausal Inference0
On the Optimality of Randomization in Experimental Design: How to Randomize for Minimax Variance and Design-Based Inference.
- JCausal Inference|Optimization and Personalization0
Rejoinder: New Objectives for Policy Learning.
- CFairness|Optimization and Personalization0
Fairness, Welfare, and Equity in Personalized Pricing, with A. Zhou.
- JSequential Decision-Making|Optimization and Personalization1
DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret, with Y. Hu.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Optimal Off-Policy Evaluation from Multiple Logging Policies, with Y. Saito and M. Uehara.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies, with M. Uehara.
- JCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning, with M. Uehara.
Biometrika, 2023.
arXiv June 2020.
GitHub.
- Abstract: We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the behavior policy. This is a departure from the literature on off-policy evaluation where most work consider the evaluation of explicitly specified policies. Crucially, offline reinforcement learning with natural stochastic policies can help alleviate issues of weak overlap, lead to policies that build upon current practice, and improve policies' implementability in practice. Compared with the classic case of a pre-specified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown. In this paper we derive the efficiency bounds of two major types of natural stochastic policies: tilting policies and modified treatment policies. We then propose efficient nonparametric estimators that attain the efficiency bounds under very lax conditions. These also enjoy a (partial) double robustness property.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders, with A. Bennett, L. Li, and A. Mousavi.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
arXiv July 2020.
Oral at AISTATS (3%)
- Abstract: Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings where experimentation is limited, such as education and healthcare. But, in these very same settings, observed actions are often confounded by unobserved variables making OPE even more difficult. We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders, where states and actions can act as proxies for the unobserved confounders. We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data. Our method involves two stages. In the first, we show how to use proxies to estimate stationary distribution ratios, extending recent work on breaking the curse of horizon to the confounded setting. In the second, we show optimal balancing can be combined with such learned ratios to obtain policy value while avoiding direct modeling of reward functions. We establish theoretical guarantees of consistency, and benchmark our method empirically.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning, with A. Zhou.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Statistically Efficient Off-Policy Policy Gradients, with M. Uehara.
- CCausal Inference|Optimization and Personalization0
Efficient Policy Learning from Surrogate-Loss Classification Reductions, with A. Bennett.
- JCausal Inference0
Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond, with X. Mao and M. Uehara.
- C|JSequential Decision-Making|Optimization and Personalization0
Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes, with Y. Hu and X. Mao.
Operations Research, 2021.
Proceedings of the 33rd Conference on Learning Theory (COLT), 2020 (Extended abstract).
arXiv September 2019.
Finalist, Applied Probability Society Best Paper Competition
- Abstract: We study a nonparametric contextual bandit problem where the expected reward functions belong to a Hölder class with smoothness parameter β. We show how this interpolates between two extremes that were previously studied in isolation: non-differentiable bandits (β≤1), where rate-optimal regret is achieved by running separate non-contextual bandits in different context regions, and parametric-response bandits (β=∞), where rate-optimal regret can be achieved with minimal or no exploration due to infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and non-differentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits.
- JCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning, with M. Uehara.
- C|JCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes, with M. Uehara.
Journal of Machine Learning Research (JMLR), 21(167):1--63, 2020.
Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.
arXiv August 2019.
GitHub.
- Abstract: Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be efficient in this setting. We develop a new estimator based on cross-fold estimation of q-functions and marginalized density ratios, which we term double reinforcement learning (DRL). We show that DRL is efficient when both components are estimated at fourth-root rates and is also doubly robust when only one component is consistent. We investigate these properties empirically and demonstrate the performance benefits due to harnessing memorylessness efficiently.
- C|JCausal Inference|Fairness|Optimization and Personalization0
Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination, with X. Mao and A. Zhou.
Management Science, 2020.
Proceedings of the 3rd ACM Conference on Fairness, Accountability, and Transparency (FAccT), 110, 2020 (Extended abstract).
arXiv June 2019.
GitHub. MS Blog.
- Abstract: The increasing impact of algorithmic decisions on people's lives compels us to scrutinize their fairness and, in particular, the disparate impacts that ostensibly-color-blind algorithms can have on different groups. Examples include credit decisioning, hiring, advertising, criminal justice, personalized medicine, and targeted policymaking, where in some cases legislative or regulatory frameworks for fairness exist and define specific protected classes. In this paper we study a fundamental challenge to assessing disparate impacts in practice: protected class membership is often not observed in the data. This is particularly a problem in lending and healthcare. We consider the use of an auxiliary dataset, such as the US census, that includes class labels but not decisions or outcomes. We show that a variety of common disparity measures are generally unidentifiable aside for some unrealistic cases, providing a new perspective on the documented biases of popular proxy-based methods. We provide exact characterizations of the sharpest-possible partial identification set of disparities either under no assumptions or when we incorporate mild smoothness constraints. We further provide optimization-based algorithms for computing and visualizing these sets, which enables reliable and robust assessments -- an important tool when disparity assessment can have far-reaching policy implications. We demonstrate this in two case studies with real data: mortgage lending and personalized medicine dosing.
- JOptimization and Personalization0
Data Pooling in Stochastic Optimization, with V. Gupta.
- JCausal Inference|Optimization and Personalization0
More Efficient Policy Learning via Optimal Retargeting.
Journal of the American Statistical Association (JASA), 2020, Special Issue on Precision Medicine and Individualized Policy Discovery.
arXiv June 2019.
GitHub.
Selected as JASA Discussion Paper
- Abstract: Policy learning can be used to extract individualized treatment regimes from observational data in healthcare, civics, e-commerce, and beyond. One big hurdle to policy learning is a commonplace lack of overlap in the data for different actions, which can lead to unwieldy policy evaluation and poorly performing learned policies. We study a solution to this problem based on retargeting, that is, changing the population on which policies are optimized. We first argue that at the population level, retargeting may induce little to no bias. We then characterize the optimal reference policy centering and retargeting weights in both binary-action and multi-action settings. We do this in terms of the asymptotic efficient estimation variance of the new learning objective. We further consider bias regularization. Extensive empirical results in a simulation study and a case study of targeted job counseling demonstrate that retargeting is a fairly easy way to significantly improve any policy learning procedure.
- JCausal Inference|Optimization and Personalization0
Minimax-Optimal Policy Learning Under Unobserved Confounding, with A. Zhou.
Management Science, 2020.
arXiv May 2018.
GitHub.
Winner, INFORMS 2018 Data Mining Best Paper Award
2nd place, INFORMS 2018 Junior Faculty Interest Group (JFIG) Paper Competition
- Abstract: We study the problem of learning personalized decision policies from observational data while accounting for possible unobserved confounding. Previous approaches, which assume unconfoundedness, that is, that no unobserved confounders affect both the treatment assignment as well as outcome, can lead to policies that introduce harm rather than benefit when some unobserved confounding is present as is generally the case with observational data. Instead, because policy value and regret may not be point-identifiable, we study a method that minimizes the worst-case estimated regret of a candidate policy against a baseline policy over an uncertainty set for propensity weights that controls the extent of unobserved confounding. We prove generalization guarantees that ensure our policy is safe when applied in practice and in fact obtains the best possible uniform control on the range of all possible population regrets that agree with the possible extent of confounding. We develop efficient algorithmic solutions to compute this minimax-optimal policy. Finally, we assess and compare our methods on synthetic and semisynthetic data. In particular, we consider a case study on personalizing hormone replacement therapy based on observational data, in which we validate our results on a randomized experiment. We demonstrate that hidden confounding can hinder existing policy-learning approaches and lead to unwarranted harm although our robust approach guarantees safety and focuses on well-evidenced improvement, a necessity for making personalized treatment policies learned from observational data reliable in practice.
- JCausal Inference|Optimization and Personalization0
Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects, with F. Johansson, U. Shalit, and D. Sontag.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning, with M. Uehara.
- CFairness|Optimization and Personalization0
The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC Metric, with A. Zhou.
- CCausal Inference0
Deep Generalized Method of Moments for Instrumental Variable Analysis, with A. Bennett and T. Schnabel.
- CCausal Inference|Fairness|Optimization and Personalization0
Assessing Disparate Impacts of Personalized Interventions: Identifiability and Bounds, with A. Zhou.
- CCausal Inference|Optimization and Personalization0
Policy Evaluation with Latent Confounders via Optimal Balance, with A. Bennett.
- JCausal Inference|Optimization and Personalization0
Comment: Entropy Learning for Dynamic Treatment Regimes.
- JCausal Inference1
Kernel Optimal Orthogonality Weighting: A Balancing Approach to Estimating Effects of Continuous Treatments, with M. Santacatterina.
arXiv October 2019.
- Abstract: Many scientific questions require estimating the effects of continuous treatments. Outcome modeling and weighted regression based on the generalized propensity score are the most commonly used methods to evaluate continuous effects. However, these techniques may be sensitive to model misspecification, extreme weights or both. In this paper, we propose Kernel Optimal Orthogonality Weighting (KOOW), a convex optimization-based method, for estimating the effects of continuous treatments. KOOW finds weights that minimize the worst-case penalized functional covariance between the continuous treatment and the confounders. By minimizing this quantity, KOOW successfully provides weights that orthogonalize confounders and the continuous treatment, thus providing optimal covariate balance, while controlling for extreme weights. We valuate its comparative performance in a simulation study. Using data from the Women's Health Initiative observational study, we apply KOOW to evaluate the effect of red meat consumption on blood pressure.
- JCausal Inference0
Optimal Weighting for Estimating Generalized Average Treatment Effects, with M. Santacatterina.
- CCausal Inference0
DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training.
- CCausal Inference|Optimization and Personalization0
Classifying Treatment Responders Under Causal Effect Monotonicity.
- CCausal Inference|Optimization and Personalization0
Confounding-Robust Policy Improvement, with A. Zhou.
- CCausal Inference0
Removing Hidden Confounding by Experimental Grounding, with A. M. Puli and U. Shalit.
Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), 10911--10920, 2018.
arXiv October 2018.
GitHub.
Spotlight at NeurIPS (3.5%)
- Abstract: Observational data is being increasingly used as a means for making individual-level causal predictions and intervention recommendations. The foremost challenge of causal inference from observational data is hidden confounding, whose presence cannot be tested in data and can invalidate any causal conclusion. Experimental data does not stuffer from confounding but is usually limited in both scope and scale. We introduce a novel method of using limited experimental data to correct the hidden confounding in causal effect models trained on larger observational data, even if the observational data does not fully overlap with the experimental data. Our method makes strictly weaker assumptions than existing approaches, and we prove conditions under which our method yields a consistent estimator. We demonstrate our method's efficacy using real-world data from a large educational experiment.
- CCausal Inference|Optimization and Personalization0
Balanced Policy Evaluation and Learning.
- CCausal Inference0
Causal Inference with Noisy and Missing Covariates via Matrix Factorization, with X. Mao and M. Udell.
- CFairness|Optimization and Personalization0
Fairness under unawareness: assessing disparity when protected class is unobserved, with J. Chen, X. Mao, G. Svacha, and M. Udell.
Proceedings of the 2nd ACM Conference on Fairness, Accountability, and Transparency (FAccT), 339--348, 2019.
arXiv November 2018.
GitHub.
- Abstract: Assessing the fairness of a decision making system with respect to a protected class, such as gender or race, is complicated by not being able to observe class membership labels due to legal or operational reasons. Probabilistic models for predicting the protected class based on observed proxy variables, such as surname and geolocation for race, are sometimes used to impute these missing labels. Such methods are known to be used by government regulators and have been observed to exaggerate disparities. The reason why is unknown, as is whether overestimation is always the case. We decompose the bias of estimating outcome disparity via an existing threshold-based imputation method into multiple interpretable bias sources, which explains when over- or underestimation occurs. We also propose an alternative weighted estimator that uses soft classification rules rather than hard imputation, and show that its bias arises simply from the conditional covariance of the outcome with the true class membership. We illustrate our results with numerical simulations as well as an application to a dataset of mortgage applications, using geolocation as a proxy for race. We confirm that the bias of threshold-based imputation is generally upward; however, its magnitude varies strongly with the threshold chosen due to the complex interplay of multiple sources of bias uncovered by our theoretical analysis. Our new weighted estimator tends to have a negative bias that is much simpler to analyze and reason about.
- CCausal Inference|Optimization and Personalization0
Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding, with X. Mao and A. Zhou.
- CCausal Inference|Fairness|Optimization and Personalization0
Residual Unfairness in Fair Machine Learning from Prejudiced Data, with A. Zhou.
- CCausal Inference|Optimization and Personalization0
Policy Evaluation and Optimization with Continuous Treatments, with A. Zhou.
Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS), 84:1243--1251, 2018.
arXiv February 2018.
GitHub.
Finalist, Best Paper of INFORMS 2017 Data Mining and Decision Analytics Workshop
- Abstract: We study the problem of policy evaluation and learning from batched contextual bandit data when treatments are continuous, going beyond previous work on discrete treatments. Previous work for discrete treatment/action spaces focuses on inverse probability weighting (IPW) and doubly robust (DR) methods that use a rejection sampling approach for evaluation and the equivalent weighted classification problem for learning. In the continuous setting, this reduction fails as we would almost surely reject all observations. To tackle the case of continuous treatments, we extend the IPW and DR approaches to the continuous setting using a kernel function that leverages treatment proximity to attenuate discrete rejection. Our policy estimator is consistent and we characterize the optimal bandwidth. The resulting continuous policy optimizer (CPO) approach using our estimator achieves convergent regret and approaches the best-in-class policy for learnable policy classes. We demonstrate that the estimator performs well and, in particular, outperforms a discretization-based benchmark. We further study the performance of our policy optimizer in a case study on personalized dosing based on a dataset of Warfarin patients, their covariates, and final therapeutic doses. Our learned policy outperforms benchmarks and nears the oracle-best linear policy.
- CCausal Inference|Sequential Decision-Making|Optimization and Personalization0
Instrument-Armed Bandits.
- JCausal Inference|Applied ML0
More Robust Estimation of Sample Average Treatment Effects using Kernel Optimal Matching in an Observational Study of Spine Surgical Interventions, with B. Pennicooke and M. Santacatterina.
- JCausal Inference|Sequential Decision-Making0
Optimal Balancing of Time-Dependent Confounders for Marginal Structural Models, with M. Santacatterina.
- CCausal Inference1
Learning Weighted Representations for Generalization Across Designs, with F. Johansson, U. Shalit, and D. Sontag.
arXiv February 2018.
- Abstract: Predictive models that generalize well under distributional shift are often desirable and sometimes crucial to building robust and reliable machine learning applications. We focus on distributional shift that arises in causal inference from observational data and in unsupervised domain adaptation. We pose both of these problems as prediction under a shift in design. Popular methods for overcoming distributional shift make unrealistic assumptions such as having a well-specified model or knowing the policy that gave rise to the observed data. Other methods are hindered by their need for a pre-specified metric for comparing observations, or by poor asymptotic properties. We devise a bound on the generalization error under design shift, incorporating both representation learning and sample re-weighting. Based on the bound, we propose an algorithmic framework that does not require any of the above assumptions and which is asymptotically consistent. We empirically study the new framework using two synthetic datasets, and demonstrate its effectiveness compared to previous methods.
- CCausal Inference|Optimization and Personalization0
Recursive Partitioning for Personalization using Observational Data.
Proceedings of the 34th International Conference on Machine Learning (ICML), 70:1789--1798, 2017.
arXiv August 2016.
Winner, Best Paper of INFORMS 2016 Data Mining and Decision Analytics Workshop
- Abstract: We study the problem of learning to choose from m discrete treatment options (e.g., news item or medical drug) the one with best causal effect for a particular instance (e.g., user or patient) where the training data consists of passive observations of covariates, treatment, and the outcome of the treatment. The standard approach to this problem is regress and compare: split the training data by treatment, fit a regression model in each split, and, for a new instance, predict all m outcomes and pick the best. By reformulating the problem as a single learning task rather than m separate ones, we propose a new approach based on recursively partitioning the data into regimes where different treatments are optimal. We extend this approach to an optimal partitioning approach that finds a globally optimal partition, achieving a compact, interpretable, and impactful personalization model. We develop new tools for validating and evaluating personalization models on observational data and use these to demonstrate the power of our novel approaches in a personalized medicine and a job training application.
- JCausal Inference0
Generalized Optimal Matching Methods for Causal Inference.
- JSequential Decision-Making|Optimization and Personalization0
Dynamic Assortment Personalization in High Dimensions, with M. Udell.
- JCausal Inference0
Optimal A Priori Balance in the Design of Controlled Experiments.
- JCausal Inference|Optimization and Personalization0
The Power and Limits of Predictive Approaches to Observational-Data-Driven Optimization, with D. Bertsimas.
- JOptimization and Personalization0
From Predictive to Prescriptive Analytics, with D. Bertsimas.
Management Science, 66(3):1025--1044, 2020.
arXiv February 2014.
Finalist, POMS Applied Research Challenge 2016
- Abstract: In this paper, we combine ideas from machine learning (ML) and operations research and management science (OR/MS) in developing a framework, along with specific methods, for using data to prescribe decisions in OR/MS problems. In a departure from other work on data-driven optimization and reflecting our practical experience with the data available in applications of OR/MS, we consider data consisting, not only of observations of quantities with direct effect on costs/revenues, such as demand or returns, but predominantly of observations of associated auxiliary quantities. The main problem of interest is a conditional stochastic optimization problem, given imperfect observations, where the joint probability distributions that specify the problem are unknown. We demonstrate that our proposed solution methods are generally applicable to a wide range of decision problems. We prove that they are computationally tractable and asymptotically optimal under mild conditions even when data is not independent and identically distributed (iid) and even for censored observations. As an analogue to the coefficient of determination R², we develop a metric P termed the coefficient of prescriptiveness to measure the prescriptive content of data and the efficacy of a policy from an operations perspective. To demonstrate the power of our approach in a real-world setting we study an inventory management problem faced by the distribution arm of an international media conglomerate, which ships an average of 1 billion units per year. We leverage both internal data and public online data harvested from IMDb, Rotten Tomatoes, and Google to prescribe operational decisions that outperform baseline measures. Specifically, the data we collect, leveraged by our methods, accounts for an 88\% improvement as measured by our coefficient of prescriptiveness.
- JOptimization and Personalization0
Robust Sample Average Approximation, with D. Bertsimas and V. Gupta.
Mathematical Programming, 171(1--2):217--282, 2018.
arXiv August 2014.
Winner, Best Student Paper Award, MIT Operations Research Center 2013
- Abstract: Sample average approximation (SAA) is a widely popular approach to data-driven decision-making under uncertainty. Under mild assumptions, SAA is both tractable and enjoys strong asymptotic performance guarantees. Similar guarantees, however, do not typically hold in finite samples. In this paper, we propose a modification of SAA, which we term Robust SAA, which retains SAA's tractability and asymptotic properties and, additionally, enjoys strong finite-sample performance guarantees. The key to our method is linking SAA, distributionally robust optimization, and hypothesis testing of goodness-of-fit. Beyond Robust SAA, this connection provides a unified perspective enabling us to characterize the finite sample and asymptotic guarantees of various other data-driven procedures that are based upon distributionally robust optimization. This analysis provides insight into the practical performance of these various methods in real applications. We present examples from inventory management and portfolio allocation, and demonstrate numerically that our approach outperforms other data-driven approaches in these applications.
- JOptimization and Personalization0
Data-Driven Robust Optimization, with D. Bertsimas and V. Gupta.
Mathematical Programming, 167(2):235--292, 2018.
arXiv January 2014.
Finalist, INFORMS Nicholson Paper Competition 2013
- Abstract: The last decade has seen an explosion in the availability of data for operations research applications as part of the Big Data revolution. Motivated by this data rich paradigm, we propose a novel schema for utilizing data to design uncertainty sets for robust optimization using statistical hypothesis tests. The approach is flexible and widely applicable, and robust optimization problems built from our new sets are computationally tractable, both theoretically and practically. Furthermore, optimal solutions to these problems enjoy a strong, finite-sample probabilistic guarantee. We also propose concrete guidelines for practitioners and illustrate our approach with applications in portfolio management and queueing. Computational evidence confirms that our data-driven sets significantly outperform conventional robust optimization techniques whenever data is available.
- CCausal Inference0
A Framework for Optimal Matching for Causal Inference.
- JCausal Inference|Applied ML|Optimization and Personalization0
Personalized Diabetes Management Using Electronic Medical Records, with D. Bertsimas, A. Weinstein, and D. Zhuo.
- COptimization and Personalization0
Revealed Preference at Scale: Learning Personalized Preferences from Assortment Choice, with M. Udell.
- JApplied ML|Optimization and Personalization0
Inventory Management in the Era of Big Data, with D. Bertsimas and A. Hussain.
- BApplied ML0
On the Predictive Power of Web Intelligence and Social Media.
- JCausal Inference0
The Power of Optimization Over Randomization in Designing Experiments Involving Small Samples, with D. Bertsimas and M. Johnson.
Operations Research, 63(4):868--876, 2015.
Code. Editorial in World Bank's Development Impact.
- Abstract: Random assignment, typically seen as the standard in controlled trials, aims to make experimental groups statistically equivalent before treatment. However, with a small sample, which is a practical reality in many disciplines, randomized groups are often too dissimilar to be useful. We propose an approach based on discrete linear optimization to create groups whose discrepancy in their means and variances is several orders of magnitude smaller than with randomization. We provide theoretical and computational evidence that groups created by optimization have exponentially lower discrepancy than those created by randomization.
- CApplied ML0
Predicting Crowd Behavior with Big Public Data.
Proceedings of the 23rd international conference on World Wide Web (WWW) companion, 23:625--630, 2014.
arXiv February 2014.
Slides. Media coverage.
Winner, INFORMS Social Media Analytics Best Paper Competition 2015
- Abstract: With public information becoming widely accessible and shared on today's web, greater insights are possible into crowd actions by citizens and non-state actors such as large protests and cyber activism. Turning public data into Big Data, company Recorded Future continually scans over 300,000 open content web sources in 7 languages from all over the world, ranging from mainstream news to government publications to blogs and social media. We study the predictive power of this massive public data in forecasting crowd actions such as large protests and cyber campaigns before they occur. Using natural language processing, event information is extracted from content such as type of event, what entities are involved and in what role, sentiment and tone, and the occurrence time range of the event discussed. The amount of information is staggering and trends can be seen clearly in sheer numbers. In the first half of this paper we show how we use this data to predict large protests in a selection of 19 countries and 37 cities in Asia, Africa, and Europe with high accuracy using standard learning machines. In the second half we delve into predicting the perpetrators and targets of political cyber attacks with a novel application of the naïve Bayes classifier to high-dimensional sequence mining in massive datasets.
- JApplied ML|Optimization and Personalization0
Scheduling, Revenue Management, and Fairness in an Academic-Hospital Division: An Optimization Approach, with D. Bertsimas and R. Baum.