111 Eighth Ave. #302
New York, NY 10011
I am an Assistant Professor of Operations Research and Information Engineering at Cornell University and Cornell Tech in New York City. My research revolves around data-driven decision making, the interplay of optimization and statistics in decision making and in inference, and the analytical capacities and challenges of observational, large-scale, and web-driven data. I hold a PhD in Operations Research from MIT as well as a BA in Mathematics and a BS in Computer Science both from UC Berkeley. Before coming to Cornell, I was a Visiting Scholar at USC's Department of Data Sciences and Operations and a Postdoctoral Associate at MIT's Operations Research and Statistics group.
Data-driven optimization under uncertainty; Machine learning; Causal inference; Personalization; Optimization in statistics; Online decision making; Decision making and operations in healthcare; Operations management and revenue management applications.
Generalized Optimal Matching Methods for Causal Inference. Submitted (2016); under review.
Abstract: We develop an encompassing framework and theory for matching and related methods for causal inference that reveal the connections and motivations behind various existing methods and give rise to new and improved ones. The framework is given by generalizing a new functional analytical characterization of optimal matching as minimizing worst-case conditional mean squared error given the observed data based on specific restrictions and assumptions. By generalizing these, we obtain a new class of generalized optimal matching (GOM) methods, for which we provide a single theory for tractability and consistency that applies generally to GOM. Many commonly used existing methods are included in GOM and using their GOM interpretation we extend these to new methods that judiciously and automatically trade off balance for variance and outperform their standard counterparts. As a subclass of GOM, we develop kernel optimal matching, which, as supported by new theory, is notable for combining the interpretability of matching methods, the non-parametric model-free consistency of optimal matching, the efficiency of well-specified regression, the judicious sample size selection of monotonic imbalance bounding methods, the double robustness of augmented inverse propensity weight estimators, and the model-selection flexibility of Gaussian-process regression. We discuss connections to and non-linear generalizations of equal percent bias reduction and its ramifications.
Dynamic Assortment Personalization in High Dimensions (with M. Udell). Submitted (2016); under review.
We demonstrate the importance of structural priors for effective, efficient large-scale dynamic assortment personalization. Assortment personalization is the problem of choosing, for each individual or consumer segment (type), a best assortment of products, ads, or other offerings (items) so as to maximize revenue. This problem is central to revenue management in e-commerce, online advertising, and multi-location brick-and-mortar retail, where both items and types can number in the thousands-to-millions. Data efficiency is paramount in this large-scale setting. A good personalization strategy must dynamically balance the need to learn consumer preferences and to maximize revenue.
We formulate the dynamic assortment personalization problem as a discrete-contextual bandit with m contexts (customer types) and many arms (assortments of the n items). We assume that each type's preferences follow a simple parametric model with n parameters. In all, there are mn parameters, and existing literature suggests that order optimal regret scales as mn. However, this figure is orders of magnitude larger than the data available in large-scale applications, and imposes unacceptably high regret.
In this paper, we impose natural structure on the problem — a small latent dimension, or low rank. In the static setting, we show that this model can be efficiently learned from surprisingly few interactions, using a time- and memory-efficient optimization algorithm that converges globally whenever the model is learnable. In the dynamic setting, we show that structure-aware dynamic assortment personalization can have regret that is an order of magnitude smaller than structure-ignorant approaches. We validate our theoretical results empirically.
A Framework for Optimal Matching for Causal Inference. Accepted to the 20th International Conference on Artificial Intelligence and Statistics (AISTATS).
Abstract: We propose a novel framework for matching estimators for causal effect from observational data that is based on minimizing the dual norm of estimation error when expressed as an operator. We show that many popular matching estimators can be expressed as optimal in this framework, including nearest-neighbor matching, coarsened exact matching, and mean-matched sampling. This reveals their motivation and aptness as structural priors formulated by embedding the effect in a particular functional space. This also gives rise to a range of new, kernel-based matching estimators that arise when one embeds the effect in a reproducing kernel Hilbert space. Depending on the case, these estimators can be found using either quadratic optimization or integer optimization. We show that estimators based on universal kernels are universally consistent without model specification. In empirical results using both synthetic and real data, the new, kernel-based estimators outperform all standard causal estimators in estimation error.
Personalized Diabetes Management Using Electronic Medical Records (with D. Bertsimas, A Weinstein, and D. Zhuo). Diabetes Care 40(2):210-217, 2017.
Objective: Current clinical guidelines for managing type 2 diabetes do not differentiate based on patient-specific factors. We present a data-driven algorithm for personalized diabetes management that improves health outcomes relative to the standard of care.
Research Design and Methods: We modeled outcomes under 13 pharmacological therapies based on electronic medical records from 1999 to 2014 for 10,806 patients with type 2 diabetes from Boston Medical Center. For each patient visit, we analyzed the range of outcomes under alternative care using a k-nearest neighbor approach. The neighbors were chosen to maximize similarity on individual patient characteristics and medical history that were most predictive of health outcomes. The recommendation algorithm prescribes the regimen with best predicted outcome if the expected improvement from switching regimens exceeds a threshold. We evaluated the effect of recommendations on matched patient outcomes from unseen data.
Results: Among the 48,140 patient visits in the test set, the algorithm’s recommendation mirrored the observed standard of care in 68.2% of visits. For patient visits in which the algorithmic recommendation differed from the standard of care, the mean posttreatment glycated hemoglobin A1c (HbA1c) under the algorithm was lower than standard of care by 0.44 ± 0.03% (4.8 ± 0.3 mmol/mol) (P < 0.001), from 8.37% under the standard of care to 7.93% under our algorithm (68.0 to 63.2 mmol/mol).
Conclusion: A personalized approach to diabetes management yielded substantial improvements in HbA1c outcomes relative to the standard of care. Our prototyped dashboard visualizing the recommendation algorithm can be used by providers to inform diabetes care and improve outcomes.
Pricing from Observational Data (with D. Bertsimas). Submitted (2016); under review.
Abstract: Given observational data on price and demand, the price optimization problem is sometimes addressed in the literature by a predictive approach: (a) fit a model to the data that best predicts demand given price and (b) substitute the predictive model into the overall profit and optimize for price. We show that, because historical demand at all prices but the observed one is missing, the price optimization problem is not well specified by the data, and in particular, the predictive approach fails to find the optimal price. We bound the suboptimality of the predictive approach, even when the optimal price cannot be identified from the data, by leveraging the special structure of the problem. Drawing from the causal inference literature, we provide sufficient conditions for the optimal price to be identifiable from the data. Given these conditions, we provide parametric and non-parametric algorithms for the price optimization problem. In the non-parametric case we prove consistency and asymptotic normality and establish rates of convergence. We develop a hypothesis test for asymptotic profit optimality of any algorithm for pricing from observational data. We use this test to demonstrate empirically in an auto loan dataset that both parametric and non-parametric predictive approaches lose significant profit relative to the optimum and that our prescriptive parametric framework leads to profit that cannot be distinguished from the optimal one, recovering 36-70% of profits lost by the predictive approaches.
Learning to Personalize from Observational Data. Submitted (2016); under review.
Winner, Best Paper of INFORMS 2016 Data Mining and Decision Analytics Workshop.
Abstract: We study the problem of learning to choose from m discrete treatment options (e.g., medical drugs) the one with best causal effect for a particular instance (e.g., patient) characterized by an observation of covariates. The training data consists of observations of covariates, treatment, and the outcome of the treatment. We recast the problem of learning to personalize from these observational data as a single learning task, which we use to develop four specific machine learning methods to directly address the personalization problem, two with a unique interpretability property. We also show how to validate personalization models on observational data, proposing the new coefficient of personalization as a unitless measure of effectiveness. We demonstrate the power of the new methods in two specific personalized medicine and policymaking applications and show they provide a significant advantage over standard approaches.
Revealed Preference at Scale: Learning Personalized Preferences from Assortment Choice (with M. Udell). Proceedings of the 17th ACM Conference on Economics and Computation (EC) 17:821–837, 2016.
We consider the problem of learning the preferences of a heterogeneous population by observing choices from an assortment of products, ads, or other offerings. Our observation model takes a form common in assortment planning applications: each arriving customer is offered an assortment consisting of a subset of all possible offerings; we observe only the assortment and the customer's single choice.
In this paper we propose a mixture choice model with a natural underlying low-dimensional structure, and show how to estimate its parameters. In our model, the preferences of each customer or segment follow a separate parametric choice model, but the underlying structure of these parameters over all the models has low dimension. We show that a nuclear-norm regularized maximum likelihood estimator can learn the preferences of all customers using a number of observations much smaller than the number of item-customer combinations. This result shows the potential for structural assumptions to speed up learning and improve revenues in assortment planning and customization. We provide a specialized factored gradient descent algorithm and study the success of the approach empirically.
Causal Inference by Minimizing the Dual Norm of Bias: Kernel Matching & Weighting Estimators for Causal Effects. Preliminary version to appear in the Proceedings of UAI 2016 Causation from Foundation to Application.
Abstract: We consider the problem of estimating causal effects from observational data and propose a novel framework for matching- and weighting-based causal estimators. The framework is based on expressing the bias of a causal estimator as an operator on the unknown conditional expectation function of outcomes and formulating the dual norm of the bias as the norm of this operator with respect to a function space that represents the potential structure for outcomes. We give the term worst-case bias minimizing (WCBM) to estimators that minimize this quantity for some function space and show that a great variety of existing causal estimators belong to this family, including one-to-one matching (with or without replacement), coarsened exact matching, and mean-matched sampling. We propose a range of new, kernel-based matching and weighting estimators that arise when one minimizes the dual norm of the bias with respect to a reproducing kernel Hilbert space. Depending on the case, these estimators can be solved either in closed form, using quadratic optimization, or using integer optimization. We show that estimators based on universal kernels are consistent for the causal effect. In numerical experiments, the new, kernel-based estimators outperform all standard causal estimators in estimation error, providing a successful balance between generality and efficiency.
Inventory Management in the Era of Big Data (with D. Bertsimas and A. Hussain). Production and Operations Management, 25(12):2006-2009.
On the Predictive Power of Web Intelligence and Social Media. Chapter in Big Data Analytics in the Social and Ubiquitous Context, Springer International, 2016.
Abstract: With more information becoming widely accessible and new content created shared on today's web, more are turning to harvesting such data and analyzing it to extract insights. But the relevance of such data to see beyond the present is not clear. We present efforts to predict future events based on web intelligence – data harvested from the web – with specific emphasis on social media data and on timed event mentions, thereby quantifying the predictive power of such data. We focus on predicting crowd actions such as large protests and coordinated acts of cyber activism – predicting their occurrence, specific timeframe, and location. Using natural language processing, statements about events are extracted from content collected from hundred of thousands of open content we sources. Attributes extracted include event type, entities involved and their role, sentiment and tone, and – most crucially – the reported timeframe for the occurrence of the event discussed – whether it be in the past, present, or future. Tweets (Twitter posts) that mention an event to occur reportedly in the future prove to be important predictors. These signals are enhanced by cross referencing with the fragility of the situation as inferred from more traditional media, allowing us to sift out the social media trends that fizzle out before materializing as crowds on the ground.
Optimal A Priori Balance in the Design of Controlled Experiments. Submitted (2015); minor revision in the Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Abstract: We develop a unified theory of designs for controlled experiments that balance baseline covariates a priori (before treatment and before randomization) using the framework of minimax variance. We establish a “no free lunch” theorem that indicates that, without structural information on the dependence of potential outcomes on baseline covariates, complete randomization is optimal. Restricting the structure of dependence, either parametrically or non-parametrically, leads directly to imbalance metrics and optimal designs. Certain choices of this structure recover known imbalance metrics and designs previously developed ad hoc, including randomized block designs, pairwise-matched designs, and re-randomization. New choices of structure based on reproducing kernel Hilbert spaces lead to new methods, both parametric and non-parametric.
From Predictive to Prescriptive Analytics (with D. Bertsimas). Submitted (2015); second round of review.
Finalist, POMS Applied Research Challenge 2016.
Abstract: In this paper, we combine ideas from machine learning (ML) and operations research and management science (OR/MS) in developing a framework, along with specific methods, for using data to prescribe decisions in OR/MS problems. In a departure from other work on data-driven optimization and reflecting our practical experience with the data available in applications of OR/MS, we consider data consisting, not only of observations of quantities with direct effect on costs/revenues, such as demand or returns, but predominantly of observations of associated auxiliary quantities. The main problem of interest is a conditional stochastic optimization problem, given imperfect observations, where the joint probability distributions that specify the problem are unknown. We demonstrate that our proposed solution methods are generally applicable to a wide range of decision problems. We prove that they are computationally tractable and asymptotically optimal under mild conditions even when data is not independent and identically distributed (iid) and even for censored observations. As an analogue to the coefficient of determination R², we develop a metric P termed the coefficient of prescriptiveness to measure the prescriptive content of data and the efficacy of a policy from an operations perspective. To demonstrate the power of our approach in a real-world setting we study an inventory management problem faced by the distribution arm of an international media conglomerate, which ships an average of 1 billion units per year. We leverage both internal data and public online data harvested from IMDb, Rotten Tomatoes, and Google to prescribe operational decisions that outperform baseline measures. Specifically, the data we collect, leveraged by our methods, accounts for an 88% improvement as measured by our coefficient of prescriptiveness.
The Power of Optimization Over Randomization in Designing Experiments Involving Small Samples (with D. Bertsimas and M. Johnson). Operations Research, 63(4):868—876, 2015.
Abstract: Random assignment, typically seen as the standard in controlled trials, aims to make experimental groups statistically equivalent before treatment. However, with a small sample, which is a practical reality in many disciplines, randomized groups are often too dissimilar to be useful. We propose an approach based on discrete linear optimization to create groups whose discrepancy in their means and variances is several orders of magnitude smaller than with randomization. We provide theoretical and computational evidence that groups created by optimization have exponentially lower discrepancy than those created by randomization.
Robust Sample Average Approximation (with D. Bertsimas and V. Gupta). Submitted (2014); minor revision in Mathematical Programming.
Winner, Best Student Paper Award, MIT Operations Research Center 2013.
Abstract: Sample average approximation (SAA) is a widely popular approach to data-driven decision-making under uncertainty. Under mild assumptions, SAA is both tractable and enjoys strong asymptotic performance guarantees. Similar guarantees, however, do not typically hold in finite samples. In this paper, we propose a modification of SAA, which we term Robust SAA, which retains SAA's tractability and asymptotic properties and, additionally, enjoys strong finite-sample performance guarantees. The key to our method is linking SAA, distributionally robust optimization, and hypothesis testing of goodness-of-fit. Beyond Robust SAA, this connection provides a unified perspective enabling us to characterize the finite sample and asymptotic guarantees of various other data-driven procedures that are based upon distributionally robust optimization. This analysis provides insight into the practical performance of these various methods in real applications. We present examples from inventory management and portfolio allocation, and demonstrate numerically that our approach outperforms other data-driven approaches in these applications.
Data-Driven Robust Optimization (with D. Bertsimas and V. Gupta). Accepted for publication in Mathematical Programming.
Finalist, INFORMS Nicholson Paper Competition 2013.
Abstract: The last decade has seen an explosion in the availability of data for operations research applications as part of the Big Data revolution. Motivated by this data rich paradigm, we propose a novel schema for utilizing data to design uncertainty sets for robust optimization using statistical hypothesis tests. The approach is flexible and widely applicable, and robust optimization problems built from our new sets are computationally tractable, both theoretically and practically. Furthermore, optimal solutions to these problems enjoy a strong, finite-sample probabilistic guarantee. We also propose concrete guidelines for practitioners and illustrate our approach with applications in portfolio management and queueing. Computational evidence confirms that our data-driven sets significantly outperform conventional robust optimization techniques whenever data is available.
Predicting Crowd Behavior with Big Public Data. Proceedings of the 23rd international conference on World Wide Web (WWW) companion, 23:625—630, 2014.
Winner, INFORMS Social Media Analytics Best Paper Competition 2015.
Abstract: With public information becoming widely accessible and shared on today's web, greater insights are possible into crowd actions by citizens and non-state actors such as large protests and cyber activism. Turning public data into Big Data, company Recorded Future continually scans over 300,000 open content web sources in 7 languages from all over the world, ranging from mainstream news to government publications to blogs and social media. We study the predictive power of this massive public data in forecasting crowd actions such as large protests and cyber campaigns before they occur. Using natural language processing, event information is extracted from content such as type of event, what entities are involved and in what role, sentiment and tone, and the occurrence time range of the event discussed. The amount of information is staggering and trends can be seen clearly in sheer numbers. In the first half of this paper we show how we use this data to predict large protests in a selection of 19 countries and 37 cities in Asia, Africa, and Europe with high accuracy using standard learning machines. In the second half we delve into predicting the perpetrators and targets of political cyber attacks with a novel application of the naïve Bayes classifier to high-dimensional sequence mining in massive datasets.
Scheduling, Revenue Management, and Fairness in an Academic-Hospital Division: An Optimization Approach (with D. Bertsimas and R. Baum). Academic Radiology, 21(10):1322—1330, 2014.
Abstract: Physician staff of academic hospitals today practice in several geographic locations including their main hospital, referred to as the extended campus. With extended campuses expanding, the growing complexity of a single division's schedule means that a naïve approach to scheduling compromises revenue and can fail to consider physician over-exertion. Moreover, it may provide an unfair allocation of individual revenue, desirable or burdensome assignments, and the extent to which the preferences of each individual are met. This has adverse consequences on incentivization and employee satisfaction and is simply against business policy. We identify the daily scheduling of physicians in this context as an operational problem that incorporates scheduling, revenue management, and fairness. Noting previous success of operations management and optimization in each of these disciplines, we propose a simple, unified optimization formulation of this scheduling problem using mixed integer optimization (MIO). Through a study of implementing the approach at the Division of Angiography and Interventional Radiology at the Brigham and Women's Hospital, which is directed by one of the authors, we exemplify the flexibility of the model to adapt to specific applications, the tractability of solving the model in practical settings, and the significant impact of the approach, most notably in increasing revenue significantly while being only more fair and objective.
Spring 2017: Applied Machine Learning (CS 5785)
Description: Learn and apply key concepts of modeling, analysis and validation from Machine Learning, Data Mining and Signal Processing to analyze and extract meaning from data. Implement algorithms and perform experiments on images, text, audio and mobile sensor measurements. Gain working knowledge of supervised and unsupervised techniques including classification, regression, clustering, feature selection, association rule mining and dimensionality reduction.