Georgia Tech Algorithm and Randomness Center (ARC)

ARC Colloquium: Vitaly Feldman - IBM/Simons Institute

Title:

Preserving Statistical Validity in Adaptive Data Analysis

Abstract:

A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries resulting from misapplication of statistical data analysis. Existing approaches to ensuring validity of inferences drawn from data assume a fixed collection of hypotheses to be tested, or analysis to be applied, selected non-adaptively before the data are examined. In contrast, the practice of data analysis in scientific research is by its nature an adaptive process, in which new hypotheses are generated and new analyses are performed on the basis of data exploration and observed outcomes on the same data. We demonstrate a new approach for addressing the challenges of adaptivity based on insights from private data analysis. As an application we show how to safely reuse a holdout set a great many times without undermining its validation power, even when hypotheses, models, and algorithms are chosen adaptively.

Joint work with Cynthia Dwork, Moritz Hardt, Toni Pitassi, Omer Reingold and Aaron Roth.

Georgia Institute of Technology Georgia Tech Algorithm and Randomness Center (ARC)

ARC Colloquium: Vitaly Feldman - IBM/Simons Institute

Event Details

For More Information Contact

Related Links

Georgia Tech Resources

Visitor Resources