Learning from Missing and Imperfect Data

Wednesday, April 9, 2025 - 4:00pm to 5:00pm
Location: 
32-G575
Speaker: 
Anay Mehrotra (Yale)
Biography: 
https://anaymehrotra.com/

Positive-Unlabeled Learning (PU Learning) is a framework for learning when only positive and unlabeled data are available, which is a common scenario in Bioinformatics, Medicine, and Fraud Detection, where obtaining negative samples is challenging or costly.

In this talk, we present an extension of the PU Learning paradigm: Positive and Imperfect Unlabeled Learning (PIU Learning). PIU Learning accounts for the low-quality of unlabeled data that can arise due to biases, covariate shifts, and adversarial corruptions – which are frequently encountered when leveraging public and crowdsourced datasets.

Beyond its practical relevance, this change in the formulation of PU learning leads to some new theoretical implications. We show how it connects to fundamental problems, such as learning from smoothed distributions, detecting data truncation, and estimation under truncation, each central to Statistics and Learning Theory. 

This talk is based on joint work with Jane H. Lee and Manolis Zampetakis.