Speaker: Daniela Witten, University of Washington
Abstract: We propose data thinning, a new approach for splitting an observation into two or more independent parts that sum to the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This proposal is very general,and can be applied to any observation drawn from a "convolution closed" distribution, a class that includes the Gaussian, Poisson, negative binomial, Gamma, and binomial distributions, among others. It is similar in spirit to -- but distinct from, and more easily applicable than -- a recent proposal known as data fission. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the "usual" approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. We will present an application of data thinning to single-cell RNA-sequencing data. This is joint work with Anna Neufeld and Ameer Dharamshi (University of Washington) and Lucy Gao (University of British Columbia).
Daniela will be on-site for the seminar in E3003, but the seminar will also be streamed through Zoom.
ZOOM
Meeting ID: 928 8613 3715
Passcode: 838290