Skip to main content

Biostatistics Seminar Series: Data Thinning for Convolution-Closed Distributions

Department & Center Events

Monday, March 27, 2023, 12:15 p.m. - 1:15 p.m. ET
Online/Onsite
Past Event

Speaker: Daniela Witten, University of Washington

Abstract: We propose data thinning, a new approach for splitting an observation into two or more independent parts that sum to the original observation, and that follow the same distribution as the original observation, up to a (known) scaling of a parameter. This proposal is very general,and can be applied to any observation drawn from a "convolution closed" distribution, a class that includes the Gaussian, Poisson, negative binomial, Gamma, and binomial distributions, among others. It is similar in spirit to -- but distinct from, and more easily applicable than -- a recent proposal known as data fission. Data thinning has a number of applications to model selection, evaluation, and inference. For instance, cross-validation via data thinning provides an attractive alternative to the "usual" approach of cross-validation via sample splitting, especially in unsupervised settings in which the latter is not applicable. We will present an application of data thinning to single-cell RNA-sequencing data. This is joint work with Anna Neufeld and Ameer Dharamshi (University of Washington) and Lucy Gao (University of British Columbia).

Daniela will be on-site for the seminar in E3003, but the seminar will also be streamed through Zoom.

ZOOM

Meeting ID: 928 8613 3715

Passcode: 838290

Contact Info

Kara Schoenberg