Statistics and Data Science Seminars

Upcoming Statistics and Data Science Seminars
DMS Statistics and Data Science Seminar
Oct 04, 2023 02:00 PM
ZOOM/354 Parker Hall


Speaker: Takumi Saegusa, University of Maryland

Title: Data Integration in Public Health Research
Abstract: Various data sets collected from numerous sources have a great potential to enhance the quality of inference and accelerate scientific discovery. Inference for merged data is, however, quite challenging because such data may contain unidentified duplication from overlapping inhomogeneous sources and each data set often opportunistically collected induces complex dependence. In public health research, for example, epidemiological studies have different inclusion and exclusion criteria in contrast to hospital records without a well-defined target population, and when combined with a disease registry, patients appear in multiple data sets. In this talk, we present several examples in public health research which potentially enjoy the merits of data integration. We overview existing research such as random effects model approach and multiple frame surveys and discuss their limitations in view of inferential goals, privacy protection, and large sample theory. We then propose our estimation and testing method in the context of survival analysis and two-sample tests. We illustrate our theory in simulation and real data examples. If time permitted, we discuss extensions of our proposed method in several directions.

More Events...

Past Statistics and Data Science Seminars
DMS Statistics and Data Science Seminar
Sep 20, 2023 02:00 PM
354 Parker Hall / ZOOM

Speaker: Davide Guzzetti, Department of Aerospace Engineering, Auburn. 
Title: Orbit Shapes in the Three-Body Problem: Importance and Applications  
Abstract: Within an unperturbed central-body gravitational field, Keplerian orbital elements form a coordinate set that is also an effective and intuitive topological description, amenable to the visualization of orbit properties and the design of space flight solutions. Unfortunately, a compact and elegant topological description for all orbits in the Circular Restricted Three-Body Problem (CR3BP), akin to the widely used Keplerian orbital elements, or alternative two-body-problem coordinate sets, is not currently available. As a result, there exists a disconnect between coordinate sets and topological features that may render orbit uniqueness within CR3BP dynamics. Tools from topological data analysis offer the opportunity to bridge this disconnect by further equipping coordinate sets with additional elements—signatures and distance metrics—that precisely represent orbit topology. Our current work explores the possibility of developing a comprehensive and dependable representation of dynamical structures within gravitational multi-body environments at all levels of fidelity, one that is derived from the study of persistence of topology generators, such as loops and voids. Synergistically, our work introduces spatial computing interfaces as a new paradigm for trajectory design. In particular, we explore the challenges of mapping user-drawn curves in virtual reality to feasible spacecraft trajectories in the Earth-Moon system. Such new modalities in human-computer interactions could enhance the interface between human insight and algorithmic processes. More effective visual steering strategies are particularly beneficial for trajectory designers who have temporary, limited access to the solution space of a dynamical system, like in the case of CR3BP dynamics.

DMS Statistics and Data Science Seminar
Sep 13, 2023 02:00 PM
354 Parker Hall


Speaker: Dr. Yang Chen (University of Michigan)

Title: Video Imputation and Prediction Methods with Applications in Space Weather
Abstract:  The total electron content (TEC) maps can be used to estimate the signal delay of GPS due to the ionospheric electron content between a receiver and a satellite. This delay can result in a GPS positioning error. Thus, it is important to monitor and forecast the TEC maps. However, the observed TEC maps have big patches of missingness in the ocean and scattered small areas on the land. Thus, precise imputation and prediction of the TEC maps are crucial in space weather forecasting.
In this talk, I first present several extensions of existing matrix completion algorithms to achieve TEC map reconstruction, accounting for spatial smoothness and temporal consistency while preserving important structures of the TEC maps. We call the proposed method video imputation with softImpute, temporal smoothing, and auxiliary data (VISTA). We show that our proposed method achieves better reconstructed TEC maps than existing methods in the literature. I will also briefly describe the use of our large-scale complete TEC database. Then, I present a new model for forecasting time series data distributed on a matrix-shaped spatial grid, using the historical spatiotemporal data and auxiliary vector-valued time series data. We model the matrix time series as an auto-regressive process, where a future matrix is jointly predicted by the historical values of the matrix time series and an auxiliary vector time series. Large sample asymptotics of the estimators are established, and performances of the model are validated with extensive simulation studies and a real data application to forecast the global TEC distributions.

DMS Statistics and Data Science Seminar
Sep 06, 2023 02:00 PM
354 Parker Hall


Speaker: Wenying Li, Auburn University,  Assistant Professor of Agriculture Economics

Title: Dimension Reduction: Addressing Aggregation Bias in Large Consumer Demand Systems


Abstract: Building on an insight of Lewbel (1996) that aggregation bias is a special case of the omitted variable bias, we propose two strategies for reducing bias in inconsistently aggregated consumer demand systems. The first uses a penalized lasso approach and the second relies on a residual-based instrumental variable technique to control for the correlation between group prices and the residual in an aggregate demand. In an example, the preferred strategy reduces bias by up to 91% in own-price elasticities and 57% in cross-price elasticities. These strategies are useful to situations where an inconsistently aggregated demand has to be used for practical purposes. 


DMS Statistics and Data Science Seminar
May 03, 2023 01:00 PM


Speaker: Yuan Ke (University of Georgia)

Title: Model-Free Feature Screening and FDR Control with Knockoff Features
Abstract: This article proposes a model-free and data-adaptive feature screening method for ultrahigh-dimensional data. The proposed method is based on the projection correlation which measures the dependence between two random vectors. This projection correlation based method does not require specifying a regression model, and applies to data in the presence of heavy tails and multivariate responses. It enjoys both sure screening and rank consistency properties under weak assumptions. A two-step approach, with the help of knockoff features, is advocated to specify the threshold for feature screening such that the false discovery rate (FDR) is controlled under a prespecified level. The proposed two-step approach enjoys both sure screening and FDR control simultaneously if the prespecified FDR level is greater or equal to 1/s, where s is the number of active features. The superior empirical performance of the proposed method is illustrated by simulation examples and real data applications.

DMS Statistics and Data Science Seminar
Apr 26, 2023 01:00 PM


Speaker: Valérie Chavez (University of Lausanne)


Title: Extreme value theory and climate extremes

Abstract: The past few decades have seen extreme climate events affecting all regions of the world with catastrophic impacts on human society. Quantifying the risk of such events is difficult but necessary. In this context, methodologies based on extreme value theory play an important role. Extreme value theory (EVT) is the field of statistics dedicated to the study of events with low occurrence frequencies and large amplitudes.  In this talk, I will review some basics of EVT and provide some insights on how to use EVT to quantify the risk of such events.


Short Bio: Dr. Valérie Chavez-Demoulin is a Full Professor of Statistics at HEC Lausanne, University of Lausanne, specializing in statistical methods for quantitative risk management and statistical methodologies applied to operations management in general, and the statistical modeling of extreme events in particular. More recent methodological work concerns conditional dependence structures modeling, non-parametric Bayesian models, dynamic Extreme Value Theory models, and extremes for non-stationary time series. Chavez-Demoulin holds a Ph.D. in Statistics from EPFL. She is an elected member of ISI (The International Statistical Institute).

DMS Statistics and Data Science Seminar
Apr 19, 2023 01:00 PM
358 Parker Hall


Speaker: Xin Bing (University of Toronto)


Title: Optimal Discriminant Analysis in High-Dimensional Latent Factor Models

Abstract: In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower-dimensional space, and base the classification on the resulting lower-dimensional projections. In this paper, we formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure and to guide which projection to choose. We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections, with the number of retained PCs selected in a data-driven way. A general theory is established for analyzing such two-step classifiers based on any projections. We derive explicit rates of convergence of the excess risk of the proposed PC-based classifier. The obtained rates are further shown to be optimal up to logarithmic factors in the minimax sense. Our theory allows the lower dimension to grow with the sample size and is also valid even when the feature dimension (greatly) exceeds the sample size. Extensive simulations corroborate our theoretical findings. The proposed method also performs favorably relative to other existing discriminant methods on three real data examples.


Short Bio. Dr. Bing holds a Ph.D. degree in statistics from Cornell University. His research interest generally lies in developing new methodologies with theoretical guarantees to tackle modern statistical problems such as high-dimensional statistics, low-rank matrix estimation, multivariate analysis, model-based clustering, latent factor model, topic models, minimax estimation, high-dimensional inference, and statistical and computational trade-offs. He is also interested in the applications of statistical methods to genetics, neuroscience, immunology, and other areas.

DMS Statistics and Data Science Seminar canceled
Apr 12, 2023 01:00 PM



DMS Statistics and Data Science Seminar
Apr 05, 2023 01:00 PM
358 Parker Hall


Speaker: Carsten Chong (Columbia University)


Title: Statistical inference for rough volatility: Central limit theorems

Abstract: In recent years, there has been substantive empirical evidence that stochastic volatility is rough. In other words, the local behavior of stochastic volatility is much more irregular than semimartingales and resembles that of a fractional Brownian motion with Hurst parameter H<0.5. In this paper, we derive a consistent and asymptotically mixed normal estimator of H based on high-frequency price observations. In contrast to previous works, we work in a semiparametric setting and do not assume any a priori relationship between volatility estimators and true volatility. Furthermore, our estimator attains a rate of convergence that is known to be optimal in a minimax sense in parametric rough volatility models.


Short Bio: Dr. Chong is currently an assistant professor at Columbia University and will join HKUST this summer. Before this, Dr. Chong did a Ph.D. at the Technical University of Munich. His research interests are primarily focused on statistical inference problems for stochastic processes, with an emphasis on high-frequency techniques and applications to financial econometrics. He is also interested in the area of stochastic partial differential equations, in particular, in stochastic PDEs driven by Levy noises, which, in contrast to Gaussian noises, typically have discontinuous and/or heavy-tailed components.

DMS Statistics and Data Science Seminar
Mar 29, 2023 01:00 PM
358 Parker Hall


Speaker: Linxi Liu (University of Pittsburgh)

Title: Bayesian density trees and forests
Abstract: Density estimation is a fundamental problem in statistics. Once an explicit estimate of the density function is obtained, various kinds of statistical inference can follow, including classification, non-parametric testing, clustering, and data compression. In this talk, I will focus on tree-based methods for density estimation under the Bayesian framework, and introduce two types of priors--the Dirichlet prior and the optional Polya tree (Wong and Ma, 2010). For a class of density functions satisfying a sparsity condition in the spectral domain, we show that Bayesian density trees can achieve fast convergence. The result implies that tree-based methods can adapt to both spatially inhomogeneous and local features of the underlying density function. I will also introduce a novel Bayesian model for density forests and show that for a class of Holder continuous functions, forests can achieve faster convergence than trees. The convergence rate is adaptive in the sense that to achieve such a rate we do not need any prior knowledge of the smoothness level. For both Bayesian density trees and forests, I will provide several numerical results to illustrate their performance in the moderately high-dimensional case.
Dr. Liu, Assistant Professor at the University of Pittsburgh,  holds a Ph.D. in statistics from Stanford University. Her research interests are mainly on Bayesian statistics, density estimation, and nonparametric methods.

DMS Statistics and Data Science Seminar
Mar 22, 2023 01:00 PM


Speaker: Yaqing Chen (Rutgers University)


Title: Geometric Exploration of Random Objects Through Optimal Transport

Abstract: We propose new tools for the geometric exploration of data objects taking values in a general separable metric space. For a random object, we first introduce the concept of depth profiles. Specifically, the depth profile of a point in a metric space is the distribution of distances between the very point and the random object. Depth profiles can be harnessed to define transport ranks based on optimal transport, which capture the centrality and outlyingness of each element in the metric space with respect to the probability measure induced by the random object. We study the properties of transport ranks and show that they provide an effective device for detecting and visualizing patterns in samples of random objects. In particular, we establish the theoretical guarantees for the estimation of the depth profiles and the transport ranks for a wide class of metric spaces, followed by practical illustrations.


Bio Facts:

Dr. Chen, Assistant Professor at Rutgers University, holds a Ph.D. degree in statistics from the University of California, Davis, under the supervision of Dr. Hans-Georg Mueller. Her research interests are mainly in functional data analysis and non-Euclidean data analysis.


More Events...