COSAM » Departments » Mathematics & Statistics » Research » Seminars » Statistics and Data Science
Statistics and Data Science
Mar 29, 2023 01:00 PM
358 Parker Hall
Speaker: Linxi Liu (University of Pittsburgh)
Tit;e: TBA
Mar 22, 2023 01:00 PM
ZOOM
Speaker: Yaqing Chen (Rutgers University)
Title: Geometric Exploration of Random Objects Through Optimal Transport
Abstract: We propose new tools for the geometric exploration of data objects taking values in a general separable metric space. For a random object, we first introduce the concept of depth profiles. Specifically, the depth profile of a point in a metric space is the distribution of distances between the very point and the random object. Depth profiles can be harnessed to define transport ranks based on optimal transport, which capture the centrality and outlyingness of each element in the metric space with respect to the probability measure induced by the random object. We study the properties of transport ranks and show that they provide an effective device for detecting and visualizing patterns in samples of random objects. In particular, we establish the theoretical guarantees for the estimation of the depth profiles and the transport ranks for a wide class of metric spaces, followed by practical illustrations.
Bio Facts:
Dr. Chen, Assistant Professor at Rutgers University, holds a Ph.D. degree in statistics from the University of California, Davis, under the supervision of Dr. Hans-Georg Mueller. Her research interests are mainly in functional data analysis and non-Euclidean data analysis.
DMS Statistics and Data Science Seminar
Mar 15, 2023 01:00 PM

DMS Statistics and Data Science Seminar
Mar 01, 2023 01:00 PM
ZOOM

DMS Statistics and Data Science Seminar
Feb 22, 2023 01:00 PM
ZOOM
Speaker: Miles Lopes (UC Davis)
Title: Rates of Approximation for CLT and Bootstrap in High Dimensions
Abstract: In the setting of low-dimensional data, it is well known that the distribution of a sample mean can be consistently approximated using the CLT or bootstrap methods. Also, the classical Berry-Esseen theorem shows that such approximations can achieve a rate of order n^{-1/2}, where accuracy is measured with respect to the "Kolmogorov distance." However, until recently, it was an open problem to determine if Berry-Esseen type bounds with near n^{-1/2} rates can be established in the context of high-dimensional data --- which stimulated many advances in the literature during the last several years. ln this talk, I will survey these developments and discuss some of my own work on this problem.
DMS Statistics and Data Science Seminar
Feb 15, 2023 01:00 PM
358 Parker Hall
Speaker: Marco Avella Medina (Columbia University)
Title: Differentially private high-dimensional M-estimation via noisy optimization
Abstract: We consider a general optimization-based framework for computing differentially private high-dimensional M-estimators and a new method for constructing differentially private confidence regions. In particular, we show how to construct differentially private penalized M-estimators via a noisy projected gradient descent algorithm that obtains global linear convergence under local restricted strong convexity. Our analysis shows that our estimators converge with high probability to a nearly optimal neighborhood of the non-private M-estimators. We then tackle the problem of parametric inference by constructing a differentially private version of a debiased lasso procedure. This enables us to construct confidence regions and to conduct hypothesis testing, based on private approximate pivots. We illustrate the performance of our methods in several numerical examples.
This is joint work with Po-Ling Loh and Zheng Liu.
DMS Statistics and Data Science Seminar
Jan 18, 2023 01:00 PM
228 Parker Hall
Speaker: Dr. Shuang Zhou (Assistant Professor in the School of Mathematical and Statistical Sciences at Arizona State University)
Title: Caveats and remedies of truncated multivariate normal priors in Bayesian constrained inference
Abstract: We show that lower-dimensional marginal densities of dependent zero-mean normal distributions truncated to the positive orthant exhibit a mass-shifting phenomenon. Despite the truncated multivariate normal density having a mode at the origin, the marginal density assigns increasingly small mass near the origin as the dimension increases. The phenomenon accentuates with stronger correlation between the random variables. This surprising behavior has serious implications towards Bayesian constrained estimation and inference, where the prior, in addition to having a full support, is required to assign a substantial probability near the origin to capture flat parts of the true function of interest. A precise quantification of the mass shifting phenomenon for both the prior and the posterior, characterizing the role of the dimension as well as the dependence, is provided under a variety of correlation structures. Without further modification, we show that truncated normal priors are not suitable for modeling flat regions and propose a novel alternative strategy based on shrinking the coordinates using a multiplicative scale parameter. The proposed shrinkage prior is shown to achieve optimal posterior contraction around true functions with potentially flat regions. Synthetic and real data studies demonstrate how the modification guards against the mass shifting phenomenon while retaining computational efficiency.
Dr. Zhou's research interests center around statistical inference under non-standard constraints, Bayesian nonparametric and hierarchical modeling, Bayesian asymptotics, and high dimensional statistics, as well as statistical applications in nuclear physic and actuarial sciences.
DMS Statistics and Data Science Seminar
Nov 17, 2022 02:00 PM
ZOOM
Speaker: Andrea Angiuli (Research Scientist in the Prime Machine Learning team at Amazon)
Title: Bridging the gap of reinforcement learning for mean field games and mean field control problems
Abstract: Mean field games (MFG) and mean field control problems (MFC) are frameworks to study Nash equilibria or social optima in games with a continuum of agents. These problems can be used to approximate competitive or cooperative games with a large finite number of agents and have found a broad range of applications, in particular in economics. In recent years, the question of learning in MFG and MFC has garnered interest, both as a way to compute solutions and as a way to model how large populations of learners converge to an equilibrium. Of particular interest is the setting where the agents do not know the model, which leads to the development of reinforcement learning (RL) methods. We present a two timescale approach with RL for MFG and MFC, which relies on a unified Q-learning algorithm. To illustrate this method, we apply it to mean field problems arising in Finance.
DMS Statistics and Data Science Seminar CANCELED
Nov 10, 2022 02:00 PM
228 Parker Hall
IN PERSON
Speaker: Ioannis Sgouralis (University of Tennessee)
Title: Bayesian nonparametric modeling of biophysical and biochemical data
Abstract: Modern experiments monitor physical systems with high resolution that may reach the molecular level. Excessive noise stemming from the measuring hardware and the experimental procedures or unaccounted processes demand the formulation of specialized methods for the analysis of the datasets acquired. Nevertheless, physical limitations and the inherent uncertainties in the underlying systems, such as unknown parameters, states, or dynamics pose unique conceptual and computational challenges that, under physically realistic data representations, lead to intractable problems of model selection. In this talk, I will present an overview of the difficulties that are commonly encountered especially with single molecule measurements. I will also highlight recent advances, including Bayesian non-parametric approaches, which provide feasible alternatives to model selection.
DMS Statistics and Data Science Seminar
Nov 03, 2022 02:00 PM
ZOOM
Speaker: Mia Hubert (KU Leuven (Belgium))
Title: Outlier detection in non-elliptical data by kernel MRCD
Abstract: The minimum regularized covariance determinant method (MRCD) is a robust estimator for multivariate location and scatter, which detects outliers by fitting a robust covariance matrix to the data. Its regularization ensures that the covariance matrix is well-conditioned in any dimension. The MRCD assumes that the non-outlying observations are roughly elliptically distributed, but many datasets are not of that form. Moreover, the computation time of MRCD increases substantially when the number of variables goes up, and nowadays datasets with many variables are common. The proposed Kernel Minimum Regularized Covariance Determinant (KMRCD) estimator addresses both issues. It is not restricted to elliptical data because it implicitly computes the MRCD estimates in a kernel induced feature space. A fast algorithm is constructed that starts from kernel-based initial estimates and exploits the kernel trick to speed up the subsequent computations. Based on the KMRCD estimates, a rule is proposed to flag outliers. The KMRCD algorithm performs well in simulations, and is illustrated on real-life data.
DMS Statistics and Data Science Seminar
Oct 27, 2022 02:00 PM
ZOOM
Speaker: Marco Riani (Professor of Statistics, University of Parma, Italy)
Title: Robust and efficient regression analysis with applications
Abstract: Data rarely follow the simple models of mathematical statistics. Often, there will be distinct subsets of observations so that more than one model may be appropriate. Further, parameters may gradually change over time. In addition, there are often dispersed or grouped outliers which, in the context of international trade data, may correspond to fraudulent behavior. All these issues are present in the datasets that are analyzed on a daily basis, for example, by the Joint Research Centre of the European Commission and can only be tackled by using methods which are robust to deviations to model assumptions. In this talk, we suggest a system of interrogating robust analyses, which we call “monitoring” and describe a series of robust and efficient methods to detect model deviations, groups of homogeneous observations multiple outliers and/or sudden level shifts in time series. Particular attention will be given to robust and efficient methods (known as forward search) which enables to use a flexible level of trimming and understand the effect that each unit (outlier or not) exerts on the model. Finally we discuss the extension of the above methods to transformations and to the big data context. The Box-Cox power transformation family for non-negative responses in linear models has a long and interesting history in both statistical practice and theory. The Yeo-Johnson transformation extends the family to observations that can be positive or negative. In this talk we describe an extended Yeo-Johnson transformation that allows positive and negative responses to have different power transformations.
Last Updated: 09/08/2020