Motivated by learning dynamical structures from static snapshot data, this paper presents a distribution-on-scalar regression approach for estimating the density evolution of a stochastic process from its noisy temporal point clouds. We propose an entropy-regularized nonparametric maximum likelihood estimator (E-NPMLE), which leverages the entropic optimal transport as a smoothing regularizer for the density flow. We show that the E-NPMLE has almost dimension-free statistical rates of convergence to the ground truth distributions, which exhibit a striking phase transition phenomenon in terms of the number of snapshots and per-snapshot sample size. To efficiently compute the E-NPMLE, we design a novel particle-based and grid-free coordinate KL divergence gradient descent (CKLGD) algorithm and prove its polynomial iteration complexity. Moreover, we provide numerical evidence on synthetic data to support our theoretical findings. This work contributes to the theoretical understanding and practical computation of estimating density evolution from noisy observations in arbitrary dimensions.
@misc{YNCY2025_snapshot-learning,author={Yao, Rentian and Nitanda, Atsushi and Chen, Xiaohui and Yang, Yun},month=feb,title={{Learning Density Evolution from Snapshot Data}},year={2025}}
The Wasserstein barycenter plays a fundamental role in averaging measure-valued data under the framework of optimal transport. However, there are tremendous challenges in computing and estimating the Wasserstein barycenter for high-dimensional distributions. In this paper, we introduce the multimarginal Schrödinger barycenter (MSB) based on the entropy regularized multimarginal optimal transport problem that admits general-purpose fast algorithms for computation. By recognizing a proper dual geometry, we derive non-asymptotic rates of convergence for estimating several key MSB quantities from point clouds randomly sampled from the input marginal distributions. Specifically, we show that our obtained sample complexity is statistically optimal for estimating the cost functional, Schrödinger coupling and barycenter.
@misc{LiChen2025_MSB,author={Li, Pengtao and Chen, Xiaohui},month=feb,title={{Multimarginal Schr\"{o}dinger Barycenter}},year={2025}}
The proximal algorithm is a powerful tool to minimize nonlinear and nonsmooth functionals in a general metric space. Motivated by the recent progress in studying the training dynamics of the noisy gradient descent algorithm on two-layer neural networks in the mean-field regime, we provide in this paper a simple and self-contained analysis for the convergence of the general-purpose Wasserstein proximal algorithm without assuming geodesic convexity on the objective functional. Under a natural Wasserstein analog of the Euclidean Polyak-Łojasiewicz inequality, we show that the proximal algorithm achieves an unbiased and linear convergence rate. Our convergence rate improves upon existing rates of the proximal algorithm for solving Wasserstein gradient flows under strong geodesic convexity. We also extend our analysis to the inexact proximal algorithm for geodesically semiconvex objectives. In our numerical experiments, proximal training demonstrates a faster convergence rate than the noisy gradient descent algorithm on mean-field neural networks.
@misc{ZhuChen2025_convergence-nonconvex,author={Zhu, Shuailong and Chen, Xiaohui},month=jan,title={Convergence analysis of the Wasserstein proximal algorithm beyond geodesic convexity},year={2025}}
The optimal transport barycenter (a.k.a. Wasserstein barycenter) is a fundamental notion of averaging that extends from the Euclidean space to the Wasserstein space of probability distributions. Computation of the \emphunregularized barycenter for discretized probability distributions on point clouds is a challenging task when the domain dimension d > 1. Most practical algorithms for approximating the barycenter problem are based on entropic regularization. In this paper, we introduce a nearly linear time O(m \logm) and linear space complexity O(m) primal-dual algorithm, the \emphWasserstein-Descent \dot\mathbbH^1-Ascent (WDHA) algorithm, for computing the \emphexact barycenter when the input probability density functions are discretized on an m-point grid. The key success of the WDHA algorithm hinges on alternating between two different yet closely related Wasserstein and Sobolev optimization geometries for the primal barycenter and dual Kantorovich potential subproblems. Under reasonable assumptions, we establish the convergence rate and iteration complexity of WDHA to its stationary point when the step size is appropriately chosen. Superior computational efficacy, scalability, and accuracy over the existing Sinkhorn-type algorithms are demonstrated on high-resolution (e.g., 1024 \times 1024 images) 2D synthetic and real data.
@misc{KimYaoZhuChen2025_barycenter-nonconvex-concave,author={Kim, Kaheon and Yao, Rentian and Zhu, Changbo and Chen, Xiaohui},month=jan,title={Optimal transport barycenter via nonconvex concave minimax optimization},year={2025}}
Random quantum states play a critical role in quantum information processing. While random quantum circuits typically provide pseudo-random states, deep thermalization introduces quantum measurement to generate genuinely random states. However, the requirement of large ancillae in conventional deep thermalization poses a challenge to scale up the system size. We introduce holographic deep thermalization to substantially reduce the required ancillae to a system-size independent constant. Our circuit design trades space with time, via adopting a sequential application of an scrambling-measure-reset process on a small number of ancillae. Via tuning the ancilla size and number of time steps, holographic deep thermalization allows a continuous trade-off between the total quantum circuit size and the ancilla size. In the case of finite-size systems, we further enhance the performance of holographic deep thermalization via generative quantum machine learning, which leads to constant-factor advantages in the convergence towards Haar random. The theoretical predictions are verified with IBM Quantum noisy simulations.
@misc{zhang2024holographicdeepthermalization,archiveprefix={arXiv},author={Zhang, Bingzhi and Xu, Peng and Chen, Xiaohui and Zhuang, Quntao},date-modified={2025-01-21 22:15:37 -0800},eprint={2411.03587},month=nov,primaryclass={quant-ph},title={Holographic deep thermalization: theory and experimentation},year={2024}}
Motivated by approximation Bayesian computation using mean-field variational approximation and the computation of equilibrium in multi-species systems with cross-interaction, this paper investigates the composite geodesically convex optimization problem over multiple distributions. The objective functional under consideration is composed of a convex potential energy on a product of Wasserstein spaces and a sum of convex self-interaction and internal energies associated with each distribution. To efficiently solve this problem, we introduce the Wasserstein Proximal Coordinate Gradient (WPCG) algorithms with parallel, sequential and random update schemes. Under a quadratic growth (QC) condition that is weaker than the usual strong convexity requirement on the objective functional, we show that WPCG converges exponentially fast to the unique global optimum. In the absence of the QG condition, WPCG is still demonstrated to converge to the global optimal solution, albeit at a slower polynomial rate. Numerical results for both motivating examples are consistent with our theoretical findings.
@article{YaoChenYang2023_WPCG,author={Yao, Rentian and Chen, Xiaohui and Yang, Yun},date-added={2023-10-09 22:22:12 -0700},date-modified={2024-10-15 07:29:34 -0700},journal={Journal of Machine Learning Research},month=may,number={269},pages={1-66},title={Wasserstein proximal coordinate gradient algorithms},volume={25},year={2024},}
Optimal transport (OT) offers a versatile framework to compare complex data distributions in a geometrically meaningful way. Traditional methods for computing the Wasserstein distance and geodesic between probability measures require mesh-dependent domain discretization and suffer from the curse-of-dimensionality. We present GeONet, a mesh-invariant deep neural operator network that learns the non-linear mapping from the input pair of initial and terminal distributions to the Wasserstein geodesic connecting the two endpoint distributions. In the offline training stage, GeONet learns the saddle point optimality conditions for the dynamic formulation of the OT problem in the primal and dual spaces that are characterized by a coupled PDE system. The subsequent inference stage is instantaneous and can be deployed for real-time predictions in the online learning setting. We demonstrate that GeONet achieves comparable testing accuracy to the standard OT solvers on a simulation example and the CIFAR-10 dataset with considerably reduced inference-stage computational cost by orders of magnitude.
@inproceedings{GracykChen2022,author={Gracyk, Andrew and Chen, Xiaohui},booktitle={Conference on Uncertainty in Artificial Intelligence (UAI)},copyright={Creative Commons Attribution 4.0 International},date-added={2022-09-30 08:37:46 -0500},date-modified={2022-09-30 08:51:01 -0500},doi={10.48550/ARXIV.2209.14440},keywords={Machine Learning (cs.LG), Artificial Intelligence (cs.AI), Computer Vision and Pattern Recognition (cs.CV), Machine Learning (stat.ML), FOS: Computer and information sciences, FOS: Computer and information sciences},month=jul,title={GeONet: a neural operator for learning the Wasserstein geodesic},year={2024},bdsk-url-2={https://doi.org/10.48550/ARXIV.2209.14440}}
Deep generative models are key-enabling technology to computer vision, text generation, and large language models. Denoising diffusion probabilistic models (DDPMs) have recently gained much attention due to their ability to generate diverse and high-quality samples in many computer vision tasks, as well as to incorporate flexible model architectures and a relatively simple training scheme. Quantum generative models, empowered by entanglement and superposition, have brought new insight to learning classical and quantum data. Inspired by the classical counterpart, we propose the quantum denoising diffusion probabilistic model (QuDDPM) to enable efficiently trainable generative learning of quantum data. QuDDPM adopts sufficient layers of circuits to guarantee expressivity, while it introduces multiple intermediate training tasks as interpolation between the target distribution and noise to avoid barren plateau and guarantee efficient training. We provide bounds on the learning error and demonstrate QuDDPM’s capability in learning correlated quantum noise model, quantum many-body phases, and topological structure of quantum data. The results provide a paradigm for versatile and efficient quantum generative learning.
@article{zhang2023generative,author={Zhang, Bingzhi and Xu, Peng and Chen, Xiaohui and Zhuang, Quntao},date-added={2023-10-09 22:08:13 -0700},date-modified={2024-01-31 18:03:04 -0800},journal={Physical Review Letters},primaryclass={quant-ph},title={Generative quantum machine learning via denoising diffusion probabilistic models},year={2024},}
K-means clustering is a widely used machine learning method for identifying patterns in large datasets. Semidefinite programming (SDP) relaxations have recently been proposed for solving the K-means optimization problem that enjoy strong statistical optimality guarantees, but the prohibitive cost of implementing an SDP solver renders these guarantees inaccessible to practical datasets. By contrast, nonnegative matrix factorization (NMF) is a simple clustering algorithm that is widely used by machine learning practitioners, but without a solid statistical underpinning nor rigorous guarantees. In this paper, we describe an NMF-like algorithm that works by solving a nonnegative low-rank restriction of the SDP relaxed K-means formulation using a nonconvex Burer–Monteiro factorization approach. The resulting algorithm is just as simple and scalable as state-of-the-art NMF algorithms, while also enjoying the same strong statistical optimality guarantees as the SDP. In our experiments, we observe that our algorithm achieves substantially smaller mis-clustering errors compared to the existing state-of-the-art.
@inproceedings{ZhuangChenYangZhang2023_NMF-BM,author={Zhuang, Yubo and Chen, Xiaohui and Yang, Yun and Zhang, Richard Y.},booktitle={International Conference on Learning Representations (ICLR)},date-added={2023-06-04 11:47:33 -0700},date-modified={2024-01-16 21:35:09 -0800},month=may,title={Statistically Optimal K-means Clustering via Nonnegative Low-rank Semidefinite Programming},year={2024},}
Motivated by statistical inference problems in high-dimensional time series data analysis, we first derive non-asymptotic error bounds for Gaussian approximations of sums of high-dimensional dependent random vectors on hyper-rectangles, simple convex sets and sparsely convex sets. We investigate the quantitative effect of temporal dependence on the rates of convergence to a Gaussian random vector over three different dependency frameworks (α-mixing, m-dependent, and physical dependence measure). In particular, we establish new error bounds under the α-mixing framework and derive faster rate over existing results under the physical dependence measure. To implement the proposed results in practical statistical inference problems, we also derive a data-driven parametric bootstrap procedure based on a kernel estimator for the long-run covariance matrices. We apply the unified Gaussian and bootstrap approximation results to test mean vectors with combined \ell^2 and \ell^∞type statistics, change point detection, and construction of confidence regions for covariance and precision matrices, all for time series data.
@article{ChChWu2024,author={Chang, Jinyuan and Chen, Xiaohui and Wu, Mingcong},date-added={2023-11-11 11:38:48 -0800},date-modified={2023-11-11 11:39:04 -0800},doi={10.3150/23-BEJ1614},fjournal={Bernoulli},issn={1350-7265},journal={Bernoulli},number={1},pages={712-742},sici={1350-7265(2024)30:1<712:CLTFHD>2.0.CO;2-L},title={Central limit theorems for high dimensional dependent data},volume={30},year={2024},bdsk-url-1={https://doi.org/10.3150/23-BEJ1614}}
Principled nonparametric tests for regression curvature in Rd are often statistically and computationally challenging. This paper introduces the stratified incomplete local simplex (SILS) tests for joint concavity of nonparametric multiple regression. The SILS tests with suitable bootstrap calibration are shown to achieve simultaneous guarantees on dimension-free computational complexity, polynomial decay of the uniform error-in-size, and power consistency for general (global and local) alternatives. To establish these results, we develop a general theory for incomplete U-processes with stratified random sparse weights. Novel technical ingredients include maximal inequalities for the supremum of multiple incomplete U-processes.
@article{10.3150/22-BEJ1459,author={Song, Yanglei and Chen, Xiaohui and Kato, Kengo},date-added={2022-10-14 07:31:27 -0500},date-modified={2022-10-14 07:31:27 -0500},doi={10.3150/22-BEJ1459},journal={Bernoulli},keywords={curvature testing, incomplete U-processes, Nonparametric regression, stratification},number={1},pages={323 -- 349},publisher={Bernoulli Society for Mathematical Statistics and Probability},title={{Stratified incomplete local simplex tests for curvature of nonparametric multiple regression}},url={https://doi.org/10.3150/22-BEJ1459},volume={29},year={2023},bdsk-url-1={https://doi.org/10.3150/22-BEJ1459}}
Clustering is a widely deployed unsupervised learning tool. Model-based clustering is a flexible framework to tackle data heterogeneity when the clusters have different shapes. Likelihood-based inference for mixture distributions often involves non-convex and high-dimensional objective functions, imposing difficult computational and statistical challenges. The classic expectation-maximization (EM) algorithm is a computationally thrifty iterative method that maximizes a surrogate function minorizing the log-likelihood of observed data in each iteration, which however suffers from bad local maxima even in the special case of the standard Gaussian mixture model with common isotropic covariance matrices. On the other hand, recent studies reveal that the unique global solution of a semidefinite programming (SDP) relaxed K-means achieves the information-theoretically sharp threshold for perfectly recovering the cluster labels under the standard Gaussian mixture model. In this paper, we extend the SDP approach to a general setting by integrating cluster labels as model parameters and propose an iterative likelihood adjusted SDP (iLA-SDP) method that directly maximizes the \emphexact observed likelihood in the presence of data heterogeneity. By lifting the cluster assignment to group-specific membership matrices, iLA-SDP avoids centroids estimation – a key feature that allows exact recovery under well-separateness of centroids without being trapped by their adversarial configurations. Thus iLA-SDP is less sensitive than EM to initialization and more stable on high-dimensional data. Our numeric experiments demonstrate that iLA-SDP can achieve lower mis-clustering errors over several widely used clustering methods including K-means, SDP and EM algorithms.
@inproceedings{ZhuangChenYang2022_LA-SDP,author={Zhuang, Yubo and Chen, Xiaohui and Yang, Yun},booktitle={International Conference on Machine Learning (ICML)},date-added={2022-10-02 19:51:13 -0500},date-modified={2023-04-24 19:05:06 -0500},keywords={Machine Learning (stat.ML), Machine Learning (cs.LG), Optimization and Control (math.OC), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Mathematics, FOS: Mathematics},title={Likelihood adjusted semidefinite programs for clustering heterogeneous data},year={2023},bdsk-url-2={https://doi.org/10.48550/ARXIV.2209.15097}}
Clustering is an important exploratory data analysis technique to group objects based on their similarity. The widely used K-means clustering method relies on some notion of distance to partition data into a fewer number of groups. In the Euclidean space, centroid-based and distance-based formulations of the K-means are equivalent. In modern machine learning applications, data often arise as probability distributions and a natural generalization to handle measure-valued data is to use the optimal transport metric. Due to non-negative Alexandrov curvature of the Wasserstein space, barycenters suffer from regularity and non-robustness issues. The peculiar behaviors of Wasserstein barycenters may make the centroid-based formulation fail to represent the within-cluster data points, while the more direct distance-based K-means approach and its semidefinite program (SDP) relaxation are capable of recovering the true cluster labels. In the special case of clustering Gaussian distributions, we show that the SDP relaxed Wasserstein K-means can achieve exact recovery given the clusters are well-separated under the 2-Wasserstein metric. Our simulation and real data examples also demonstrate that distance-based K-means can achieve better classification performance over the standard centroid-based K-means for clustering probability distributions and images.
@inproceedings{ZhuangChenYang2022,author={Zhuang, Yubo and Chen, Xiaohui and Yang, Yun},booktitle={Proceedings of Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS)},date-added={2022-09-14 20:28:06 -0500},date-modified={2022-09-30 09:00:11 -0500},title={Wasserstein $K$-means for clustering probability distributions},year={2022},}
This paper concerns the nonparametric estimation problem of the distribution-state dependent drift vector field in an interacting N-particle system. Observing single-trajectory data for each particle, we derive the mean-field rate of convergence for the maximum likelihood estimator (MLE), which depends on both Gaussian complexity and Rademacher complexity of the function class. In particular, when the function class contains α-smooth Hölder functions, our rate of convergence is minimax optimal on the order of N^-\fracαd+2α. Combining with a Fourier analytical deconvolution estimator, we derive the consistency of MLE for the external force and interaction kernel in the McKean-Vlasov equation.
@inproceedings{pmlr-v178-yao22a,author={Yao, Rentian and Chen, Xiaohui and Yang, Yun},booktitle={Proceedings of Thirty Fifth Conference on Learning Theory (COLT)},date-added={2022-08-29 09:04:40 -0500},date-modified={2022-09-30 08:57:33 -0500},editor={Loh, Po-Ling and Raginsky, Maxim},month={02--05 Jul},pages={2242--2275},publisher={PMLR},series={Proceedings of Machine Learning Research},title={Mean-field nonparametric estimation of interacting particle systems},volume={178},year={2022},}
Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large (or even moderate) datasets. In this paper, we introduce a linear time complexity algorithm for approximating an SDP relaxed K-means clustering. The proposed sketch-and-lift (SL) approach solves an SDP on a subsampled dataset and then propagates the solution to all data points by a nearest-centroid rounding procedure. It is shown that the SL approach enjoys a similar exact recovery threshold as the K-means SDP on the full dataset, which is known to be information-theoretically tight under the Gaussian mixture model. The SL method can be made adaptive with enhanced theoretic properties when the cluster sizes are unbalanced. Our simulation experiments demonstrate that the statistical accuracy of the proposed method outperforms state-of-the-art fast clustering algorithms without sacrificing too much computational efficiency, and is comparable to the original K-means SDP with substantially reduced runtime.
@inproceedings{pmlr-v151-zhuang22a,author={Zhuang, Yubo and Chen, Xiaohui and Yang, Yun},booktitle={Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (AISTATS)},date-added={2022-08-29 09:39:43 -0500},date-modified={2022-09-30 08:59:39 -0500},editor={Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel},month={28--30 Mar},pages={9214--9246},publisher={PMLR},series={Proceedings of Machine Learning Research},title={Sketch-and-lift: scalable subsampled semidefinite program for K-means clustering},volume={151},year={2022},}
Irregular functional data in which densely sampled curves are observed over different ranges pose a challenge for modeling and inference, and sensitivity to outlier curves is a concern in applications. Motivated by applications in quantitative ultrasound signal analysis, this paper investigates a class of robust M-estimators for partially observed functional data including functional location and quantile estimators. Consistency of the estimators is established under general conditions on the partial observation process. Under smoothness conditions on the class of M-estimators, asymptotic Gaussian process approximations are established and used for large sample inference. The large sample approximations justify a bootstrap approximation for robust inferences about the functional response process. The performance is demonstrated in simulations and in the analysis of irregular functional data from quantitative ultrasound analysis.
@article{ParkChenSimpons2022,author={Park, Yeonjoo and Chen, Xiaohui and Simpson, Douglas},date-added={2022-08-29 09:04:40 -0500},date-modified={2022-09-30 08:59:06 -0500},journal={Statistica Sinica},pages={2265-2293},title={Robust Inference for Partially Observed Functional Response Data},volume={32},year={2022}}
We consider the problem of change point detection for high-dimensional distributions in a location family when the dimension can be much larger than the sample size. In change point analysis, the widely used cumulative sum (CUSUM) statistics are sensitive to outliers and heavy-tailed distributions. In this paper, we propose a robust, tuning-free (i.e., fully data-dependent), and easy-to-implement change point test that enjoys strong theoretical guarantees. To achieve the robust purpose in a nonparametric setting, we formulate the change point detection in the multivariate U-statistics framework with anti-symmetric and nonlinear kernels. Specifically, the within-sample noise is canceled out by anti-symmetry of the kernel, while the signal distortion under certain nonlinear kernels can be controlled such that the between-sample change point signal is magnitude preserving. A (half) jackknife multiplier bootstrap (JMB) tailored to the change point detection setting is proposed to calibrate the distribution of our ℓ∞-norm aggregated test statistic. Subject to mild moment conditions on kernels, we derive the uniform rates of convergence for the JMB to approximate the sampling distribution of the test statistic, and analyze its size and power properties. Extensions to multiple change point testing and estimation are discussed with illustration from numerical studies.
@article{10.1214/21-EJS1915,author={Yu, Mengjia and Chen, Xiaohui},date-added={2022-08-29 09:54:34 -0500},date-modified={2022-09-30 08:52:14 -0500},doi={10.1214/21-EJS1915},journal={Electronic Journal of Statistics},keywords={bootstrap, Change point analysis, Gaussian approximation, High-dimensional data, U-statistics},number={1},pages={1096 -- 1152},publisher={Institute of Mathematical Statistics and Bernoulli Society},title={{A robust bootstrap change point test for high-dimensional location parameter}},url={https://doi.org/10.1214/21-EJS1915},volume={16},year={2022},bdsk-url-1={https://doi.org/10.1214/21-EJS1915}}
We determine the information-theoretic cutoff value on separation of cluster centers for exact recovery of cluster labels in a K-component Gaussian mixture model with equal cluster sizes. Moreover, we show that a semidefinite programming (SDP) relaxation of the K-means clustering method achieves such sharp threshold for exact recovery without assuming the symmetry of cluster centers.
@article{9366690,author={Chen, Xiaohui and Yang, Yun},date-added={2022-08-29 10:07:29 -0500},date-modified={2022-09-30 08:53:58 -0500},doi={10.1109/TIT.2021.3063155},issn={1557-9654},journal={IEEE Transactions on Information Theory},month=jun,number={6},pages={4223-4238},title={Cutoff for Exact Recovery of Gaussian Mixture Models},volume={67},year={2021},bdsk-url-1={https://doi.org/10.1109/TIT.2021.3063155}}
We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. The diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given the number of clusters, we propose a polynomial-time convex relaxation algorithm via the semidefinite programming (SDP) to solve the diffusion K-means. In addition, we also propose a nuclear norm regularized SDP that is adaptive to the number of clusters. In both cases, we show that exact recovery of the SDPs for diffusion K-means can be achieved under suitable between-cluster separability and within-cluster connectedness of the submanifolds, which together quantify the hardness of the manifold clustering problem. We further propose the localized diffusion K-means by using the local adaptive bandwidth estimated from the nearest neighbors. We show that exact recovery of the localized diffusion K-means is fully adaptive to the local probability density and geometric structures of the underlying submanifolds.
@article{CHEN2021303,author={Chen, Xiaohui and Yang, Yun},date-added={2022-08-29 09:04:40 -0500},date-modified={2022-09-30 08:54:12 -0500},doi={https://doi.org/10.1016/j.acha.2020.03.002},issn={1063-5203},journal={Applied and Computational Harmonic Analysis},keywords={Manifold clustering, K-means, Riemannian submanifolds, Diffusion distance, Semidefinite programming, Random walk on random graphs, Laplace-Beltrami operator, Mixing times, Adaptivity},month=may,pages={303-347},title={Diffusion K-means clustering on manifolds: Provable exact recovery via semidefinite relaxations},url={https://www.sciencedirect.com/science/article/pii/S106352032030021X},volume={52},year={2021},bdsk-url-1={https://www.sciencedirect.com/science/article/pii/S106352032030021X},bdsk-url-2={https://doi.org/10.1016/j.acha.2020.03.002}}
Abstract Cumulative sum (CUSUM) statistics are widely used in the change point inference and identification. For the problem of testing for existence of a change point in an independent sample generated from the mean-shift model, we introduce a Gaussian multiplier bootstrap to calibrate critical values of the CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is fully data dependent and it has strong theoretical guarantees under arbitrary dependence structures and mild moment conditions. Specifically, we show that with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform validity in size under the null and it achieves the minimax separation rate under the sparse alternatives when the dimension p can be larger than the sample size n. Once a change point is detected, we estimate the change point location by maximising the ℓ∞-norm of the generalised CUSUM statistics at two different weighting scales corresponding to covariance stationary and non-stationary CUSUM statistics. For both estimators, we derive their rates of convergence and show that dimension impacts the rates only through logarithmic factors, which implies that consistency of the CUSUM estimators is possible when p is much larger than n. In the presence of multiple change points, we propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to dynamically adjust the change point detection rule and recursively estimate their locations. We derive its rate of convergence under suitable signal separation and strength conditions. The results derived in this paper are non-asymptotic and we provide extensive simulation studies to assess the finite sample performance. The empirical evidence shows an encouraging agreement with our theoretical results.
@article{https://doi.org/10.1111/rssb.12406,author={Yu, Mengjia and Chen, Xiaohui},date-added={2022-08-29 10:16:32 -0500},date-modified={2022-09-30 08:54:35 -0500},doi={https://doi.org/10.1111/rssb.12406},journal={Journal of the Royal Statistical Society: Series B (Statistical Methodology)},keywords={binary segmentation, bootstrap, CUSUM, change point analysis, Gaussian approximation, high-dimensional data},month=apr,number={2},pages={247-270},title={Finite sample change point inference and identification for high-dimensional mean vectors},url={https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12406},volume={83},year={2021},bdsk-url-1={https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12406},bdsk-url-2={https://doi.org/10.1111/rssb.12406}}
We derive a dimension-free Hanson–Wright inequality for quadratic forms of independent sub-gaussian random variables in a separable Hilbert space. Our inequality is an infinite-dimensional generalization of the classical Hanson–Wright inequality for finite-dimensional Euclidean random vectors. We illustrate an application to the generalized K-means clustering problem for non-Euclidean data. Specifically, we establish the exponential rate of convergence for a semidefinite relaxation of the generalized K-means, which together with a simple rounding algorithm imply the exact recovery of the true clustering structure.
@article{10.3150/20-BEJ1251,author={Chen, Xiaohui and Yang, Yun},date-added={2022-08-29 10:14:30 -0500},date-modified={2022-09-30 08:55:19 -0500},doi={10.3150/20-BEJ1251},journal={Bernoulli},keywords={$k$-means, Hanson--Wright inequality, Hilbert space, semidefinite relaxation},number={1},pages={586 -- 614},publisher={Bernoulli Society for Mathematical Statistics and Probability},title={{Hanson--Wright inequality in Hilbert spaces with application to $K$-means clustering for non-Euclidean data}},url={https://doi.org/10.3150/20-BEJ1251},volume={27},year={2021},bdsk-url-1={https://doi.org/10.3150/20-BEJ1251}}
This paper concerns the parameter estimation problem for the quadratic potential energy in interacting particle systems from continuous-time and single-trajectory data. Even though such dynamical systems are high-dimensional, we show that the vanilla maximum likelihood estimator (without regularization) is able to estimate the interaction potential parameter with optimal rate of convergence simultaneously in mean-field limit and in long-time dynamics. This to some extend avoids the curse-of-dimensionality for estimating large dynamical systems under symmetry of the particle interaction.
@article{10.1214/21-ECP416,author={Chen, Xiaohui},date-added={2022-08-29 10:10:53 -0500},date-modified={2022-09-30 08:55:46 -0500},doi={10.1214/21-ECP416},journal={Electronic Communications in Probability},keywords={interacting particle systems, maximum likelihood estimation, mean-field regime, stochastic Vlasov equation, symmetry},month=jul,number={none},pages={1 -- 13},publisher={Institute of Mathematical Statistics and Bernoulli Society},title={{Maximum likelihood estimation of potential energy in interacting particle systems from single-trajectory data}},url={https://doi.org/10.1214/21-ECP416},volume={26},year={2021},bdsk-url-1={https://doi.org/10.1214/21-ECP416}}
This paper is concerned with finite sample approximations to the supremum of a non-degenerate U-process of a general order indexed by a function class. We are primarily interested in situations where the function class as well as the underlying distribution change with the sample size, and the U-process itself is not weakly convergent as a process. Such situations arise in a variety of modern statistical problems. We first consider Gaussian approximations, namely, approximate the U-process supremum by the supremum of a Gaussian process, and derive coupling and Kolmogorov distance bounds. Such Gaussian approximations are, however, not often directly applicable in statistical problems since the covariance function of the approximating Gaussian process is unknown. This motivates us to study bootstrap-type approximations to the U-process supremum. We propose a novel jackknife multiplier bootstrap (JMB) tailored to the U-process, and derive coupling and Kolmogorov distance bounds for the proposed JMB method. All these results are non-asymptotic, and established under fairly general conditions on function classes and underlying distributions. Key technical tools in the proofs are new local maximal inequalities for U-processes, which may be useful in other problems. We also discuss applications of the general approximation results to testing for qualitative features of nonparametric functions based on generalized local U-processes.
@article{ChenKato2020_PTRF,author={Chen, Xiaohui and Kato, Kengo},date-added={2022-08-29 09:04:40 -0500},date-modified={2022-09-30 08:55:34 -0500},day={31},doi={10.1007/s00440-019-00936-y},issn={1432-2064},journal={Probability Theory and Related Fields},month=jul,pages={1097--1163},title={Jackknife multiplier bootstrap: finite sample approximations to the U-process supremum with applications},url={https://doi.org/10.1007/s00440-019-00936-y},volume={176},year={2020},bdsk-url-1={https://doi.org/10.1007/s00440-019-00936-y}}
This paper introduces a dynamic panel SIR (DP-SIR) model to investigate the impact of non-pharmaceutical interventions (NPIs) on the COVID-19 transmission dynamics with panel data from 9 countries across the globe. By constructing scenarios with different combinations of NPIs, our empirical findings suggest that countries may avoid the lockdown policy with imposing school closure, mask wearing and centralized quarantine to reach similar outcomes on controlling the COVID-19 infection. Our results also suggest that, as of April 4th, 2020, certain countries such as the U.S. and Singapore may require additional measures of NPIs in order to control disease transmissions more effectively, while other countries may cautiously consider to gradually lift some NPIs to mitigate the costs to the overall economy.
@article{ChenQiu2020_cepr,author={Chen, Xiaohui and Qiu, Ziyi},date-added={2022-08-29 09:04:40 -0500},date-modified={2022-09-30 08:59:12 -0500},journal={Covid Economics: Vetted and Real-Time Papers, Centre for Economic Policy Research (CEPR)},number={7},pages={46-67},title={Scenario analysis of non-pharmaceutical interventions on global COVID-19 transmissions.},year={2020}}
This paper is concerned with the estimation of time-varying networks for high-dimensional nonstationary time series. Two types of dynamic behaviors are considered: structural breaks (i.e., abrupt change points) and smooth changes. To simultaneously handle these two types of time-varying features, a two-step approach is proposed: multiple change point locations are first identified on the basis of comparing the difference between the localized averages on sample covariance matrices, and then graph supports are recovered on the basis of a kernelized time-varying constrained L 1 -minimization for inverse matrix estimation (CLIME) estimator on each segment. We derive the rates of convergence for estimating the change points and precision matrices under mild moment and dependence conditions. In particular, we show that this two-step approach is consistent in estimating the change points and the piecewise smooth precision matrix function, under a certain high-dimensional scaling limit. The method is applied to the analysis of network structure of the S&P 500 index between 2003 and 2008.
@article{XuChenWu2020_entropy,article-number={55},author={Xu, Mengyu and Chen, Xiaohui and Wu, Wei Biao},date-added={2022-08-29 09:04:40 -0500},date-modified={2022-09-30 08:54:27 -0500},doi={10.3390/e22010055},issn={1099-4300},journal={Entropy},number={1},title={Estimation of Dynamic Networks for High-Dimensional Nonstationary Time Series},url={https://www.mdpi.com/1099-4300/22/1/55},volume={22},year={2020},bdsk-url-1={https://www.mdpi.com/1099-4300/22/1/55},bdsk-url-2={https://doi.org/10.3390/e22010055}}
Abstract A U-statistic, calculated from a random sample of size n, is an average of a symmetric function calculated for all m-tuples in the sample. Examples include the sample variance, the Cramér-von Mises and energy statistics of goodness-of-fit, and the Kaplan–Meier and Nelson–Aalen estimators in survival analysis. Asymptotic properties are described.
We study the problem of distributional approximations to high-dimensional non-degenerate U-statistics with random kernels of diverging orders. Infinite-order U-statistics (IOUS) are a useful tool for constructing simultaneous prediction intervals that quantify the uncertainty of ensemble methods such as subbagging and random forests. A major obstacle in using the IOUS is their computational intractability when the sample size and/or order are large. In this article, we derive non-asymptotic Gaussian approximation error bounds for an incomplete version of the IOUS with a random kernel. We also study data-driven inferential methods for the incomplete IOUS via bootstraps and develop their statistical and computational guarantees.
@article{10.1214/19-EJS1643,author={Song, Yanglei and Chen, Xiaohui and Kato, Kengo},date-added={2022-08-29 15:38:15 -0500},date-modified={2022-09-30 08:53:21 -0500},doi={10.1214/19-EJS1643},journal={Electronic Journal of Statistics},keywords={bootstrap, Gaussian approximation, incomplete $U$ statistics, Infinite-order $U$-statistics, random forests, uncertainty quantification},number={2},pages={4794 -- 4848},publisher={Institute of Mathematical Statistics and Bernoulli Society},title={{Approximating high-dimensional infinite-order $U$-statistics: Statistical and computational guarantees}},url={https://doi.org/10.1214/19-EJS1643},volume={13},year={2019},bdsk-url-1={https://doi.org/10.1214/19-EJS1643}}
This paper studies inference for the mean vector of a high-dimensional U-statistic. In the era of big data, the dimension d of the U-statistic and the sample size n of the observations tend to be both large, and the computation of the U-statistic is prohibitively demanding. Data-dependent inferential procedures such as the empirical bootstrap for U-statistics is even more computationally expensive. To overcome such a computational bottleneck, incomplete U-statistics obtained by sampling fewer terms of the U-statistic are attractive alternatives. In this paper, we introduce randomized incomplete U-statistics with sparse weights whose computational cost can be made independent of the order of the U-statistic. We derive nonasymptotic Gaussian approximation error bounds for the randomized incomplete U-statistics in high dimensions, namely in cases where the dimension d is possibly much larger than the sample size n, for both nondegenerate and degenerate kernels. In addition, we propose generic bootstrap methods for the incomplete U-statistics that are computationally much less demanding than existing bootstrap methods, and establish finite sample validity of the proposed bootstrap methods. Our methods are illustrated on the application to nonparametric testing for the pairwise independence of a high-dimensional random vector under weaker assumptions than those appearing in the literature.
@article{10.1214/18-AOS1773,author={Chen, Xiaohui and Kato, Kengo},date-added={2022-08-29 15:34:05 -0500},date-modified={2022-09-30 08:57:52 -0500},doi={10.1214/18-AOS1773},journal={The Annals of Statistics},keywords={Bernoulli sampling, bootstrap, Divide and conquer, Gaussian approximation, incomplete $U$-statistics, randomized inference, sampling with replacement},number={6},pages={3127 -- 3156},publisher={Institute of Mathematical Statistics},title={{Randomized incomplete $U$-statistics in high dimensions}},url={https://doi.org/10.1214/18-AOS1773},volume={47},year={2019},bdsk-url-1={https://doi.org/10.1214/18-AOS1773}}
We propose a pointwise inference algorithm for high-dimensional linear models with time-varying coefficients. The method is based on a novel combination of the nonparametric kernel smoothing technique and a Lasso bias-corrected ridge regression estimator. Due to the non-stationarity feature of the model, dynamic bias-variance decomposition of the estimator is obtained. With a bias-correction procedure, the local null distribution of the estimator of the time-varying coefficient vector is characterized for iid Gaussian and heavy-tailed errors. The limiting null distribution is also established for Gaussian process errors, and we show that the asymptotic properties differ between short-range and long-range dependent errors. Here, p-values are adjusted by a Bonferroni-type correction procedure to control the familywise error rate (FWER) in the asymptotic sense at each time point. The finite sample size performance of the proposed inference algorithm is illustrated with synthetic data and an application to learn brain connectivity by using the resting-state fMRI data for Parkinson’s disease.
@article{10.2307/26384241,author={Chen, Xiaohui and He, Yifeng},date-added={2022-08-29 15:49:09 -0500},date-modified={2022-09-30 08:55:26 -0500},issn={10170405, 19968507},journal={Statistica Sinica},number={1},pages={255--276},publisher={Institute of Statistical Science, Academia Sinica},title={Inference of high-dimensional linear models with time-varying coefficients},url={http://www.jstor.org/stable/26384241},urldate={2022-08-29},volume={28},year={2018},bdsk-url-1={http://www.jstor.org/stable/26384241}}
This paper studies the Gaussian and bootstrap approximations for the probabilities of a nondegenerate U-statistic belonging to the hyperrectangles in \mathbbR^d when the dimension d is large. A two-step Gaussian approximation procedure that does not impose structural assumptions on the data distribution is proposed. Subject to mild moment conditions on the kernel, we establish the explicit rate of convergence uniformly in the class of all hyperrectangles in \mathbbR^d that decays polynomially in sample size for a high-dimensional scaling limit, where the dimension can be much larger than the sample size. We also provide computable approximation methods for the quantiles of the maxima of centered U-statistics. Specifically, we provide a unified perspective for the empirical bootstrap, the randomly reweighted bootstrap and the Gaussian multiplier bootstrap with the jackknife estimator of covariance matrix as randomly reweighted quadratic forms and we establish their validity. We show that all three methods are inferentially first-order equivalent for high-dimensional U-statistics in the sense that they achieve the same uniform rate of convergence over all d-dimensional hyperrectangles. In particular, they are asymptotically valid when the dimension d can be as large as O(e^n^c) for some constant c∈(0,1/7). The bootstrap methods are applied to statistical applications for high-dimensional non-Gaussian data including: (i) principled and data-dependent tuning parameter selection for regularized estimation of the covariance matrix and its related functionals; (ii) simultaneous inference for the covariance and rank correlation matrices. In particular, for the thresholded covariance matrix estimator with the bootstrap selected tuning parameter, we show that for a class of sub-Gaussian data, error bounds of the bootstrapped thresholded covariance matrix estimator can be much tighter than those of the minimax estimator with a universal threshold. In addition, we also show that the Gaussian-like convergence rates can be achieved for heavy-tailed data, which are less conservative than those obtained by the Bonferroni technique that ignores the dependency in the underlying data distribution.
@article{10.1214/17-AOS1563,author={Chen, Xiaohui},date-added={2022-08-29 10:35:00 -0500},date-modified={2022-09-30 08:54:48 -0500},doi={10.1214/17-AOS1563},journal={The Annals of Statistics},keywords={bootstrap, Gaussian approximation, high-dimensional inference, U-statistics},number={2},pages={642 -- 678},publisher={Institute of Mathematical Statistics},title={{Gaussian and bootstrap approximations for high-dimensional U-statistics and their applications}},url={https://doi.org/10.1214/17-AOS1563},volume={46},year={2018},bdsk-url-1={https://doi.org/10.1214/17-AOS1563}}
We consider the estimation of the transition matrix in the high-dimensional time-varying vector autoregression (TV-VAR) models. Our model builds on a general class of locally stationary VAR processes that evolve smoothly in time. We propose a hybridized kernel smoothing and \ell^1-regularized method to directly estimate the sequence of time-varying transition matrices. Under the sparsity assumption on the transition matrix, we establish the rate of convergence of the proposed estimator and show that the convergence rate depends on the smoothness of the locally stationary VAR processes only through the smoothness of the transition matrix function. In addition, for our estimator followed by thresholding, we prove that the false positive rate (type I error) and false negative rate (type II error) in the pattern recovery can asymptotically vanish in the presence of weak signals without assuming the minimum nonzero signal strength condition. Favorable finite sample performances over the \ell^2-penalized least-squares estimator and the unstructured maximum likelihood estimator are shown on simulated data. We also provide two real examples on estimating the dependence structures on financial stock prices and economic exchange rates datasets.
@article{10.1214/17-EJS1325,author={Ding, Xin and Qiu, Ziyi and Chen, Xiaohui},date-added={2022-08-29 15:47:01 -0500},date-modified={2022-09-30 08:59:47 -0500},doi={10.1214/17-EJS1325},journal={Electronic Journal of Statistics},keywords={high-dimension, kernel smoothing, Locally stationary processes, Sparsity, time-varying parameters, vector autoregression},number={2},pages={3871 -- 3902},publisher={Institute of Mathematical Statistics and Bernoulli Society},title={{Sparse transition matrix estimation for high-dimensional and locally stationary vector autoregressive models}},url={https://doi.org/10.1214/17-EJS1325},volume={11},year={2017},bdsk-url-1={https://doi.org/10.1214/17-EJS1325}}
This paper studies a Dantzig-selector type regularized estimator for linear functionals of high-dimensional linear processes. Explicit rates of convergence of the proposed estimator are obtained and they cover the broad regime from independent identically distributed samples to long-range dependent time series and from sub-Gaussian innovations to those with mild polynomial moments. It is shown that the convergence rates depend on the degree of temporal dependence and the moment conditions of the underlying linear processes. The Dantzig-selector estimator is applied to the sparse Markowitz portfolio allocation and the optimal linear prediction for time series, in which the ratio consistency when compared with an oracle estimator is established. The effect of dependence and innovation moment conditions is further illustrated in the simulation study. Finally, the regularized estimator is applied to classify the cognitive states on a real functional magnetic resonance imaging dataset and to portfolio optimization on a financial dataset.
@article{7558216,author={Chen, Xiaohui and Xu, Mengyu and Wu, Wei Biao},date-added={2022-08-29 15:45:36 -0500},date-modified={2022-09-30 08:58:01 -0500},doi={10.1109/TSP.2016.2605079},issn={1941-0476},journal={IEEE Transactions on Signal Processing},month=dec,number={24},pages={6459-6470},title={Regularized Estimation of Linear Functionals of Precision Matrices for High-Dimensional Time Series},volume={64},year={2016},bdsk-url-1={https://doi.org/10.1109/TSP.2016.2605079}}
@article{10.1214/15-EJS1007,author={Chen, Xiaohui},date-added={2022-08-29 18:30:04 -0500},date-modified={2022-08-29 18:30:04 -0500},doi={10.1214/15-EJS1007},journal={Electronic Journal of Statistics},keywords={Optimal linear predition, shrinkage, Sparsity},number={1},pages={801 -- 810},publisher={Institute of Mathematical Statistics and Bernoulli Society},title={{Discussion of ``High-dimensional autocovariance matrices and optimal linear prediction''}},url={https://doi.org/10.1214/15-EJS1007},volume={9},year={2015},bdsk-url-1={https://doi.org/10.1214/15-EJS1007}}
While neuroimaging data can provide valuable phenotypic information to inform genetic studies, the opposite is also true: known genotypes can be used to inform brain connectivity patterns from fMRI data. Here, we propose a framework for genetically informed group brain connectivity modeling. Subjects are first stratified according to their genotypes, and then a group regularized regression model is employed for brain connectivity modeling utilizing the time courses from a priori specified regions of interest (ROIs). With such an approach, each ROI time course is in turn predicted from all other ROI time courses at zero lag using a group regression framework which also incorporates a penalty based on genotypic similarity. Simulations supported such an approach when, as previously studies have indicated to be the case, genetic influences impart connectivity differences across subjects. The proposed method was applied to resting state fMRI data from Schizophrenia and normal control subjects. Genotypes were based on D-amino acid oxidase activator (DAOA) single-nucleotide polymorphisms (SNPs) information. With DAOA SNPs information integrated, the proposed approach was able to more accurately model the diversity in connectivity patterns. Specifically, connectivity with the left putamen, right posterior cingulate, and left middle frontal gyri were found to be jointly modulated by DAOA genotypes and the presence of Schizophrenia. We conclude that the proposed framework represents a multimodal analysis approach for incorporating genotypic variability into brain connectivity analysis directly.
@article{6678714,author={Liu, Aiping and Chen, Xiaohui and Wang, Z. Jane and Xu, Qi and Appel-Cresswell, Silke and McKeown, Martin J.},date-added={2022-08-29 15:51:47 -0500},date-modified={2022-08-29 15:51:47 -0500},doi={10.1109/TBME.2013.2294151},issn={1558-2531},journal={IEEE Transactions on Biomedical Engineering},month=mar,number={3},pages={946-956},title={A Genetically Informed, Group fMRI Connectivity Modeling Approach: Application to Schizophrenia},volume={61},year={2014},bdsk-url-1={https://doi.org/10.1109/TBME.2013.2294151}}
Moment inequality for quadratic forms of random vectors is of particular interest in covariance matrix testing and estimation problems. In this paper, we prove a Rosenthal-type inequality, which exhibits new features and certain improvement beyond the unstructured Rosenthal inequality of quadratic forms when dimension of the vectors increases without bound. Applications to test the block diagonal structures and detect the sparsity in the high-dimensional covariance matrix are presented.
@article{CHEN201483,author={Chen, Xiaohui},date-added={2022-08-29 15:50:26 -0500},date-modified={2022-09-30 08:51:56 -0500},doi={https://doi.org/10.1016/j.spl.2014.05.004},issn={0167-7152},journal={Statistics & Probability Letters},keywords={Quadratic forms, Rosenthal's inequality, High-dimensional covariance matrix},pages={83-88},title={A note on moment inequality for quadratic forms},url={https://www.sciencedirect.com/science/article/pii/S0167715214001801},volume={92},year={2014},bdsk-url-1={https://www.sciencedirect.com/science/article/pii/S0167715214001801},bdsk-url-2={https://doi.org/10.1016/j.spl.2014.05.004}}
We consider estimation of covariance matrices and their inverses (a.k.a. precision matrices) for high-dimensional stationary and locally stationary time series. In the latter case the covariance matrices evolve smoothly in time, thus forming a covariance matrix function. Using the functional dependence measure of Wu [Proc. Natl. Acad. Sci. USA 102 (2005) 14150–14154 (electronic)], we obtain the rate of convergence for the thresholded estimate and illustrate how the dependence affects the rate of convergence. Asymptotic properties are also obtained for the precision matrix estimate which is based on the graphical Lasso principle. Our theory substantially generalizes earlier ones by allowing dependence, by allowing nonstationarity and by relaxing the associated moment conditions.
@article{10.1214/13-AOS1182,author={Chen, Xiaohui and Xu, Mengyu and Wu, Wei Biao},date-added={2022-08-29 15:57:31 -0500},date-modified={2022-09-30 08:53:51 -0500},doi={10.1214/13-AOS1182},journal={The Annals of Statistics},keywords={consistency, Covariance matrix, Dependence, functional dependence measure, high-dimensional inference, Lasso, Nagaev inequality, nonstationary time series, precision matrix, Sparsity, spatial--temporal processes, thresholding},number={6},pages={2994 -- 3021},publisher={Institute of Mathematical Statistics},title={{Covariance and precision matrix estimation for high-dimensional time series}},url={https://doi.org/10.1214/13-AOS1182},volume={41},year={2013},bdsk-url-1={https://doi.org/10.1214/13-AOS1182}}
In this paper, we propose a shrinkage-to-tapering oracle (STO) estimator for estimation of large covariance matrix when the number of samples is substantially fewer than the number of variables, by combining the strength from both Steinian-type shrinkage and tapering estimators. Our contributions include: (i) Deriving the Frobenius risk and a lower bound for the spectral risk of an MMSE shrinkage estimator; (ii) Deriving a closed-form expression for the optimal coefficient of the proposed STO estimator. Simulations on auto-regression (e.g. a sparse case) and fraction Brownian motion (e.g. a non-sparse case) covariance structures are used to demonstrate the superiority of the proposed estimator.
@inproceedings{6288303,author={Chen, Xiaohui and Wang, Z. Jane and McKeown, Martin J.},booktitle={2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},date-added={2022-08-29 18:31:31 -0500},date-modified={2022-08-29 18:31:31 -0500},doi={10.1109/ICASSP.2012.6288303},issn={2379-190X},month=mar,pages={2013-2016},title={Large covariance matrix estimation: Bridging shrinkage and tapering approaches},year={2012},bdsk-url-1={https://doi.org/10.1109/ICASSP.2012.6288303}}
In this paper, we introduce a shrinkage-to-tapering approach for estimating large covariance matrices when the number of samples is substantially fewer than the number of variables (i.e., n,p→∞ and p/n→∞). The proposed estimator improves upon both shrinkage and tapering estimators by shrinking the sample covariance matrix to its tapered version. We first show that, under both normalized Frobenius and spectral risks, the minimum mean-squared error (MMSE) shrinkage-to-identity estimator is inconsistent and outperformed by a minimax tapering estimator for a class of high-dimensional and diagonally dominant covariance matrices. Motivated by this observation, we propose a shrinkage-to-tapering oracle (STO) estimator for efficient estimation of general, large covariance matrices. A closed-form formula of the optimal coefficient ρ of the proposed STO estimator is derived under the minimum Frobenius risk. Since the true covariance matrix is to be estimated, we further propose a STO approximating (STOA) algorithm with a data-driven bandwidth selection procedure to iteratively estimate the coefficient ρ and the covariance matrix. We study the finite sample performances of different estimators and our simulation results clearly show the improved performances of the proposed STO estimators. Finally, the proposed STOA method is applied to a real breast cancer gene expression data set.
@article{6252067,author={Chen, Xiaohui and Wang, Z. Jane and McKeown, Martin J.},date-added={2022-08-29 15:59:26 -0500},date-modified={2022-08-29 15:59:26 -0500},doi={10.1109/TSP.2012.2210546},issn={1941-0476},journal={IEEE Transactions on Signal Processing},month=nov,number={11},pages={5640-5656},title={Shrinkage-to-Tapering Estimation of Large Covariance Matrices},volume={60},year={2012},bdsk-url-1={https://doi.org/10.1109/TSP.2012.2210546}}
Estimation of the covariance matrix and its inverse, the precision matrix, in high-dimensional situations is of great interest in many applications. In this paper, we focus on the estimation of a class of sparse precision matrices which are assumed to be approximately inversely closed for the case that the dimensionality p can be much larger than the sample size n, which is fundamentally different from the classical case that p <; n. Different in nature from state-of-the-art methods that are based on penalized likelihood maximization or constrained error minimization, based on the truncated Neumann series representation, we propose a computationally efficient precision matrix estimator that has a computational complexity of O (p3). We prove that the proposed estimator is consistent in probability and in L2 under the spectral norm. Moreover, its convergence is shown to be rate-optimal in the sense of minimax risk. We further prove that the proposed estimator is model selection consistent by establishing a convergence result under the entry-wise ∞-norm. Simulations demonstrate the encouraging finite sample size performance and computational advantage of the proposed estimator. The proposed estimator is also applied to a real breast cancer data and shown to outperform existing precision matrix estimators.
@article{6159094,author={Chen, Xiaohui and Kim, Young-Heon and Wang, Z. Jane},date-added={2022-08-29 15:58:51 -0500},date-modified={2022-08-29 15:58:51 -0500},doi={10.1109/TSP.2012.2189109},issn={1941-0476},journal={IEEE Transactions on Signal Processing},month=jun,number={6},pages={2899-2912},title={Efficient Minimax Estimation of a Class of High-Dimensional Sparse Precision Matrices},volume={60},year={2012},bdsk-url-1={https://doi.org/10.1109/TSP.2012.2189109}}
Learning a small number of genetic variants associated with multiple complex genetic traits is of practical importance and remains challenging due to the high dimensional nature of data. In this paper, we proposed a two-graph guided multi-task Lasso to address this issue with an emphasis on estimating subnetwork-to-subnetwork associations in expression quantitative trait loci (eQTL) mapping. The proposed model can learn such subnetwork-to-subnetwork associations and therefore can be seen as a generalization of several state-of-the-art multi-task feature selection methods. Additionally, this model has a nice property of allowing flexible structured sparsity on both feature and label domains. Simulation study shows the improved performance of our model and a human eQTL data set is analyzed to further demonstrate the applications of the model.
@inproceedings{pmlr-v22-chen12b,address={La Palma, Canary Islands},author={Chen, Xiaohui and Shi, Xinghua and Xu, Xing and Wang, Zhiyong and Mills, Ryan and Lee, Charles and Xu, Jinbo},booktitle={Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS)},date-added={2022-08-29 09:04:40 -0500},date-modified={2022-12-26 14:08:18 -0600},editor={Lawrence, Neil D. and Girolami, Mark},month={21--23 Apr},pages={208--217},publisher={PMLR},series={Proceedings of Machine Learning Research},title={A Two-Graph Guided Multi-task Lasso Approach for eQTL Mapping},volume={22},year={2012},}
Variable selection is a topic of great importance in high-dimensional statistical modeling and has a wide range of real-world applications. Many variable selection techniques have been proposed in the context of linear regression, and the Lasso model is probably one of the most popular penalized regression techniques. In this paper, we propose a new, fully hierarchical, Bayesian version of the Lasso model by employing flexible sparsity promoting priors. To obtain the Bayesian Lasso estimate, a reversible-jump MCMC algorithm is developed for joint posterior inference over both discrete and continuous parameter spaces. Simulations demonstrate that the proposed RJ-MCMC-based Bayesian Lasso yields smaller estimation errors and more accurate sparsity pattern detection when compared with state-of-the-art optimization-based Lasso-type methods, a standard Gibbs sampler-based Bayesian Lasso and the Binomial–Gaussian prior model. To demonstrate the applicability and estimation stability of the proposed Bayesian Lasso, we examine a benchmark diabetes data set and real functional Magnetic Resonance Imaging data. As an extension of the proposed RJ-MCMC framework, we also develop an MCMC-based algorithm for the Binomial–Gaussian prior model and illustrate its improved performance over the non-Bayesian estimate via simulations.
@article{CHEN20111920,author={Chen, Xiaohui and Wang, Z. Jane and McKeown, Martin J.},date-added={2022-08-29 16:00:24 -0500},date-modified={2022-08-30 23:55:22 -0500},doi={https://doi.org/10.1016/j.sigpro.2011.02.014},issn={0165-1684},journal={Signal Processing},keywords={Sparse signal recovery, Variable selection, Dimensionality reduction, Fully Bayesian modeling, Reversible-jump Markov chain Monte Carlo (RJ-MCMC), Lasso},number={8},pages={1920-1932},title={A Bayesian Lasso via reversible-jump MCMC},url={https://www.sciencedirect.com/science/article/pii/S0165168411000703},volume={91},year={2011},bdsk-url-1={https://www.sciencedirect.com/science/article/pii/S0165168411000703},bdsk-url-2={https://doi.org/10.1016/j.sigpro.2011.02.014}}
The Huberized LASSO model, a robust version of the popular LASSO, yields robust model selection in sparse linear regression. Though its superior performance was empirically demonstrated for large variance noise, currently no theoretical asymptotic analysis has been derived for the Huberized LASSO estimator. Here we prove that the Huberized LASSO estimator is consistent and asymptotically normal distributed under a proper shrinkage rate. Our derivation shows that, unlike the LASSO estimator, its asymptotic variance is stabilized in the presence of noise with large variance. We also propose the adaptive Huberized LASSO estimator by allowing unequal penalty weights for the regression coefficients, and prove its model selection consistency. Simulations confirm our theoretical results.
@inproceedings{5495338,author={Chen, Xiaohui and Wang, Z. Jane and McKeown, Martin J.},booktitle={2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},date-added={2022-08-29 18:34:44 -0500},date-modified={2022-09-15 08:25:27 -0500},doi={10.1109/ICASSP.2010.5495338},issn={2379-190X},month=mar,pages={1898-1901},title={Asymptotic analysis of the Huberized LASSO estimator},year={2010},bdsk-url-1={https://doi.org/10.1109/ICASSP.2010.5495338}}
Inferring effective brain connectivity from neuroimaging data such as functional Magnetic Resonance Imaging (fMRI) has been attracting increasing interest due to its critical role in understanding brain functioning. Incorporating sparsity into connectivity modeling to make models more biologically realistic and performing group analysis to deal with inter-subject variability are still challenges associated with fMRI brain connectivity modeling. To address the above two crucial challenges, the attractive computational and theoretical properties of the least absolute shrinkage and selection operator (LASSO) in sparse linear regression provide a suitable starting point. We propose a group robust LASSO (grpRLASSO) model by combining advantages of the popular group-LASSO and our recently developed robust-LASSO. Here group analysis is formulated as a grouped variable selection procedure. Superior performance of the proposed grpRLASSO in terms of group selection and robustness is demonstrated by simulations with large noise variance. The grpRLASSO is also applied to a real fMRI data set for brain connectivity study in Parkinson’s disease, resulting in biologically plausible networks.
@inproceedings{5652779,author={Chen, Xiaohui and Wang, Z. Jane and McKeown, Martin J.},booktitle={2010 IEEE International Conference on Image Processing (ICIP)},date-added={2022-08-29 18:34:02 -0500},date-modified={2022-09-15 08:25:37 -0500},doi={10.1109/ICIP.2010.5652779},issn={2381-8549},month=sep,pages={589-592},title={FMRI group studies of brain connectivity via a group robust Lasso},year={2010},bdsk-url-1={https://doi.org/10.1109/ICIP.2010.5652779}}
In the context of linear regression, the least absolute shrinkage and selection operator (LASSO) is probably the most popular supervised-learning technique proposed to recover sparse signals from high-dimensional measurements. Prior literature has mainly concerned itself with independent, identically distributed noise with moderate variance. In many real applications, however, the measurement errors may have heavy-tailed distributions or suffer from severe outliers, making the LASSO poorly estimate the coefficients due to its sensitivity to large error variance. To address this concern, a robust version of the LASSO is proposed, and the limiting distribution of its estimator is derived. Model selection consistency is established for the proposed robust LASSO under an adaptation procedure of the penalty weight. A parallel asymptotic analysis is derived for the Huberized LASSO, a previously proposed robust LASSO, and it is shown that the Huberized LASSO estimator preserves similar asymptotics even with a Cauchy error distribution. We show that asymptotic variances of the two robust LASSO estimators are stabilized in the presence of large variance noise, compared with the unbounded asymptotic variance of the ordinary LASSO estimator. The asymptotic analysis from the nonstochastic design is extended to the case of random design. Simulations further confirm our theoretical results.
@article{5571842,author={Chen, Xiaohui and Wang, Z. Jane and McKeown, Martin J.},date-added={2022-08-29 18:32:56 -0500},date-modified={2022-08-29 18:32:56 -0500},doi={10.1109/TIT.2010.2059770},issn={1557-9654},journal={IEEE Transactions on Information Theory},month=oct,number={10},pages={5131-5149},title={Asymptotic Analysis of Robust LASSOs in the Presence of Noise With Large Variance},volume={56},year={2010},bdsk-url-1={https://doi.org/10.1109/TIT.2010.2059770}}
Sparsity is crucial for high-dimensional statistical modeling. On one hand, dimensionality reduction can reduce the variability of estimation and thus provide reliable predictive power. On the other hand, the selected sub-model can discover and emphasize the underlying dependencies, which is useful for objective interpretation. Many variable selection methods have been proposed in literatures. For a prominent example, Least Absolute Shrinkage and Selection Operator (lasso) in linear regression context has been extensively explored. This paper discusses a class of scaled mixture of Gaussian models from both a penalized likelihood and a Bayesian regression point of view. We propose an Majorize-Minimize (MM) algorithm to find the Maximum A Posteriori (MAP) estimator, where the EM algorithm can be stuck at local optimum for some members in this class. Simulation studies show the outperformance of proposed algorithm in nonstochastic design variable selection scenario. The proposed algorithm is applied to a real large-scale E.coli data set with known bona fide interactions for constructing sparse gene regulatory networks. We show that our regression networks with a properly chosen prior can perform comparably to state-of-the-art regulatory network construction algorithms.
@inproceedings{5162334,author={Chen, Xiaohui and Gottardo, Raphael},booktitle={2009 3rd International Conference on Bioinformatics and Biomedical Engineering (ICBBE)},date-added={2022-08-29 18:36:41 -0500},date-modified={2022-09-15 08:26:23 -0500},doi={10.1109/ICBBE.2009.5162334},issn={2151-7622},month=jun,pages={1-4},title={An MM-Based Optimization Algorithm for Sparse Linear Modeling on Microarray Data Analysis},year={2009},bdsk-url-1={https://doi.org/10.1109/ICBBE.2009.5162334}}
Summary:BNArray is a systemized tool developed in R. It facilitates the construction of gene regulatory networks from DNA microarray data by using Bayesian network. Significant sub-modules of regulatory networks with high confidence are reconstructed by using our extended sub-network mining algorithm of directed graphs. BNArray can handle microarray datasets with missing data. To evaluate the statistical features of generated Bayesian networks, re-sampling procedures are utilized to yield collections of candidate 1st-order network sets for mining dense coherent sub-networks.Availability: The R package and the supplementary documentation are available at .Contact:mchen@zju.edu.cn
@article{10.1093/bioinformatics/btl491,author={Chen, Xiaohui and Chen, Ming and Ning, Kaida},date-added={2022-08-29 09:22:54 -0500},date-modified={2022-08-29 09:25:43 -0500},doi={10.1093/bioinformatics/btl491},issn={1367-4803},journal={Bioinformatics},month=sep,number={23},pages={2952-2954},title={{BNArray: an R package for constructing gene regulatory networks from microarray data by using Bayesian network}},url={https://doi.org/10.1093/bioinformatics/btl491},volume={22},year={2006},bdsk-url-1={https://doi.org/10.1093/bioinformatics/btl491}}