Beyond PCA: Optimal low-rank matrix denoising
PCA solves the representation problem of how to optimally approximate a matrix with a matrix of specified lower rank. It does not solve the denoising problem of how to best recover a low rank matrix buried in noise. The denoising problem is unobservable in the sense that solving it involves measuring error with respect the low-rank matrix we are trying to estimate. Can we solve the denoising problem better than PCA using data? How does it compare to approaches that use nuclear norm? Does it generalize to settings with missing data?
It turns out that for a broad class of noises matrices, the solution to the unobservable denoising problem can be computed directly from data (we call the algorithm OptShrink). The solution takes the form of a shrinkage-and-thresholding operator on the singular values of the observed signal-plus-noise matrix. The analysis reveals that the optimal shrinkage operator should shrink larger singular values, which correspond to well estimated latent singular vectors, less than smaller singular values. This yields a non-convex shrinkage function implying that convedx penalty functions will always be suboptimal. For more detaisl see here and download the OptShrink code here.
Bottleneck analysis in queueing theory
Consider a series of queues or servers that process customers in first-in-first-out order. What is the latency of a batch of n customers traversing a series of m queues? How do bottlenecks, i.e., slow servers affect the latency?
It turns out that the latency distribution is exactly the distribution of the largest eigenvalue of a special rank-one-plus-noise type random matrix, of the type studied in here. We uncover a phase transition which separates a regime where the bottleneck is present but the latency distribution is Tracy-Widom distributed from a regime where the bottleneck results in a latency distribution that is normally distributed. For details click here and here.
Perfect transmission through random media
Can one send light through media such as eggshells, paper and milk and achieve 100 percent transmission? Random matrix theory predicts that we we can; the algorithms we have developed find a optimal wavefront using just a few measurements!
The figure above shows a highly transmitting eigen-wavefront which achieves 99.99 percent transmission even while a normally incident wavefront achieves just 45 percent transmission! (the circles are cylindrical scatterers). We have developed algorithms for constructing these wavefronts using a handful of measurements and theory for predicting associated fundamental limits. For more details click here.
Spectra of complex-networks
What is the eigenvalue spectrum of a network with an expected degree distribution? It turns out it is the free multiplicative convolution of the semi-circle with the (normalized) degree distribution!
The figure above illustrates the agreement between the theoretical prediction (blue line) and numerical simulations. For more details click here. The free convolution was numerically computed using techniques described here here.
Phase transitions in network structure discovery
Consider a random network with a planted clique or community embedded in it. When can one discover the structure using spectral or eigen-analysis based methods? It turns out that one can do so as long as the clique is above a critical size and as long as the difference between density of intra-community and inter-community edges is greater than a critical threshold.
The figure above illustrates a setting where the clique is present and can be detected versus another setting where the clique is present but the spectral method fails. The transition can be accurately predicted. More details on the clique problem and the community problem can be found here.
The breakdown point of principal component analysis
Consider a matrix X with real eigenvalues. What happens when one perturbs it with a finite, low rank random matrix? It turns out that a remarkable phase transition phenomenon occurs. Specifically, if the perturbation is “strong enough” then the largest eigenvalue of the sum will separate from the spectrum of X; in this even, the corresponding perturbed eigenvector will lie on a thin cone around the original unperturbed eigenvector.
Fundamental signal processing theory
A fundamental question in statistical signal processing, that arises in applications as diverse as radar, sonar, wireless communications and econometrics is: Given finite number of samples, can one reliably discriminate signal from noise?
My research answers this in an application independent sense. We show that if the SNR is above a threshold which is a simple function of the number of sensors (or dimensionality of the system) and the number of signal-plus-noise and noise-only samples then reliable detection is possible in an optimal manner using a new random matrix theory inspired algorithm we have developed. Below this SNR threshold, reliable detection is (asymptotically) just not possible.
The figure above illustrates this “hot zone/cold zone” behavior in signal detection by plotting the logarithm of the probability of accurate detection. The superimposed black line is the predicted SNR threshold for this example when both the noise-only covariance matrix and the signal-plus-noise covariance matrix are estimated using finite samples. The superimposed red line is the predicted SNR threshold if the noise covariance were known perfectly. For additional details see the sequence of papers downloadable here and here.Preliminary results show that this predicted threshold is remarkable accurate in real-world settings. We have been using sensor array data graciously provided by the Laboratory for Autonomous Marine Sensing Systems.
Adaptive signal processing for communications
A major challenge while communicating in dynamic channels, such as the underwater acoustic channel, is the large amount of time-varying inter-symbol interference (ISI) due to multipath. In many realistic channels, the fluctuations between different taps of the sampled channel impulse response are correlated. Traditional least-squares algorithms used for adapting channel equalizers do not exploit this correlation structure.
My research addresses this by designing a channel subspace post-filtering algorithm that treats the least-squares channel estimate as a noisy time series and exploits the channel correlation structure to reduce the channel estimation error. Consequently, the performance of the channel estmated based decision feedback equalizer (DFE) is improved as well. We bring into the sharp focus how our algorithm exploits the channel correlation structure and demonstrate its implemntability.
We have tested this algorithm on undersea acoustic communication data collected during experimental trials and have realized a 5-7 dB improvement in performance of a channel estimate-based decision feedback equalizer that uses this post-filtered channel estimate to determine the equalizer coefficients. For additional details see the paper downloadable here.
High-dimensional statistical signal processing
Classical “optimal” multivariate statistical signal processing algorithms for detection and estimation are no longer optimal in modern settings where the dimensionality of the system is large (due to a large number of sensors) and the number of observations available to make inference is relatively small. Classical algorithms will perform poorly or provide faulty inference in such scenarios.
My research focuses on designing implementable detection and estimation algorithms that are robust to high-dimensionality and the finite sample effects. We take our inspiration from random matrix theory.
One example of this is the design of an algorithm that can deblur the sample eigenspectrum from limited observations. For details see the paper downloadable here.
Computational random matrix theory
Sample covariance matrices are ubiquitous in statistical signal processing theory. Random matrix theory is the natural language for understanding the properties of sample covariance matrices.
My research provides, for the first time ever, a concrete computational framework so that the power of random matrix theory can be fully harnessed. Before this work, one could only work with simple examples. Now, arbitrarily complicated examples are amenable.
The development of a random matrix calculator, depicted above, is the culmination of this line of research. The software package RMTool is a MATLAB based package. The underlying theory was developed in the paper downloadable here. For an application to computing the Shannon capacity of a large class of MIMO wireless channels, click here.
Design paradigms from random matrix theory
There are a whole new set of distributions arising from random matrix theory that are becoming increasingly important and starting show up in a lot of different (apparently) unconnected areas such as number theory, bus spacings in Cuernevaca, growth processes, quantum chaos, among others.
My research focuses on whether the associated (as yet unknown) entropy-like optimality criterion associated with these distributions can be exploited for engineering problems?
The figure above, taken from Brian Hayes’ article shows samples of several particle location (or level) distributions, some of them mathematically defined and others derived from measurements or observations. All of the samples have been scaled so that exactly 100 levels fit in the space allotted. Thus the mean distance between levels is the same in all cases, but the patterns are nonetheless quite diverse. For example, the earthquake series is highly clustered, which surely reflects some geophysical mechanism. The lower-frequency fluctuations of tree-ring data probably have both biological and climatological causes.
What is special about the random eigenvalue series is that they are rather regularly spaced, though still random. The ubiquity of this distribution in physical systems raises the intriguing prospect of whether mimicing these distributions in some problems could somehow be beneficial? We have some clues and some initial applications. More will surely follow.
For a fascinating account of the ubiquity of the random matrix eigenvalue level distributions (fifth from left in above figure) in physical systems see Percy Deift’s paper and Brian Hayes’ engaging exposition.