Given the ubiquity of large high-dimensional data sets, and the need not only to transmit, archive, and reduce them, but also to analyze and understand their scientific and/or intelligence content, we are developing tools for the computational dissection, analysis and understanding of complex datasets whose size defies simplistic analysis. These tools should include a representation which allows exploitation of nonlinearity, supports fast multiresolution algorithms, incorporates a priori (physical) information and constraints, and provides for flexibility of adaptation to the application. Application areas include bioinformatics, remote sensing (e.g. hyperspectral datasets), homeland defense (e.g. face recognition, epidemiology) large-scale physics simulations, and dynamics on complex networks (e.g. internet traffic analysis, urban population dynamics).
A fundamental challenge is the appropriate generalization of Principal Component Analysis (PCA), the classical but linear tool for analysing large data sets. We are developing non-linear generalizations that incorporate a priori physical information (e.g. invariance with respect to rotation) and that allow for fast multi-resolution algorithms. This work is related to dimensionality reduction and manifold learning methods, such as the recent isomap and LLE. As an example, in the area of face recognition, we have obtained promising results by exploiting the know invariances to translation, scaling, and rotation.
We have a significant overlap in interests with the DDMA project.