Computational Geometric Learning
The CG Learning project is a Specific Targeted Research Project (STREP) funded by the FET (Future and Emerging Technologies) unit of the European Commission (EC) --- priority ICT (Information, Society, Technologies) --- within the 7th Framework Programme
of the EC under contract No. 255827.
The Computational Geometric Learning
project aims at extending the success story of geometric algorithms with guarantees to high-dimensions. This is not a straightforward task. For many problems, no efficient algorithms exist that compute the exact solution in high dimensions. This behavior is commonly called the curse of dimensionality. We try to address the curse of dimensionality by focusing on inherent structure in the data like sparsity or low intrinsic dimension, and by resorting to fast approximation algorithms.
The goals of the project are best described along three axes, namely (1) building the foundations of a new field computational geometric learning
, (2) developing algorithms and data structures that exploit structure in order to scale well with increasing dimension, i.e., avoiding the curse of dimensionality, and (3) providing robust and efficient implementations of foundational algorithms and data structures that come with theoretical guarantees for high-dimensional geometric data processing.
- Foundations of computational geometric learning: Laying these foundations requires diverse knowledge and experience, including among others geometric primitives, shape processing, computational topology, optimization, statistics and machine learning, CGAL (Computational Geometry Algorithms Library) development, and applications. Only the combination of all these disciplines that are hardly found in a single group or institution can enable us to develop and implement algorithms and data structures with guarantees that can compete with heuristics that are time- and space efficient but lack guarantees on their accuracy.
- Bypassing the curse of dimensionality: The complexities of many geometric algorithms and data structures grow exponentially with increasing dimension. This behavior is commonly called the curse of dimensionality. We need to address this curse by focusing on the inherent structure in the data which in some sense needs to be sparse or of low intrinsic dimension (with an appropriate notion of dimension) as well as by putting the emphasis on output-sensitive algorithms and average-case analysis.
- Guarantees: We plan to follow an integrated approach of developing the geometric foundations theoretically, but also practically in the form of high quality software on which many applications can build. With the vision of a "high-dimensional CGAL" in mind, we will develop algorithms and data structures with theoretical guarantees that should be directly applicable in machine learning and geometric computing, and that can serve as a basis for robust applications in elds such as structural biology.
- Geometric inference and approximation. Geometric inference aims at inferring topological or geometric properties of a presumably unknown shape from a set of data points sampling it. A stronger goal is to compute an approximation from the sample points that shares many of the geometric and topological properties with the unknown shape. The latter problem, known as curve/surface reconstruction in low dimensions or as manifold learning in high dimensions, has attracted strong interest in many scientific fields where the data may often be represented by point sets in high dimensional spaces. Although point sets may live in high dimensional spaces, one often expects them to be located around unknown, possibly nonlinear, low dimensional shapes. An important class of shapes is formed by smooth submanifolds of Euclidean spaces, but also more general compact subspaces need to be considered. It is then desirable to infer topological features (dimension, Betti numbers) and geometric characteristics (singularities, volume, curvature) of these shapes from the data. In some cases the shapes of interest may be known through a set of conditions --- e.g., solution sets of geometric constraint problems, like boundaries of configuration spaces in robotics or iso-energy hypersurfaces in six-dimensional phase space in physical applications --- that allow to sample them. It then desirable to study and design efficient sampling strategies to reliably approximate or reconstruct them. Building on recent results in computational geometry and geometric approximation combined with statistical approaches the main goal of this work package is to provide new algorithmic tools based upon strong theoretical models to infer and compute relevant topological and geometrical properties of the shapes from which the data are drawn.
- Fundamental data structures and algorithms for high-dimensions. Computational Geometry has been very successful in the past 25 years in developing methods for problems in low dimensions, and CGAL has managed to provide a unified software framework for implementing such methods. Within this work package, fundamental high-dimensional data structures and algorithms will be developed, in most cases with the goal of implementing them in CGAL. High-dimensional geometric data processing is now in a situation that we have encountered fteen years ago in the low-dimensional setting: a number of useful theoretical concepts has been developed over the past years, but robust and efficient implementations are mostly lacking. In order to have an impact in practice, such implementations must scale with the dimension and exploit (hidden) structure of the data. In analyzing high-dimensional data structures and algorithms, our focus is not on worst-case complexity, but on (provably) good performance under some given structural properties of the input. These properties may be of statistical nature (when we are dealing with noise, for example), or of geometric nature (when data is of low intrinsic dimension, say). Related to this, we are also aiming at output-sensitive algorithms .
- Modeling high-dimensional geometric structures in science and engineering. Data sets found in science and engineering are often large and complex in the sense that they encode subtle correlations between the variables describing the system of interest. One way to apprehend this complexity consists of decomposing the data into appropriate geometric structures to process them individually and to unravel their complex interactions. This work-package follows this spirit for three application domains, namely structural computational biology, analysis of configuration spaces in robotics, and cosmology.