The Curse of Dimensionality is a problem encountered when we try to learn models for data with high dimensions. Two dimensional data is easy, you have an input and output plotted on a 2D coordinate plane, you can clearly visualize how far off the points are from the linear regression. But what happens when your data has more dimensions-- 10, 100, 1000?

It was recently described in one of my lectures as, “You may think that these two data points are close because they have a certain subset of their features that match up. But in reality, imagine yourself as a data point represented as a star. And the next nearest data point is as close as the next closest star in the universe. Imagine the data set spread as far apart as the stars.”

How is this possible? Or rather why? To take a closer look at this, we’ll go over the idea of hypercubes and multi-dimensional spheres.

A hypercube can be thought of as a cube extended out into n-dimensional space.

Think of the progression from a point, to a line, to a square, to a cube. We go from having a point extended in 0 dimensions, to 1 dimension, to 2 dimensions, to 3 dimensions... (Wikipedia has a really good video of this)

We also assume the hypercube to have an edge length of 1.

We can think of this space as representing the feature space of the data. We want to know how many data points we would need to accurately capture this feature space.

Let’s say we want a certain number of data points. We assign this number as N, and we want to arrange it so that each data point covers the volume of a smaller cube within the hypercube.

The volume of each smaller cube for every data point would be 1/N (since the volume of the hypercube is 1 and there are N smaller cubes). The edge length of these smaller cubes would be (1/N)^1/n.

As n, the number of dimensions for the feature space increases towards infinity, the edge length converges to 1.(Which means for an infinite-dimension hypercube, the edge length is almost the same as that of the hypercube itself with a volume of 1.)

What’s the point of this? To illustrate how quickly the space we need to explore grows, and the number of data points we would need to sample to accurately define this space.

(There are a few more points which I need to transcribe from my handwritten notes. Maybe in another post.)

@lttrs_admin please fix the formatting for these kinds of posts :)

References:

Rojas,Raul. The Curse of Dimensionality https://www.inf.fu-berlin.de/inst/ag-ki/rojas_home/documents/tutorials/dimensionality.pdf

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, 2009.

Other ideas from lecture:

Lipschitz Continuity https://users.wpi.edu/~walker/MA500/HANDOUTS/LipschitzContinuity.pdf

Banach Space https://mathworld.wolfram.com/BanachSpace.html