# comp.graphics.algorithms

## Subject: Re: nearest neighbour of a point

Hans-Bernhard Broeker wrote:
> In comp.graphics.algorithms running.polygon@gmail.com wrote:
>
> > I have 2 sets of points X & Y. The distribution of points in both the
> > sets is such that points at a certain distance are concentrated and on
> > either side of this distance, they start becoming sparce, something
> > like a normal distribution.
>
> This description is a bit hard to parse. You seem to be saying that
> within set X, distances to the nearest neighbor cluster around a
> certain value, rather separated from zero. And the same for set Y.
> Or is this clustering true for *all* distances between points in one
> set? I.e. are all points of set X expected to fall roughly on a
> spherical shell of radius R around any point in set X one picks as a
> reference?
>

Sorry, I should have made myself a bit more clear.
All the points lie within a hypercube of side 1 and all points are
having non-negative co-ordinates in each of the dimensions. So, if we
consider 3D, then they would all lie in a cube of side 1 in the 1st
octant(if that's what it's called).

Also, consider there are say 20 dimensions, then we know for sure that
2 points can be separated by a max. (dist)^2 of 20.
So, all I'm saying is that the number of points in distribution X & Y
is such that the number of points at a dist^2 <0 & >20 from the origin
is zero, followed by points having dist^2 <1 & > 19, and so on....
and the number of points having dist^ between 10 & 11 would probably be
the most.
Of course, the mean of the distribution need not be exactly in the
center(at 10 in case of 20 dimensions) but it is a normal distribution.

> The former case would indicate that your pointset is some kind of
> regular grid embedded in that high-D space, and it may be a grid of
> considerably fewer actual dimensions than the embedding space. You
> could find out by counting the number of almost-equally-far neighbors
> per point. E.g. a regular "grid" of points along a curve in 3D space
> would yield 2 close neighbors per point, a gridded surface would have
> 4, and a volumetric grid would have 6.
>
> E.g. if you find an average of 2*d close neighbors per vertex, your
> pointset might be a grid on a d-dimensional manifold, rather than a
> truly N-dimensional data set. If d is a good deal smaller than N,
> that could ease the search algorithms a lot.
>
> > The points are in higher dimentions > 3. They may be up to 50
> > dimentions.
>
> In that case, and assuming the argument above doesn't apply, odds are
> that there's no algorithm that could improve upon the naive "compute
> all distances and remember the shortes" approach. 2^50 is guaranteed
> to be much larger than the number of points you have, so no kind of
> dimensionality is 50.

Yes, that is right. 50 dimensions would imply that even if I have a few
ten thousand points then 2^50 would easily overshadow that number.

> The fact to remember is that in
> high-dimensional spaces, everything is pretty much next to everything
> else, so the only way to find the closest thing is to check
> everything.
>
> You can do the maths yourself: compare the volume of a 50-dimensional
> hyper-spherical shell whose radius and thickness are the center and
> width of your distance distribution, to that of a 50-dimensional
> hypersphere of said radius. You'll find that the shell's volume is
> not really much different from that of the full sphere. I.e. the
> restriction to a thin spherical shell isn't going to help you much in
> searching or organizing points.

It's a bit(very) hard(for me) to visualize anything in more than 3D.
However, instead of considering the thickness of the hyperspherical
shell to be the width of the distribution, we let the thickness be say
the standard deviation of the distribution, would that mean than the
volumes of the shell & the sphere are largely different?

If I try to imagine a similar concept in 3D I land at the following
reasoning:
Let y be a point in set Y, and let dy be the dist^2 of y from {0,0,0}.
Say that I have a list of all points in set X such that they are sorted
in ascending order on their dist^2 from {0,0,0}.
Now, I can say that the point in set X which is closest to point y is
one which lies in the band such that the band contains all points at a
distance of [dy-DELTA] to [dy+DELTA] such that DELTA is chosen in such
a way that it encompases at least the point which is closest to y.

However, I'm not sure if the same concept can be extended to higher
dimensions.

>
> --
> Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
> Even if all the snow were burnt, ashes would remain.