# comp.graphics.algorithms

## Subject: Re: nearest neighbour of a point

Michael Hennebry wrote:
> running.polygon@gmail.com wrote:
>> I have 2 sets of points X & Y. The distribution of points in both the
>> sets is such that points at a certain distance are concentrated and on
>> either side of this distance, they start becoming sparce, something
>> like a normal distribution. The points are in higher dimentions > 3.
>> They may be up to 50 dimentions. Now, there are many points in each of
>> the sets, and I would like to know if there is any efficient method to
>> find for each point in set Y, a point in set X such that the distance
>> between the point in set Y & the selected point in set X is <= the
>> distance between the point in set Y and any other point in set X. I can
>> use euclidean distance as the metric.
>
> This one is still unclear on how X and Y were obtained.
>
> I gather that X and Y are subsets of unit hypercube [0,1]^n,
> where n is an integer that might be as large as 50.
> Are the elements of X and Y drawn uniformly at random from the
> hypercube?
> If so, I expect you are stuck with an O(|X| |Y| n) algorithm.
>

Probably the simplest way of achieving subquadratic time-complexity is
to use a spatial hash. Each point in your N-space is truncated to the
nearest gird position of an infinite N-grid of a given cell size S. Grid
positions are denoted by an N-vector of integers. Hashes are computed
based on this N-vector of integers. In a spatial hash, points in
neighboring cells of a given cell can be retrieved in constant time. By
choosing a sensible cell size the cells can be kept relatively sparse
(for a sort-of homogeneous distribution of points).

Store the smallest of the two sets (let's assume it is X) in the spatial
hash, and use the larger set (Y) to query the hash structure. For each y
in Y, retrieve the cell and compute the nearest distance d to a point x
in the cell. If the cell is empty, then take d to be infinite. Traverse
all neighboring cells in a nearest-first manner (Dijkstra's algorithm)
until all unsearched cells are further away then the current nearest
distance d.

It can be seen that if your cell size is too small then a lot of cells
need to be traversed, wheareas if the cell size is too large then the
number of distance computations per cell will be huge. So, it pays to
balance the cell size. If you have a way of estimating the average
distance between to to points, then that would be a proper measure for
the cell size.

Hope this info helpful.

Gino