[Home]KdTree/BucketPRKdTree

Robo Home | KdTree | Changes | Preferences | AllPages

So, thanks to a comment by Corbos on the Lukious/Rewrite page, Chase and I started looking into [KD Trees]. Here is a snippet from Chase's KD Tree page showing some timings I took for a KD implementation:

  Looks like I gave up too early on the KDTree stuff.  I had a major inefficiency in my search algorithm.  Here are the timings I get now. 
       7 dimensions, 26,000 scans, cluster size = 50.  Also, this is with squaring the distances, which seems to make it run a lot faster.
    O(n) method: 2172
    O(log n) method: 1024
  But here's the bonus move ... 52,000 scans:
    O(n) method: 3945
    O(log n) method: 1260
  104,000
    O(n) method: 8436
    O(log n) method: 1512
  Are you ready for this .... 400,000 scans.  This is more than you'll get in a targeting challenge, all for less than you WERE spending on 35 rounds.
    O(n) method: 70801
    O(log n) method: 1756

Pretty nice, the problem is I don't scan for a cluster size of 35 in my DCResearch. I scan for a cluster size of 500, with about 26,000 nodes.

    O(n) method: 3936
    O(log n) method: 3540

But I finally, through reading up on QTrees, had an idea for one of these things that could keep itself balanced. It's not a tremendous speedup, but it should help. Here's my new tree (I don't know it's real name because I made it up myself ... someone else must have done it already though and given it a name). This is 26,000 nodes clustering 500 points:

    O(n) method: 3173
    O(log n) method: 2468
52,000 nodes
    O(n) method: 5928
    O(log n) method: 2760

Oh, and by the way, it works better than KD trees for the smaller cluster sizes, too. Here's 50 @ 26,000:

    O(n) method: 2436
    O(log n) method: 676
And at 52,000:
    O(n) method: 5052
    O(log n) method: 692
And finally, 400,000 nodes, all for about the price of 26,000 nodes with a (unbalanced) KD tree.
    O(log n) method: 1088

Now again, this is with totally random data. With real data, my tree will not be perfectly balanced, so numbers will not be this good.

Here's how the tree works. First, I'll briefly explain a kd tree for those who haven't heard. It is a binary tree. Each node in the tree keeps the location of one point in k-dimensional space (read: it has a double array of length k). Every point stored to the left of the root node has a 0th dimension value lower than the root node's, all of the to the right has one higher. Every point stored to the left of a node at level 1 (right below the root node) has a 1st dimension value lower than than that node ... every point stored to the left of a node at level X has a (X%k)th dimension value lower than that node ... etc. Then, when searching the tree for the 50 points closest to point Y, you can entirely skip large portions of the tree by simply determining "ANY point with a 0th dimension value greater than this nodes will be farther away than any of these 50 points I've found so far" - so you don't every try searching through that whole right sub-tree.

The trouble with a KD tree is that you can't keep it balanced without re-building the whole tree from scratch, and that takes a long time. So here's what I've done, it's similar to a [quad tree]. The key lies in normalizing all the values you store to be between 0 and 1 (or any other arbitrary, but fixed range). All the data goes in the leaf nodes; the nodes above them only represent dividing "lines", just like those in a KD tree. Each leaf node stores up to X values, then, once it passes that boundary, the node gets split in two. It takes on the dividing "line" that halves the space it used to represent, it gets two child nodes, and the data gets divvied between the 2 new children. In this way the tree will always stay just as balanced as your data are (there cannot be any "bad luck" from adding points in a worst-case order), and there is never a need to re-build the tree or do any tree rotations. It is possible, however, that your data are distributed in a fashion that makes this tree worthless (i.e. when they all gravitate exponentially more densely toward one corner of your universe).

Best values for X for my own DC purposes lie between 8 and 16. If you'd like to run some timings of your own the code for it is included in the source, in the [JUnit] tests. There are a few intricacies I haven't mentioned here, but I'll let the code do the rest of the talking. --Simonton

And [here] it is. I'm told it's a confusing coding style ... so ... you'll have to deal with that. -- Simonton


Comments

What do your numbers mean? 5052 what? Fortnights? I was assuming it was comparisons, but I would expect it to be about equal to the total number of scans if that was the case... --David Alves

If anyone knows the real name for this kind of tree, let's move this page to one by that name! -- Simonton

I plugged this tree into my gun. I can now run a 1000 round battle without discarding a single scan. That's around 500 000 scans, much faster than it was taking to sequentially compare 60 000 scans, amazing. -- ABC

This might actually be a PMR tree, in the demo, try putting on node in one corner then many in another, the same with the PMR, you might of made a variation of a PMR Tree. Or you might have a unique version of a PR Kd Tree, as if it would be if you divided it evenly down the center of the nodes in the area your dividing up. --Chase-san


I had an idea for a DC-come-PM gun, but it would require about 30 dimensions. How does this tree hold up with those kinds of sizes of 'k'? -- Skilgannon

Not so hot. -- Simonton

 26,000 scans, 30 dimensions:
    O(n) method: 17093
    O(log n) method: 27442
 400,000 scans, 30 dimensions:
    O(n) method: 304462
    O(log n) method: 365868

Thinking about it, the tree has to look through a minimum of (k + 1)^2 - 1 leaves, therefore ((k - 1)^2 - 1)*X comparisons, to find the nearest neighbor. When the leaves aren't split at equal intervals it can get quite a bit higher than that. So, a minimum of (31*31 - 1)*8 = 7680 comparisons, discounting tree overhead. I wonder if reducing X will help for higher numbers of K...-- Skilgannon


You know I realize that a TreeMap? could do pretty much the exactly same thing as this, as it states "This implementation provides guaranteed log(n) time cost for the containsKey, get, put and remove operations." Just need a decent Comparator. I feel a little silly now. Though true it cannot get the nearest n as far as I know. But with a slight modification of the source... --Chase-san

That would require a lot more than a "slight" modification. You'd have to disable height balancing, as well as integrate the comparitor enough so that it would know what level of the tree it's at. And that's just for a standard K-d tree doing a nearest neighbor search. As you say, more would be necessary for the nearest n neighbors, and you might as well forget about this PR bucket stuff (which increases efficiency a lot in this application). -- Simonton

They should really impliment a built-in multi-dimensional binary search tree into the base java. --Chase-san

Wow... I just finished modifying Horizon to use one of these trees, and I ran some test battles to see how much faster the bot is. The old, O(N log N) code took 3:45 to run a Horizon vs. Horizon battle of 50 rounds; the new bot takes only 45 seconds for the same battle. I never realized that so much of the execution time was going into the clustering! I thought at least a third of the slowness was in the PrecisePrediction. -- AaronR


Robo Home | KdTree | Changes | Preferences | AllPages
Edit text of this page | View other revisions
Last edited November 4, 2007 2:00 EST by AaronR (diff)
Search: