Anyways, one thing I did differently than most versions I see, is instead of rotating the dimension that is pivoted, I made it always split along the widest spread dimension. It probably doesn't make too big of a difference though except with some odder data distributions. Anyways, here are what my timings of it are like:
Searching for 10 neighbors out of 500 points spread across 2 dimensions. Flat: 0.0771735563 ms/iter Flat: 0.0689681925 ms/iter Flat: 0.0693208894 ms/iter Tree: 0.0571355537 ms/iter Tree: 0.0468793049 ms/iter Tree: 0.0451129084 ms/iter Searching for 10 neighbors out of 500 points spread across 5 dimensions. Flat: 0.0879868045 ms/iter Flat: 0.0881494044 ms/iter Flat: 0.0897309072 ms/iter Tree: 0.0654588738 ms/iter Tree: 0.0659778443 ms/iter Tree: 0.065816676 ms/iter Searching for 10 neighbors out of 5000 points spread across 2 dimensions. Flat: 0.75388945 ms/iter Flat: 0.63476881 ms/iter Flat: 0.625862029 ms/iter Tree: 0.265337165 ms/iter Tree: 0.341665337 ms/iter Tree: 0.350281228 ms/iter Searching for 10 neighbors out of 5000 points spread across 5 dimensions. Flat: 0.825682338 ms/iter Flat: 0.82461396 ms/iter Flat: 0.827343259 ms/iter Tree: 0.316616245 ms/iter Tree: 0.380413863 ms/iter Tree: 0.377532379 ms/iter Searching for 10 neighbors out of 50000 points spread across 2 dimensions. Flat: 7.11696229 ms/iter Flat: 7.17059282 ms/iter Flat: 7.01230776 ms/iter Tree: 0.24727957 ms/iter Tree: 1.66214694 ms/iter Tree: 2.65713756 ms/iter Searching for 10 neighbors out of 50000 points spread across 5 dimensions. Flat: 10.73712784 ms/iter Flat: 10.5244782 ms/iter Flat: 10.55054963 ms/iter Tree: 0.84721054 ms/iter Tree: 1.42747949 ms/iter Tree: 1.24676525 ms/iterSo well, those timings aren't all that bad really. However, I'm not sure all is well. For example, if I attempt to manually rebalance my tree, it suddenly becomes horribly inefficient (the actual search! not counting the rebalancing time). In addition, while this tree is implemented as the "Bucket PR K-D Tree" variety, setting the bucket size to anything besides 1 actually degrades performance (particularly if increasing the bucket size to over 8). Due to these two things I suspect I have some bothersome bugs hanging around somewhere. Anyways, if anyone wants to use my implementation, or has any tips about fixing it, feel free. =)
While mathematically it should preform O(ln n) however in actual practice it seems to preform upto O(ln n/2) ni the larger cases, very well done. Perhaps you can add a range search aswell (it would be very useful in a traditional pattern matcher). --Chase-san