Rednaxela/MultiplePlaneRegressionClustering - Robo Wiki -= Collecting Robocode Knowledge =-

Well, a while back I talked some of using function approximation on the points in a closer outputted by DynamicClustering. Well, that's all well and good when the enemy has only a single targeting method, however multi mode tricks could really mess with it. So, how can I focus my data on the current situation, in a way more accurate than normally used for DynamicClustering, and in a way that can handle multi mode bots better too? Well, the solution I just came up with is something I call "Multiple Plane Regression Clustering".

So, let's start with the basics of using function approximation on cluster data. So here's what the GuessFactors of LinearTargeting (with it doesn't account for walls) looks like:

It's nice and simple. Easy to create a function for by regression. Note that in the multiple dimensions of segmentation you might use, this line would be a plane (or hyperplane) which is flat in all axis except the lateral velocity. Keep in mind that lines are basically plans in 2 dimensions and are interchangable for the purpose of this discussion.

Well, that's all well and nice, but what happens when multimodal things come along?

Well, I know the above isn't a realistic targeting to expect in RoboCode, but it's a hypothetical multimodal one for sake of discussion. So let's say that hypothetically those given points are the points your DynamicClustering gave. Well, as a person, you can pretty clearly see there are two vaguely linear lines there, but how do you make an algorithm know that?

Well, the algorithm I came up with (still untested though), is essentially a hybrid of [K-means clustering] and [linear regression]. So, similar to how you pick random centroids in K-means, you pick two or more random planes. Then for each data point, you find the plane which it is located closest to. Then you regenerate each plane using multiple linear regression (linear regression across more than 2 dimensions), and repeat the process until you have a level of convergence you are satisfied with. Note, that in this case, rather than "clustering" by situations as you do with DynamicClustering, you are clustering different modes.

To use this data of the lines, you simply take every data point, shift it's GuessFactor by however much the formula for the plane it is clustered with says it should be shifted, and then use a kernel density algorithm on these shifted values. If you wanted to, you could also take statistics, on the upper and lower bounding planes of the cluster to improve it's reproduction of clusters with more complex probability distributions. In particular I think keeping track of the upper and lower bounding planes of a cluster, would allow you to do things like with less data create a truly accurate model of the probability of where Splinter is targeting for instance. Also, if you wanted more flexibility, you could very well replace the multiple linear regression, with a neural network, however I'm reluctant to consider that on account of the cpu requirements of that.

In any case, I believe that this algorithm could be an extremely effective way to model multi modal targeting/movement algorithms. Any comments/questions about this idea? :) -- Rednaxela