Voidious/SegmentationResearch - Robo Wiki -= Collecting Robocode Knowledge =-

Well, I'm taking a Machine Learning class this semester, and while I think my application of it is far from polished right now, I also think it's interesting and worth sharing.

Java programs:
- WaveRecorder - A tank that records raw GuessFactorTargeting wave data to ASCII files.
- WaveReader - A program for analysing the data output by WaveRecorder?.

I created a data collecting bot called WaveRecorder. The bot fires with RandomTargeting and records the raw data of every wave it collects. Right now, I'm just printing ASCII text to a text file, so I run out of memory if I run too many rounds at a time; but the current method is only useful against non-AdaptiveMovements anyway, so it's no big deal to run 100 round matches and concatenate the results. I put my bot file quota to like 2 gigs, which is robocode.robot.filesystem.quota=2000000000 in your robocode.properties file.

I also created a program called WaveReader designed to analyze the data collected. I'm trying to apply the concepts of Entropy, information gain and gain ratio to finding the best segments against (for now) bots with purely random, non-dodging movements. Basically, for a given number of segmentation slices, the program tests the information gain (or gain ratio) from every possible way of slicing the data, using thresholds with a certain increment in a given range.

Results so far:

Initial data for Cigaret, DuelistMicro, FloodMini, Sparrow, RandomMovementBot, all waves:
- [Optimal segments using information gain]
- [Optimal segments using gain ratio]
  - Doh: The wall distance info is using information gain, not gain ratio.
Some optimal segments found by averaging Entropy over many tanks, only firing waves. Same 5 bots as above, plus TheArtOfWar, Tron 2.01, SandboxDT 1.91, AspidMovement, FunkyChicken, and Fhqwhgads.
- [Information Gain, 4 segments, 1 to 4 slices] - 10/15/2006
- [Information Gain, 4 segments, 2 to 4 slices] - 10/16/2006, uses a nested calculation
Some optimal segments found by averaging Entropy over many tanks, only firing waves. Segments on acceleration first, and can handle different # of slices for each segment. Same 11 bots as above, plus GrubbmGrb from the TC.
- [Segmentation tests for Komarious 1.705] - 10/24/2006, # of slices as used for segmentation in Komarious, used in 1.705
Note: The segmentation measured is less than the slices given, not less than or equal to.
Note: The VChangeTime segment is (time since velocity change / (bullet time = distance / bullet velocity)). It also ignores times of 0, because those are usually segmented out with an accel segment for me.

My first tests are with RandomMovementBot, Cigaret (uh, I forgot it dodged), DuelistMicro, FloodMini (forgot again), and Sparrow. I ran 1300 rounds against each, an extra 500 against RandomMovementBot because I was testing the WaveRecorder, and then ran two different analyses overnight. (Well, after testing and polishing WaveReader first, of course.)

I don't know which of those two ideas (information gain or gain ratio) is "better" for this, but I prefer gain ratio, which has some adjustment for "shattered" data sets, basically. You can hit [Wikipedia] or [Google] to learn more about those machine learning concepts, or check out [that free book] I mentioned. It may take some more thought and work to get really good results out of these tests, but if these methodologies are worth anything, they should be capable of giving me good results.

Note that the segmentation slices this produces are all based on this being the only segmentation of the data. It's quite possible (even likely, IMO) you'd get different wall distance slices if you had segmented on distance first. Another note is that there's still some subjective choices in how this is implemented; for instance, I'm using 37 GF bins when parsing the raw data to create the data sets, and fairly large increments in the slices I'm testing (for sake of processing speed).

-- Voidious

I've been working on this a bit more, and I posted some [new results] (also in the list above). I will post my latest Java source soon. It seems to me that information gain, not gain ratio, gives results that seem more reasonable, so that's what I've been using. I've also stopped looking at finding good segments for individual bots, and am now looking for optimal segments against a whole set of tanks.

Those latest tests are 3800 - 5000 rounds against each of 11 tanks, many from the original TargetingChallenge, using only firing waves. This is based on information gain, not gain ratio, which is giving what I think are more reasonable results so far. The bots used are: Cigaret, DuelistMicro, FloodMini, Sparrow, RandomMovementBot, TheArtOfWar, Tron 2.01, SandboxDT 1.91, AspidMovement, FunkyChicken, and Fhqwhgads. Only firing waves, and a lot (but not equal number) of rounds against each, so not all the tanks are evenly weighted. (Working on it...)

It's interesting how close some of the slices it yields are to many sets I've seen and used myself. So, perhaps nothing groundbreaking will come of this, but it's interesting nonetheless... Once I get these latest tests run with some finer granularity and equally weighting all 11 tanks, I think I'm going to run a little experiment: I'll make a GF gun with 4 slices on each of these 4 segments that I just intuitively set based on experience, and another with the exact slices these tests recommend so far.

-- Voidious

I'm very interested in this. My gun is very highly segmented and I'm quite sure that it's not segmented optimally. Maybe I'll see about getting Phoenix to write waves to disk and writing a program to feed them into your WaveReader?. Will keep you posted if anything comes of it . --David Alves

That would be cool, and not too difficult to do, I think. The code has gotten a little stringy, but it should still be pretty usable, and I'll probably clean it up soon. I'm about to post the latest WaveReader on the wiki... That first version would only calculate optimal slices for individual bots, while the new one finds them over a set of bots, averaging the entropy over all the different data sets in its search. Unsurprisingly, a lot of these slices look pretty familiar; but this would (hopefully) also allow us to try new segments and quickly find good slices for them. -- Voidious

My second attempt to take some segmentation lessons from all this and apply them to Komarious yielded a small increase in rating. It may not be worth the bytes it cost, but maybe with some more tweaks... It's far from proof that this is any better than trial and error, but it's a good sign, at least. -- Voidious

Hmm, nice experiments, if I figure out what all the numbers mean I might be able to improve the segmentation of my dev7 gun branch. ;) -- Chase-san

Just posted some more recent results, which are actually the ones I used for Komarious: [nested calculations of segments]. Basically, in this version, I first calculate the best slices for LateralVelocity; using those slices as a base, I calculate the best slices for distance, then same process for wall distance then time since velocity changed. It stands to reason that the slices you will get for the segments will be different if you are already segmenting on something else, and indeed this is the case. -- Voidious

I got around to some updates to this stuff last night. First, I added some segments to WaveRecorder: reverse wall distance, time since zero velocity, time since max velocity, and distance last 8 / 15 / 25 ticks; I also tweaked the code a little to not mess up on skipped turns (like the MiniBot CustomEvent wave system does sometimes). In WaveReader, I segmented on accel before everything else, since I always segment that the same way, and (minor but important) added code to let me specify the # of slices for each segment when searching for optimal slices. Based on results from firing waves over 900 rounds vs each of 12 bots, I updated Komarious with many segmentation tweaks, and got a few points out of it... -- Voidious

I'm starting up my own research, but instead of the number of segments to have I plan to do research on what the best segments could be, i'll runn 100 round matches vs my ArtBot?(its a grapher bot I made) with added data recording via a normal TC match without my bot firing a shot at first, then slowly evolve a gun from the data. I plan to test Distance, latVelocity, Acceleration/TimeSinceVelocityChange?, Wall/Corner? Prox and whatever else I can think of. I am gonna do this mostly because I don't understand Voidious' results here. -- Chase-san

My results aren't for the best # of segments, they are for the best segments given a number of segments for a given attribute. Is there something specific I could clarify about the stuff I've posted? -- Voidious

I would add that number of segments will just about always - and certainly more so than where to slice given a number of segments - be a trade-off between learning speed and accuracy. Rarely, if ever, will more segments end up less accurate once given enough data. This makes "best number of segments" a bit trickier to find with the methods I'm using here. -- Voidious

I did a little research of my own, and the 2 most visited distances are 285 and 525 that got hits that stood head and shoulders above the rest (both over 200 hits, where as the next highest a distance of 200 only got 80 hits). I tested this with a very large diversety of 17 robots. As for lateral velocity, 0 was actually the most visited, with -8 and 8 hot on its heels (not very surprising considering), the least visisted was 3 and -3. If you wanna see the data i'm going to be working with more here is the Data my program spat out at me http://tinyurl.com/yfhqgl. -- Chase-san

One remark about lateral velocity being zero, did you also count the pre-fire stage (the first 30 ticks) and the stages where one of the bots is disabled? -- GrubbmGait

I didn't special case either of those in my research, but it would be a good idea to do so. Thanks, Grubbm... -- Voidious

Mmkay, my last attempt at this was horrible! But I actually got some good data this time, 10536 values for each timeSinceLastVelocityChange?, Distance, velocity, advancing velocity, lateral velocity, forward wall distance, and reverse wall distance from over 122 random bots (well 120, cause 2 failed to import), coming to a total of 73,752 values and 1,032,865 bytes (0.98 mb) of data. If anyone wants the file, its a direct object export of a Map<String, List<Double>>. However I gotta figure out a good way to analyze it now, heh.

I have plans to rerun it again, cept instead of 10 rounds a bot, 500 rounds a bot, using a better base, a auto-updating data file, and against about 300 bots, but this is a good start, which should come to over 100mb of data.--Chase-san

Okay, a little data on over many many many bots. Hopefully this can help you direct your research a little. http://chase.tfsnewworld.com/robodata/data.html --Chase-san

Hey, nice looking graphs ;) I never posted the raw data that I collected with my WaveRecorder? because it's literally gigs of data. However, I could post some of the segments and segment combos I found from this in case anyone's interested. Nothing that's going to rocket you 50 points in the rumble, but it was useful in tuning some segments in Komarious and also for having a good base segmentation to start for new attributes I wanted to try, like distance last 8/15/25 ticks and other even weirder ones. -- Voidious

Er, some more of the segments, I forgot I actually did post a bunch of the initial ones... -- Voidious
Sure! --Chase-san

Gigs? *Drools*.. erm, um, oh! Yes, the graphics were made in Tulip, a rebuild of my old Illustrator program, cept its alot smaller. Whats nice about this data that its diverse, over many many robots, so that you can get good stats segment idea's. Note, it was made in a surfer, which is why distance seems like it never goes beyond 750 (not that that is easy anyway), but generally okay if your gun is gonna be on a surfer too! --Chase-san