Robo Home | TargetingChallenge | Changes | Preferences | AllPages

Looking at GD's results sheet I am thinking 7 seasons must be some minimum:

Season DTAsp TAOW Spar Cig Tron Fhqw HTTC Yngw Funk TotalComment
Season 1 54.29 83.69 100.00 85.14 68.89 84.80 83.29 75.91 90.94 81.86 80.88
Season 2 58.40 90.09 100.00 93.49 80.57 84.09 87.71 82.43 98.29 76.43 85.15
Season 3 66.40 81.91 99.91 90.06 66.89 84.57 85.60 81.69 98.40 80.00 83.54
Season 4 67.77 70.40 98.51 89.37 76.31 77.34 79.43 85.69 94.97 71.89 81.17
Season 5 78.83 85.80 98.86 88.34 72.60 85.34 84.11 85.00 91.31 85.23 85.54
Season 6 55.20 80.97 100.00 78.06 63.77 77.09 84.80 86.20 93.97 79.00 79.91
Season 7 59.89 79.29 98.06 86.57 80.69 86.97 86.51 89.03 97.69 80.60 84.53

FloodMini's strength is in the long haul, but it looks like it's still close to the top against everyone using his normal gun. The # one is when he fires a wave only when he fires a real bullet ("normal" Guess-factor targeting I suppose, now you see why I don't normally do it :-p). Given only this many rounds, the fire-a-wave-every-scan philosophy outscores the normal system against even the bullet-concious, like HTTC and Cigaret. Meanwhile, I think some Haikus would do better at this than HaikuTrogdor, but I'd like to see them hit AresHaiku. -- Kawigi

Interesting! I have also noted that when graphing with RoboGrapher mmost bots show the same spikes for both types of waves. Even the bullet aware bots. They show quite different profiles with the two wave types, but the most visited factor ususally stays they same. GloomyDark acts on this knowledge, shooting from the every scan waves until the bullet trailing waves have built up. Of course, something else is broken in GD's gun so I can't measure the effect of this ... =( -- PEZ

Hmm... This is an interesting result. Unfortunately I don't currently have time to run the current development version of Fractal through this challenge (I'm actually planning on sleeping tonight for once :D and I can't sleep with my computer on because the noisy fan drowns out my alarm clock and I'll miss class), but I will certainly look into having Fractal fire bullets every wave. Heck, in AutomatedSegmentation, this could just be added as a segmentation dimension! :D. However, I've tried firing a wave every tick before, and Fractal's wave handling is currently horribly slow so it slows the robot to a screeching halt; I'm sure there's some bug in there, so I'll look into optimizing it to get it working. And yes, if 7 seasons seems like too much of a minimum, we should probably increase it; it's really arbitrary, so you can use whatever you want, but it's probably a good idea to post in the comment what you used. I'll probably run 10 or 12 for Fractal when I'm out to class tomorrow... -- Vuen

RoboGrapherBot and GloomyDark are doing just that (using wave type as just another segmentation dimension). Could it be the AutomatedSegmentation that slows your wave handling down? I'm thinking maybe it's a lot of work doing all that smoothing in all permutations on the return of all waves. Can't you do the smoothing when you read the data instead? -- PEZ

The dev Fhqwhgads also segments on this at some level, but it's lower priority than a lot of segmentations because it's frankly better against a lot of bots to just use the continuously-compiled stats. Note what the highest score is against DT so far - FloodMini with saved data on his normal gun. -- Kawigi

Actually, I'm pretty sure it's not the smoothing that's hurting it; it slows to a halt as soon as the waves start being launched. I'm sure it has something to do with the wave handling. However I haven't tested this since Fractal 0.32, and its AutomatedSegmentation gun is wicked fast compared to that; I'll test it out and see what happens. I also posted Fractal's latest score; somewhat disappointed. Ideally I'd like to be able to hit 90 in this challenge... I'll keep working at it... -- Vuen

Hmm... Oddly enough, it runs surprisingly faster firing a wave every tick; it now runs at about 100 fps rather than 80 fps, and that's with the window open, and yes that's doing however many thousands of gaussian smoothings per wave hit (which is now every tick rather than every 16 ticks). Doesn't make much sense if you ask me... And, two hours later, I just realized I forgot to save this page after typing this. lol :D -- Vuen

Odd indeed! But firing a wave shouldn't take any time at all anyway. You fire it and then every tick check if it has traveled far enough, and that's it. It must be something else in your code eating up those CPU cycles. 100 fps sounds like a SlowBot still. But if it is using the CPU for things worthwhile I guess there's nothing wrong with it being slow. =) -- PEZ

If anyone is in a position to run DT through this version of the Targeting Challenge feel free (I think Jim owes me a favour). -- Paul Evans

How did that dept arrive? =) I'll run DT tonight for you. I have been intending too all the time since it will give us such a good reference. Look back at this space tomorrow and you'll see DT in yet another #1 spot probably. -- PEZ

100 FPS minimized sounds like a SlowBot, but 100 FPS watching it doesn't. I think HaikuTrogdor might have gone as high as 6000 FPS minimized when I was testing his gun :-p 90 in this challenge is a challenge, especially since we've barely topped that in the normal TC. I won't say it's impossible, though. It looks like if I could take the best results against each opponent for FloodMini, I'd have a 92.37 (of course, that's balanced by the 84.31 it would get if I used the lowest score for each opponent). It's seasonal scores ranged from just less than 87 to just less than 90. -- Kawigi

Paul, yes I owe you a favor. If PEZ does not do it I shall. Feel free to call it in on some other topic though if you wish. I have not forgotten. jim

If PEZ does not do it? What PEZ says, PEZ does. =) The results surprised me. You have some work to do Paul. One of the seasons DT broke the 90 point wall though. Check it against TCCalc if you're curious: http:/robocode/uploads/pez/TCFast-DT.xml -- PEZ

Sweet. Congratulations Kawigi on smoking SandboxDT in the fast learning challenge!! -- Vuen

I knew you would PEZ. Thats why I did not spend the time doing it myself. I also wanted to communicate to Paul that I had not forgotten his favor =^> -- jim

It's a close second - given that the standard deviation for GloomyDark's results above is 2.27 you should not be that confident the result is accurate as the difference in % scores for the top position is presently 1.02%. Of course DT's score could be a lucky high - it's true average could easily be a couple of percentage points lower - putting it below Griffon! -- Paul Evans

I can run it more seasons. That's a cool feature of RoboLeague that it just continues to build on the XML file. Let's say we opt for running 15 seasons. 525 rounds against each opponent should give a more reliable score. I think that maybe GD's results are a bit misleading since I have one or more bugs in the gunning code and it might be more unstable than most. I don't think any of the seasons gave DT a score below Griffon's average though so I think you souldn't worry too much. =) Here's the table for DT's first 7 seasons anyway:

Season DTAsp TAOW Spar Cig Tron Fhqw HTTC Yngw Funk TotalComment
Season 1 64.83 94.71 100.00 97.97 74.77 81.03 92.94 77.34 99.31 86.06 86.90
Season 2 79.63 89.31 97.60 94.06 72.37 80.03 96.57 85.97 99.66 83.94 87.91
Season 3 66.97 90.97 100.00 95.43 67.31 82.80 98.97 87.26 98.06 83.06 87.08
Season 4 59.40 88.17 99.43 95.89 76.80 84.69 93.03 81.97 95.49 81.97 85.68
Season 5 66.51 86.43 100.00 97.74 69.97 78.51 93.37 88.91 97.03 83.57 86.21
Season 6 75.43 91.71 100.00 97.03 87.89 86.97 90.63 91.20 98.34 87.63 90.68
Season 7 65.94 88.63 100.00 97.43 73.69 85.46 92.97 88.11 94.94 78.91 86.61

Hmmm, maybe the std dev is about the same here... Anyways, I'll run the next 8 seasons tonight so you'll see tomorrow if your score is adjusted upwards or downwards. Everybode else should also build up their scores with at least 15 seasons I think. If you can deduce from the tables of GD and DT what number of seasons we should run to get, say +/- 0.2 accuracy, let me know before tonight and I'll run enough seasons.

-- PEZ

Ah crappy, edit conflict! [edit] er, my bad... SandboxDT's standard deviation is 1.65 (57.8 points), while the standard deviation of the mean is 0.62 (21.8 points) (Gloomy's deviation of the mean is 0.86)... Shouldn't we be comparing standard deviation of the mean though, not the deviation on the individual season results? -- Vuen

Yes, I would say it's the Total we should look at. -- PEZ

...Yes I know PEZ, the error on the individual columns is an entirely different calculation. The standard deviation is the error on any particular Total of one season. The standard deviation of the mean is the error on overall total (i.e. the average, or 'mean' of all the seasons (hence the 'mean' in 'deviation of the mean'.)) Since we're comparing overall totals, for error calculations shouldn't you be using standard deviation of the mean? Because if you were to repeat 7 seasons of SandboxDT again (or any number of times), the average of the total of these 7 new seasons would have a 67% probability of landing within 0.62 of the total given in the overall results at the top of the page, not within 1.65. -- Vuen

Vuen - I think you are correct - I'm mis-using statistics - the standard distribution result I calculated tells me that 66% of individual season totals are expected to fall within the range of the mean +/- the SD it does not tell me how confident I should be of the value of the mean. Does any one know how to do this? -- Paul Evans

The standard deviation of the mean the value within which 66% of repeated experiments (repeated 7 season average trials) would fall. It is simply the deviation of the mean divided by the square root of the number of measurements.

In other words, if you were to run one additional season, it would have a 67% chance of landing within 1.65 of your overall score for the first 7 seasons, but if you ran 7 more seasons and averaged those 7, it would have a 67% chance of landing within 0.62 of your overall score for the first 7 seasons.

-- Vuen

I may have found the answer to the last question - looking at MS Excel I need to find the population SD of the results totals which is 1.526 then use the CONFIDENCE Function - this tells me that I can be 95% confident that the mean is 87.30 +/- 1.13 or 50% confident that the mean is 87.30 +/- 0.39 etc. (I know however statistics are tricky - the fact that the totals are averages themselves probably invalidates the logic - I seem to remember things about degrees of freedom, population or sample standard deviations back in college and never really understanding it) -- Paul Evans

In marketing research you often use a chi-square test to calculate confidence values. It sounds a bit like what you're doing in Excel there. If I remember correctly you calculate the chi-square of the mean for a certain number of observations like so (assuming o is the observation value, m is the mean, d is the standard deviation):

  1. For each observation
    1. square(o - m) / square(d)
Sum it up and you have the chi2 value. I don't remember what you did with this value though, but it might have been that the closer to 1 it is the more confident you can be....

-- PEZ

Yeah, that seems to make sense to calculate a confidence value (since statistics are so fond of squares :D). [edit] I just realized how little sense what I derived here made... Anyway, I never took a stats course, but in the lab part of my physics class they expected us to know all of this; luckily the lab manual has a good appendix, so it explains the basics of statistics really well. I wrote a little Java app a couple months ago to do these calculations for me (which is what I used just now :D). I'm not sure however how this excel confident function works, but it's probably something to do with the chi2 value you show here. The lab manual I have is more concerned with comparing the results with error to known values or to other values with error (we do stuff like measuring a spring constant two different ways and comparing the results), so I'm not sure how to calculate confidence in data. Luckily excel does it for you :D. My roommate would probably know how because he's taking 2nd year stat, except that it's 6am right now, so he's fast asleep... Maybe I should sleep too (class in 3.5 hours!) -- Vuen

Once we have to formula I can add it to the /Calculator and it can place it in the Comments column together with the number of seasons. -- PEZ

OK - I can derive the 0.62 result Vuen got using Excels Confidence function - I have found out the mean +/- SD is 68.27% of results not 66% as I previously thought. If I use the sample standard deviation in Excel's Confidence function and use 68.27% as a confidence level I also get 0.62. All I'm not sure of is why Excel suggested that population SD should be used and not the sample SD. -- Paul Evans

Ah, that makes sense. Take the gaussian smoothing function (f(x)=1/(sqrt(2*pi)*d)*exp(-x^2/(2*d^2))) and throw in x=d=1; you should get the same thing as punching in 0.624187379534571 into the confidence level for DT because the population SD is 0.624187379534571. That's why excel used population SD; the population SD excel is referring to is what my lab manual calls standard deviation of the mean. You want the certainty on the average, so it uses the error on the average rather than the error on a single value. This all makes perfect sense now. Looks like the percentage confidence function is simply 1/(sqrt(2*pi)*d)*exp(-x^2/(2*d^2))*100% where d is standard deviation of the mean and x is the deviation from the value for which you want a certainty percentage (i.e. the difference between your score and the next person's for example). This in itself isn't however very useful; the standard deviation of the mean (population SD) is much more useful, so if you're looking for something to punch into the TCCalc PEZ, put in popSD = sum{for each season}((tot_avg - tot_season)^2) / ((num_seasons - 1) * sqrt(num_seasons)). Note that this will die if the user only inputs one season :D -- Vuen

I'll do that. Thanks for putting it in pseudo code for me, I really got lost there in the rest of the discussion. =) No problem with single seasons since I don't calculate averages then anyway. -- PEZ

I added the 15 season run for Jekyl as PEZ suggested above. I had to run it twice as I was not sure of the results. Seems to be accurate. Not that it is anything outstanding, but I did want to make sure I got it correct. I find it interesting that Jekyl seems to learn fairly quickly. Very interesting indeed. One thing I have noticed is that data savings does not seem to be all that effective long term vs. some bots. It may be a bit dicey to draw this conclusion from such a limited number of runs but I have run this challenge 5 times or so with the same reslults each time. Against SandboxDT having data seems to make a difference (+4% to +6%). Cigaret I seem to hit just as effectively without data as I do with. In the case of Tron it seems like having data may even hurt sligthly(-35 to -5%). It is hard to effectively judge though as over any particular 35 round match I would suspect that proximity to the target plays a statistically significant role. Any one else taking the time to look at their results and trying to determine what they might mean? -- jim

I just figured looking at results from others that I focus gun developement on too few opponents :) And yes... my results vs Tron seem to be unafected by learning -- Frakir

OK, so I now have a version of TCCalc that calculates the popSD the way Vuen suggested. I get popSD == 7.03 for DT's 7 seasons above. Could that be right? If so, what does it say? Should I do something else with this figure before presenting it? -- PEZ

Ehh, I just realized I used the total scores without dividing them with the number of bots. When I use the averaged scores I get 0.070 for DT. Same questions as above though. =) -- PEZ

The more I think about it them more it makes sense that data would not be effective here. Some of these bots are either best in bread or bots with some pretty distinct flaws in their movement. For the ones that are good, it stands to reason that having more data will not make too much difference as they most likely have some pretty flat movement to begin with. In the case of TAOW, it does not take long to catch on to his movement flaw and exploit it. I agree with Frakir, I am spending way to much time on way too small a population of targets. -- jim

Um, I got a popSD of 0.62 for SandboxDT... Maybe I got the formula wrong... -- Vuen

You or I got it wrong. Can you double check? I have checked the code twice now and I think I have got it right. If you know some Perl I could post the code. -- PEZ

Once we have sorted this out I think it would be best to hide this stuff away so no one gets a headache... Using the data set {86.9, 87.91, 87.08, 85.68, 86.21, 90.68, 86.61} (DT's totals) I get the following results:

Stage one is to agree on what is the definition of population and sample standard deviation for the data. -- Paul Evans

I already have the headache =) When I started with Robocode I had problems since I was Perl "tainted" and now I see I am Java tainted coding Perl... =) Perl is scary in that it just goes ahead even when there are quite severe errors in the code... Anyway, I have now fixed one bug in the calculation of popSD and I get 1.028 for DT's 7 seasons. (headache) The test calculator is named "TCCalc2" if anyone likes to test it. I'll continue my bug hunt mean while... -- PEZ

I can't see where I am doing things wrong. Anyone sees it?

        my $sumDiffSq = 0;
        for (my $i = 0; $i < $seasons; $i++) {
            $sumDiffSq += (($total_score / @bots) - ($seasonTotals[$i] / @bots)) ** 2;
        my $popSD = $sumDiffSq / (($seasons - 1) * sqrt($seasons));
One thing I notice is Paul's remark there that Vuen did "SD/SQRT(7)". But in the pseudo-code Vuen posted it was "sumDifferencesSquared?/((7 - 1) * SQRT(7))". -- PEZ

Well, I did it with Paul's remark as the clue instead and now it is SD/SQRT(num_seasons). I get 0.62 like Vuen and Excel using Sample SD. The name "Sample SD" in Excel sounds like what we should go for here since it is the sample size we can vary. The population size is infinite, in'it? -- PEZ

I've plugged this "confidence interval" calculation into the /Calculator now. I get 0.52 for 7 seasons of Griffon's. Let's see where we land with 15 seasons of DT's. Soon, it's running now. -- PEZ

Hmm... Looks like Sample SD and Population SD are two different things now. You are looking for sample SD; not sure what population SD is, but sample SD is the deviation on the sample which logically makes sense and gives you the right results for what you are looking for. I suggest we ignore population SD, since we can't seem to figure out what it is, and use sample SD, since we now know what it is.

Anyway for the piece of code you divide the sum by (seasons - 1) to get the standard deviation, then divide the standard deviation by sqrt(seasons) to get the standard deviation of the mean (sample SD). I just combined the two to get divide by ((seasons - 1) * sqrt(seasons)) to skip the intermediate SD step. Alternatively you could just write $sumDiffSq? / ($seasons - 1) / sqrt($seasons). However what exactly is the ** 2? Is that squared? It should be squared, not a multiplication, but I can't remember my perl well (i do hate perl) so that's probably what that means. You could rewrite it to be sure like this:

        my $diff = 0;
        my $sumDiffSq = 0;
        for (my $i = 0; $i < $seasons; $i++) {
            $diff = ($total_score / @bots) - ($seasonTotals[$i] / @bots);
            $sumDiffSq += $diff * $diff;
        my $SD = $sumDiffSq / ($seasons - 1)
        my $sampleSD = $SD / sqrt($seasons);
Although that's probably what you already have. Not sure what's wrong with the piece of code you wrote earlier though.

-- Vuen

I am pretty sure that "** 2" is squared. =) What's wrong with the piece of code I posted above as well as the piece you just suggested is that it lacks a sqrt() call. I needed Paul's hint to realize that. The code now looks like so:

        my $sumDiffSq = 0;
        for (my $i = 0; $i < $seasons; $i++) {
            $sumDiffSq += (($total_score / @bots) - ($seasonTotals[$i] / @bots)) ** 2;
        my $sd = sqrt($sumDiffSq / ($seasons - 1));
        my $sampleSD = $sd / sqrt($seasons);
After 15 rounds DT reaches a confidence level of 0.32. Quite predictable. =)

-- PEZ

Does your @bots have 10 or 11 elements? Can't tell from those snippets... -- Frakir

Ah, I remember now. There's that whole additional square root on the standard deviation formula... Can't believe I missed that... Also @bots is 10 because there are 10 pairings (10 measurements, 10 reference bots). Good work PEZ :D -- Vuen

First time with the calculator. Indeed a great job, PEZ -- Axe

Great score Axe! Robocode competition has a bright future. =) -- PEZ

Thnx, I am very proud about it, almost didnt believe that... Have i really beaten DT (at least here)??? Paul, are u sure that your results are correct? Or am I dreaming? :^D! -- Axe

I've no reason to doubt the results - I gues DT's superiority in the leagues must be in the movement (data saving will not play a factor for most of the league battles). -- Paul Evans

Which means. All you who have results near or above DT; What are you doing here? Go to the MovementLaboratory and stay there! =) -- PEZ

That will be surelly my next step, but first i want to see how this gun improvement affect my ranking (i am tired of modifying both gun & movement and then discover that a bug in the movement was masked by a gun improvement). One step at a time. I was trying to prove myself that u can have a PatternMatcher gun as good as (or even better) the GuessFactorTargeting guns. I know that ABC already done it (before and better), but i wanted to do it myself. It is amazing how i converged to a similar idea of TronsGun (the MultipleChoice feature it was actually the only feature inspired on TronsGun). -- Axe

Great stuff, I'm glad I inspired you to code such a powerfull gun. I agree I did it first, but if it is better or not is still to be tested. I'll run my gun through this challenge next week (unless some kind soul does it for me in the meanwhile). The MultipleChoice is definitely a big factor, but you probably went the "pure" PM way while I tried to adapt it to work in melee battles, the good 1vs1 performance was a lucky coincidence ;). -- ABC

I already had three kinds of PM guns in my old HataMoto?, witch is a melee bot (it guns design was a StatisticalGun? with 10 different targeting systems using VBs to rate them). Musashi was based in it, but the firsts versions eaten dust. The great improval came when i saw it in the top of the SlowBots list. I decided then to keep it simple (KISS), one good gun is better then 10 poor guns. When i say that i converged to ABC's idea is because i use a (-1,-2,-5,-10,-20,-40,-80) to find the best sample, also using the ditances from me, the wall and corner to rate it (i sware to god that came to this by myself). One day i saw the TronsGun description, witch uses something very similar, plus the MultipleChoice scheme - if it works with Tron & Shadow, it should work with Musashi too... -- Axe

Continuing the above, i think that the ABC's MultipleChoice in PM idea gives to the PatternMatching a kind of GuessFactorTargeting "flavor", without loosing the PM capabilities that i like so much. I also think that the combination (Mult.Choice + PM) overcomes the GuessFactorTargeting capabilities in short battles, it learns real fast! A prove of that is that Musashi's TargetingChallenge/ResultsFastLearning is very similar to the TargetingChallenge/Results. That convinced me to even stop saving data for the time being (that and the bloody JRE 1.4.2 misbehaviour - i dont like the idea of my bot lying disabled in otherelses machines :)). I have only to disagree with ABC about the "lucky coincidence" thing. When we talk about Tron & Shadow very few things can be described as luck ;-) -- Axe

Just posted the 15-season results for FloodMini - dropped my score by .17, did a little better against DT, didn't lose a point to TheArtOfWar in any of the new 8 battles, but performed a bit worse against Cigaret, Tron, and HTTC. I also formatted the table so it was more readable when you're editing it. -- Kawigi

I have reworked my gun slightly moving away from ints as a storage mechanism and toward floats. I got a slight boost in performance as you can see. -- jim

Looks like even small changes to Aristocles gun can make quite a difference. -- PEZ

A slight, slight improvement for Aristocles. Probably within the margin of error. But I might have better luck in the 500 round TC with this segmentation. Time will tell. -- PEZ

Added results of Aleph. I didn't try to tweak the gun so far. let's see if I can do something... -- rozu

Added a TC for Shadow since ABC didn't get around to it. :-) Impressive scores, if it could hit DT worth anything then I suspect it would be top. -- Jamougha

Thanks, looks like my gun is still pretty good in this challenge after all... :) -- ABC

Name Author DT Asp TAOW SparCig Tron Fhqw HTTC Yngw Funk Total Comment
Pugilist 1.9.7b PEZ 78.41 90.18 99.35 96.26 79.28 86.03 94.90 83.95 97.14 88.32 89.38 15 seasons
Pugilist 1.9.7 PEZ 78.58 89.54 98.95 97.23 80.44 82.89 94.98 84.20 98.08 88.78 89.37 15 seasons

The huge bug (which really is huge) in Pugilist 1.9.7 and corrected in 1.9.7b doesn't seem to make much difference in the fast learning results over all. Against individual reference bots it makes a difference though. And with the bug in place P gets 6 more rating points in the RoborRumble?. In the 500-rounds TC the bug-fixed versions scores 1 TC point higher than the one with the bug. Food for thought... Now I guess I must reinstall that bug in Pugilist. Strange. -- PEZ

Great result, ABC! --Vic

Thanks :). Now I need to find a way to improve its long term performance... -- ABC

Cool! Any hints on how to hit TronTC? and HTTC like that? -- PEZ

Sure, you just have to forget how to hit DT like you do and it will come naturally. :) -- ABC

Yay! -- PEZ

Robo Home | TargetingChallenge | Changes | Preferences | AllPages
Edit text of this page | View other revisions
Last edited November 24, 2004 23:50 EST by PEZ (diff)