Bots that are good for testing against. They should meet the following criteria:
A list of fast bots would be good for rapid testing. Ideally I'd like a list of 20 or more, evenly spread out across the rating spectrum. :-) --David Alves
|Bot||Testing Time*||Saves data?||1-v-1 Rating||Specialization Index||Gun Type|
|1. abc.Shadow 2.46||Disable for test||1993||36.9||PatternMatching|
|2. pez.mini.Tityus 0.9.1||0:22||No||1889||41.1||GuessFactorTargeting|
|3. ad.Quest 1.0||0:21||No||1836||14.2||GuessFactorTargeting|
|4. kawigi.f.FhqwhgadsMicro 1.0||0:18||No||1766||14.7||GuessFactorTargeting|
|5. myl.micro.NekoNinja 1.30||No||1677||25.2||PatternMatching|
|6. cbot.cbot.CBot 0.8||0:25||No||1615||26.5||PatternMatching|
|7. tobe.Relativity 3.9||No||1588||29.2||?|
|8. tobe.calypso.Calypso 4.1||No||1498||25.9||?|
|9. sgp.SleepingGoat? 1.1||No||1435||25.1||LinearTargeting|
* 100 rounds against itself on Dave's computer, Athlon XP 2000+ with 1.5 GB ram
|Bot||Testing Time*||Saves data?||Melee Rating||Specialization Index|
|1. abc.Tron 2.02||No||1702||3.5|
|2. rz.GlowBlowMelee 1.4||No||1689||2.7|
|3. kawigi.micro.Shiz 1.0||2:20||No||1640||4.0|
|4. ara.Shera 0.88||No||1610||3.8|
|5. dummy.micro.Sparrow 2.5||No||1592||4.8|
|6. shu.nitro.LENIN .T34||No||1574||4.9|
|7. mld.DustBunny 2.0||No||1523||4.1|
|8. mz.NanoGod? 1.91||No||1502||4.2|
|9. Noran.BitchingElk? 0.054||No||1486||3.1|
* 10 rounds with 10 of itself on Dave's computer, Athlon XP 2000+ with 1.5 GB ram
It seems that the higher your rating gets, the harder it is to benchmark progress. Recent chats with David inspired me to use that [RoboRumble RoboLeague XML Generator], meticulously remove all the data saving bots, and end up with a 347-bot test bed to try and accurately guage how well a new version would do in the RoboRumble. For anyone else who's interested, here it is: [nodata_rrtb_template.xml]
You'll also need a convenient way to get the average percent-total-score from the seasons you run - I use a modified version of Axe's RoboLeagueAnalyser, which I've posted: [LeagueAnalyser_V.zip]. PEZ's RoboLeague/ScoreAverageAddOn might do the trick, too.
Time will tell just how accurate this benchmark is... And if you don't have a fast machine, good luck. =) But I thought it was worth sharing those resources.
I suggest you remove Cephalosporin from your testbed, Voidious. It times out quite a bit. e.g.
Next grouping: Phoenix 0.63* vs. Cephalosporin 0.2 (296 remaining). GE> gjr.Cephalosporin 0.2 still has not started after 10000 ms... giving up. GE> gjr.Cephalosporin 0.2 still has not started after 10000 ms... giving up. GE> gjr.Cephalosporin 0.2 still has not started after 10000 ms... giving up.--David Alves
That 347-bot testbed sounds like a very good idea. Ive just finished some testing with my old testbed and can only conclude that it is not good enough to show the effect of my changes to bullet detetction on the ratings :-(. So does anybody already have some results on how accurate the testbed mentioned above really is? Is it suited to predict a tendency in rumble ratings? If so, whats the margin of error (meaning: what fluctuation is to be expected and what does the difference in average score percentage needs to be to actually predict a change in the rating)? Im somewhat reluctant to make a score of releases just to find out which of the changes actually is responsible for the drop in the ratings, so a reliable testbed would be extremely useful... --mue
I have scores for a few versions of Dookious, which I'll put at the bottom of this comment. I have found it to be at least somewhat reliable, but I think it could be improved upon. Removing most of the HOT bots (or anything else top bots get 99% against consistently) and rammers is something David and I had discussed, and is probably a good idea. Also, removing all the data saving bots resulted in generally weaker bots - Dookious gets almost 86% against this TestBed, but a couple percent lower in the rumble.
The variance is usually .1% or .2% between seasons, but once I did see a .4% variance between 2 seasons. David's the only other person I know of that has tried this, and he had a version of Phoenix score lower with this TestBed over 4 seasons but gain a couple points in the rumble. So, I guess my final analysis would be: it's a lot closer to accurate than any TestBed of a few bots I've thrown together, but it is not perfect or even close.
Oh, this might save you some time:
All versions of Dookious with data saving off. Hope that is of some help to you (and maybe others). (Edit: added Komarious scores.)
Thank you very much, Voidious. Thats exactly the data i was looking for. I just installed RoboLeague, and run Ascendant vs that testbed now. Concerning the numbers you posted: The difference between Ascendant 1.2.6 and the Dookious releases is about 14 - 15 points in ranking but only between 0.45% and 0.6% in average score percentage vs the testbed. And i need to trace some smaller rating differences right now. So hm, maybe i have to add the data saving bots again and delete the data directories after each season... But well, first there'll be some testing with the testbed as it is.
And i cannot help but wonder: running 4 or even more seasons vs that testbed needs more than 1200 battles. You dont happen to have a little server park at your disposal? You could as well run your own little rumble including smart battles, score obfuscator (aka rating system) and all that stuff :-) --mue
Heh, no, just my Core Duo MacBook and an AMD2000 PC =) It takes me about 3 hours to run 1 season for Dookious, 1.5 hours if I split it up on my two CPU cores, or a little less if I split it up three ways. (Mind you that Dookious is a fair bit slower than Ascendant.) -- Voidious