Roborumble.org - Robo Wiki -= Collecting Robocode Knowledge =-

http://roborumble.org/home

What is it?

Two things:

A place to store all your jars so that they're accessible to the world with version history
A RRAH server, to provide stable rankings for robots

What's Done?

Database schema
Robot creation / uploading / downloading
User creation / login / basic permissions
Ruby equivalent to J2EE's <display:table> tag
Basic CSS work

What's Not Done?

Ranking server
Battle uploading
Rating calculation
Categories for bots ("minibot", "wave surfer", etc.)

News

I put Phoenix 0.1 and 1.02 onto it as a test. Feel free to upload your own bots. I'll make sure to keep them when I update the application. Bots can be downloaded with urls like this:

http://roborumble.org/download/Phoenix

More generally, roborumble.org/download/<somerobot> will always download the latest version of <somerobot>. Specific versions can also be downloaded, if you know you want a particular version. --David Alves

About the whole codesize/properties thing, I think I know what its for, the storage of robots and such right? You could always just parse the file name, as it should do now. And on its first battle have fnl add code so that it returns the codesize, and then determine its league. I don't think you wanna invoke the java runtime, (which here on windows takes up a good 10mb to 20mb of ram).

Also, note, I would be happy to help with the css/xhtml/site design/php. Having been a webmaster the last what, its almost 2008... 4 years. I do valid xhtml strict, and i'm a css wizard (i'm pretty good at complex cross-browser css), and I wrote the backend on the site I administistrate.

--Chase-san

Oh also, could you make it so people can use different flags instead of the one for thier country, as that has happened before (they may tell everyone where they live, but wish to represent another country). --Chase-san

It is planned do have a completely new Robocode server, right? I think we should think about other rating systems. The current one is good at estimating the rating without many battles, but has the downside of rating-drifts. I think we should base our new score just on the average score percent (e.g. score = accumulated score / number of Bots). A nonelinear translation into values around the current rating points would be nice. But i doubt we could not get the exact same values. --Krabb

I did send him my prototype of the ELO base system, he just hasn't done anything with it yet. (it worked, it was just lacking large data tests and such) --Chase-san

Hmm, isn't the current system also ELO based? I think an ELO system is useful in cases where you have a small number of games and many competitors (Like chess rankings or online games like Warcraft III). A human player can only play a limited amount of games, but we can calculate hundreds of robocode games per day. The problem with ELO based systems is, that your rating is dependent on your opponents rating(wich depends on its opponents as well...), therefore your rating can't be stable. --Krabb

Really, I think a bots rating 'should' be based on the rating of other robots, dispite the instability. Besides a fixed rating system would be just as instable, as it would have to be based off the preformance of the robot vs other robots anyway. --Chase-san
Hmm, I think a fixed rating system would definitely be more stable. We get this ugly rating drifts due to new Bots and especially removed bots. And i don't see the point, why ratings should be based on the rating of other bots. But i would be happy if you disabuse me :) --Krabb
- I just can't invision a fixed rating system, makes me think of it to much like the challenges. --Chase-san

We should just switch to PremierLeague across the board. =) -- Voidious

That might work, but I like the ELO system alot, but true, a PremierLeague setup would be more stable. --Chase-san

The 'total % wins' column in the PL ranking already contains the accumulated score percentage/100, almost the same scoring Krabb suggested when starting this discussion. It has the advantage of being stable when all pairings are done, but not the rigidity of the PremierLeague as it still matters how well you beat others. The ranking would be very close to the current ranking. Currently DrussGT would score 520/606*100 is 85.81, while both Ascendant and Shadow would score 510/606*100 is 84.16. I don't like the rigidity of the PL and I also don't like the ratingdrift, and this system would combine most advantages and some disadvantages of both systems. I don't think that the outcome could be mapped to the current rating though (85 => 2100). Another small advantage: scoring like 5763 to 0 now would count instead of being ignored. -- GrubbmGait

It should not be that hard to map the the new score to our old rating system. Looks like a simple polynomial would do the job:

http://designnj.de/roboking/rating_table.JPG

What do you think? --Krabb

Hmm, I just made an account on Roborumble.org to test, however I accidently put the country down as Afghanistan instead of Canada, and I can't see any way to change it. Hopefully things like changing that will get implemented or that could get fixed some time ;) Nice stuff for how this is looking so far. I'd be willing to help if only I knew Ruby. -- Rednaxela

If I try to upload a bot, somewhere it is making it a nil object and breaking somehow. I can post full error mesage if you are unaware of this problem. --Baal

I was likewise unsuccessful in uploading a bot, about a week ago. - Martin

there is an error if in the "Robots & Teams" page click an uploaded robot, if the robot is not uploaded everythink seems fine -- asdasd

Well, as it stands right now, Roborumble.org isn't really working but I hear David is working on it. Hm.. maybe I should learn some Ruby to help out on this... -- Rednaxela

To change the subject back, I too would like to see a scoring more like a simple average percent of total score. No fancy math after that, every bot just gets a rating between 0-100. Or multipy by 10 to get 0-1000, that might be cooler. I see no need to map to attempt making the scores look the same as they do now. --Simonton

Hmm, I think such an average rating would be good too, just so long as we keep the PL rankings too, after all it's the easiest way to see someone Undefeated. Personally I'd rather 0-100 than 0-1000 but that's a matter of personal taste and I'd still want 4 digits of precision. Only problem (not a serious one) with changing the rating system like this, is The2000Club might be in need of a replacement =P. One other thought, is it might be nice to use some statistical methods to calculate "estimated error" values, to give an estimate about how much a ranking might be affected by lack of battles. Not a critical feature nor would it affect ranking/ratings, but a would-be-nice-to-have thing. -- Rednaxela

Good idea w/ the error values - but I guess I'm not sure how they would be calculated. What would you suggest? The only thing I can think of is to keep a history how much influence x battles have on total score after y total battles, then use the average. But I definitely want to keep the premier league, too. Also, maybe the scores should be multiplied by 100 - that way we can see 4 digits (I'm just not a fan of decimals in the scoring) AND we would get things like The8000Club?, which sounds WAY better than The2000Club (like 4 times better!). --Simonton

I'm all for it =) I need something new to work on with DrussGT. Moving the goalposts will do just that. -- Skilgannon

No matter how you twist the rankings DrussGT is the goalposts... :) -- ABC

I'd certainly call the PL a "twist" in a way, and in that way, DrussGT isn't quite the goalposts... besides ABC, you're the one that sets the goalposts in melee, holding both the top place and 2 others in the top 5 in the megabot melee ;) -- Rednaxela

Well, put it this way: if improving 5% against top bots will boost my score as much as improving 5% against low ranked bots, I'll get around to adding other things, like an anti-surfer gun. I've finally figured out why improving scores against low ranked bots helps more than against high ranked bots in the old system: as your rating increases, the expected score doesn't change as much due to the ELO system, so it will contribute to canceling out ProblemBots. Against mid/high ranked bots the expected score changes more as your score increases, so they no longer contribute to canceling out ProblemBots, and may even become ProblemBots. Make sense? -- Skilgannon

The fact that your score wouldn't depend on beating low-ranked bots more than any others is a strong reason to move to this kind of system, in my opinion. --Simonton

To compare, ELO is mostly a measure of how much you thrash the weak, average score is how well you fight overall, and PL score is mostly a measure of how you overcome the strong. I think each of those are interesting and have their place, but at least personally, ELO carries less interest than PL and average score. -- Rednaxela

I believe that the % score ranking will end up exactly the same as the ELO one, but will have some (important, imo) disadvantages. You guys are wrong thinking that in that setup beating the low-ranked becomes less important, if anything it will be more important. With a 1% score increase meaning exactly the same against anyone, getting those easy points will be essential. With ELO, 5% against top bots will boost your score as much as against low ones Skilgannon (it only depends on how far you are from the expected score curve), but it's very hard to do it without losing more that that against the middle ones! -- ABC

I still wish I knew more about how ELO works. I've just heard that a 1% increase against a low ranked bot gets you more points than a 1% increase against a top-ranked bot, but from what you say that's incorrect? I would also be hard-pressed to think that squeezing 1% more points out of the low-ranked bots is any easier than the mid range in the current top bots, but then I've never had a top bot with that much room to tweak before. But, come to think of it, probably 1/2 the reason I would like a more straight-forward scoring system is because I don't know the algorithm currently used. I've asked on the wiki before, but with no answer. -- Simonton
There's a link to an article by Paul Evans about ELO rankings on the EternalRumble page. Also some stuff tucked away on the RoboRumble/OldServerDevelopment page (like new bots starting at 1600, new versions starting at old version's ranking). -- Darkcanuck
ELO is just a way of predicting battle results. If you know the ranking of BOT A and BOT B, you can predict the average score they'll get against each other before they fight. Every battle a bot get makes it go up or down depending on the difference between the expected score and the actual result, and all the expected results are updated with that new ranking. If you thrash a bot that is higher ranked than you, your ranking will climb more that if you thrash a lower one. I like it a lot because it is very good at predicting battle results, even with only 35 round battles. In the end, with full pairings and lots of battles, it will be exactly the same as a %score ranking, but you get cool stuff like the PBIs and the LRP. -- ABC
Hmm, after reading more about the system, my position has changed to this: ELO is fine, but also display an average percent score alongside it for the sake of interest. -- Rednaxela
I can believe ELO will show results very similar to using average percentage of scores (APS) after enough time, but it cannot be exactly the same for at least two reasons. 1) In the ELO system thrashing a high-ranking bot gives a greater ranking increase than thrashing a low ranking bot, with APS it does not. 2) The ELO system drifts (because every time there is a problem bot fight, the scores go up/down for both bots), whereas the APS system will stabilize. Difference (1) makes ELO look more attractive to me, while (2) favors APS. A bigger difference that favors APS imo is the fact that early on in a bot's career, before its score has stabilized (which takes more than the 20 battles prescribed in RoboRumble/OldServerDevelopment), it will produce inconsistent rating adjustments to its competitors (because as its rating fluctuates drastically, its expected score does too, so the same battle that would have raised its opponent's ranking a little while ago could now lower it). I do agree that PBIs and the LRP are cool, so I would vote to come up with an equivalent (or better?) way to calculate a PBI in an average percentage of scores system. -- Simonton
One of the advantages of ELO is that it stabilizes faster than APS. The greater ranking increase comes from thrashing an enemy that is high-ranking in relation to you. The up/down movements are relative to the amount of "anchors" you have already, the more pairings a bot has the less it will move when it fights a problem bot. After that problem bot is accounted for in your score, further battles will not change your ranking. It probably takes more than 20 battles per bot for ELO to start making good guesses, but APS will take many many more. It's made for competitions where full pairings are impossible to get, if you have full pairings APS will be just as valid, and simpler, I know. But I still think it's predicting abilities are amazingly cool. -- ABC
Ok, that is different than the ELO system described on the link from the EternalRumble page. Can you give more specifics on how that works, or point me somewhere that does? -- Simonton
How is it different?
[This page] says to use the following formula, which makes no mention of whether or not you're facing a ProblemBot. Edit: it also makes no mention of ranking change being relative to the number of battles already fought -- Simonton

        EstimatedScoreFraction? = 1 / (1 + 20^(-RatingDiff?/800) )
        NewRating? = OldRating? + CorrectionConstant? * (ActualScoreFraction? - EstimatedScoreFraction?)

Ok, I see your point. I'm not 100% sure, but I think that what the actual server code does is sum all those factors (for all the pairings including the last one fought), and ads that (momentum in the details page) to your ranking. That way nothing changes if you get exactly the same result as before against anyone, ProblemBot or not. If you like challenges, try to understand Albert's RR server code, I remember trying and failing :), but still believe it's a marvel how well it works... -- ABC

For an improved variant of the old ELO system, you can also check out the Glicko systems. For details have a look at the website of Prof. Mark E. Glickman, the inventor of them: [1]. (I haven't followed all the discussion, but i still think something like this gives very good ratings.) -- Qohnil

It looks like those systems keep some nice statistics to see (a couple measurements similar to a standard deviation), but I still favor APS. I'll put some thought into the math sometime & see if I can come up with how to get the same rankings as APS out of a more ELO like system, because it sounds like that would satisfy everybody. But the only advantage of ELO is its predictive power when you haven't matched up against all the competitors being ranked, and that does not seem important to me in the rumble. There are only a bit over 600 bots, and everybody seems to consider rankings stabilized at 2000 battles, so seeing the predictive power for those first 600ish battles doesn't have any value, in my opinion. I still think we should move to APS & include statistics like "average standard deviation of scores against each competitor", or things like that. -- Simonton
Hmm, I'd say that APS probably makes the most sense for actual score/rankings given that complete pairings are indeed easy, but maybe keeping track of some ELO or Glicko stuff for predictive estimates perhaps would be interesting. Or perhaps wouldn't hurt to use to fill in missing pairings before they're complete. One other thought I had that might be interesting, would be having a neural net to classify "groups" of bots based on how they perform against others, with it being likely that it would create groups for rammers and surfers etc. I think many bot categories like rammers and surfers after all, would leave quite distinct score tendencies that a neural net could learn to categorize. I don't think that's relevant to scoring really except that it would use score data, but might an interesting experiment to do with the score data and may also make it easier to guess things about undocumented bots. Actually I suppose I could do that with the existing rumble data from the current server... I might do that some time... anyways I'm getting off topic :-) -- Rednaxela

@Simonton: I think the first test you should make is to generate a APS ranking and compare it to the current (ELO) one. I believe they will be exactly the same, except maybe in some very close cases where ELO will serve better as a differentiator, IMO. Also, I always wanted to know what would happen if we increased the round number of the rumble battles. If we ever get that kind of ranking (100 round battles, f.e.) full pairings become harder to get, but with ELO we can get a meaningful ranking much easier. -- ABC

Well - I know how to generate an APS ranking, but I don't know how to compare it to the current ELO. Let's say there's 3 bots & 3 pairings. The bots always score as such against each other: A/B=25/75, A/C=75/25, BC=25/75. APS scores are 50% for all three bots. Maybe someone can help calculate the ELO? Does anyone have enough command of the roborumble server code to run a bunch of iterations? Or can someone point me to where I can find that server code? In response to your second statement, I agree; in a rumble where full pairings don't happen ELO-like would be the way to go. -- Simonton
- The RR client/server code is included in the robocode source zip file from sourceforge.net. -- ABC
- I'm afraid I'm having a little trouble locating it in the zip. Nothing in the "roborumble" source directory looks like server code. Do you know where it is in there? -- Simonton
I once did this graph: http://designnj.de/roboking/rating_table.JPG Each dot is the score of a randomly selected bot. I think the curve looks pretty smooth, which means that the elo rating is quite accurate. --Krabb

Wrote something quick: Nfwu/EloSim. Basically simulates what the server does. RR@H source is from Albert's website: http://www.geocities.com/albert_pv/RoboRumbeAtHome.html This decay (wins1 = 0.7 * wins1 + 0.3 * real1; wins2 = 0.7 * wins2 + 0.3 * real2;) looks interesting. ~~For a test on: (A/B=25/75, A/C=75/25, B/C=25/75)~~

...

~~A > C > B in terms of rating. Or I may have screwed something up.~~ I did screw something up. A, B, and C have very close rankings, but are different on every run. -- Nfwu