-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ratings system improvements #114
Comments
It sounds like we need metric(s) of what constitutes a good rating system. For the record, there was a user on Arimaa.com who recently single-handedly changed the ratings of a number of bots by more than a hundred points each by losing repeatedly to a weak bot, then winning repeatedly against stronger bots. One possible metric is the ability of a single user to have a disproportionate impact on another user's rating. One metric is that the rating system should be not too CPU-intensive to compute. One metric is that the player's ratings should tell you something about the probability that one of them will win against the other. One metric is that the ratings should be stable over time: the strength of a 1400 player today should be comparable to the strength of a 1400 player last week or last year. Also if a player wins repeatedly against a very weak opponent, should the winner's rating rise arbitarily high or stabilize at some value? There's also a policy element here: the Free Internet Chess Server has explicit policies prohibiting certain types of cheating and rating manipulation. (For example, rules 12, 13, and 15.) For what it's worth I would say why not just implement plain old regular WHR? If you want to allow for retroactively unrating a game, I don't know if there's a better solution than periodically recomputing every single player's rating. |
FWIW, also look at the line notes at ded9d1a for some more thoughts on the computational end of this. Of your metrics:
As I have a lot of personal experience working with ratings-related systems, this is probably something I will spend a lot of time tinkering with down the road. Another consideration to keep in mind in design of the rest of the site to facilitate later rating system improvement: "Have at least some information on the new players entering the system." In particular, we will probably want to handle each of the following cases differently:
|
I'm out of my league here. I understand Glicko and WHR conceptually, but not well enough to implement them myself. And I definitely don't understand the math. When I used to play games on Yahoo Games, I remember your rating was marked as "provisional" until you had played a certain number of games. If I remember correctly they didn't publicly display the ratings for provisional players. (This could avoid the problem clyring mentioned where a new player with high uncertainly splits his first two games against the same opponent and his rating jumps by a large amount.) If a new player is strong, it could be useful to allow them to start with a high ranking provided they meet some criteria like defeating a few high-rated bots or solving a few tactics puzzles. This is similar to colleges that have a foreign language requirement and allow you to place out of the requirement by demonstrating proficiency. I could imagine setting things up so that for the first X games a new player's rating is updated normally, but the opponent's rating change is calculated retroactively after the system has a better estimate of the new player's true strength. (I don't know if that's a good idea or not.) |
Implement a better ratings system. Something WHR-based would be nice, including for retroactively taking into account games unrated after being played. This would probably involve some work thinking about how to structure the back-end update process. And some thought on how to deal with unusual user behaviors, how bots should differ, globals ratings anchors and adjustments, etc.
The text was updated successfully, but these errors were encountered: