Ratings Summit in Athens

A preliminary report by Jeff Sonas

On June 11-12, FIDE held a special meeting in Athens, Greece to discuss the implications of changes to the FIDE rating system, especially the increase of the K-factor. The K-factor controls how rapidly a player's rating responds to their recent results. The increase had been previously agreed to by the General Assembly, and was scheduled to go into effect at the start of July. Ratings experts from around the world were brought together to recommend a course of action to the Presidential Board (which met a few days later in Krakow, Poland), and FIDE Deputy President Georgios Makropoulos chaired the two-day Athens meeting.

Meeting participants (left to right): FIDE Executive Director David Jarrett, FIDE Deputy President Georgios Makropoulos, Nick Faulks, GM Bartlomiej Macieja, Jeff Sonas, GM John Nunn, FIDE Qualification Commission Chairman Mikko Markkula, Vladimir Kukaev. Not shown (because he took the picture): FIDE Treasurer Nigel Freeman.

I should say first of all that I was very impressed with the FIDE decision to hold this meeting, as well as their conduct during the meeting. Whatever the procedural problems in the past that led to this situation, emergency discussion and analysis were certainly called for at this juncture, and it was a very productive meeting. I have nothing but good things to say about FIDE throughout this process.

One point to emphasize is that for this meeting at least, the only two options were to recommend that the Presidential Board accept, or reject, the specific decisions of the General Assembly, so we did not spend too much time on debating the "perfect solution". After analyzing the issue at length during our meeting, the majority clearly felt that the doubling of the K-factor was not necessarily an improvement, and that the matter was sufficiently unclear as to require further analysis. In fact my personal feeling, after looking at the data and running many simulations, is to go beyond that statement and say that doubling the K-factor could quite likely be a very poor move. Therefore we recommended to the Presidential Board that it refer the K-factor-increase back to committee, while supporting several other changes to the rating calculation, including the move to 6 lists per year (rather than 4 lists per year) as well as increasing the "350-point-rule" to be the "400-point-rule" instead.

I am currently working on a longer writeup summarizing the technical discussion, as well as additional analysis that was inspired by the brainstorming during the meeting itself and subsequent informal discussions afterwards with other attendees. For now I would just like to point out a few highlights from the meeting:

It might seem strange to have supported the move to 6 lists per year, while at the same time rejecting the K-factor increase. You would think these two decisions in combination would actually make the rating system less dynamic rather than more. This is true, but it is important to recognize the relative magnitude of the two factors. A change to the K-factor makes a big difference, whereas changing from 4 lists/year to 6 lists/year makes a very small difference. Even increasing the K-factor for established players from 10 to 11 would be a severe overreaction to the move from 4 lists/year to 6 lists/year. I looked at the average rating change for each player across various rating formulas, and my calculations indicated that the proper increase in K-factor, to account for the reduced rating "responsiveness" associated with the move from 4 lists/year to 6 lists/year, would be to increase the K-factor from 10 to 10.2. So if we decide to increase the K-factor significantly, it ought to be for reasons other than the move from 4 lists/year to 6 lists/year.
I felt a particular responsibility for this situation because I advocated moving to a universal K-factor of 24 seven years ago, and this conclusion was used to partially justify the decision to double the K-factor. My earlier analysis included optimizing a single universal K-factor, rather than the three-tiered K-factor that FIDE currently has (where some players have K=10, some players have K=15, and some players have K=25). In fact the vast majority of players do have K=15 already. Additionally my earlier analysis was performed on a smaller, unofficial subset of games from 1994-2001, whereas my more recent analysis covered the entire dataset – all players and all results – for the whole FIDE rating list from 1999-2009. My older model also did not have a mechanism for introducing new provisional players into the system, whereas my latest model actually matches just about all of the FIDE rating regulations. For all these reasons I feel that the recent analysis is more reliable, and I am very grateful that FIDE was so forthcoming with the data for my analysis. I will be writing much more about my conclusions regarding the K-factor as well as my simulation model itself.
GM Bartlomiej Macieja held the consistent view that the ratings are so important, being used for direct qualification into the Candidates and World Cup events, that if we have to make the calculations somewhat more complex and it results in more accurate ratings, this is a good tradeoff. For instance, two players may be so close in their ratings that their positions might be reversed if we adjusted for number of games played with White vs. Black, or if we used a different K-factor. We discussed this at some length informally after the meeting, and others felt that the rating system is a nice balance right now between simplicity and accuracy, and were concerned about the introduction of further complexity into it. Certainly another way to tackle this problem would be to change how the qualification process works, to make it depend more upon direct over-the-board results and less on the precise rating calculation. This is a very difficult issue.
GM John Nunn had surveyed several of the top players, including both younger and older players from the top ten, and shared some of the results of that survey. They all seemed to feel that the current system was working fine. Obviously it is in their interest to be conservative because their ratings are already high, but some of these players (especially the younger ones) are still improving, and thus it would also seem to be in their interest to support a more dynamic system. Nevertheless they pretty much all said that they didn't see a reason to change the K-factor.
GM Dmitriy Jakovenko had previously written an article for the ACP website regarding the increase of the K-factor, and this article was discussed extensively during our meeting. His description of various K-factors as reflecting various beliefs in how important your last 20 games are, compared to all previous games, was quite useful in conveying the impact of the K-factor. It was a very helpful perspective and provided a subjective assessment that people could make themselves, instead of just having to trust the analysis of people like me with all our statistics. The subjective assessment shouldn't be the only one but it absolutely helps in framing the discussion.
Deputy President Makropoulos did much more than just run the meeting, and had several insightful points that motivated me to reassess some of my analysis. For instance he didn't completely agree with my approach of evaluating the "accuracy" of rating systems by comparing expected score against actual score, because we expect improving players to outperform their rating and thus perhaps some of that should be considered "improvement" in the players rather than just error in the rating itself. He was also resistant to the idea that we must look back to Professor Elo's writings and opinions to determine the ideal course of action, preferring instead that we start from the assumption that the current situation is the default that players are accustomed to. I agree that the burden of proof should fall upon people advocating a change in the current system, even if that change would involve moving more toward what Elo himself originally envisioned.
Finally, one topic that received considerable discussion was "rating inflation" – what does it actually mean, is there evidence of it, and what causes it? This was relevant to the K-factor discussion because a higher K-factor does seem to increase "inflation", depending on your definition of the word, of course! In addition if top players' ratings are really increasing too fast then it could devalue the Grandmaster title, which is based upon an absolute rating cutoff. Inflation was certainly a topic where there was some disagreement among the experts regarding the fundamental answers to all three of the above questions, and certainly requires further investigation. I will be writing much more about this, and I anticipate lively discussion among the mathematically-inclined readers in particular.

Jeff Sonas in Athens

Press release

K Factor Meeting in Athens

Prior to the Presidential Board in Krakow, a meeting was held in Athens to discuss the proposed changes to the rating system and in particular the increase in the K factor. A small group of experts gathered to discuss the matter and to make recommendations, if needed, regarding the decisions taken in Dresden.

The two day meeting was chaired by the FIDE Deputy President, Georgios Makropoulos, and included the FIDE Treasurer Nigel Freeman, FIDE Executive Director David Jarrett, FIDE Qualifications Commission Chairman Mikko Markkula, FIDE Qualification Commission Councillor Nick Faulks, Jeff Sonas from California, GMs John Nunn and Bartlomiej Macieja plus Vladimir Kukaev from the Ratings Office in Elista.

The meeting focused on the K factor issue but also dealt with a number of other matters including possible inflation in the rating system. The meeting supported the move to 6 lists per year and the increase of the ‘350 point rule’ to ‘400 point rule’ but felt that the increase in the K factor should be referred back to the Qualification Commission and this recommendation was agreed at the Presidential Board.

Jeff Sonas and Bartlomiej Macieja both produced interesting and detailed material and John Nunn had carried out a survey of top players. FIDE is indebted to them for their hard work in the preparation for this meeting. In addition, the ACP forwarded an important paper by GM Dmitriy Jakovenko.

It is intended that this group continues to cooperate and meet again next year.

Source: FIDE

The great K-Factor debate on ChessBase.com

Ratings Summit in Athens
22.06.2009 – On June 11-12, FIDE held a special meeting in Athens, Greece to discuss the implications of changes to the FIDE rating system, especially the increase of the K-factor. Ratings experts from around the world (including John Nunn and GM Bartlomiej Macieja) were brought together to recommend a course of action to the Presidential Board. Jeff Sonas reports on the meeting.

Rating and K-factor: wrapping up the debate
11.05.2009 – The discussions regarding the K-factor – the rate at which ratings go up or down when they are calculated – reaches its climax with a wrap-up article by Dr John Nunn, grandmaster and mathematician, who evaluates the arguments that have been presented by the different parties. After this it is up to FIDE, which has already initiated positive steps settle the matter. Final installment.

Thompson: Leave the K-factor alone!
07.05.2009 – The debate on whether to increase the rate of change of the Elo list continues. Today we received an interesting letter from Ken Thompson, the father of Unix and C, and a pioneer of computer chess. Ken believes that the current rating system isn't broken and that the status quo is better than change. If anything the ratings should be published more often – every day if possible. Food for thought.

Rating debate (6): Here comes the proof!
04.05.2009 – "I couldn't believe my eyes when I read GM John Nunn's opinion," writes GM Bartlomiej Macieja (pronunciation supplied), the original initiator of this debate. He presents proof for the fact, challenged by Nunn, that the K-factor and the frequency of rating lists are related to one another. Other readers have also weighed in, a wrap-up reply by John Nunn will appear soon. Long, interesting read.

Rating debate: is 24 the ideal K-factor?
03.05.2009 – FIDE decided to speed up the change in their ratings calculations, then turned more cautious about it. Polish GM Bartlomiej Macieja criticised them for balking, and Jeff Sonas provided compelling statistical reasons for changing the K-factor to 24. Finally John Nunn warned of the disadvantages of changed a well-functioning system. Here are some more interesting expert arguments.

Nunn on the K-factor: show me the proof!
30.04.2009 – With the debate raging over FIDE's decision to change or not to change the K-factor used in calculating players' ratings, we are glad to receive an important message from our voice-of-reason grandmaster. Dr John Nunn says "there seems no real evidence that K=20 will result in a more accurate rating system, while there are a number of risks and disadvantages." His explanation and reader feedback.

Macieja: the FIDE General Assembly must decide
30.04.2009 – "Using the FIDE Laws of Chess terminology, the move has been made, and no takeback is any longer possible." Polish GM Bartlomiej Macieja is insisting that the decision to increase the K-factor in rating calculations is not just necessary and good in the current tournament situation, it is in fact irrevocable and can only be legally changed by the body that passed it. Open letter.

FIDE: We support the increase of the K-factor
29.04.2009 – Yesterday we published a letter by GM Bartlomiej Macieja asking the World Chess Federation not to delay the decision to increase the K-factor in their ratings calculation. Today we received a reply to Maceija's passionate appeal from FIDE, outlining the reasons for the actions. In addition interesting letters from our readers, including one from statistician Jeff Sonas. Opinions and explanations.

Macieja: The increase of the K-factor is essential
28.04.2009 – Yesterday we reported that FIDE had decided not simply to change the K-Factor in its rating calculation, but to publish two parallel lists for a year and then review the results. Today we received a passionate appeal by GM Bartlomiej Macieja not to delay the decision but increase the K-factor immediately. In fact he advocated recalculating the lists of the last two or even five years. Let the debate begin.

FIDE: Anand-Topalov bidding, K-Factor
27.04.2009 – The World Chess Federation has opened the bidding for the next World Championship match between Viswanathan Anand and Veselin Topalov, scheduled for April 2010. At the same time FIDE has reacted to concerns of players and decided not to simply change the K-Factor in its rating calculation, but to in fact publish two parallel lists for a year and then review the results. Press releases.

SHOP

SHOP

Ratings Summit in Athens

ONLINE SHOP

ChessBase Magazine 225