|  | Championship Chessmetrics AnalysisBy Jeff Sonas. Jeff is a statistical chess analyst who has written dozens of articles 
        since 1999 for Kasparov Chess website. In recent months he has invented 
        a new rating system and used it to generate 150 years of historical chess 
        ratings for thousands of players. You can explore these ratings on his 
        Chessmetrics website. 
        He is also V.P. of Engineering for Ninaza, providing web-based medical 
        software for the health care industry.
 
 | 
INTRODUCTION
We are cursed to live in "interesting times" in the chess world. 
  We have two different organizations sponsoring their own versions of the World 
  Championship, and the top-rated player in the world wants no part of either 
  championship. Countless proposals and unification plans have been suggested, 
  and rejected, and there is no end in sight.
I am a relatively weak chess player, but a very strong computer programmer 
  and statistician. Most of all, I am a big fan of chess, and I want to help. 
  I have little to contribute in the arenas of business plans, organizational 
  details, or negotiations, but nevertheless I do have something quite useful 
  to offer. I have developed some very sophisticated statistical tools that enable 
  me to objectively explore various chess topics, and in recent weeks I have devoted 
  considerable time to analyzing thousands of different world championship formats. 
  I would like to share the results of that analysis.
I am not affiliated with any chess organization, and I have no particular agenda 
  to promote. What I do have, instead, is the distinct impression that people 
  are making important decisions about the world championship, without adequate 
  information. One possible explanation is that the decision-makers are simply 
  unaware that it is often possible to use statistics to draw reasonably sound 
  conclusions about some of these topics. Or maybe they don't even care about 
  objective truth, and simply want to promote their own agendas or improve their 
  own situations. I'm going to adopt the role of the optimist here, and assume 
  that many people would love to have an objective analysis of the various options 
  available for the world chess championship, but that nobody ever thought to 
  ask for one. Well, here's your analysis...
Based on my calculations, I can now tell you whether one world championship 
  format is "objectively better" than another one, and I can explain 
  why. If you describe a typical world championship format to me, I can tell you, 
  with reasonably good accuracy, the average percentage chance of the strongest 
  player in the world winning the championship cycle. I call that percentage the 
  "effectiveness" of a world championship format.
For instance, it turns out that the 128-player FIDE World Championship has 
  an "effectiveness" of 38%, which means that 38% of the time, it will 
  be won by the strongest player in the world (assuming no boycotts). In other 
  words, five times out of eight the strongest player will fail to win the tournament. 
  The Einstein Group's world championship cycle (which will debut in July in Dortmund, 
  Germany) has a much better effectiveness of 50%, which still means that the 
  best player will be champion only half of the time. By comparison, a slightly 
  modified version of Yasser Seirawan's "Fresh Start" proposal is extremely 
  effective, at 67%. In fact, none of the 13,000 formats under consideration managed 
  to break the 70% barrier, so Yasser's proposal is almost maximally effective.
Through statistical analysis combined with random simulation, I have analyzed 
  13,000 different world championship formats in great detail, including Swiss 
  tournaments, knockout tournaments, long matches, short matches, round-robin 
  tournaments of various types, qualifier tournaments, and much more. I have tried 
  to include all of the formats which have been used historically or which are 
  currently under consideration, as well as many experimental formats. Out of 
  those 13,000 formats, the FIDE World Championship format is ranked #12,671, 
  which means that it is in the bottom 5% in effectiveness. Although the Einstein 
  Group format is clearly better, a 50% effectiveness is still not very good: 
  it ranks #10,945 on my list. The modified Seirawan proposal, by comparison, 
  is way up at #345.
After that introduction, you might be chomping at the bit to learn what format 
  is #1 on my list. However, I'm not going to tell you just yet, because "effectiveness" 
  is not the only important factor. Without giving those other factors their due 
  consideration, it doesn't make sense to talk yet about what is "best" 
  or even "objectively best". 
THE FOUR IDEAL CHARACTERISTICS OF A WORLD CHAMPIONSHIP
In evaluating various world championship formats, I believe there are four 
  important characteristics to consider. I want to introduce a little bit of terminology 
  here, in an attempt to make all this easier to talk about. An ideal world championship 
  format would be "practical", "effective", "inclusive", 
  and "unbiased". Let me briefly cover what I mean with each of those 
  four words.
(1) "Practical" – The top players must be willing to 
  participate, the sponsors must be willing to sponsor the tournaments and/or 
  matches, and the playing sites must be available. Thus, World Championship formats 
  that include relatively shorter events, or just one event, would be more "practical" 
  than multi-stage formats or formats with very long matches or tournaments. And, 
  of course, World Championship formats with greater prize money will also be 
  more attractive to the players, although there are other important considerations 
  for most players.
(2) "Effective" – The overall purpose of the World Championship 
  is to allow the strongest player (whoever that may be) to demonstrate their 
  superiority by winning the championship. For instance, World Championship formats 
  with inadequate length or inefficient structure will frequently be won by weaker 
  players, whereas more effective formats would provide that strongest player 
  sufficient maneuvering space (even if they lose a game or two) to demonstrate 
  their superiority by winning the championship.
(3) "Inclusive" – It is easy to tell which players have 
  been the most successful in the recent past; just consult the rating list. However, 
  ratings are known to be somewhat inaccurate as measures of players' actual strength, 
  and it is quite conceivable that the strongest player is not actually the highest-rated 
  player. Thus it is typically a good idea to include several players in the World 
  Championship cycle, to give more people an option to demonstrate their ability. 
  However, the tricky part is that many super-inclusive formats, such as the FIDE 
  championships, are extremely ineffective at determining the strongest player. 
  Nevertheless, it is still possible (though challenging) to be both "inclusive" 
  and "effective" simultaneously.
(4) "Unbiased" – Traditionally, specific players in the 
  world championship cycle have been given certain advantages, due to their past 
  accomplishments. For instance, the defending champion might be seeded directly 
  into the final match, or a recent semifinalist might automatically qualify as 
  a Candidate without needing to play in an Interzonal. Other advantages have 
  included "draw odds", and the champion's right to an automatic rematch, 
  and first-round byes for high-rated players (as in the earlier 100-player FIDE 
  knockout tournaments). These "biases" are often perceived as being 
  unfair to everyone else, and should be avoided when possible. However, a "bias" 
  is not inherently bad; it is simply an advantage granted to a particular player. 
  It can be one way to make an event more "effective" without having 
  to make it impractically long.
In all fairness to the FIDE and Einstein Group approaches, they do have their 
  important advantages. The FIDE approach is extremely inclusive and unbiased, 
  and reasonably practical (as long as there is sufficient funding for such an 
  event). The Einstein Group's format is not particularly inclusive, though it 
  has the large practical advantage that it bears some resemblance to the traditional 
  way the championship has been run, and thus its winner might indeed be more 
  accepted by the public, as a legitimate champion, than the FIDE champion often 
  has been.
THE FIDE CHAMPIONSHIPS
The FIDE championship format takes a mere 22 days of play to reduce 128 competitors 
  down to one champion. It is very inclusive, and has no biases in favor of any 
  specific participant. For comparison, I identified 72 other formats that are 
  22 days or shorter, and also have no biases. Out of these possibilities, the 
  FIDE format (38% effectiveness) is right in the middle, ranked 37th out of 73. 
  Most options are in the 30%-40% range, and only one format managed to finish 
  above 50%. If FIDE were to invite just the eight top-rated players to its knockout 
  tournament, with two rounds of 6-game matches and then a 10-game final (22 playing 
  days), it would have a 52% chance to be won by the strongest player in the world, 
  slightly better than the Einstein Group approach. Another good unbiased and 
  practical option would be to have four simultaneous single-round-robin tournaments 
  with 10 players each (9 playing days), with the four winners advancing to two 
  rounds of knockout matches (4-game semifinal matches and then an 8-game final 
  match). That approach would be significantly more inclusive and only slightly 
  less effective (46% effectiveness) than the eight-player knockout.
When there are no biases introduced (i.e., nobody gets automatically seeded 
  into any later stage, and everyone is treated equally), a knockout event seems 
  to be far better than a Swiss. For instance, the options to take the top two 
  or four finishers from a 13-round Swiss tournament and then play short matches 
  between those top finishers, turn out to be very ineffective, often lower than 
  20%. However, as you will see in a little while, a format based on a Swiss qualifier 
  can actually be considerably more effective than a comparable format with a 
  knockout qualifier. This discovery greatly surprised me, and I will go into 
  more detail further down, when I discuss the Fresh Start proposal. However, 
  first let's finish talking about the FIDE and Einstein Group approaches.
The major criticism of the FIDE championship, of course, is that the individual 
  matches are too short. A single loss can mean almost certain elimination. Everyone 
  loses a game now and then, so it seems an overly drastic punishment to be eliminated 
  because you happened to have a minus score over the span of two games. The 2002 
  tournament made a half-hearted attempt to address this by lengthening the final 
  match from 6 games to 8 games. As I mentioned before the tournament, that is 
  hardly much of an improvement (it raised the effectiveness by 0.2%). It would 
  have been better (39% effectiveness) to use those extra two days to make the 
  quarterfinal round 4 games long, instead, although of course even better would 
  be a 4-game quarterfinal AND an 8-game final (41% effectiveness). 
Another obvious option would be to change all of the 2-game matches into 4-game 
  matches. Of course, this would have the unfortunate result of adding at least 
  10 days to the length of the event if it stayed a 128-player tournament. To 
  compensate, the number of players could be reduced from 128 down to 64. Thus 
  with 4-game matches throughout, leading to an 8-game final (32 playing days), 
  the effectiveness would rise to 42%. 
Unsurprisingly, the knockout tournament would become more and more effective, 
  as we make it less and less inclusive and lengthen various rounds. If we were 
  to halve the number of players again, a reasonably inclusive knockout tournament 
  (32 players) could still be held, with four-game matches throughout, leaving 
  room for either an 8-game final match (43% effectiveness) or a longer 14-game 
  match (44% effectiveness). With sixteen players, the effectiveness could be 
  improved to 46% by 4-game matches and a 14-game final. Finally, as I already 
  mentioned, the most effective unbiased tournament would be an eight-player knockout 
  tournament with six-game quarterfinal and semifinal matches, with a ten-game 
  final, an overall effectiveness of 52%.
THE EINSTEIN GROUP CHAMPIONSHIPS
Now let us turn to the Einstein Group championship format. This is an amazing 
  attempt to compress an entire Candidates Cycle and World Championship match 
  into a mere 30 days of play. The format has come under severe criticism because 
  the round-robin preliminaries and the subsequent two rounds of four-game matches 
  are perilously short. In its current state, the only significant bias involved 
  is that the defending champion gets to play in the final. So, I considered all 
  of my formats lasting 30 or fewer playing days, with the single bias that the 
  champion is seeded into the final automatically (assuming rapid tiebreaks throughout). 
  There were 208 different formats, and the Einstein Group approach (50% effectiveness) 
  ranked 148th, placing it in the bottom third.
The most effective approach (62% effectiveness), within these constraints, 
  would be to only invite the four top-rated players (other than the champion). 
  They would then play two rounds of six-game knockout matches to get from four 
  players down to one, and the winner would play the defending champion in an 
  18-game match. Even just a 14-game match would still be a 61% effectiveness, 
  and better than any other approach (given the constraints). If it were necessary 
  to include eight candidates, plus the champion (as is the case in Dortmund), 
  the 30 days would be better spent in three rounds of 4-game knockout matches, 
  followed by an 18-game match against the defending champion (60% effectiveness).
If it were desirable to be even more inclusive (for instance so that a "wildcard" 
  local participant like Christopher Lutz could be chosen, without impacting the 
  odds too significantly), you could have two simultaneous 10-player single-round-robins, 
  where the two winners play each other in a 4-game match, and the winner plays 
  the defending champion in a 16-game match (56% effectiveness). Or you could 
  even go the super-inclusive route, with a 196-player 13-round Swiss like Yasser 
  Seirawan suggests. The two top finishers could play each other in a 4-game match, 
  and the winner challenges the defending champion in an 8-game match. That would 
  only last 25 days, and would still have an effectiveness of 55%. All of these 
  options are significantly more effective than the actual format chosen by the 
  Einstein Group, while still lasting no more than 30 playing days.
Of course, none of those options resemble the format that will actually happen 
  in Dortmund. Are there less significant changes that would still greatly improve 
  the effectiveness? Absolutely. For instance, the pair of 4-game knockout matches 
  is hazardous. Even in a four-game match, it is very difficult to recover from 
  a loss. How about getting rid of one of those matches? Instead of picking the 
  top two players from each preliminary round-robin, you could just pick the top 
  finisher from each round-robin. Then a single four-game match between those 
  two winners, followed by the same 16-game final against the defending champion, 
  would make the event four days shorter, and it would raise the effectiveness 
  from 50% to 56%. It would be even better (60% effectiveness) to make use of 
  the whole 30 days by playing matches of 10 and then 14 games (rather than 4 
  and then 16).
Finally, there was a way to be even more effective, within the 30-day constraint, 
  although it did involve introducing another bias into the world championship 
  cycle. It always helps the effectiveness of a format if you allow the highest-rated 
  player to automatically bypass the qualifier event. For instance, you could 
  have the highest-rated player compete in a 10-game match against the winner 
  of a 4-player double-round-robin, and the winner would challenge the defending 
  champion in a 14-game match. That would be a 64% effectiveness, and it seems 
  likely that Garry Kasparov would have been more amenable to that option, although 
  of course I have no idea what went on with the negotiations. I should point 
  out that all of these numbers assume that nobody declines an invitation. With 
  neither Kasparov, Viswanathan Anand, nor Ruslan Ponomariov participating, the 
  effectiveness of the Einstein Group approach, in this particular cycle, will 
  of course be way lower than 50%. It will probably be more like 20% or 25%, since 
  it is reasonably likely that the best player in the world is either Kasparov, 
  Anand, or Ponomariov, and there is less than a 50-50 chance that the best player 
  in the world is actually participating in the championship cycle at all.
WHERE THESE NUMBERS COME FROM
I don't expect you to blindly accept all of these numbers. If you're still 
  paying attention by this point, you might be wondering whether I'm just making 
  up the numbers to serve my own purposes, or if I actually calculated them somehow. 
  I don't want to bog you down with all of the gory details, but here is a brief 
  summary of what I did.
I didn't want my conclusions to be skewed by any special characteristics of 
  the current rating list, such as an unusually large gap between #2 and #3, or 
  between #3 and #4. So I decided that my calculations would be based upon a "representative 
  rating list", rather than an actual one. I did some analysis of rating 
  list trends over the past few decades, and came up with a way to randomly simulate 
  millions of "typical" rating lists. Thus sometimes there is a huge 
  gap between #1 and #2, and sometimes it's very crowded at the top, with no clear 
  leader. Sometimes the champion isn't even the top-rated player.
However, it is also important to acknowledge that ratings are inaccurate. They 
  are merely estimates of players' true strengths, and those estimates have errors 
  associated with them (a standard deviation of about 50, if you're interested). 
  Somebody might have a rating of 2700, but their true strength could easily be 
  2580 or 2780. So, for each random rating list, I had to simulate a "true 
  strength" for each player. The one player with the highest "true strength" 
  is that elusive "strongest player in the world", whom we are trying 
  to identify through the use of an effective world championship format. Thus 
  sometimes the "strongest player in the world" might not be the world 
  champion or the top-rated player; they might even be rated #8 or #10 or #20 
  in the world, though it's unlikely. That is why it is important to be inclusive 
  with your world championship cycle; if you just use the top two or three players, 
  you might easily leave out the strongest player.
Armed with the ratings and true strengths of everyone on a simulated rating 
  list, I could then proceed to simulate a world championship cycle. I tried various 
  types of qualifier formats, different numbers of simultaneous qualifying tournaments, 
  allowing the top-rated one or two players to bypass the qualifier, different 
  ways of resolving tied matches, and/or allowing the champion to enter the cycle 
  at various stages. The breakthrough was my realization that all popular world 
  championship formats could in fact be expressed as an "Interzonal" 
  qualifier followed by a series of knockout matches. This allowed me to tackle 
  the problem systematically, rather than just trying a few options which I thought 
  might me "ideal". For instance, the FIDE championships were treated 
  as eight different qualifier tournaments (each of which were 16-player knockout 
  events won by a single player) and then a series of knockout matches among the 
  final eight players. The Einstein Group championships were treated as two simultaneous 
  qualifier tournaments (each of which were 4-player double-round-robin tournaments 
  that qualified two players), and then there were three rounds of knockout matches, 
  with the champion entering the cycle in the third and final round of knockouts. 
  And so on. For each simulated championship cycle, I could see whether the "strongest 
  player" actually won, and over an average of many thousands of iterations 
  for each format, that would tell me the "effectiveness" of each world 
  championship format.
YASSER SEIRAWAN'S "FRESH START" PROPOSAL
I have to admit that I expected my analysis to reveal a searing criticism of 
  Yasser Seirawan's "Fresh Start" proposal, with its Swiss qualifier. 
  Swiss tournaments are generally perceived to be very ineffective, especially 
  compared to knockout tournaments of comparable size. I expected that I would 
  have to conclude that "it's all well and nice to play three rounds of long 
  matches at the end of your world championship cycle, but what good is that when 
  the majority of Candidates were chosen in a lottery?"
I was even advised to save myself the effort of trying to program Swiss tournaments 
  in my simulations, since they were obviously so ineffective. A very prominent 
  arbiter told me, "You do not need that for your simulation. It is perfectly 
  obvious, if you want to obtain a winner who has the highest rating prior to 
  the event, then the current FIDE knockout system is best." However, I really 
  wanted to compare the FIDE and Einstein approaches against Yasser's proposal 
  (which is based upon a Swiss qualifier), so I ultimately decided to include 
  the Swiss qualifiers in my analysis.
Well, guess what? Out of the 13,000 world championship formats I evaluated, 
  number TWO on the list, with an effectiveness of 69.4%, was the following structure: 
  The world champion and the two highest-rated players (other than the world champion) 
  bypass the qualifier and automatically become Candidates. They are joined by 
  the top five finishers from a 196-player 13-round Swiss. Those eight players 
  then play three rounds of knockout matches (16-game quarterfinal, 20-game semifinal, 
  and 20-game final).
Does that sound familiar? It's almost exactly what Yasser Seirawan suggests 
  for the next world championship cycle. He actually suggests a 10-game quarterfinal, 
  a 14-game semifinal, and a 20-game final, and that shorter format (67% effectiveness) 
  shows up at #181 on my list (still in the top 2% of formats). And there are 
  details in his proposal about tiebreaks that were not included in my overall 
  analysis (though I do cover them further down); I assumed rapid tiebreaks everywhere 
  for the eight-player candidate cycles, since otherwise the calculations would 
  have taken months to run all the possibilities! And Yasser doesn't actually 
  say that it should be the two highest-rated players who bypass the qualifier; 
  he specifically names Garry Kasparov and Ruslan Ponomariov as the two players.
The number one format on my list, with an effectiveness of 69.5%, was actually 
  very similar to number two. In this scenario, only the top finisher from that 
  same Swiss tournament qualifies, to play the #1-rated player in a 20-game match. 
  The winner then plays the defending world champion in a 20-game match for the 
  title. That is the single most effective world championship that I could find, 
  but unfortunately it includes two biases: the world champion gets automatically 
  seeded into the final round, AND the top-rated player doesn't have to play in 
  the Swiss. Yasser's proposal would be somewhat less biased, as it is less of 
  an advantage to be an "automatic Candidate" when there are eight Candidates 
  rather than two, and of course in his proposal the defending world champion 
  does not get automatically seeded into the final match.
Since we're on the topic, I should point out that the #3 format on my list 
  has actually been tried, sort of, in the world championship. In 1959 Mikhail 
  Tal won an eight-player quadruple-round-robin tournament in Yugoslavia, allowing 
  him to play a 24-game match against the defending champion. In 1962 Tigran Petrosian 
  won an identical format in Curacao. And that same format is #3 on my list, with 
  an effectiveness of 69.3%, although it says that the winner of the round-robin 
  should face the top-rated player rather than the defending champion. Thus if 
  the defending champion was not the top-rated player, the champion would have 
  to play in (and win) the round-robin tournament for the opportunity to play 
  a championship match against the top-rated player. Also, it's not strictly like 
  the 1959 and 1962 Candidates tournaments, because back then the eight players 
  came from Interzonals, whereas this format recommends just taking the players 
  from the top of the rating list. Presumably the bias in favor of the top-rated 
  player is too much to make this format acceptable, although it is clearly very 
  effective.
Of course, there is no real difference between 68.3% and 68.5%. The point is 
  not so much that nine of the top twelve formats happened to have Swiss qualifiers. 
  The real dazzler is that a Swiss qualifier can with any seriousness be called 
  "optimal". Conventional wisdom tells us that knockout tournaments 
  are more effective than Swiss tournaments of comparable length. It says that 
  knockout tournaments work better, because the strongest players are in control 
  of their own destiny, and nobody can finish ahead of you unless you are actually 
  knocked out by someone. By contrast, in a Swiss you might do well but someone 
  else might happen to do even better.
Why is conventional wisdom wrong? Well, I have two possible explanations. One 
  has to do with information theory. In a multi-stage event such as a knockout 
  tournament, it only matters if you make it to the next stage, whether that be 
  from a 2-0 whitewash or a 3-3 standoff where somebody advances from a sudden-death 
  game. After each round, the slate is wiped clean and all remaining players start 
  with the same score. Obviously, that means discarding a considerable amount 
  of information about how players have been performing. When the whole point 
  is to identify the strongest player, it seems unwise to discard so much information. 
  By contrast, in a Swiss tournament, your total score reflects the whole of your 
  performance in the event. Of course, this "additional information" 
  has to be balanced against the fact that players face different levels of opposition 
  in a Swiss tournament, so a score of +2 might sometimes be more impressive than 
  a score of +4. But there are obviously ways to address that by optimizing the 
  pairings and/or scoring method, though that lies outside the scope of my analysis... 
  for now.
To understand my other explanation, consider an alteration to Yasser's proposal. 
  Rather than a large Swiss which generates five Candidates, you could instead 
  have five different simultaneous 16-player knockout tournaments (2-game matches 
  throughout), where the winner of each knockout tournament becomes a Candidate. 
  That approach would be good (62%) but not as good as the Swiss approach (67%). 
  With the knockout approach, you are basically splitting your field into five 
  subgroups, and deciding to take the single top-performing player from each subgroup. 
  If the strongest player in the world happens to be playing in the same subgroup 
  as another player who is almost as strong, then it becomes reasonably likely 
  (in the knockout approach) that the strongest player would lose a two-game match 
  to the slightly weaker player. You can't qualify both players and resolve their 
  differences later in a long match, since you are required to take exactly one 
  player from each subgroup (i.e., the one who wins each knockout tournament). 
  The numbers (62% vs. 67%) suggest that it would work much better to have all 
  of the players intermingled in one big tournament, so the five strongest performances 
  can advance, independent of who would have been in which subgroup.
However, the Swiss tournament is not some magical solution that should be used 
  anywhere; it is very easy to use it poorly. The Swiss only works well if the 
  highest-rated players bypass it and automatically become Candidates. Thus the 
  Swiss is best viewed as a super-inclusive way to sort through the rabble and 
  find the rare player who is extremely under-rated (literally) and actually very 
  strong. If we already know that a player is very strong (the defending champion, 
  or one of the two top-rated players in the world), it is far better to allow 
  them to bypass a Swiss where they might potentially lose a couple of games and 
  fail to qualify. For instance, if you had everyone (including the defending 
  champion) play in the Swiss, and picked the top eight finishers as your candidates, 
  then the effectiveness would only be 17%. If you automatically qualified the 
  defending champion, but the other seven qualifiers had to come from the Swiss, 
  the effectiveness would only be 53%, barely better than the Dortmund style. 
  The most important thing is to include at least the highest-rated player automatically, 
  along with the defending champion. If the two automatic qualifiers are the defending 
  champion and the (remaining) highest-rated player, the effectiveness jumps up 
  to 64%. And as we've seen already, if the second-rated player is also allowed 
  to bypass the qualifier, the effectiveness is a nearly-ideal 67%.
Another interesting question is whether the qualifier tournament becomes more 
  effective if you make it more inclusive. We have seen earlier, in the discussion 
  about the FIDE format, that a knockout loses effectiveness significantly when 
  you double the number of players. In the case of a Swiss, however, the inclusion 
  of extra players actually helps, rather than hurts, the effectiveness. For instance, 
  if you modify the Seirawan proposal to only include 64 players, the effectiveness 
  is 61%, but doubling the field of players, for a total of 128, raises the effectiveness 
  to 65%, and tripling the field (to Yasser's suggested 196-player level) leads 
  to the best effectiveness, the 67% already mentioned. Presumably this is because 
  the weaker players don't get in the way as much in a Swiss, after the first 
  round or two. 
In a 128-player knockout, you have a large number of players who clearly are 
  not the strongest players in the tournament, but who can have a huge impact 
  on the outcome through the chance elimination of a top seed. We almost saw the 
  extreme example of that in Moscow, where a single loss to the bottom seed just 
  about resulted in the first-round elimination of #1 seed Viswanathan Anand. 
  On the other hand, by having such an inclusive field in the large Swiss, you 
  give yourself the possibility of identifying an extremely underrated player 
  who actually deserves to play in the Candidate section.
If you're trying to get a feel for what level of player would typically finish 
  in the top five in the 196-player Swiss tournament, I can tell you that an average 
  set of five qualifiers would have ratings ranging from 2600 to 2780. A very 
  strong set of five qualifiers (which would happen one time out of every ten) 
  might be something like: Michael Adams, Alexei Shirov, Peter Leko, Alexander 
  Morozevich, and Judit Polgar. A much weaker set of five qualifiers (which also 
  whould also happen one time out of every ten) would be like: Viswanathan Anand, 
  Zoltan Almasi, Konstantin Sakaev, Giorgi Giorgadze, and Xie Jun. On average, 
  out of the five top Swiss finishers, there would be two or three players rated 
  above 2700, and two or three players rated below 2700. Once every 25 or 30 tournaments, 
  all five qualifiers would be rated below 2700, and once every 40 or 45 tournaments, 
  all five qualifiers would be rated above 2700. About 45% of the time, at least 
  one qualifier would be a sub-2600 player.
RAPID TIEBREAKS
One controversial issue is whether rapid games are a good way to break ties. 
  This only matters, of course, if a tie actually occurs, so it is a more significant 
  factor when there are short events (such as the FIDE championships or the Dortmund 
  qualifier), and it wouldn't matter as much in the Seirawan proposal (though 
  of course it still could happen). There is a general perception that rapid and 
  blitz games are more "random" than classical games. This is undoubtedly 
  true, since time trouble always introduces an element of randomness into the 
  outcome of a game. However, I recently analyzed the results of several thousand 
  games played at various time controls over the past few years, and (statistically 
  speaking) this issue doesn't seem to be a particularly significant one. The 
  higher-rated player still manages about the expected percentage score, whether 
  the game is played at classical, rapid, or blitz controls. Here is a picture 
  to illustrate what I am talking about.

 In this graph, we see the well-known trend that as the white player's rating 
  advantage gets bigger and bigger, White tends to score a higher and higher percentage. 
  If the two players have the same rating, then White scores 55%. If White has 
  a rating advantage of 200 points, then White would score almost 70%. The blue 
  line represents this relationship at classical time controls. 
Now look at the red line, which represents rapid games. If rapid time controls 
  really did make the game a lot more random, then the higher-rated player would 
  tend to score closer to 50% than predicted, with either color. That means we 
  would see the red line being flatter, more horizontal, than the blue line. This 
  is true to a certain degree, especially on the right side of the graph, in those 
  scenarios where White has a large rating advantage. This means that rapid games 
  do indeed turn out more randomly when White is the big favorite; White is not 
  able to score as high a percentage as the ratings would suggest. For instance, 
  with a +300 rating point advantage, White would score 75% in classical games 
  but only 72% in rapid games. However, when Black is the favorite by more than 
  100 rating points (the left side of the graph), the rapid results are exactly 
  the same as classical. Thus, when outrated by 300 points, White scores an identical 
  33% whether it be classical or rapid. So, the conclusion to be drawn is that 
  the advantage of the white pieces is not as large in rapid games as in classical 
  games, especially when White is the higher-rated player. But the higher-rated 
  player should do just about as well in rapid as in classical. Perhaps the real 
  "randomness" comes from the fact that rapid matches are typically 
  only two games long, rather than four or six.
The blitz data (the white line on my graph) is a little more suspect, because 
  there are fewer results available to analyze. However, there is no compelling 
  evidence that blitz games are "more random" than rapid or even classical 
  games; the white line is not any more horizontal than the blue line. You can 
  see a distinctive bend in the middle of the white line, suggesting that the 
  advantage of the white pieces is magnified when the two players are of similar 
  strength. For instance, when the two players have the same rating, White scores 
  58% in blitz but only 55% in classical. As I just mentioned, the advantage of 
  the white pieces is not as large in rapid chess as it is in classical chess, 
  so in rapid games, when the players have identical ratings, White only manages 
  to score 53%. But again, I see no real evidence that the faster time controls 
  are diminishing or obscuring the rating difference between the two players in 
  blitz. Thus it seems that rapid games, or even blitz games if need be, are a 
  reasonably effective way to resolve ties.
Now, it is certainly true that we see a lot more decisive results in the faster 
  time controls, particularly in blitz. What do I mean by "a lot more"? 
  Well, switching the time controls from classical to rapid, has about the same 
  effect (on the frequency of draws) as changing one of the players from Peter 
  Leko to either Veselin Topalov or Alexei Shirov, or changing the opening from 
  a 1.d4 game to a Sicilian Dragon. Further, switching the time controls from 
  classical to blitz, has about the same effect (on the frequency of draws) as 
  changing a Peter Leko-Anatoly Karpov matchup into an Alexander Morozevich-Alexei 
  Fedorov matchup, or changing a Petroff's Defense into a King's Gambit. This 
  will indeed make the results slightly more random, which (as I said) could be 
  addressed by making the rapid tiebreaks longer. I hate to sound like a broken 
  record, but I should again point out that this exact approach (using 4-game 
  matches if a rapid tiebreak becomes necessary) was already suggested by Yasser 
  Seirawan in his "Fresh Start" proposal.
For instance, let's take a very simple unbiased case, where two simultaneous 
  10-player single-round-robin tournaments are held, and the winners play each 
  other in a title match. First let's consider the case where the final match 
  is only six games long. If a drawn match is to be resolved by the spin of a 
  roulette wheel, the effectiveness of this format is 37.3%. Obviously, it would 
  be better to actually play games to resolve the tie, since the stronger player 
  would have a better-than-even chance to win the tiebreak. So if we use the rapid-blitz 
  progression like in the FIDE championships, the effectiveness goes up to 39.2%. 
  Since blitz games are more random, if we simply played a long set of 2-game 
  rapid matches, it would be slightly better (39.3%). Finally, Yasser's suggestion 
  of a rapid match which would be four games long (rather than two), is the most 
  effective tiebreak method (39.5% effectiveness).
You can see from those numbers that the tiebreak method doesn't matter too 
  much, even for a mere six-game match; the effectiveness ranged from 37.3% (random) 
  to 39.5% (4-game rapid match). Of course, as the match length is increased, 
  the tiebreak method becomes less and less of a factor; for a 16-game match, 
  the random option has an effectiveness of 41.1% and the other options are all 
  41.6% or 41.7%. And for a 24-game match, the random tiebreak has an effectiveness 
  of 42.1% and all other options are tied at 42.3%. A drawn match is just too 
  unlikely.
However, sometimes this issue does not even arise. Specifically, if one of 
  the players has been granted "draw odds" in a particular match, that 
  player is automatically declared the winner in the event of a drawn match. Usually, 
  the defending champion is granted draw odds in their match, and this is obviously 
  a key part of Yasser's proposal, since it acknowledges two champions, and there 
  are also the curious provisions about "inheriting" draw odds if you 
  overcome them in your quarterfinal match. Generally, draw odds are not a good 
  way to resolve ties. They are better than a roulette wheel (since on average 
  the defending champion will be stronger than the challenger), but slightly less 
  effective than any other tiebreak method. The main benefit of draw odds is that 
  they provide an incentive for a defending champion to actually participate in 
  a world championship cycle, since the draw odds are a bias that favors the defending 
  champion.
Everything that I have said to this point applies to chess world championships 
  in general. The conclusions would have been identical a decade ago, or fifteen 
  years in the future, even with a completely different set of top players. However, 
  at this point I must leave off my attempts to be "generic", because 
  there is one final issue I want to cover, which must be handled "specifically". 
  I want to discuss the topic of who would be favored by the various biases in 
  the "Fresh Start" proposal, and in order to do that we must start 
  talking about "Vladimir Kramnik" and "Garry Kasparov" and 
  "Ruslan Ponomariov", rather than just "the defending champion" 
  or "the highest-rated player".
WHO IS FAVORED BY THE FRESH START PROPOSAL?
The "Fresh Start" proposal has an interesting set of biases. Kramnik, 
  Kasparov, and Ponomariov are all "rewarded" by being allowed to bypass 
  the qualifier, but each in turn is "punished" by the fact that the 
  other two players are also bypassing the qualifier. Ponomariov would presumably 
  be happy to avoid the qualifier, but sad that Kasparov and Kramnik (probably 
  his two strongest potential opponents) were guaranteed to qualify. Further, 
  as champions of their respective organizations, Kramnik and Ponomariov are additionally 
  granted another bias: draw odds in their quarterfinal and semifinal matches. 
  Finally, Kasparov is "punished" by the fact that he will have to overcome 
  draw odds in his semifinal match, whoever the opponent. So clearly Kramnik and 
  Ponomariov would benefit from the match structure, and Kasparov would probably 
  not benefit, but how big of a deal is this? What are the magnitudes of each 
  player's advantages and disadvantages? This is an extremely important question, 
  perhaps THE most important question about the relative merits of Yasser's proposal.
First of all, let's once again draw an important distinction between the meaning 
  of "highest-rated player" and "strongest player". Ratings 
  are inexact, and so the player with the highest rating might not actually be 
  the strongest player. There is no way to exactly measure who the strongest player 
  is; all we can do is talk about the "likelihood" that each player 
  really is the strongest in the world. The rating list tells us (with great accuracy) 
  who has been most successful recently, and gives us some idea of who will do 
  best in the near future, but we should always remember that no rating difference 
  is ever 100% conclusive; you have to deal with probabilities rather than absolutes.
By the way, I want to applaud the decision of the Einstein Group to use an 
  average of the FIDE and Professional ratings for the invitations and seedings 
  in their Dortmund qualifier. I had already mentioned a year ago that a simple 
  average of the two ratings did an excellent job of masking the limitations of 
  each individual one, so I think it was a great decision. To keep things consistent, 
  I have done the same thing in the following analysis (using the April 1st 2002 
  rating lists), although I had to add 50 points to each Professional rating to 
  make the numbers similar to the FIDE ratings. With these ratings, we can apply 
  some simple statistics and calculate each player's likelihood of being the strongest 
  in the world. 
Unsurprisingly, it's probably either Garry Kasparov or Vladimir Kramnik. Kasparov 
  (average FIDE/Prof rating 2842) has a 49% chance of being the strongest player, 
  whereas Kramnik (2827) has a 34% chance. Veselin Topalov (2758), Ruslan Ponomariov 
  (2751), and Viswanathan Anand (2751) each have about a 3% chance, and the rest 
  of the world (2740 and below) has a combined 8% chance. In a perfect world championship 
  format, whenever Kasparov was indeed the strongest player, he would win the 
  championship. And likewise for Kramnik. Thus, in a perfect format, Kasparov 
  would have a 49% chance overall to win the championship, and Kramnik would have 
  a 34% chance, and so on.
However, the "perfect world championship" is only a myth. We've already 
  seen (above) that no known world championship format is even 70% effective, 
  so even in the best case, a third of the time the championship will be won by 
  somebody who is not the strongest player. We have to keep the matches down to 
  a reasonable and practical length, and sometimes that just isn't long enough 
  for the strongest player to demonstrate their superiority over another very 
  strong player.
I have spent several hours analyzing the statistical effect of draw odds, and 
  I can state very confidently that the actual selection of Candidates is far 
  more important than the question of who gets draw odds in a 10-game (or longer) 
  match. For instance, even if there were no draw odds, Kasparov and Kramnik would 
  still be "punished" by the fact that they have to play fairly short 
  matches against players who are certainly weaker, but nevertheless have some 
  chance to eliminate them. For instance, I just told you that we can be 83% sure 
  that either Kasparov or Kramnik is the strongest player in the world, but even 
  after they bypassed the qualifier, there would still be more than a 25% chance 
  that someone else would actually win the championship.
Ruslan Ponomariov is clearly the beneficiary of the most significant biases 
  in the "Fresh Start" proposal. Although his combined rating of 2751 
  puts Ponomariov in a virtual tie for fourth in the world with Viswanathan Anand, 
  he still has less than a 3% chance of actually being the strongest player in 
  the world. Nevertheless, Ponomariov would have a 10.4% chance to actually win 
  the championship. It turns out that if Ponomariov's rating were actually 2783 
  (rather than 2751), then the numbers would claim that Ponomariov did in fact 
  have a 10.4% chance of being the strongest player. Thus we can say that the 
  specific Fresh Start proposal "awards" Ponomariov 32 rating points, 
  in effect.
This is a very large bias in favor of Ponomariov. To try and put that bias 
  in more concrete terms, let's envision a fantasy scenario where Kasparov and 
  Kramnik are the only two players who bypass the qualifier, so Ponomariov has 
  to finish in the top six in the Swiss qualifier like anyone else. However, in 
  this fantasy, Ponomariov gets a special advantage (in the Swiss and in the final 
  rounds of matches) that he receives the white pieces every five games out of 
  six, instead of every one game out of two. According to my calculations, that 
  fantasy scenario gives Ponomariov about the same advantage that the actual Fresh 
  Start proposal gives him. Is that an unfair advantage? Or is it commensurate 
  with his position as FIDE World Champion? That is for someone else to decide, 
  I suppose.
It would be tempting to say that +32 rating points is way too many to "award" 
  Ponomariov, and that he should be granted an automatic place but not given draw 
  odds. Well, that doesn't really help very much, because the lion's share of 
  his advantage lies in his automatic Candidate status. Here is how the various 
  biases are measured by my technique:
(1) Being an automatic qualifier for the three rounds of matches (10/14/20 
  games): Kasparov -14 rating points, Kramnik -7 rating points, and Ponomariov 
  +22 rating points.
(2) Draw odds given to Kramnik and Ponomariov in the quarterfinal: Kramnik 
  +4 rating points, Ponomariov +6 rating points.
(3) Draw odds given to Kramnik and Ponomariov in the semifinal: Kramnik +4 
  rating points, Ponomariov +4 rating points.
(4) Any player who eliminates Kramnik or Ponomariov in the quarterfinal, inherits 
  draw odds for the semifinal: Kasparov -2 rating points.
Interestingly enough, this collection of small advantages for Kramnik, and 
  small disadvantages for Kasparov, are sufficient to make Kramnik the statistical 
  favorite if the Fresh Start proposal were to actually happen. Kramnik would 
  have a 38% chance to win the championship, Kasparov would have a 36% chance 
  to win the championship, and (as I've already said) Ponomariov would have just 
  over a 10% chance to win the championship. Nevertheless, that is only because 
  Kasparov and Kramnik are already so close together. In the bigger picture, this 
  draw odds issue does not seem to merit the attention it gets. A +4 rating point 
  advantage, across the entire world championship cycle, is less important statistically 
  than the total advantage you would get from your opponent blundering a pawn 
  in one single game, sometime during the cycle. Probably this is more of a prestige 
  issue than anything else, or perhaps there is a huge psychological issue I am 
  ignoring with my statistics (like the feeling that you are battling uphill from 
  the start, if the other person has draw odds).
CONCLUSION
As I said way back at the beginning, I have no particular agenda to promote. 
  However, I have had to re-examine many of my assumptions about chess, as a result 
  of this analysis, and I hope that will happen for you as well. Among other things, 
  I now have a much greater respect for Swiss tournaments than before, along with 
  a greater respect for Yasser Seirawan's judgment and intuition about what makes 
  a good tournament format! Perhaps some deeply-held beliefs about the "randomness" 
  of rapid chess will also be challenged as a result of my analysis, but possibly 
  that is too much to expect. Likewise for the "draw odds" debate, I 
  suppose...
This essay is the culmination of many, many late-night hours of effort. However, 
  I hope that it will prove to be a beginning, rather than an end. There are many 
  problems with the current state of the chess world, and statistics will never 
  be the only answer to any of them. Statistics are merely a tool, a source of 
  information, to assist people in finding a better answer to some of their problems. 
  There has been so much debate, and yet so little objective exploration of the 
  facts, and so I hope that this will be the beginning of a new effort, a new 
  kind of debate. I invite you to send me e-mail at jeff@chessmetrics.com, and 
  if there is enough interest perhaps I will publish a follow-up analysis which 
  incorporates feedback from all of you.
I would like to conclude with a quote from baseball analyst Bill James: "It 
  has always been my experience that if you can present a good argument and back 
  up what you are saying, there are people who will be persuaded. It is sometimes 
  possible to change the tenor of the debate by injecting information into the 
  discussion." I hope, very much, that he is correct.
Thank you for taking the time to read this.
 Jeff Sonas
Links