Winning starts with what you know
The new version 18 offers completely new possibilities for chess training and analysis: playing style analysis, search for strategic themes, access to 6 billion Lichess games, player preparation by matching Lichess games, download Chess.com games with built-in API, built-in cloud engine and much more.
The FIDE world championship tournament, which takes place from September 27th through October 16th in San Luis (Argentina), should prove to be a fascinating event. It is the first time in more than half a century that the FIDE (men's) world champion will be determined by a round robin tournament, rather than a two-player match or a huge knockout tournament. I have recently finished a detailed random simulation of the possible tournament outcomes, and it has produced some very interesting numbers.
By now you've probably read Nigel Short's article in which he questions Garry Kasparov's claim of a 95% chance that the tournament would either be won by Viswanathan Anand, Veselin Topalov, or Peter Leko. Nigel offered to take the remaining five players (Peter Svidler, Alexander Morozevich, Michael Adams, Judit Polgar, and Rustam Kasimdzhanov) if Kasparov would give him 17-to-1 odds for a $100 wager on the tournament's winner.
There is no doubt that a win by either Anand, Topalov, or Leko is likely, and of course there are many subjective considerations which are difficult to analyze numerically. However, the statistics do paint a more uncertain picture than that suggested by Kasparov. According to my calculations, the remaining five players have a combined 41% chance (not a mere 5% chance) to win the tournament, making those 17-to-1 odds seem pretty attractive! In addition, because Leko's strong tendency toward draws makes a large plus score unlikely for him, and it could easily require a +4 score to win the tournament, Peter Svidler is actually given a slightly greater chance to win the tournament than is Leko (12% vs. 11%), and Judit Polgar is right there in the same group with an 11% chance to win.
Viswanathan Anand seems to be the clear favorite, with a 31% chance to win. Veselin Topalov has the next-best prospects, with about a 17% chance, and each of the other six players has somewhere between an 8%–12% chance to win, except for current FIDE champion Rustam Kasimdzhanov, who is easily the lowest-rated participant and is only given one chance in thirty of winning the tournament. We will talk some more about the individual players further down, but first I want to discuss the format of the tournament itself, because it is a far cry from the FIDE championship knockout tournaments of recent years. We could debate endlessly about whether a tournament or a match is preferable, but instead I would prefer to emphasize that out of the possible tournament formats, this one is excellent.
It might be tempting to glance over the tournament rules and immediately start criticizing the fact that, just as in the past, a shared first place could ultimately be decided by rapid games, blitz games, or even the "Armageddon" sudden-death game. Of course, nobody wants a rapid game to decide the championship, but that is indeed a very real possibility in a knockout tournament, where there just aren't that many tiebreaking options other than proceeding to the faster-time-control games. You can't use head-to-head results, or number of wins, because they are always the same for both players. And "first win" (or "last win") is considered to be too unfair to the player who has Black first (or last). The FIDE championship was in fact determined by rapid games twice during the knockout era, in the first event when Anatoly Karpov defeated Anand in 1998, and then the last one when Kasimdzhanov defeated Michael Adams in 2004.
However, this concern is far less relevant in San Luis than it was for the knockout tournaments. In a round-robin event, particularly a long one, it is possible to set up the tiebreaking rules so that it’s almost certain that first place will be resolved via the classical games themselves. Over the course of the long tournament, there will be many differences in the quantity of wins for each player, and various head-to-head results can be used as well. Admittedly, some of the tiebreak criteria may seem somewhat arbitrary (e.g., why not “fewest losses” rather than “most wins”?) However, at least this way the players know beforehand what they are getting themselves into, and they can plan accordingly as the final rounds approach and the tiebreak situation crystalizes. Of course, you can always play the “What If?” game and revert to criticizing the ultimate tiebreaker (games at the faster time control), but for this particular event, and these particular rules, it is very unlikely that the championship will need to be resolved by the rapid/blitz games. More specifically, the odds are 38-to-1 against a need for rapid games to resolve the championship.
Let’s go through the various possibilities. First of all, remember that this is a very long event, fourteen rounds. That means it’s pretty likely that the players will sort themselves out enough; in fact, according to my calculations there’s almost an 80% chance that there will be a clear winner after fourteen rounds, meaning all of this concern about tiebreaks would be quite irrelevant. In other words, if we played this tournament 40 different times, we would see a clear first place winner 32 times, and we would see a shared first place only 8 times. What happens in those remaining 8 cases?
According to the rules, the first tiebreak criteria comes from head-to-head results among the tied players. For instance, if there was a three-way-tie, we would look at the head-to-head results among just those three players. If there is still a tie, then it falls through to the next criteria, which is to count up the total number of games won during the tournament by each player (against all opponents, even those who didn’t share first place). Most of the time, the criteria will suffice to determine a single winner. Only one time out of forty would it actually move on to rapid games. And even if it does get to rapid games, there’s about a 98% chance that the rapid tiebreaker would only involve two players, rather than the weird multi-player mini-round-robin that the rules provide for.
While we’re on the topic of rapid games, I do want to point out one other thing. I know that Rustam Kasimdzhanov’s victory in Tripoli last year was a huge surprise, but it actually could have been somewhat anticipated statistically, if rapid ratings had been used in the calculations. I didn’t use any, because there wasn’t an official FIDE rapid list handy. But in retrospect I do want to call attention to Stefan Fischl, who maintains a web site that includes an unofficial “rapid rating list” going back a few years. According to Stefan’s list, as of the start of the 2004 Tripoli tournament, Kasimdzhanov was ranked #2 at rapid chess among all 124 of the tournament participants, trailing only Veselin Topalov. Thus it is perhaps not that surprising that Kasimdzhanov was able to eliminate Alejandro Ramirez, Vassily Ivanchuk, Topalov, and finally Adams during the rapid games. And by now Kasimdzhanov’s unofficial rapid rating is second in the world among active players (behind only Viswanathan Anand and Garry Kasparov).
So if it does get to a rapid tiebreak, you might be interested to know that Anand’s unofficial rapid rating is more than sixty points higher than anyone else at San Luis, but the #2 and #3 spots (among San Luis participants) are held by underdogs Kasimdzhanov and Alexander Morozevich, with Judit Polgar far down on the rapid rating list, more than 200 points below Anand. I used those rapid ratings in my simulation model, but it really isn’t too significant since a rapid tiebreak is quite unlikely.
Enough about tiebreakers; let’s get back to rounds 1-14, the classical part of the tournament, which (as I’ve said) has a 97% chance of being sufficient to determine the next FIDE champion. Who is favored to win, and why?
My simulation model took several factors into consideration. The most important factor, of course, is the estimated strength for each player: their rating. Rather than just using the FIDE ratings, I have chosen to use my more accurate Chessmetrics rating formula to compute the strength of each player as of September 1st. I have also considered other factors such as White vs. Black strength for each player, along with their draw frequencies with each color. I have also looked for significant head-to-head results from the past, and finally (after a lot of agonizing) I decided to include a bonus/penalty for players who have done particularly well/poorly against 2700+ opposition. Taking all of these factors into consideration, and randomly simulating the entire tournament a million different times, this is how it turned out:
There are definitely some important differences between these numbers and what you would have expected from the FIDE rating list. First and foremost, Anand and Topalov are tied on the latest FIDE list, with Peter Leko being 25 rating points behind, and then there’s another 25-30 point gap before we get to Peter Svidler and Judit Polgar. Why, then, do I have Anand so far ahead of Topalov, and how did Svidler and Polgar catch up to Leko?
Well, it’s kind of hard to explain but I’ll give it a shot. You might remember an article I wrote a couple of years ago where I suggested replacing the current Elo formula with a simpler “linear” formula. My analysis showed that the Elo formula creates an unintentional bias against the players who tend to outrate their opponents by 100-200 points. For instance, if you have a 150-point rating advantage over your opponent, empirical data says you should score about 67%, whereas the Elo formula expects you to score 70%. So if you played 100 games against opposition rated 150 points below you, and let’s say you really did score 67/100 (exactly matching the empirical predictions), then the Elo formula would claim that you should have scored 70/100, and thus you had scored a full 3 points below expectations, and you would (undeservedly) lose 30 rating points.
In this tournament field, the players other than Kasimdzhanov who typically face the lowest-rated opponents (due to the events they tend to play in) are Peter Svidler and Alexander Morozevich. These two players have had an average rating advantage of about 100 points in their games from recent years, and so their FIDE ratings are quite lower than they really deserve, due to the aforementioned bias. At the other end of the spectrum, Peter Leko and Veselin Topalov face such strong opposition that on average they only outrate their opponents by 25-30 points. Leko and Topalov have not had this Elo bias working against them, so their FIDE ratings are a little higher than they deserve, relative to the others. And so what looked like a 25-point gap in strength between Leko and Svidler on the FIDE rating list, is revealed as just a function of the kinds of events they play in, and in fact Leko and Svidler are probably about the same strength, despite what the rating list would tell you. And whereas Anand faces about the same caliber of opponent that Topalov does, Anand’s rating is so high that he too outrates his typical opponent by about 100 points. Since the Elo formula places an unreasonable expectation upon Anand, his FIDE rating is also lower than he deserves.
That’s the simplistic explanation. There’s a lot more going on, because the rating formulas really are very different. For instance, the FIDE ratings don’t care that Judit Polgar took an entire year off, whereas my ratings (which account for inactivity) are sensitive to the time lapse. Topalov is playing much more frequently than he did a few years ago, and that is also affecting his rating. If anything, it’s amazing that the FIDE list and the Chessmetrics list match up so closely! However, you probably don’t care too much about the intricacies of rating calculations, so let’s just leave it at this: I have optimized my formula to provide maximal predictive power, and it says that Svidler is indeed as strong as Leko, and that Anand is somewhat stronger than Topalov, but who really knows the truth?
As long as we’re taking this nostalgic trip back through my past journalistic efforts, let me remind you of another article I wrote a few years back, trying to see whether past head-to-head results really have any bearing on future results. I know that the players think they do, that there are certain opponents they love to face and others they hate to face. In that article, I concluded that there really was no such effect, that it didn’t matter whether you’d over-performed or under-performed in the past against someone.
However, because the head-to-head results are so relevant in this tournament (it’s the first tiebreaker criteria), I figured I should revisit that investigation a little bit. So, I re-ran the analysis using the newer Chessmetrics ratings, and I checked to see whether accounting for past head-to-head results would in fact have improved the predictions of future matchups. It turns out that once you pass a certain level of significance, it does improve the future predictions if you correct for matchups where one player seems to have a particular “knack” for beating another
The most famous historical example is (of course) Vladimir Kramnik’s career mastery of Garry Kasparov. Over the course of his career, Kramnik took about 79 rating points from Kasparov (meaning that Kramnik scored 7.9 full points more than expected, across all their games). This is by far the largest amount in chess history. If they ever played each other again, this methodology would award Kramnik a special 26-rating-point bonus when facing Kasparov. Tied for second with 60 rating points each, are Viktor Korchnoi’s domination of Lev Polugaevsky, and Kramnik’s domination of Judit Polgar (which might become particularly relevant if Polgar does manage to win this tournament; Kramnik would receive a special 20-rating-point bonus against Polgar!) Down at #33 on the historical list is Leko’s historical overpeformance against Topalov, and way down at #102 on the list is Anand’s overperformance against Polgar. Those two San Luis matchups are the only ones that qualify as being “significant”. Thus in my model, I give Leko an extra 14-point rating advantage when he faces Topalov, and I give Anand an extra 11-point rating advantage against Polgar. But all in all, I think this factor is not particularly relevant.
I also decided to see whether certain players do particularly well (or particularly poorly) when facing elite opposition. I set the boundary at a 2700-level, and for each player I examined their historical results against players rated 2700+, and against lower-rated players, to see whether there was any unusual difference in results. Out of the eight participants, the three who have done unusually well against 2700+ opponents were Peter Svidler, Peter Leko, and Rustam Kasimdzhanov, and they all got a 7-rating-point bonus in my calculations (since this is such an elite event). On the other hand, Alexander Morozevich has historically done much better against lower-rated players, and not so well against 2700-level opposition, so he got an 11-rating-point penalty in my calculations. Veselin Topalov and Judit Polgar also received smaller penalties of this kind. Here is a summary of the various modifiers that I used:
Player | ||||||
Anand | |
|
|
|
|
|
Topalov | |
|
|
|
|
|
Svidler | |
|
|
|
|
|
Leko | |
|
|
|
|
|
Polgar | |
|
|
|
|
|
Morozevich | |
|
|
|
|
|
Adams | |
|
|
|
|
|
Kasimdzhanov | |
|
|
|
|
In this list, I want to call your attention to the column about draw percentage. Overall, I expect a draw percentage around 46%. Although it is an elite event, and thus traditionally full of draws, the inclusion of Topalov, Polgar, Morozevich, and Kasimdzhanov should ensure plenty of decisive game, and this could have a very interesting result. Remember that a player’s total number of wins is one of the tiebreakers, and so players who win a lot and lose a few, will have a tangible advantage over other players who win a few but never lose. And there is another mathematical factor involved, which is that the riskier players stand a better chance of putting a string of wins together and building up a really high score to actually win the tournament.
To illustrate this, let’s compare Judit Polgar and Peter Leko. You can see in the above list that after all the factors are considered, Leko is given a “final San Luis rating” above 2750 whereas Polgar’s is down at 2735. Nevetheless, they are given identical 11% chance to win the tournament. Why is this? Again, kind of hard to explain but I’ll try. You’ve made it this far so you must be at least marginally interested. Players with a lot of decisive results have a “wider” bell curve of total scores, meaning that they have a pretty significant chance of a major plus score, along with a pretty significant chance of a large minus score. On the other hand, drawish players don’t have such a wide variation in possible scores:
You can see that Leko’s curve has a higher peak at an even score or a +1 score, so he is more likely than Polgar to end up with those scores. She is clearly more likely than Leko to put up a big minus score like -4 or -5, but despite her lower rating, once we get out in the +4 or +5 range, the unpredictable Polgar actually has a better chance than the drawish Leko to finish with such a high score. And since it is probably going to take at least a +3 score (or higher) to win the tournament, Polgar’s chances of winning the tournament are quite comparable to Leko’s. Her average score is lower, but that doesn’t matter as much as her chance of a high final score. You would get the same identical explanation for why Svidler actually has a slightly better chance to win the tournament than the higher-rated Leko, and for why Morozevich has a slightly better chance to win the tournament than the higher-rated Adams.
Finally, I want to provide some justification for what I said in the last paragraph about it requiring a +3 or +4 score to win the tournament. Remember that I simulated this tournament a million different times. Sometimes a +2 score was good enough for clear first place, and sometimes a +9 score was not even enough for a share of first place. There was even one simulated tournament where all eight players tied with a score of 7/14, and the classical tiebreak criteria were sufficient to bring it down to two players in the rapid tiebreak, where Svidler defeated Kasimdzhanov to win the title! However, by taking an aggregate average you can get a good sense of the overall trends in what it will probably take to win.
I can tell you that more than 70% of the time, the tournament will be won by somebody scoring either +3, +4, or +5, with the most likely winning score being +4. A score of +2 is probably not going to be good enough to win the tournament; in fact, the odds are about 17-to-1 against your becoming champion if you finish with a +2 score. Managing a +3 score is obviously more promising, but the odds are still 2-to-1 against you. On the other hand, a +4 score probably gives you a 51% chance to win clear first, and an additional 11% chance to share first and still win the title, meaning that overall you have a 62% chance to become champion if you score +4. Here is a graphic illustrating these numbers for all scores between +1 and +10.
Part of the fun of this is to watch as the numbers change over the course of a tournament, and to try to figure out why. Oftentimes there is something very significant going on, that I never would have noticed without digging a little bit more. My plan is to provide statistical updates on each rest day of the tournament, and to try to explain why the numbers have changed. I hope that you have enjoyed this article. I know that the statistical perspective is not the only perspective, and is not even the most important one. But perhaps it will provide a useful counterpoint to some of the more subjective approaches. In any event, I'll see you again on the first rest day. In the meantime, feel free to visit my Chessmetrics site or send me email about any of this.
The
Greatest Chess Player of All Time – Part IV |
The
Greatest Chess Player of All Time – Part III |
The
Greatest Chess Player of All Time – Part I |
FIDE
Championship odds after round three |
Revised
statistics for FIDE championship |
Who
will be the next FIDE world champion? |
Putting
his money where his stats are |
How
(not) to play chess against computers |
Physical
Strength and Chess Expertise |
Are
chess computers improving faster than grandmasters? |
ChessBase
in Slashdot |
Man
vs Machine – who is winning? |
Does
Kasparov play 2800 Elo against a computer? |
Computers
vs computers and humans |
Dortmund
statistics: who will win |
The
Sonas Rating Formula – Better than Elo? |
The
best of all possible world championships |