PART III – What does the future hold for grandmasters against computers?
In Part I 
  and Part II 
  of this series, we looked at some historical evidence suggesting that right 
  now the top human grandmasters and the top chess computers are extremely closely-matched. 
  Further, there is no compelling evidence to indicate that computers are soon 
  going to pull ahead of the top humans. 
With the Kasparov-Deep Blue matches so far in the past, it must come as a 
  big surprise to many people that computers have not yet surpassed the top grandmasters. 
  Although computers are obviously getting stronger due to hardware and software 
  improvements, humans have also improved their play against computers, faster 
  than expected.
What does the future hold for grandmasters against computers? It all depends 
  upon which group can improve faster, relative to the other. "Improvement" 
  would typically suggest that a player is adding something positive to their 
  play. However, remember that it could also mean that a player is removing something 
  negative from their play. Either one constitutes "improvement". 
I can think of three main categories where grandmasters and/or computers could 
  improve:
  - Physical strength
-  Chess expertise
- Playing style against specific opposition 
Let's go through those three categories and see how they apply to computer 
  improvement against humans, as well as human improvement against computers.
1. Physical strength
 Clearly, improving the hardware will allow a chess computer program to play 
  objectively stronger. Faster-executing programs can evaluate deeper, or more 
  accurately, in the same amount of "thinking" time. From examining 
  the past several years of the SSDF (Swedish Computer Chess Association) computer 
  list, we can say that hardware leaps of 80 points have happened approximately 
  every two years. This would suggest that computer hardware is providing an 
  annual increase of 40 points of strength. 
However, it is important to remember chess columnist Mig Greengard's quote: 
  "The computer doesn't really play chess. It plays another game that looks 
  like chess but has its own rules." When two computers are playing each 
  other, if one can search 17 moves deep and the other can only search 15-16 
  moves deep, then the first computer has a big advantage because it sees everything 
  the second computer sees, and then some. That's why it is conceivable that 
  computers are gaining 40 points a year against older computers, from searching 
  depth alone. Against humans, however, a search 17 moves deep vs. a search 15 
  moves deep is not as relevant, because the human isn't thinking nearly so far 
  ahead anyway.
  
  Of course, it is difficult for humans to improve their physical strength when 
  playing computers. However, there is still a very effective way to "improve" 
  their success, and that is to remove a negative factor which has hindered grandmasters' 
  performance: the lengthy match. You can see here that computers do progressively 
  better against humans, as a one-on-one match progresses:

This is likely due to the effect of physical and mental fatigue upon the human, 
  as the match continues. In a match between two humans, the fatigue would mostly 
  balance out as the match progressed, since both humans would get tired. But 
  obviously the computer does not get tired or discouraged. It is also possible 
  that this effect is related to humans using up their opening novelties at the 
  start of a match, or some other factor, but fatigue seems likely to be the 
  real culprit. 
I should also point out that you don't see this effect in Swiss or round-robin 
  tournaments that have both computer and human participants. Computers do about 
  the same against humans, whether in the start, middle, or end of a tournament, 
  so there seems to be something particularly draining for the humans about a 
  one-on-one match against a computer.
2. Chess expertise
 Certainly, upgraded software will play objectively stronger chess, even on 
  the same hardware. Improved chess knowledge, better opening books, better endgame 
  tablebases, better search techniques, and better utilization of hardware will 
  all enable superior moves to be found in the same amount of thinking time. 
  How can we express this in terms of rating points? Well, in Part 
  I we looked at how the SSDF ratings of the top-ranked computers have progressed 
  over time. Let's review that graph once again:

 However, remember that this only applies to games between computers. In the 
  same way that hardware upgrades probably don't give the full 40-point annual 
  improvement against humans, it seems likely that software upgrades also don't 
  provide an additional 30-point annual improvement against humans. Surely some 
  of those 30 points of software upgrades will come from improvements to a program's 
  opening library. Since the older programs are commercially available, it is 
  fairly straightforward to play thousands of games against older software and 
  to identify holes in the opening books of those older programs. This will allow 
  new software to dominate older software, but against humans, the improvements 
  to the opening book (while useful) probably won't translate to a full 30-point 
  annual improvement.
With a 40-point annual improvement due to hardware upgrades, and a 30-point 
  annual improvement due to software upgrades, that would normally suggest that 
  computers are getting stronger at a rate of 70 Elo points a year, relative 
  to humans. This is clearly not happening. If the SSDF list is indeed over-estimating 
  the true rate of improvement of computer programs, what would we expect to 
  see? Over time, the ratings of the top programs would drift higher and higher, 
  until they got so ridiculously high that some sort of correction would need 
  to be applied to reflect the true strength of the computer programs.
This is exactly what happened a few years ago, and that explains the curious 
  downturn of the SSDF graph in mid-2000. This is what Thoralf Karlsson, SSDF 
  chairman, had to say in August 2000:
The SSDF rating list provides information about the relative strength 
  of chess programs, when tested in the way SSDF does, but does not necessarily 
  say which Elo-rating a certain program would achieve after having played hundreds 
  of tournament games against human players. How good or bad the individual correlation 
  between SSDF- and ELO-ratings is, will most likely never be established. So 
  many games against humans will never be played.
Apart from establishing relative ratings, we have had the ambition that 
  the general level of the list would be fairly realistic, compared to human 
  ratings. From our start in 1984 we have used tournament games against Swedish 
  chess players to calibrate the list. At some points we have discarded older 
  games, believing that human chess players with time have become better to exploit 
  the weaknesses of chess programs. Until the latest rating list the level of 
  the list has been unchanged from summer 1991, and was based on 337 tournament 
  games against Swedish players between 1987 and 1991. Regrettably it has not 
  been possible for us to play any more games for many years now.
  
  For some time we had the general impression that the level of the list was 
  rather OK. But during the latest years it has become more and more obvious 
  that the best programs on the latest hardware don't get as high Elo-ratings 
  as our list could be interpreted to predict. If this is due to differences 
  between Swedish- and Elo-ratings, to the "human learning effect", 
  to some kind of "spreading effect" in a computer-computer list or 
  a combination of these and perhaps other factors, we don't know.
It is difficult to find a perfect solution, but we have chosen to correlate 
  the level of the list to the results of tournament games between computers 
  and Elo-rated humans, played during the latest years. For us it has been very 
  convenient to use Chris Carson's compilation of such games. Calculations based 
  on these games indicate that the level of the list is about 100 points too 
  high. So from now on we have lowered the list with 100 points!
To summarize, before the correction, in early 2000, the SSDF ratings were 
  still accurate in how they ranked computers against each other, but the actual 
  rating numbers were too high, across the board. Those numbers ultimately were 
  coming from a few hundred games played against Swedish players in 1987-1991. 
  And it was becoming too much of a stretch to extrapolate forward from games 
  played by the top Mephisto and Fidelity computers, on 68020 processors against 
  lower-rated humans, a dozen years in the past. For one thing, there was no 
  allowance for the fact that human players had gotten objectively stronger, 
  or had learned to play better against computers, since 1991.
So, at that point, about 100 games were analyzed from events between humans 
  and computers in 1997-2000. The humans in those games had an average FIDE rating 
  below 2400; the only two events against really strong humans were Junior at 
  Dortmund 2000, and Fritz at the Dutch Championships in 2000. Thoralf Karlsson 
  also had to make some assumptions about the impact of different hardware, since 
  the hardware used by Junior and Fritz in those events (for example) was different 
  from that used by the SSDF. The conclusion from all of this analysis was that 
  all SSDF ratings should be reduced by 100 points. There have been no further 
  corrections since then.
However, I believe that the same kind of upward drift has continued in the 
  three-plus years since August 2000. It is true that today's top computers would 
  dominate the top computers from three years ago, leading to a 200-point difference 
  on the SSDF list. However, I don't see that it necessarily means that today's 
  top computers would play 200 points better against top grandmasters.
For one thing, computers were doing unusually well against humans, exactly 
  in that time frame. If you remember the performance rating graph from Part 
  II a couple weeks ago, top computers had a performance rating (against 
  humans) of 2444 between 1995 and 1997, and then it shot up 200 points (to 2647) 
  between 1998 and 2000. But the improvement didn't continue at that rate; the 
  performance rating of computers against humans only went up by a total of 62 
  points between the 1998-2000 range and the 2001-2003 range. And as I tried 
  to prove in Part II, even that improvement only came from computers becoming 
  more dominant against the lower-rated humans; humans rated 2550+ are just as 
  successful against computers today as they were five years ago.
Since the SSDF list is calibrated against human-computer results from 1997-2000, 
  and more than 80% of those humans were rated below 2550 anyway, I think it 
  is a mistake to look at the 2800+ SSDF ratings of the top programs and to conclude 
  that those top programs will dominate today's top grandmasters. The battle 
  is not over yet.
In Part IV Jeff Sonas examines playing style and the question of whether 
  it is possible to "tune" computers to play especially well against 
  humans. He includes statistical analysis on which openings are especially suited 
  to the playing style of computers, i.e. which lines humans should probably 
  avoid. This article will appear this weekend – well, you're just going 
  to have to wait like everyone else, aren't you, Garry...
   
    |  | Jeff Sonas is a statistical chess analyst who has 
        has invented a new rating system and used it to generate 150 years of 
        historical chess ratings for thousands of players. You can explore these 
        ratings on his Chessmetrics 
        website. Jeff is also Chief Architect for Ninaza, providing web-based 
        medical software for clinical trials. Previous articles: |