I'm a developer at IBM. For the last three months I’ve been working on our #whatmakesgreat campaign, showcasing the use analytics at Wimbledon. Before the tournament ends I wanted to explain how we ended up discussing Roger Federer's perspiration and Rafael Nadal’s preference for sunshine.
The technical challenge in #whatmakesgreat was to combine tennis statistics, unstructured data (25 years of the Telegraph's and Wimbledon’s archives), the experience of tennis players and coaches and fan opinion to determine what makes a great Wimbledon champion. Raw statistics might tell us who the most successful player is, but great means more than best. Statistics can't measure sportsmanship, relationship with the crowd, flair or passion.
My assumption was that the statistics would tell us about match performance and the unstructured content would cover the fuzzier elements. That, to some degree was true, but the most revealing insights came when the unstructured analysis forced us to ask new questions of the statistics. Bias in machine learning is an active topic and we took efforts to avoid it. In the best moments of this project this bias was reversed, the machine learning analysis challenged our biases as developers, coaches, journalists and fans.
Our first step was to get a rough sense of how each player was described in the archives. We selected and grouped content by player and a TF-IDF analysis found words closely associated with each player. Intriguingly, ‘sweat’ was showing a statically significant correlation to Roger Federer, but not to any other player. Does he work harder than the rest of the players?Does he have a problem with perspiration?Or is he sponsored by a deodorant company? A more detailed analysis of those documents showed the opposite, Roger Federer does not sweat. Many articles contained variations on the theme of him winning without perspiring. There are more references to Federer’s lack of sweat than to many other player’s serves.
That Roger Federer does not sweat had become ingrained thinking, the sort of idea we were looking to challenge. Was it real or just a lazy cliche? We had IBM’s Wimbledon match data for all the top players and using Weather Underground we pulled in temperature data for those matches. This let us see the number of matches played by player and temperature.
*Games played at Wimbledon where temperature data is available.
We also knew the results of these matches, so we could calculate a win:loss ratio by temperature for each player. This showed Federer was consistent regardless of the temperature. Intriguingly, Nadal appeared to be more successsful when it was cooler. Interesting maybe, but not conclusive. The top players lose so few matches (and when they do they tend to be against other top players), that a win:loss ratio is not a reliable statistic. We came up with a better measure, looking at the ratio of points won and lost. Even a great player will lose many points across a game.
*Games played at Wimbledon where temperature and point by point data is available.
These results again showed that Federer (and the majority of the greats) performed consistently, regardless of the temperature. Perhaps there is more to this than a cliche, but Federer was not any better at dealing with warmer temperatures than any other great. So why might it be that Federer doesn’t appear to sweat? Our tennis experts knew Federer to be ruthless in finishing the games, he tends to dominate the play and not get involved in many long gruelling matches. Our coach described the way Federer will attack the net. If you come in close, the point is going to end quickly whether you win or lose. The statistics back this up, his matches tend to be shorter than average. So maybe we don’t see Federer sweat as he’s rarely on court long enough?
What took our attention in these results wasn’t Federer, but Nadal. Again they confirmed the previous results, that Nadal is more successful in cooler temperatures (Agassi was the only other great who shared this trend).
As much as there's a cliche about Federer's lack of sweat, common thinking is that Nadal doesn't like to play in cooler weather. He’s from Mallorca the stereotype goes, of course he’ll do better in Mediterranean weather. Even Nadal thinks this is the case as confirmed in several interviews.
“I prefer the sun and the heat. But instead we have cold, wind and rain.”
- Rafael Nadal, Daily Telegraph 3 July 2007
The data shows that is not the true. He’s certainly no worse when its colder and if anything he has a better record when the temperature is lower. This formed the basis of a Telegraph article and fans used social media to debate the affect temperature would have on their favourite players.
Several fans highlighted the affect the temperature could be having on the ball and surface.
“I wonder too if part of it at Wimbledon is with the change in the temperature, the surface changes quickly - obviously these are the top players who can adapt quickly to it but perhaps the differences just suit Federer and Nadal differently.”
- Nickie Chapman, 29 Jun 2017
“I change my racket with the the season since the warmer it is the more speed and bounce to the balll. What this tells us is that Nadal thrives on a slower less bouncy ball which when you look at his style of play is unsurprising. Nadal's racket suits my winter game and Djokovic's (which has more strings) my summer game.”
- Arnie Ward, 29 Jun 2017
We had been thinking in terms of temperature impacting player's endurance. What we hadn’t been looking at was the temperature's affect on the balls and the surface, something Anne Keothavong agreed with the fans on. Anne was not surprised that Nadal performed better in cooler conditions. She said that it slows the ball down, making it more like playing on Nadal’s favourite surface, clay.
So Federer is good in any temperature. He does sweat, it’s just that he’s rarely on court long enough to get hot. Nadal might like the sunshine and warmth, but he performs better in slightly cooler conditions, where the slower speeds suit his strong baseline game. A tiny bit of analysis in a bigger project, but one that engaged fans and made us all think about things differently.
The Things We Didn’t Know We Knew
What I think was interesting in this piece of analysis was that everything was known, but just not quite connected together.
That there is a cliche of Federer not sweating was hidden inside lots of media reports spread over many years. It's only when we saw all the references together that it was clear how ingrained the thinking is.
Historic temperature data and Wimbledon match results both existed, but it's only in combining the two sources that you see that Nadal has a better record when it’s cooler. That Nadal himself has been influenced by the stereotype against him is only visible when you add in his interview comments.
That Federer plays shorter games is known by tennis statisticians, but not connected to his consistency across temperatures.
That tennis fans and players know that the ball slows when it’s cooler. That they know a slower ball suits Nadal and yet that is not used to question the idea of Nadal being better in the heat.
The analysis was mainly statistical, what the unstructured content did was force us to ask the right questions, to challenge the assumptions and bypass some of our own biases.
Darren here. I work in technology at Net-A-Porter, but my real love is for fashion, beauty and portrait photography. I spend my spare time writing about pictures I love and trying to emulate them. I’m always happy to meet models, makeup artists and stylists. So if you’re interested in working together, get in touch by email or find me on Twitter or Instagram.