Moneyball in football: Analysis on Tam Scobbie and Gregory Tade
Ok, so this article is not strictly about Moneyball. For those of you not familiar with the term, ‘Moneyball’ was popularized by the 2011 film staring Brad Pitt, where (based on a true story) his character, the manager of the Oakland A’s baseball team, one of the smallest budgeted teams in the league, put together a new side based purely on statistics… The side went on to set the American League record of 20 consecutive wins and were narrowly defeated in the play offs. To put in football terms, it would be like Saints qualifying for the Champions League.
The approach has been tried in football – it’s rumored Liverpool, under the American Red Sox’s ownership tried this last season. It worked for the Red Sox’s, but for Liverpool it meant Stewart Downing and Jordon Henderson. Even my mum could have told me that wasn’t going to work! So, if it didn’t work for Liverpool, why am I even talking about it?
Well, I was inspired by this post on the Liverpool Red and White Kop Forum, and decided to put this to use with Saints new and prospective signings. For a full indepth understanding of what I’m attempting to do, please read the link to the original post, however I will give a brief explanation here (borrowing heavily from the original post!)
Essentially, you can look at statistics for footballers in 3 ways:
1. That coarse numbers pertaining to passing, shooting, tackling, crossing and of course ‘chances created’ are all relevant and can help you make an informed decision about a potential signing or how to deploy a player.
2. That coarse numbers pertaining to passing, shooting, tackling, crossing are derived from heterogeneous environments therefore irrelevant and any attempt to homogenise them is a wild goose chase. However with a bit of insight and effort there are metrics that can be derived to help you make an informed decision about a potential signing or how to deploy a player.
3. That they’re not worth a 20p mix.
The original poster rules out 1 and 2, effectively on the basis just cause a striker scored lots of goals, or has a good shot accuracy, doesn’t mean they will fit in at their new club. Based on the stats Alfonso Alves seemed a great signing for Middlesborough, but it didn’t work out that way, and stats don’t explain how since Cisse signing Ibrahim Ba has managed just one goal. So, a new approach is needed.
Humans are, in theory, pattern matching machines, we know that despite both having four legs and a tail, a dog and a cow are different things because they fit different patterns and so humans view the world (and by implication, football players) through the filter of such patterns. On the other hand they tend to be a bit shit at differential calculus.
Computers though, are little more than adding machines, by design they don’t know anything about patterns but they can tell you the square root of 234234923874 on demand and integrate a differential form quick sharp.
So, if it were possible to combine the calculation power of a computer with the pattern matching of a human, there may exist an opportunity for a major advantage.
Enter, Artificial Intelligence – the practice of enabling machines to recognise and classify patterns.
Let’s start with a basic example, shapes are a some of the most simple patterns going, we can very quickly spot the circles, squares and triangles in the following image, but to a computer it’s just a series of ones and zeroes. It could be the Sistine chapel for all it knows.
Ok, so teaching a computer that Sandaza is a square, Sheridan is a circle and Mackay is a triangle shouldn’t be too hard – but how to you decide what data your using. Well, this is where mathematics come back into play. The original poster didn’t display his algorithm, and, anyway, it wouldn’t work for the SPL, as the data available is different, so I made my own. How do we know its right?
Well, like the original poster, we need to validate against something which we can manually validate. I will use the same example he gave, with cars. This is easy to manually validate, as we know a fast car from a slow car.
So, lets run my algorithm against the data and plot as a scatter graph, to visually display the data.
As you can see, the ‘normal’ cars are all very closely grouped together (which we expect – they are all similar cars) and the ‘supercars’ occupy the top right of the graph. Again, we would expect this and the Bugatti is off on its own, which, as by far the ‘best’ supercar available I would also expect. Based on this I am happy my computer has learnt how to classify cars as squares or circles.
Still with me? Great.
Let’s now look at how this translates to Saints players.
The main difficulty was finding accurate statistics on them. Its ok for Scottish premier league players, but for those from lower divisions there is next to nothing available. It was also hard to find data going back many years, so like the original poster did with Liverpool’s signings from previous seasons, its hard to go back and do this for Saints players (Haber, Sheridan, Sandaza have next to no stats before they joined Saints).
But, as I can validate the algorithm works, I looked at a recent Saints signing, Tam Scobbie and a prospective signing in Gregory Tade.
What does this tell us? Well, like the examples above, it takes footballers and determines whether they are a square, triangle, circle etc. In this case, I have chosen a few players to validate Tam against. I have picked our current left back Callum Davidson, to compare him against and then a few others from around the league. Danny Grainger is there, very close to Lee Wallace, the left back he replaced at Hearts. I also picked Paul Dixon and Izaguirre as examples of different full backs in the league.
Tam, is very much on his own over on the right. This tells me he is not a Davidson, and not a Lee Wallace. He is not going to be a ready made replacement for Davidson, but a very different player. Does it mean he’s going to be rubbish? No, not at all, but he isn’t going to play in the same way that Davidson currently does for the club. Given that Steve Lomas said he wants players who are ‘versatile’ for his small squad, this is probably a good thing.
It is interesting how close Wallace and Grainger are though in this example. Maybe Grainger was bought by Hearts purely cause he was the closest to the outgoing player available. By all accounts, he has done much better for Hearts, than Saints fans thought he did here. It’s obvious from this that Grainger is a square, yet Saints were trying to play him in a position where we required a circle. For Hearts, he fits in the square peg vacated by Wallace, while Davidson is a circle that fits nicely at Saints.
My second one today is of Gregory Tade. As I write this its pure speculation that he’s joining us, but the overwhelming opinion on the forums is WHY? Is he really the right striker for us?
Interesting grouping in this one. I have used a few of the top goalscorers this season, along with our 2011/2012 strikeforce of Sandaza and Sheridan. As you can see SoS are vastly different players, which explains why they linked up so well together. Tade, is very similar to Sheridan, so, on the face of it, he would not be a bad replacement for him, but certainly not a replacement for Sandaza. Worryingly is how similar Hasselbaink, another Saints target is to Tade. While Tade might make a good replacement for Sheridan, it doesn’t look as though Tade and Hasselbaink will make a good partnership, unless, of course, tactics are going to change.
Please bear in mind none of this has actually been tested… it looks good, but it will be interesting to revisit these articles next season and see if they were close to reality. The graphs to do depict a ‘bad’ player. They just categorise players into groupings of what kind of player they are, to give a visualization of what kind of role they are likely to perform, as compared to other players.
As we get more signings I will attempt the same analysis and then will review later on in the year to see how it worked out. Many many thanks to the original poster on the Red and White Kop for the idea and for the thought process behind it. All I did was develop and adapt an algorithm that I think works for the data I have available.