How accurate are xG models II: the ‘Big Chance’ dilemma

xG has become the most-used ‘advanced’ metric in football analysis. Everybody seems to have an xG model nowadays, yet all are different. This begs the question, how good are they? Last year I tried to answer this question by evaluating a bunch of xG models. It got quite some attention and so I felt it would only be right to replicate the test this season.

The results however, were different than expected. These surprising results might hint at a flaw in most xG models that are used at the moment. I will go into this further in the second paragraph, but first I’ll show the results of the contest!

The contest

Let’s first introduce all contestants, and also the additional benchmarks we’ll be testing:

Blog 11 - Table

Methodologies (if online) can be found in the appendix1. Apart from the new contestants, I also decided to include Paul Riley’s model from his blog ‘An xG model for everyone in 20 minutes’ to see how it performed. For an explanation of how the ‘perfect’ model is created, please read my blog from last year. For the ‘All shots equal’ model, I use a value of 0.095 for every shot, as proposed in a Deadspin article a while back.

We’ll be testing the models by calculating the RMSEP of the xG values and the actual amount of goals. RMSEP is a measure used to evaluate the differences between predictions (xG) and outcome (goals). Just like last year, this measure will be used since it allows for comparison with the ‘perfect model’. Penalties and own goals are excluded. For a more exact explanation of methodology, please refer to my blog last year.

Without further ado here are this year’s results:

Blog 11 - Model Comparison OG

The winner is:… Stephen McCarthy!

Newcomers Colin Trainor and Constantinos Chappas take 2nd place. Winner and runner-up from last year Michael Caley and 11tegen11 finish 3th and 4th respectively. Also, all models perform better than the ‘all shots equal’ benchmark. Even the relatively simple model proposed by Paul Riley outperforms the benchmark by quite some difference.

You might’ve noticed something funny though…

How come most of the models perform better than the perfect model does on average? Some models are even outside the 95%-confidence interval. That means the possibility that this season is some sort of outlier is very small. What is going on here?

There are two possible explanations. The first is that the perfect model might not be accurate. The perfect model is based on my xG values. That means that if it’s calculated for a different distribution of xG values, the RSMEP may differ. I tried to test this by taking a few similar looking distributions (sampled from my xG values, and exponential) to see how that would change the average RMSEP. As it turned out, not much changed. Therefore I don’t think this (fully) explains these weird results.

The second explanation seems much more plausible to me. It revolves around OPTA’s Big Chance metric.

The Big Chance Dilemma

A ‘Big Chance’ is a variable that is recorded by OPTA. The idea behind it is that it will be a 1 when a shot is a big chance, and a 0 otherwise. OPTA coders decide after a shot whether it was a ‘Big Chance’ or not, and double check these decisions afterwards.

Most xG models use this variable as it can be used as a proxy for defensive pressure. Since there’s no public tracking data available for most leagues, this seems like a good way to fill the gap of missing information. For instance, ‘Big Chance’ might be able to take into account how many players there are between the ball and the goal.

Sounds good right? Well…

It is very likely that OPTA coders fall prey to something called ‘outcome bias’. Freely transcribed from Wikipedia: “Outcome bias is an error made in evaluating the quality of a [chance] when the outcome of that [chance] is already known.” Basically, when a player converts a chance, it will be more likely for coders to note it as a ‘Big Chance’. On the other hand, when a player messes up, it might in retrospect look like a more difficult chance, and the ‘Big Chance’ label might not be given.

This means that the ‘Big Chance’ metric indirectly includes post-shot information. It can be (roughly) compared to making a model that uses ‘shots on target’ only. Such a model will always perform better than an ‘all shots’ model in the above contest, as they will simply have more information. This makes it impossible to accurately compare models in the way I did above. Some models might use the ‘Big Chance’ metric more, which will give them a bigger advantage. Others might barely use it and end up at the bottom. When looking at the results from earlier this becomes clear:Blog 11 - Model Comparison

Model with a green circle used OPTA’s Big Chance metric. Models with an orange circle used a Big Chance variable but not the one from OPTA. Models with a red circle didn’t use a Big Chance variable at all. The difference in ranking is easily seen.

Apart from putting a huge ‘?’ next to the contest results, it is also questionable if we should want to use the ‘Big Chance’ metric in our models at all.

  1. Descriptive vs. predictive

xG models can be made for several purposes. However, as far as I know, most make them to get a better view of underlying performance. We all know goals are surrounded by a cloud of randomness and xG is a great way to see through that somewhat. That way we can judge player/team performance better. For instance we could say:

  • “Hey the results are bad, but it’s plausible that, seen their good underlying performance, they will improve in the future.”
  • “Hey the results are bad, and so is the underlying performance, they should change something.”

In both cases you make a judgement on team performance based on your projection of how it will be in the future. I feel most of xG figures are used for this purpose.

That being said, it is questionable if the Big Chance metric improves this process. It correlates with goals very well, and thus your xG model will be more apt to predict if a shot was a goal. However, the predictive performance of it might not become better. Taking post shot information might mean overfitting on outcome, rather than accurately measuring process. Whether this is true or not is to be seen, but the fact that I haven’t seen someone write about this yet tells me most people don’t even consider this. (Out of the methodologies I’ve seen only Michael Caley acknowledge this issue and try to correct for it.)

  1. Dependency

The possible impact of the Big Chance variable on predicted xG values has been shown by Jan Mullenberg in a blog he wrote about xG models:

Blog 11 - xgnacstats

What you see above is the predicted xG on the y-axis, and the distance to the goal on the x-axis. The red dots are shots classified as a Big Chance, whereas black dots are not. It is baffling to see that the impact is so big, that a ‘bad’ Big Chance almost always has a higher xG value than a ‘good’ non-Big Chance. An example2 of this impact in real life:

Blog 11 - Comparison

I’m not saying the left chance is definitely a better chance than the right one. However it’s safe to say the difference should not be anywhere close to 0.5. To have a model rely on one subjective variable this much can’t be good.

  1. Inconsistency

[NOTE: From reliable sources I’ve heard that the following paragraph only applies to data that has been scraped. Opta does Big Chances for all matches they keep track of. Therefore the following paragraph only applies to public models, and not to models built on an actual Opta feed.]

Going through my Twitter timeline, I often encounter xG maps of matches I recently watched. Just last week, I noticed something weird. I’d watched the international friendly between The Netherlands and Côte d’Ivoire, in which the Netherlands won 5-0. When looking at the xG plot from @11tegen11, I noticed the xG score for some chances were much lower than I’d expected. After enquiring with Sander it turned out none of the shots from that match were noted as a ‘Big Chance’. This was surprising, as that meant that for instance this shot from Janssen was not considered a ‘Big Chance’.

Blog 11 - Janssen Example

Actually, none of the shots in this match was valued as a ‘Big Chance’, indicating that OPTA might only use the ‘Big Chance’ variable for certain leagues/matches. The problem here is, nobody knows which matches! Using an xG model fitted on ‘Big Chances’ for matches like this will make the total xG score much, much, lower than what it should’ve been. This will hurt predictive power, but also the descriptive part of xG. You might for instance look at Janssen’s performance in the national team and conclude that his xG numbers were weak, even though they should’ve been much higher.

I hope you all agree these are some serious issues surrounding xG models, that I feel aren’t given enough attention. What this is not is a call to stop using the Big Chance variable. What this is, is call to all modelers out there to reevaluate your choices. The Big Chance might seem like a great way to model defensive pressure, but its flaws and the errors it creates might not be worth it.

 

Appendix:

1 Methodologies:

Michael Caley

Sander Ijtsma

Stephen McCarthy

Alfredo Giacobbe (Italian)

Paul Riley: An xG model for everyone in 20 minutes

2 The left example is from the international friendly between England and France, shot by Dembele around the 15 minute mark. The right example is from Gaziantepspor – Besiktas, from the Turkish Süper Lig, somewhere between the 0-4 and the final whistle.

What is the best location to pass from?

Back in May I wrote a blog about ‘xG added’, a metric that measures the value of passes (and dribbles). Basically it assigns a ‘danger’ value to the start and end location of an action. The difference between both values is the ‘xG added’ by the player performing the action. If you want to read a more elaborate explanation, please check this blog.

Lately I’ve been making some improvements to the model, of which I posted the first two parts a few weeks ago:

  1. What is a possession-based model (and why does it matter)?
  2. Measuring dribbling skill

Previously I’ve posted top 10 rankings in xG added, for instance like this one:

xgaddedpl1617t110v2

Some people noted that these rankings tend to be very attacker/attacking midfielder dominated. Aren’t there any defenders good at progressing the ball? Well of course there are, but due to their position on the field it’s harder for them to pass into dangerous locations. This becomes very clear when you see where these dangerous locations are on the pitch:

possessionmodelboateng

In the above picture, the whiter, the more dangerous. For a defender, no matter how good at passing he is, it’s going to be very difficult to pass into these dangerous locations and thus gain xG added. On the other hand, an attacker is always close to this ‘danger zone’, and thus has more opportunities to gain xG added.

As @GoalImpact noted, it therefore might be useful to correct for this difference in playing position. As a start, let’s first have a look at which locations usually cater good ball progression.  We can do this by looking at the average xG added for actions from a certain location.

For instance, in the above picture, Jérôme Boateng has the ball on his own half. On average, what would be the xG added we would expect see for an action from this position? If we do this for all locations on the field, we get the following picture:

averagexgadded

As expected, locations closer to the opponents goal are usually locations where actions yield a higher xG added. However, there are a few more interesting things to see here:

Right in front of goal, the average xG added for an action is pretty low. If a player has the ball at such a location, it will be very hard to improve the location any further. Not a lot of actions take place in this location, and if they do they will have a low xG added.

The areas around all have high xG added. These locations are significantly less dangerous than the spot right in front of goal, but are very close to it and thus make it easy to pass to more dangerous positions.

The most interesting thing however to me, is that it looks like the ‘half spaces’ are the best places to pass from. Half-spaces are certain areas of the field, often used in tactical analysis.

halfspace

They are usually described as good locations, due to the fact that they give a player a lot of passing directions, and due to the fact that they are usually less crowded than the center. If you want to read more about half-spaces, read this excellent piece by Spielverlagerung.

As it turns out, these locations also seem to be the best locations to pass from when looking at xG added. This is not a big surprise, but it’s nice to see a generally accepted theory back in the data. In/around the opposition box, it turns out an action from a location in the half-spaces yields a higher xG added on average than one from the center/side.

Furthermore, in the defensive half and all the way in the offensive corner area, the touch lines should be avoided. xG added from actions close to the side line is lower on average, probably because it’s easier to shield off passing options or to press players that attempt actions from those locations. This again is in line with the tactical idea behind half-spaces.

Influence on players

Now that we know which locations generally result in good xG added scores, we can calculate if players are getting more or less xG added than expected, compared to their action locations. This gives the following picture for the current Premier League season, for players who played more than 500 minutes:

xxgaddedpl2

The black line represents the line where the expected xG added is equal to the actual xG added. That means that players far above the line are performing better than expected, whereas players below the line are performing worse than expected.

NOTE: This metric only measures ball progression skill. Therefore players who tend to be more concentrated on goal scoring don’t necessarily show up in a positive way. This doesn’t mean they’re not good, just that they don’t excel in this metric.

Whether this over/underperformance is due to skill or luck remains to be seen. It’s very much possible that the a large part of the skill measured with xG added is getting into good positions repeatedly, just like with normal xG. It must also be noted that the sample size is relatively small, as all players played a maximum of 10 games.

It’s definitely interesting though that players in deeper position like Kanté and Stones pop-up using this method, even though their positions make it harder for them to perform well in xG added. It might be a method to look for good passers in more defensive roles. For instance, Claudio Bravo tops the goalkeepers by a mile, which feels right.

That’s all for now. Thanks for reading. As usual, if you have any feedback/questions feel free to contact me on Twitter at @NilsMackay. If you want to get into contact you can also send an e-mail to mackayanalytics@gmail.com.

Measuring dribbling skill

Back in May I wrote a blog about ‘xG added’, a metric that measures the value of passes. Basically it assigns a ‘danger’ value to the start and end location of a pass. The difference between both values is the ‘xG added’ by the player making the pass. If you want to read a more elaborate explanation, please check this blog. Lately I’ve been making some improvements to the model, of which I posted the first part last Friday: What is a possession-based model (and why does it matter)?

A big flaw in my previous version of the model, was that dribbles were not incorporated. This meant that if a player dribbled all the way across the pitch and then laid it off to his teammate when close to the goal, he would get almost no credit.

examplexgaddeddribble

For instance, in the example above, player 2 would get no credits for the dribble but only for the pass to player 3 at the end. When you’re evaluating passes exclusively, that’s fine. However, the goal of xG added was to measure how good players are at progressing the ball, and dribbles are a big part of that.

I’ve been wanting to add this to my model for some time, and was re-motivated after reading this great article by Tom Worville (@Worville).

The problem with dribbles is, that they’re not recorded in Opta data. Only when another player is dribbled by, it is recorded as an event, whereas the dribble in the example above would not be recorded (assuming he doesn’t dribble past any players). However, as Michael Caley once said about this in a piece from about a year ago:

“[These dribbles can be] extrapolated from pass locations: When a player receives a pass in one location and then makes a pass from a new location, he can be assumed to have run on the ball from the first point to the second point.”

In other words, we can create these events ourselves. It might not be 100% accurate but it is a close estimation. Using the xG added model, we can now assign the start and end point of the dribble a ‘danger value’, using the possession-based model explained in my last blog. This gives every dribble an xG added value which we can use to assess player and team performance.

On a player level, we can look at the total xG added by dribbles over a certain period of time. For instance, let’s have a look at the best dribblers in the Premier League so far this season. The values are per 90 minutes, and only players with at least 500 minutes are included to make sure the sample size is big enough. This gives the following:

xgwondribblespl1617

The black line in the middle shows the average xG added per 90 for all players with more than 500 minutes. The inclusion of dribblers like Hazard, Coutinho, Iwobi and Sterling in the top 10 are good signs.

One thing to note, is that it’s very hard to stand out in this specific metric. Dribbles can only be so dangerous, as most of the time there will be defenders and/or goalkeepers in the way. For instance, dribbling into the danger zone (zone right in front of goal) is almost impossible, since the longer you will keep the ball, the more players you will attract towards you. Even the best dribbler in the world, Messi, often finds himself in these situations, and every time it happens we see a picture like this:

messidribbling

That’s the end of the road, even for a player like Messi. This makes it hard to gain a lot of xG added by dribbles only, but often dribbling the ball and attracting players gives you better passing options, as you will draw defenders out of position. That means that the effect of good dribbling is probably also indirectly measured in the xG added from passes and through balls. Even so, xG added by dribbles still seems to pick out the top dribblers in the league.

Another thing it seems to pick up, is defenders who tend to dribble into midfield. For instance, this is a plot of the 15 best dribbles by Koscielny:

dribblingkoscielny

With an xG added by dribbles of 0.061 per 90 minutes, Koscielny is well above the average player. Defenders like Kolarov and Otamendi are even higher, but Koscielny is a nice example of a defender doing pretty well in this metric by dribbling into midfield. In contrast, somewhere at the bottom of the ranking we find Robert Huth (0.010 xG added by dribbles per 90), who seems to be afraid to keep the ball at his feet for longer than a few meters at a time:

dribblinghuth

I guess you can say Huth knows what is strengths and weaknesses are pretty well. As you can see this metric offers a way to distinguish between these types of defenders, and reward players who progress the ball in this way.

The complete model

Let’s end with a quick look at the Premier League standings for players in xG added, so including passes etc.:

xgaddedpl1617t110v2

Belgium star De Bruyne leads the pack, which is no surprise. After Guardiola joined City he has really stepped up his game to new levels. Özil is a bit lower than last year, when he was at the top of the table. I’m sure as the season progresses, he’ll find his way back to the top. Other than perhaps Joe Allen, there are not many surprises here. Eden Hazard has gotten a nice bump due to his dribbling skill, just like Silva and Coutinho.

Let’s look at the xG added by teams:

xgaddedpl1617t110v2-teams

Manchester City is leading the pack, closely followed by the trio Liverpool, Chelsea and Arsenal. Manchester United is struggling to keep up with the top 4, both offensively and defensively. Tottenham’s offensive problems are clearly visible as well. Their defensive numbers are top class but they’re having some trouble with chance creation, with lower numbers than Southampton and Bournemouth. Even though they’re only 5 points behind Liverpool at the moment, I don’t see them challenging for the title if they can’t fix their chance creation problem.

Leicester City has been as unconvincing in the numbers as they have been in their results. At the bottom of the table we have a group of 5 teams with terrible defensive numbers. Where Swansea, Sunderland and Hull have at least some offensive output to counter that, WBA and especially Burnley have been dramatic offensively, according to xG added.

Thanks for reading! I hope you enjoyed it. If you want to know how your favorite Premier League player is doing in xG added, send me a tweet and I’ll send you his numbers. Feedback/comments are very welcome as always!

What is a possession-based model (and why does it matter)?

In football analytics the most used advanced metric is ‘Expected Goals’ or ‘xG’, which tries to measure the probability of a shot becoming a goal. It has become so popular even, that when making new metrics, some analysts (like myself) tend to work towards something of the same kind. However, transferring a regular xG model to other areas of research might not always be the best idea, as I will show in this blog.

Back in May,  I introduced the metric ‘xG added’, a metric that measures the value of passes. Basically it assigns a ‘danger’ value to the start and end location of a pass. The difference between both values is the ‘xG added’ by the player making the pass. If you want to read a more elaborate explanation, please check this blog.

Assigning the ‘danger’ value in (for instance) my xG added model, can be done in several ways. At first, it seemed easy to use a regular xG model to estimate these ‘danger’ values. However, as @deepxg (if you don’t follow him already now is the time) noted in some feedback he gave me on Twitter, this might not be the most accurate way to do it. An alternative to using a regular xG model, is using a possession-based model.

So what does this mean? Let me explain what they both mean separately and then compare them. At the end I’ll explain why this matters for my model, and other similar models.

Shots-only model

This is your regular xG model. Shooting from distance is less dangerous to the opposition than shooting from close range. Also, shooting from an angle is less dangerous than shooting from right in front of goal. This is the basic concept behind most xG models, and is also visible when you look at conversion rates by location across the last 4 seasons in the Premier League.

shotconversion

The lighter the colour, the bigger the chance you will score if you shoot from that location. Apparently, once, someone scored a goal from their own box. Goalkeeper Begović from Stoke City managed to surprise his colleague from Southampton after only 13 seconds, back in 2013.

begovic-goal-vs-southampton

Very impressive, but obviously not indicative for the probability a shot from that location will become a goal. That’s why in an xG model, we usually smooth this out, to get something like this:

shotonlymodel

Much better. It might not be a perfect estimation, but it will do for what I’m trying to illustrate in this blog.

Possession-based model

A possession-based model tries to estimate the probability that, given a certain event, the team with the ball will score within the same possession. Let’s look at an example. Say Team A gets to take a corner kick. The model will try to estimate what fraction of corners ends in the goal within the same possession. This goal could be directly from the corner kick, from a direct header resulting from the corner kick, after a short corner and 35 more passes etc. As long as all the events between the current event and the goal are from Team A, it counts.

For the last 4 Premier League seasons, this will give the following image:

possessionconversion

As you can see, this will give higher values on most places on the pitch. This makes sense, as a shot from your own half is unlikely to end up in the goal, whereas a possession in your own half might very well end in a goal within a few moves. Using the same smoothing technique as earlier, we get this:

possessionmodel

At this point you might think: mate, that looks exactly like the previous one. Which would be a great assessment. To see what the difference actually is, let’s have a look at a plot of the difference in probabilities:

differencenonshotsmodel2

This is where it gets interesting. A blue colour means that a shot from that location is more dangerous than an average possession. A red colour means that a shot from that location is less dangerous than an average possession. In other words, in spots with a red colour it’s probably best to keep playing rather than shooting.

Since this is a very bold statement (pun intended), let me note some things. First of all, the exact rendition of the picture above changes significantly as you change the model you use. However, the overall idea will be the same: a red colour for most of the pitch and from sharp angles, and a blue colour directly in front of goal. Second of all, whether it’s better to shoot or not is highly dependent on other factors than location, such as passing options, opponents location etc. It’s not a good idea to use this map as hard proof of when it’s better to shoot or not, however it can be used to illustrate the intuition behind the decision.

We can see that a possession-based model rates a possession on the back line next to the goal much higher than a regular xG model. We can also see that it rates positions around the box higher, in comparison to positions in the own half. On the other hand, it rates positions right in front of goal a lot lower, since it’s not always possible to shoot if you receive the ball at that location.

So what does this mean for my ‘xG added’ model specifically, but also for other similar models? A few things actually:

  1. Passes into the blue area get valued less, compared to the old, shots-only model, as not all completed passes into this area can be converted to a shot.
  2. Passes towards the backline, next to the goal, get valued a lot more. This makes sense as it is still a dangerous location, even if a direct shot is not an option.
  3. Passes towards the area around the box get valued more. In the previous model this would not be the case, as a shot from 25 meters out is about is dangerous as a shot from 50 meters out. It’s a bit more difficult to see in the picture due to the high values near the goal, but the plot is in general more red in the right half than in the left half.

Apart from the differences, why is a possession-based method better for this kind of model than a regular shots-only method?

In this case the answer is: it makes football sense. After all, even if you complete a pass, your teammate might not always be able to get a shot off. Furthermore, a shot might not be the best option for your teammate. A shot-only model won’t account for these situations whereas a possession-based model will.

Thanks for reading! Keep an eye out for my next blog in a few days, in which I will talk about adding dribbles to the model to see who’s the best dribbler in the Premier League, according to xG added.

These are best passers of EURO 2016

About 2 months ago I introduced a new way to measure passing skill, xG added. With the Euros now in its final stages, I figured it was a nice time to see who xG added rates as the best passers.

(Disclaimer: the sample size of an international tournament is very small. Lots of players only play 3 games and opposition is different for teams in different groups. This means that one excellent game will influence your average score quite significantly.)

Let’s start with an example. Elseid Hysaj, right back for Albania and Napoli, puts his team mate in front of the keeper. The xG added of this pass is 0.352. That basically means the xG at the end location is about 0.352 higher than the xG at the start of the pass. This takes into account that the pass is a through ball, which gives the end xG a bump.

Hysaj vs. Zwitserland

As I explained in my last blog, the metric is pretty harsh on center forwards, as they often receive the ball in advanced positions which gives them little to no options to pass forward. Apart from the strikers and goalkeepers though, players from all positions on the field can excel in xG added.

Jerome Boateng

An interesting example of this is Jérôme Boateng, centre back for Germany and Bayern Münich. Ever since Pep Guardiola joined Bayern he has developed into one of the best passing centre backs in the world. The influence of Guardiola is clearly visible when you look at the xG won (positive xG added only) for Boateng before and after Guardiola arrived at Bayern in 2013/14.

JeromeBoateng

It also looks like Boateng changed his passing style into passes over longer distances. This is visible in the graph above as his xG won in passes decreased, but xG won in long balls increased. He has continued to focus on long balls during the Euros, which can be seen when we look at his passing plot.

JeromeBoatengPassingPlotENG

In the above plot, the 25 best passes of Boateng during the Euros are shown. The more visible lines correspond to passes with a higher xG added. The colour indicates the type of pass. Looking at this, it instantly becomes clear he really likes to pass diagonally to the left forward. He’s not a one trick pony however, as he also finds his teammates with ground passes in central positions. Boateng’s best pass according to xG added was this through ball on Özil against Northern Ireland.

Boateng vs. Noord-Ierland LQ

The best passers

Boateng has been excellent this season, which is also shown when we take a look at the best players this tournament according to xG added. All players who have played at least 2 and a half full matches, or 225 minutes, are included. I chose 225 minutes because it will still include players who were eliminated after 3 matches, even if they were substituted just before the end. The data used is from the group stage and the round of sixteen. The scores are xG added per 90 minutes. This gives the following top 10:

EKxGwonENG

Eden Hazard is in the lead, followed by Silva and Özil. Centre backs Jérôme Boateng and Leonardo Bonucci are also included. More surprising names are Shaqiri, Dier and Hysaj (who appeared earlier in this blog). Apart from the fact that they might be lesser known than the others, they’re also the youngest players in the top 10. Shaqiri already appeared in 12th place in xG added in last year’s Premier League season, and is excelling for his country as well. Hysaj gained a lot of xG added because of the great pass I used as an example earlier in this blog. Either way, all three players look legit and will probably appear in lists like these in the future.

Another notable fact is how far Silva is ahead of team mate Iniesta. Iniesta is often praised for his passing skill but this tournament Silva out played him. Where Iniesta was on the top of his game against Czech Republic and Turkey, he was completely invisible against Croatia and Italy. This is easily shown by looking at his xG added figures for the 4 matches.

 

AndresIniestaGbG

Silva’s numbers are somewhat different. He excelled against Czech Republic, where he achieved the highest single match xG added score of the tournament. He couldn’t impress afterwards though against Turkey and Italy. His game against Croatia still pretty decent though, most notably his through ball to Cesc Fabregas which caused the 1-0 was amazing. All in all he ended far above Iniesta mostly because of Iniesta’s disappointing perfomances against Croatia and Italy. His passing plot shows that he regularly found his teammates in advanced positions.

DavidSilvaPassingPlotENG

The best player according to xG won is Eden Hazard. The Belgium winger took some time to get started though. He couldn’t impress in the game against Italy, but has been growing in the tournament ever since.

EdenHazardGbGENG

His game against Hungary was outstanding and only topped by the Silva game against Czech Republic. Apart from a goal and an assist his passing was also extraordinary. His passing plot below is from the game against Hungary only.

EdenHazardPassingPlotvsHungaryENG

Hazard mainly specializes in short ground passes, whereas he avoids crosses in general. This tournament as well as at Chelsea he only delivers 0.3 crosses a game, which is very low for a winger in an elite team. His passes are often over short distances but also in very deep positions and in the general direction of the goal. These passes are often against packed defences which makes in even more impressive he still finds space to give these passes. One of his best passes against Hungary was this through ball to Carrasco.

Hazard vs. Hongarije LQ

It’s a shame he wasn’t able to take his team past Wales, but Chelsea fans will be delighted to see Hazard on top of his game. If he can keep up his form, I’m sure we will see the Eden Hazard who won player of the season in 2015.

Measuring passing skill

With Mesut Özil racking up a total of 18 assists so far this season, it’s clearer than ever that you don’t have to be a goal scorer to be a superstar. The record of 20 assists (Thierry Henry 02/03) might be out of reach with only 2 games to go, but there is no doubt Özil has been excellent for Arsenal this season. Last year the Premier League saw a similar story, in the form of Cesc Fàbregas. With 18 assists he also got close to taking the all-time record, and deservedly got a lot of praise for doing so. One year later however Fàbregas is no longer in the picture, with only 7 assists, putting him on a shared 13th place in the assists ranking. Surely he hasn’t suddenly lost the ability to be an elite playmaker right? Maybe the poor Chelsea season is causing his downfall? Or maybe, maybe assists as a measure of playmaker quality is just as superficial as goals is as a measure of a striker.

So I decided to dive into passing data from Opta to see if I could find a more sustainable metric that quantifies the skills Özil and Fabregas possess. My opinion is that every pass has a certain value, even if the pass doesn’t end up being a shot. Obviously defending midfielders or defenders can also be really good in build-up play, even though they rarely create shooting opportunities for their teammates and thus won’t show up in metrics like assists or key passes.

In general, a pass towards goal adds value to an attack, but is also harder to make. On the other hand, a pass away from goal often takes some danger out of the attack, and is usually easier to complete (obviously this isn’t always true since sometimes a player might be forced to play backwards in order to continue the attack). To try and quantify the value of a pass, I thus decided to assign xG (or Expected Goals, a metric that estimates the probability of a shot) values to the start location and end location of a pass. These values describe the probability a shot would become a goal if it would be taken from that location. In general a pass towards a more dangerous location is a good pass. For the start location I used a simple xG model that only uses location (more specifically distance to goal and angle to both posts) as input. For the end location I included the type of pass*, since a through ball is generally more dangerous than a cross. When you subtract the start xG value from the end xG value, you get the xG that was added by completing the pass. This value can also be negative, for instance if you pass backwards.

Obviously this is only relevant when the pass is actually completed, as when a pass is not completed a team loses possession. A good passer will try to minimize the amount of failed passes while still adding danger to an attack. For that reason for every failed pass a player is given a penalty of 0.01 xG. This number is fairly arbitrary and can be changed. Furthermore, the location where the ball is lost can also be of great importance. When the ball is lost close to the opponent’s corner flag one might argue that this is not a big deal. However when a player loses the ball close to his own goal, the team might have a serious problem. Therefore players are penalized for every failed pass in the form of the xG value for the opponent on the location where the failed pass ended. Obviously this has some overlap with the previous penalty but I think this is no more than fair, as a possession gained from an interception is usually more dangerous than an average possession.

Results

Summing all these values gives the following top 20 for the current season, in xG added per 90 mins. Only players who played more than 1000 minutes are included, which gives 277 players:

Blog5plot1

As we might have expected Mesut Özil is on top, although Alexis Sánchez closely following him might be somewhat of a surprise. Even more surprising we see Cesc Fàbregas in third, albeit quite far behind the first two. According to this metric his passing is still top class even though he hasn’t given as many assists as last year. Reasons for this could include bad luck, but also a position change. Maybe he started playing in a deeper position for Chelsea in comparison to last season. There obviously might also be other explanations for this, and if you have one I’d love to hear it. Whatever the reason this metric shows that his passing skill hasn’t dropped since last season.

The other names that complement the top 10 pass the eye-test quite comfortably, with an honorable mention for PFA Player of the Year Riyad Mahrez in eleventh. The fact that the first defender comes in at place 30 (Héctor Bellerín) and that the highest placed defenders are all full backs, indicates that it might not be useful to compare between different positions. Strikers also don’t perform well in this metric, although that seems logical to me since they often only receive the ball in advanced positions, after which their only options are likely to be as pass back or a shot. This doesn’t necessarily mean that strikers are bad at passing but it does mean that strikers generally don’t progress the ball to more dangerous areas with passes.

Repeatability

This looks nice but a metric only becomes useful when it is repeatable. To test this I calculated the values for the past 14/15 Premier League season, to see if there is a correlation between xG added in two consecutive seasons. In total 173 players played over 1000 minutes in both seasons. When comparing the xG added for those players over the two seasons I got the following results:

Blog5plot2

Colours indicate the position of a player and the size of the dot is bigger for players who played more minutes. This looks like a very strong relationship, with an R-squared of about 0.79. Part of what made this possible is sample size. Since passes occur a lot, especially in comparison to for instance assists, this metric has a lot more certainty than key passes/assists only.

When we only look at the 173 players who played over 1000 minutes in both seasons, it’s amazing to see that from the top 10 in 14/15, an amazing total of 8 players are still in the top 10 in the current season. Furthermore, Riyad Mahrez was already in 15th place last season, even though he was hardly seen as a world class player back then. Similarly, when we look at the current season, I notice some names that are fairly high up that I haven’t heard of too often. For instance, in 14th place this season is Junior Stanislas, a 26 year old winger for Bournemouth this season. He has only had 2 assists this season but my metric suggests he knows what he is doing. Perhaps we’ll see more of him next year. In that way this metric might be used by clubs as a way to scout new players, or to evaluate the passing skill of players they currently have. If a club would’ve known a year ago that Mahrez is a top class player, they could’ve probably gotten him for at least half of what he is worth now. Obviously this is only one player and we have no idea how this will play out in the future, but I’m excited to see how players high up the list this year will perform next year.

I’ll stop here because I don’t want this to become too long. There are many more interesting things to say about this, as we can also see which player adds the most xG through long balls, through balls, crosses etc. Looking at age groups might also give an insight into which players will become elite passers in the future. If you have any suggestions for further research or if you want to know how high a certain player is on the list, contact me on Twitter (@NilsMackay).

This is also the first version of this metric, so chances are there’s plenty of room for improvement. If you have any ideas of how to improve this let me know.

*The types of passes I used for this are normal passes, long balls, through balls and crosses. Set pieces and thrown-ins are removed from the data.

Biases in our xG models

As you may or may not have noticed, in my latest blog I took a shot at quantifying how good our xG models currently are. Today I won’t look at overall performance, but I’ll go more in depth to see if the xG models have certain biases or not. Hopefully this will show where there is still improvement to be made, but also at which parts we’re already pretty good.

Just like last time, to know if the results we achieve are any good, we’ll have to know how they should be if the model were perfect. For this purpose I’ll use a simulation of a ‘perfect’ model (same as last time) as comparison. Basically what I did is I simulated the xG values from my own model to see what the connection between the xG values and actual outcomes should look like. For a more elaborate explanation please check this blog of mine. I will also be including a model that assigns all shots the same value of 0.095, the average conversion rate of a shot. In general we’ll expect any xG model to be better than that simple model, and we aim to approach the ‘perfect’ model.

Home/away bias

The first bias I’ll look at is the home/away biases. Basically what the test will look at is whether the xG models tend to over/underestimate to amount of goals scored in home/away games. What I did for this was simply look at the amount of xG each model assigned to home teams and compare it with how many goals were scored by home teams (similarly for away teams). This gave the following results:

  Goals/game xG/game
home 1.26 1.30 (+0.04)
away 1.05 1.08 (+0.03)

The results are quite encouraging. In the 260 match sample I’m using the models slightly overestimated to numbers of goals that were actually scored, but the differences are minimal. The differences are so minimal that I’m pretty sure they can be called random variance.

The main point however is that the models over/underestimate home and away matches similarly. Individually the models were spread out a bit more, but none had a significant bias towards home or away teams.

Score bias

The other bias I’ll look at is score bias. What that means is that it’s possible some scores are over/under estimated by certain models. For instance, a model might systematically under predict the amount of xG for matches that have big scores like 5-0. On the other hand a model can over predict the amount of xG for matches that end in 0-0.

This will be quite interesting, as I often here people critique single match xG plots when the actual score and the xG score don’t align. People generally quickly start to question the model’s accuracy when the differences between what happened and what the plot says are big.

However, a fundamental thing to understand is that when a match ends in 1-1, we don’t expect the xG score to be 1.0-1.0 on average.

Wait what? Obviously if the xG score for a match is 1.0-1.0, in general the actual outcome we expect will also be 1-1. However the other way around this does not hold.  This might seem weird, but it becomes clearer when we look at matches that end in 0-0. Would we expect the xG score to be 0.0-0.0 as well? No, of course not! The average match that ends in 0-0 will surely have had some chances, so the xG score must be larger than 0.0-0.0. Similarly this is true for matches that end in 1-1 or any other score.

So what xG scores do we expect? Simulations of the ‘perfect model’ gave the following plot:

Blog4plot1

On the y-axis we can see the amount of goals scored by a team, and on the x-axis the average xG that such a team will have scored. The boxplots show 200 simulations of 260 games in the Premier League, and for each simulation the average xG score was taken when x goals were scored.

As we can read from the plot, when a team scores 0 goals in a game, on average we expect it to have about 0.8 xG. If a team scores once, it will have an average of about 1.1 xG. If a team scores twice it we expect it to have about 1.45 xG and when a team scores 3 times about 1.78 xG. The cause of this is that the underlying distribution of xG scores is not uniform. Obviously I don’t know the actual underlying distribution but my own model will serve as an approximation here. Let’s add the models and see how they perform:

Blog4plot2

There’s a lot to see here so let’s start by explaining how this is displayed. On the right we can see the models that were evaluated in my latest blog, with the number corresponding to how they performed (1 is best, 9 is worst). In the plot each number is shown for each row once, and they are scattered a bit so the numbers don’t overlap.

In general what we can see here is that models 1 to 5 are within the area of the boxplot in all 4 cases. This means that the more accurate models also perform better in this bias test. Model 6, 7 and 8 are outside of the boxplot area once or more, whereas the naïve Deadspin model isn’t within the boxplot area a single time. In general what we can take from here is that the better models tend to have smaller biases (as expected), and that simple models might seem decent over large samples but they make consistent errors on smaller samples. One other thing I noticde is that when a team scores 1 goal, all models but one (Caley) overestimate the average amount of xG. Similarly, when a team scores 2 goals, all models but one (Torvaney) underestimate the average amount of xG. Whether this is due to the sample or a structural phenomenon I’m afraid I can’t tell for sure.

Concluding

Not all xG models are the same. When I see a statistic that uses xG I usually just assume it’s correct, while my analysis has shown that especially simpler models can make big systemic errors. I feel like the use of a too simple xG model might lead you to wrong conclusions. On the other hand half of the models I tested fall within the margin of error on every test I did. To me it seems like it is definitely worth it to invest a few extra hours to improve your xG model to get into that category. Simpler models can obviously still be used for analysis, but only if we understand their limitations and communicate this when publishing results.

Thus we come to the end of my 3-part piece on evaluating xG models. This was great fun so maybe it’s an idea to check back in a year or so to see how the state of xG models has changed. If you have any questions feel free to contact me on Twitter (@NilsMackay). If you want to read part 1 and 2:

Part 1: How NOT to evaluate your xG model

Part 2: How good are our xG models?

How good are our xG models?

Expected goals are a difficult metric. Apart from the huge amount of work in takes to create an xG-model, once you’ve got one it’s hard to tell if it’s any good. Most people try to check this by checking if their xG totals for entire seasons are similar to the actual goal totals, mostly using R2. In my latest blog I tried to explain why I think this is a very poor way to evaluate your xG-model. I also hinted at a better way to evaluate an xG-model, something I will explain and apply today. First I’ll explain my methodology and afterwards I’ll apply it to evaluate different xG-models, including a few of the most prominent ones in the community like Michael Caley’s and 11tegen11’s model. Are those models really better than other models? And how close to being perfect are they? If you’re only interested in the results please skip the next paragraph.

Methodology

Let’s first start with the methodology. One of the critiques I had against the full season R2 plots was that a lot of information was lost by summing the entire season together. Therefore I decided to look at single match totals. As far as I know this is the smallest sample at which we look at xG-values, and by looking at single matches rather than seasons we have a lot more data points. The method I will be using to compare the xG-scores to the actual scores is the root-mean-square error percentage (RMSEP). The standard root-mean-squared error is a very simple statistic that measures the differences between your predictions and the actual outcomes. The root-mean-square error percentage (also known as the coefficient of variation of the RMSE) is a normalized version of this that I’ll be using so it’s possible to compare it with my ‘perfect’ model, a theoretical model which I explained in my last blog. The exact formula for the root-mean-square error percentage is:

CodeCogsEqn(1)

There might be better or more suitable metrics out there but I think this is still reasonably easy and understandable/reproducible, and it’s a theoretically sound metric.

However, the exact value of this metric alone might not give much insight as it’s quite technical. So we’re going to need something to compare the results with. For this purpose, I decided to include two extra ‘models’ in my evaluation, which try to describe the upper and lower bound of performances. For the lower bound (lower is better) I’ll be using the ‘perfect’ model described in my latest blog. As an upper bound I’ll be using the model explained in an infamous Deadspin article, which assigns every shot an xG-value of exactly 0.095. The idea behind this is that if an xG-model can’t do better than that then really what’s the point of making one, so it creates a nice upper bound.

The results are in…

First a short introduction into the models that will be used in this evaluation:

  1. Nils Mackay, my own model. The start of the methodology can be found here, although I have greatly improved it since.
  2. Michael Caley (@MC_of_A). Numbers taken from the xG-plots on his Twitter account. Methodology here.
  3. 11tegen11 (@11tegen11). Methodology here.
  4. FootballStatistics (@stats4footy), no methodology available.
  5. @SteMc74. Methodology here.
  6. Willy Banjo (@bertinbertin). Methodology here.
  7. SciSports, a Dutch start-up company. (@SciSportsNL). Methodology online soon.
  8. Ben (@Torvaney). Ben uses a model that only uses x,y location and whether it was a header or not. You can create your own numbers using his model here.

Now for the results:

Blog3plot2

And the winner is…. Michael Caley!

What we see above is the RMSEP for every model (the white dot) for the set of games I used (the first 260 games in this year’s Premier League). On the top you see the ‘perfect’ model, which is a boxplot of 200 simulations I did. Basically, due to variation, a simulation can be relatively more in line with the xG-values, or less. If actual scores are ‘luckier’ or ‘less expected’ than the RMSEP becomes higher, and vice versa. What we can see is that (over a sample of 260 games) the RMSEP of a ‘perfect’ model varies about 0.08 in both directions. To kind of illustrate this ‘confidence interval’ I added the blue lines for all other models, which are basically just the observed value (white dot) plus or minus 0.08. Mind you these are not actual ‘confidence intervals’ as I don’t have those. These confidence intervals have to be seen in comparison with the ‘perfect’ model only though, as different outcomes of games would roughly affect all models similarly. Therefore I think the ranking of the models above is basically what it will be in any sample of 260 games or more.

What’s really surprising is that Michael Caley’s model performs almost as good as the ‘perfect’ model. This indicates that his estimations are really good and don’t have much room for improvement. This is somewhat surprising as the general consensus is that positional data will improve xG-models by a lot. My analysis shows that, although there’s still room for improvement, it won’t really matter that much (for xG-models).

Following by a decent margin we find 11tegen11 in second place and FootballStatistics in third. In fourth and fifth we find @SteMc74 and my own model, closely trailed by Willy Banjo’s model in sixth. In seventh and eighth we find the model by SciSports and Ben’s model. All the way at the back we find the ‘upper bound’, the Deadspin model. As somewhat expected by myself this model performs very poorly on single matches and it’s ‘confidence interval’ doesn’t even touch the worst simulation of the ‘perfect’ model.

So what can we take from here? First of all, even a simple model like Ben’s is a lot better than just counting shots. Second of all, creating a good xG-model can be really hard, but it is not impossible. Caley’s model is living proof that it’s possible to create a model that’s close to a ‘perfect’ model, even without using positional data.

I’ve done some additional analysis that looks at whether the models have certain biases. I feel like this article will get too long if I add it here, so I’ll write something about that in a week or so. Great thanks to FootballStatistics (@stats4footy) for his work in this article. Also great thanks to all the modelers who were so kind to provide data for this analysis.

(NOTE: I decided not to include Paul Riley’s (@footballfactman) model in the analysis. His model looks at xG2, while all the other models in this analysis look at xG1. The main difference between them is that xG2 assigns a value of 0 for blocked shots and shots off target, while xG1 doesn’t look at what happens to a shot. The implications if this is that there are fewer shots to be given a xG-value, which will lead to a smaller RMSEP due to lower variance. I figured comparing his model with the rest would be like comparing apples and pears. For who’s interested, his RMSEP was 0.81, very close but slightly behind Caley’s model.)

(NOTE 2: If you wish to reproduce these results please note that the actual value of the RMSEP for a model varies significantly between different samples. This is due to different amount/quality of shots in the games used. So if you want to see how your own model is doing, you’re going to have to use the first 260 games of the Premier League 15-16, or you’ll have to get data from all modelers for a different sample.)