What is the best location to pass from?

Back in May I wrote a blog about ‘xG added’, a metric that measures the value of passes (and dribbles). Basically it assigns a ‘danger’ value to the start and end location of an action. The difference between both values is the ‘xG added’ by the player performing the action. If you want to read a more elaborate explanation, please check this blog.

Lately I’ve been making some improvements to the model, of which I posted the first two parts a few weeks ago:

  1. What is a possession-based model (and why does it matter)?
  2. Measuring dribbling skill

Previously I’ve posted top 10 rankings in xG added, for instance like this one:


Some people noted that these rankings tend to be very attacker/attacking midfielder dominated. Aren’t there any defenders good at progressing the ball? Well of course there are, but due to their position on the field it’s harder for them to pass into dangerous locations. This becomes very clear when you see where these dangerous locations are on the pitch:


In the above picture, the whiter, the more dangerous. For a defender, no matter how good at passing he is, it’s going to be very difficult to pass into these dangerous locations and thus gain xG added. On the other hand, an attacker is always close to this ‘danger zone’, and thus has more opportunities to gain xG added.

As @GoalImpact noted, it therefore might be useful to correct for this difference in playing position. As a start, let’s first have a look at which locations usually cater good ball progression.  We can do this by looking at the average xG added for actions from a certain location.

For instance, in the above picture, Jérôme Boateng has the ball on his own half. On average, what would be the xG added we would expect see for an action from this position? If we do this for all locations on the field, we get the following picture:


As expected, locations closer to the opponents goal are usually locations where actions yield a higher xG added. However, there are a few more interesting things to see here:

Right in front of goal, the average xG added for an action is pretty low. If a player has the ball at such a location, it will be very hard to improve the location any further. Not a lot of actions take place in this location, and if they do they will have a low xG added.

The areas around all have high xG added. These locations are significantly less dangerous than the spot right in front of goal, but are very close to it and thus make it easy to pass to more dangerous positions.

The most interesting thing however to me, is that it looks like the ‘half spaces’ are the best places to pass from. Half-spaces are certain areas of the field, often used in tactical analysis.


They are usually described as good locations, due to the fact that they give a player a lot of passing directions, and due to the fact that they are usually less crowded than the center. If you want to read more about half-spaces, read this excellent piece by Spielverlagerung.

As it turns out, these locations also seem to be the best locations to pass from when looking at xG added. This is not a big surprise, but it’s nice to see a generally accepted theory back in the data. In/around the opposition box, it turns out an action from a location in the half-spaces yields a higher xG added on average than one from the center/side.

Furthermore, in the defensive half and all the way in the offensive corner area, the touch lines should be avoided. xG added from actions close to the side line is lower on average, probably because it’s easier to shield off passing options or to press players that attempt actions from those locations. This again is in line with the tactical idea behind half-spaces.

Influence on players

Now that we know which locations generally result in good xG added scores, we can calculate if players are getting more or less xG added than expected, compared to their action locations. This gives the following picture for the current Premier League season, for players who played more than 500 minutes:


The black line represents the line where the expected xG added is equal to the actual xG added. That means that players far above the line are performing better than expected, whereas players below the line are performing worse than expected.

NOTE: This metric only measures ball progression skill. Therefore players who tend to be more concentrated on goal scoring don’t necessarily show up in a positive way. This doesn’t mean they’re not good, just that they don’t excel in this metric.

Whether this over/underperformance is due to skill or luck remains to be seen. It’s very much possible that the a large part of the skill measured with xG added is getting into good positions repeatedly, just like with normal xG. It must also be noted that the sample size is relatively small, as all players played a maximum of 10 games.

It’s definitely interesting though that players in deeper position like Kanté and Stones pop-up using this method, even though their positions make it harder for them to perform well in xG added. It might be a method to look for good passers in more defensive roles. For instance, Claudio Bravo tops the goalkeepers by a mile, which feels right.

That’s all for now. Thanks for reading. As usual, if you have any feedback/questions feel free to contact me on Twitter at @NilsMackay. If you want to get into contact you can also send an e-mail to mackayanalytics@gmail.com.

Measuring dribbling skill

Back in May I wrote a blog about ‘xG added’, a metric that measures the value of passes. Basically it assigns a ‘danger’ value to the start and end location of a pass. The difference between both values is the ‘xG added’ by the player making the pass. If you want to read a more elaborate explanation, please check this blog. Lately I’ve been making some improvements to the model, of which I posted the first part last Friday: What is a possession-based model (and why does it matter)?

A big flaw in my previous version of the model, was that dribbles were not incorporated. This meant that if a player dribbled all the way across the pitch and then laid it off to his teammate when close to the goal, he would get almost no credit.


For instance, in the example above, player 2 would get no credits for the dribble but only for the pass to player 3 at the end. When you’re evaluating passes exclusively, that’s fine. However, the goal of xG added was to measure how good players are at progressing the ball, and dribbles are a big part of that.

I’ve been wanting to add this to my model for some time, and was re-motivated after reading this great article by Tom Worville (@Worville).

The problem with dribbles is, that they’re not recorded in Opta data. Only when another player is dribbled by, it is recorded as an event, whereas the dribble in the example above would not be recorded (assuming he doesn’t dribble past any players). However, as Michael Caley once said about this in a piece from about a year ago:

“[These dribbles can be] extrapolated from pass locations: When a player receives a pass in one location and then makes a pass from a new location, he can be assumed to have run on the ball from the first point to the second point.”

In other words, we can create these events ourselves. It might not be 100% accurate but it is a close estimation. Using the xG added model, we can now assign the start and end point of the dribble a ‘danger value’, using the possession-based model explained in my last blog. This gives every dribble an xG added value which we can use to assess player and team performance.

On a player level, we can look at the total xG added by dribbles over a certain period of time. For instance, let’s have a look at the best dribblers in the Premier League so far this season. The values are per 90 minutes, and only players with at least 500 minutes are included to make sure the sample size is big enough. This gives the following:


The black line in the middle shows the average xG added per 90 for all players with more than 500 minutes. The inclusion of dribblers like Hazard, Coutinho, Iwobi and Sterling in the top 10 are good signs.

One thing to note, is that it’s very hard to stand out in this specific metric. Dribbles can only be so dangerous, as most of the time there will be defenders and/or goalkeepers in the way. For instance, dribbling into the danger zone (zone right in front of goal) is almost impossible, since the longer you will keep the ball, the more players you will attract towards you. Even the best dribbler in the world, Messi, often finds himself in these situations, and every time it happens we see a picture like this:


That’s the end of the road, even for a player like Messi. This makes it hard to gain a lot of xG added by dribbles only, but often dribbling the ball and attracting players gives you better passing options, as you will draw defenders out of position. That means that the effect of good dribbling is probably also indirectly measured in the xG added from passes and through balls. Even so, xG added by dribbles still seems to pick out the top dribblers in the league.

Another thing it seems to pick up, is defenders who tend to dribble into midfield. For instance, this is a plot of the 15 best dribbles by Koscielny:


With an xG added by dribbles of 0.061 per 90 minutes, Koscielny is well above the average player. Defenders like Kolarov and Otamendi are even higher, but Koscielny is a nice example of a defender doing pretty well in this metric by dribbling into midfield. In contrast, somewhere at the bottom of the ranking we find Robert Huth (0.010 xG added by dribbles per 90), who seems to be afraid to keep the ball at his feet for longer than a few meters at a time:


I guess you can say Huth knows what is strengths and weaknesses are pretty well. As you can see this metric offers a way to distinguish between these types of defenders, and reward players who progress the ball in this way.

The complete model

Let’s end with a quick look at the Premier League standings for players in xG added, so including passes etc.:


Belgium star De Bruyne leads the pack, which is no surprise. After Guardiola joined City he has really stepped up his game to new levels. Özil is a bit lower than last year, when he was at the top of the table. I’m sure as the season progresses, he’ll find his way back to the top. Other than perhaps Joe Allen, there are not many surprises here. Eden Hazard has gotten a nice bump due to his dribbling skill, just like Silva and Coutinho.

Let’s look at the xG added by teams:


Manchester City is leading the pack, closely followed by the trio Liverpool, Chelsea and Arsenal. Manchester United is struggling to keep up with the top 4, both offensively and defensively. Tottenham’s offensive problems are clearly visible as well. Their defensive numbers are top class but they’re having some trouble with chance creation, with lower numbers than Southampton and Bournemouth. Even though they’re only 5 points behind Liverpool at the moment, I don’t see them challenging for the title if they can’t fix their chance creation problem.

Leicester City has been as unconvincing in the numbers as they have been in their results. At the bottom of the table we have a group of 5 teams with terrible defensive numbers. Where Swansea, Sunderland and Hull have at least some offensive output to counter that, WBA and especially Burnley have been dramatic offensively, according to xG added.

Thanks for reading! I hope you enjoyed it. If you want to know how your favorite Premier League player is doing in xG added, send me a tweet and I’ll send you his numbers. Feedback/comments are very welcome as always!

What is a possession-based model (and why does it matter)?

In football analytics the most used advanced metric is ‘Expected Goals’ or ‘xG’, which tries to measure the probability of a shot becoming a goal. It has become so popular even, that when making new metrics, some analysts (like myself) tend to work towards something of the same kind. However, transferring a regular xG model to other areas of research might not always be the best idea, as I will show in this blog.

Back in May,  I introduced the metric ‘xG added’, a metric that measures the value of passes. Basically it assigns a ‘danger’ value to the start and end location of a pass. The difference between both values is the ‘xG added’ by the player making the pass. If you want to read a more elaborate explanation, please check this blog.

Assigning the ‘danger’ value in (for instance) my xG added model, can be done in several ways. At first, it seemed easy to use a regular xG model to estimate these ‘danger’ values. However, as @deepxg (if you don’t follow him already now is the time) noted in some feedback he gave me on Twitter, this might not be the most accurate way to do it. An alternative to using a regular xG model, is using a possession-based model.

So what does this mean? Let me explain what they both mean separately and then compare them. At the end I’ll explain why this matters for my model, and other similar models.

Shots-only model

This is your regular xG model. Shooting from distance is less dangerous to the opposition than shooting from close range. Also, shooting from an angle is less dangerous than shooting from right in front of goal. This is the basic concept behind most xG models, and is also visible when you look at conversion rates by location across the last 4 seasons in the Premier League.


The lighter the colour, the bigger the chance you will score if you shoot from that location. Apparently, once, someone scored a goal from their own box. Goalkeeper Begović from Stoke City managed to surprise his colleague from Southampton after only 13 seconds, back in 2013.


Very impressive, but obviously not indicative for the probability a shot from that location will become a goal. That’s why in an xG model, we usually smooth this out, to get something like this:


Much better. It might not be a perfect estimation, but it will do for what I’m trying to illustrate in this blog.

Possession-based model

A possession-based model tries to estimate the probability that, given a certain event, the team with the ball will score within the same possession. Let’s look at an example. Say Team A gets to take a corner kick. The model will try to estimate what fraction of corners ends in the goal within the same possession. This goal could be directly from the corner kick, from a direct header resulting from the corner kick, after a short corner and 35 more passes etc. As long as all the events between the current event and the goal are from Team A, it counts.

For the last 4 Premier League seasons, this will give the following image:


As you can see, this will give higher values on most places on the pitch. This makes sense, as a shot from your own half is unlikely to end up in the goal, whereas a possession in your own half might very well end in a goal within a few moves. Using the same smoothing technique as earlier, we get this:


At this point you might think: mate, that looks exactly like the previous one. Which would be a great assessment. To see what the difference actually is, let’s have a look at a plot of the difference in probabilities:


This is where it gets interesting. A blue colour means that a shot from that location is more dangerous than an average possession. A red colour means that a shot from that location is less dangerous than an average possession. In other words, in spots with a red colour it’s probably best to keep playing rather than shooting.

Since this is a very bold statement (pun intended), let me note some things. First of all, the exact rendition of the picture above changes significantly as you change the model you use. However, the overall idea will be the same: a red colour for most of the pitch and from sharp angles, and a blue colour directly in front of goal. Second of all, whether it’s better to shoot or not is highly dependent on other factors than location, such as passing options, opponents location etc. It’s not a good idea to use this map as hard proof of when it’s better to shoot or not, however it can be used to illustrate the intuition behind the decision.

We can see that a possession-based model rates a possession on the back line next to the goal much higher than a regular xG model. We can also see that it rates positions around the box higher, in comparison to positions in the own half. On the other hand, it rates positions right in front of goal a lot lower, since it’s not always possible to shoot if you receive the ball at that location.

So what does this mean for my ‘xG added’ model specifically, but also for other similar models? A few things actually:

  1. Passes into the blue area get valued less, compared to the old, shots-only model, as not all completed passes into this area can be converted to a shot.
  2. Passes towards the backline, next to the goal, get valued a lot more. This makes sense as it is still a dangerous location, even if a direct shot is not an option.
  3. Passes towards the area around the box get valued more. In the previous model this would not be the case, as a shot from 25 meters out is about is dangerous as a shot from 50 meters out. It’s a bit more difficult to see in the picture due to the high values near the goal, but the plot is in general more red in the right half than in the left half.

Apart from the differences, why is a possession-based method better for this kind of model than a regular shots-only method?

In this case the answer is: it makes football sense. After all, even if you complete a pass, your teammate might not always be able to get a shot off. Furthermore, a shot might not be the best option for your teammate. A shot-only model won’t account for these situations whereas a possession-based model will.

Thanks for reading! Keep an eye out for my next blog in a few days, in which I will talk about adding dribbles to the model to see who’s the best dribbler in the Premier League, according to xG added.

These are best passers of EURO 2016

About 2 months ago I introduced a new way to measure passing skill, xG added. With the Euros now in its final stages, I figured it was a nice time to see who xG added rates as the best passers.

(Disclaimer: the sample size of an international tournament is very small. Lots of players only play 3 games and opposition is different for teams in different groups. This means that one excellent game will influence your average score quite significantly.)

Let’s start with an example. Elseid Hysaj, right back for Albania and Napoli, puts his team mate in front of the keeper. The xG added of this pass is 0.352. That basically means the xG at the end location is about 0.352 higher than the xG at the start of the pass. This takes into account that the pass is a through ball, which gives the end xG a bump.

Hysaj vs. Zwitserland

As I explained in my last blog, the metric is pretty harsh on center forwards, as they often receive the ball in advanced positions which gives them little to no options to pass forward. Apart from the strikers and goalkeepers though, players from all positions on the field can excel in xG added.

Jerome Boateng

An interesting example of this is Jérôme Boateng, centre back for Germany and Bayern Münich. Ever since Pep Guardiola joined Bayern he has developed into one of the best passing centre backs in the world. The influence of Guardiola is clearly visible when you look at the xG won (positive xG added only) for Boateng before and after Guardiola arrived at Bayern in 2013/14.


It also looks like Boateng changed his passing style into passes over longer distances. This is visible in the graph above as his xG won in passes decreased, but xG won in long balls increased. He has continued to focus on long balls during the Euros, which can be seen when we look at his passing plot.


In the above plot, the 25 best passes of Boateng during the Euros are shown. The more visible lines correspond to passes with a higher xG added. The colour indicates the type of pass. Looking at this, it instantly becomes clear he really likes to pass diagonally to the left forward. He’s not a one trick pony however, as he also finds his teammates with ground passes in central positions. Boateng’s best pass according to xG added was this through ball on Özil against Northern Ireland.

Boateng vs. Noord-Ierland LQ

The best passers

Boateng has been excellent this season, which is also shown when we take a look at the best players this tournament according to xG added. All players who have played at least 2 and a half full matches, or 225 minutes, are included. I chose 225 minutes because it will still include players who were eliminated after 3 matches, even if they were substituted just before the end. The data used is from the group stage and the round of sixteen. The scores are xG added per 90 minutes. This gives the following top 10:


Eden Hazard is in the lead, followed by Silva and Özil. Centre backs Jérôme Boateng and Leonardo Bonucci are also included. More surprising names are Shaqiri, Dier and Hysaj (who appeared earlier in this blog). Apart from the fact that they might be lesser known than the others, they’re also the youngest players in the top 10. Shaqiri already appeared in 12th place in xG added in last year’s Premier League season, and is excelling for his country as well. Hysaj gained a lot of xG added because of the great pass I used as an example earlier in this blog. Either way, all three players look legit and will probably appear in lists like these in the future.

Another notable fact is how far Silva is ahead of team mate Iniesta. Iniesta is often praised for his passing skill but this tournament Silva out played him. Where Iniesta was on the top of his game against Czech Republic and Turkey, he was completely invisible against Croatia and Italy. This is easily shown by looking at his xG added figures for the 4 matches.



Silva’s numbers are somewhat different. He excelled against Czech Republic, where he achieved the highest single match xG added score of the tournament. He couldn’t impress afterwards though against Turkey and Italy. His game against Croatia still pretty decent though, most notably his through ball to Cesc Fabregas which caused the 1-0 was amazing. All in all he ended far above Iniesta mostly because of Iniesta’s disappointing perfomances against Croatia and Italy. His passing plot shows that he regularly found his teammates in advanced positions.


The best player according to xG won is Eden Hazard. The Belgium winger took some time to get started though. He couldn’t impress in the game against Italy, but has been growing in the tournament ever since.


His game against Hungary was outstanding and only topped by the Silva game against Czech Republic. Apart from a goal and an assist his passing was also extraordinary. His passing plot below is from the game against Hungary only.


Hazard mainly specializes in short ground passes, whereas he avoids crosses in general. This tournament as well as at Chelsea he only delivers 0.3 crosses a game, which is very low for a winger in an elite team. His passes are often over short distances but also in very deep positions and in the general direction of the goal. These passes are often against packed defences which makes in even more impressive he still finds space to give these passes. One of his best passes against Hungary was this through ball to Carrasco.

Hazard vs. Hongarije LQ

It’s a shame he wasn’t able to take his team past Wales, but Chelsea fans will be delighted to see Hazard on top of his game. If he can keep up his form, I’m sure we will see the Eden Hazard who won player of the season in 2015.

Measuring passing skill

With Mesut Özil racking up a total of 18 assists so far this season, it’s clearer than ever that you don’t have to be a goal scorer to be a superstar. The record of 20 assists (Thierry Henry 02/03) might be out of reach with only 2 games to go, but there is no doubt Özil has been excellent for Arsenal this season. Last year the Premier League saw a similar story, in the form of Cesc Fàbregas. With 18 assists he also got close to taking the all-time record, and deservedly got a lot of praise for doing so. One year later however Fàbregas is no longer in the picture, with only 7 assists, putting him on a shared 13th place in the assists ranking. Surely he hasn’t suddenly lost the ability to be an elite playmaker right? Maybe the poor Chelsea season is causing his downfall? Or maybe, maybe assists as a measure of playmaker quality is just as superficial as goals is as a measure of a striker.

So I decided to dive into passing data from Opta to see if I could find a more sustainable metric that quantifies the skills Özil and Fabregas possess. My opinion is that every pass has a certain value, even if the pass doesn’t end up being a shot. Obviously defending midfielders or defenders can also be really good in build-up play, even though they rarely create shooting opportunities for their teammates and thus won’t show up in metrics like assists or key passes.

In general, a pass towards goal adds value to an attack, but is also harder to make. On the other hand, a pass away from goal often takes some danger out of the attack, and is usually easier to complete (obviously this isn’t always true since sometimes a player might be forced to play backwards in order to continue the attack). To try and quantify the value of a pass, I thus decided to assign xG (or Expected Goals, a metric that estimates the probability of a shot) values to the start location and end location of a pass. These values describe the probability a shot would become a goal if it would be taken from that location. In general a pass towards a more dangerous location is a good pass. For the start location I used a simple xG model that only uses location (more specifically distance to goal and angle to both posts) as input. For the end location I included the type of pass*, since a through ball is generally more dangerous than a cross. When you subtract the start xG value from the end xG value, you get the xG that was added by completing the pass. This value can also be negative, for instance if you pass backwards.

Obviously this is only relevant when the pass is actually completed, as when a pass is not completed a team loses possession. A good passer will try to minimize the amount of failed passes while still adding danger to an attack. For that reason for every failed pass a player is given a penalty of 0.01 xG. This number is fairly arbitrary and can be changed. Furthermore, the location where the ball is lost can also be of great importance. When the ball is lost close to the opponent’s corner flag one might argue that this is not a big deal. However when a player loses the ball close to his own goal, the team might have a serious problem. Therefore players are penalized for every failed pass in the form of the xG value for the opponent on the location where the failed pass ended. Obviously this has some overlap with the previous penalty but I think this is no more than fair, as a possession gained from an interception is usually more dangerous than an average possession.


Summing all these values gives the following top 20 for the current season, in xG added per 90 mins. Only players who played more than 1000 minutes are included, which gives 277 players:


As we might have expected Mesut Özil is on top, although Alexis Sánchez closely following him might be somewhat of a surprise. Even more surprising we see Cesc Fàbregas in third, albeit quite far behind the first two. According to this metric his passing is still top class even though he hasn’t given as many assists as last year. Reasons for this could include bad luck, but also a position change. Maybe he started playing in a deeper position for Chelsea in comparison to last season. There obviously might also be other explanations for this, and if you have one I’d love to hear it. Whatever the reason this metric shows that his passing skill hasn’t dropped since last season.

The other names that complement the top 10 pass the eye-test quite comfortably, with an honorable mention for PFA Player of the Year Riyad Mahrez in eleventh. The fact that the first defender comes in at place 30 (Héctor Bellerín) and that the highest placed defenders are all full backs, indicates that it might not be useful to compare between different positions. Strikers also don’t perform well in this metric, although that seems logical to me since they often only receive the ball in advanced positions, after which their only options are likely to be as pass back or a shot. This doesn’t necessarily mean that strikers are bad at passing but it does mean that strikers generally don’t progress the ball to more dangerous areas with passes.


This looks nice but a metric only becomes useful when it is repeatable. To test this I calculated the values for the past 14/15 Premier League season, to see if there is a correlation between xG added in two consecutive seasons. In total 173 players played over 1000 minutes in both seasons. When comparing the xG added for those players over the two seasons I got the following results:


Colours indicate the position of a player and the size of the dot is bigger for players who played more minutes. This looks like a very strong relationship, with an R-squared of about 0.79. Part of what made this possible is sample size. Since passes occur a lot, especially in comparison to for instance assists, this metric has a lot more certainty than key passes/assists only.

When we only look at the 173 players who played over 1000 minutes in both seasons, it’s amazing to see that from the top 10 in 14/15, an amazing total of 8 players are still in the top 10 in the current season. Furthermore, Riyad Mahrez was already in 15th place last season, even though he was hardly seen as a world class player back then. Similarly, when we look at the current season, I notice some names that are fairly high up that I haven’t heard of too often. For instance, in 14th place this season is Junior Stanislas, a 26 year old winger for Bournemouth this season. He has only had 2 assists this season but my metric suggests he knows what he is doing. Perhaps we’ll see more of him next year. In that way this metric might be used by clubs as a way to scout new players, or to evaluate the passing skill of players they currently have. If a club would’ve known a year ago that Mahrez is a top class player, they could’ve probably gotten him for at least half of what he is worth now. Obviously this is only one player and we have no idea how this will play out in the future, but I’m excited to see how players high up the list this year will perform next year.

I’ll stop here because I don’t want this to become too long. There are many more interesting things to say about this, as we can also see which player adds the most xG through long balls, through balls, crosses etc. Looking at age groups might also give an insight into which players will become elite passers in the future. If you have any suggestions for further research or if you want to know how high a certain player is on the list, contact me on Twitter (@NilsMackay).

This is also the first version of this metric, so chances are there’s plenty of room for improvement. If you have any ideas of how to improve this let me know.

*The types of passes I used for this are normal passes, long balls, through balls and crosses. Set pieces and thrown-ins are removed from the data.

Biases in our xG models

As you may or may not have noticed, in my latest blog I took a shot at quantifying how good our xG models currently are. Today I won’t look at overall performance, but I’ll go more in depth to see if the xG models have certain biases or not. Hopefully this will show where there is still improvement to be made, but also at which parts we’re already pretty good.

Just like last time, to know if the results we achieve are any good, we’ll have to know how they should be if the model were perfect. For this purpose I’ll use a simulation of a ‘perfect’ model (same as last time) as comparison. Basically what I did is I simulated the xG values from my own model to see what the connection between the xG values and actual outcomes should look like. For a more elaborate explanation please check this blog of mine. I will also be including a model that assigns all shots the same value of 0.095, the average conversion rate of a shot. In general we’ll expect any xG model to be better than that simple model, and we aim to approach the ‘perfect’ model.

Home/away bias

The first bias I’ll look at is the home/away biases. Basically what the test will look at is whether the xG models tend to over/underestimate to amount of goals scored in home/away games. What I did for this was simply look at the amount of xG each model assigned to home teams and compare it with how many goals were scored by home teams (similarly for away teams). This gave the following results:

  Goals/game xG/game
home 1.26 1.30 (+0.04)
away 1.05 1.08 (+0.03)

The results are quite encouraging. In the 260 match sample I’m using the models slightly overestimated to numbers of goals that were actually scored, but the differences are minimal. The differences are so minimal that I’m pretty sure they can be called random variance.

The main point however is that the models over/underestimate home and away matches similarly. Individually the models were spread out a bit more, but none had a significant bias towards home or away teams.

Score bias

The other bias I’ll look at is score bias. What that means is that it’s possible some scores are over/under estimated by certain models. For instance, a model might systematically under predict the amount of xG for matches that have big scores like 5-0. On the other hand a model can over predict the amount of xG for matches that end in 0-0.

This will be quite interesting, as I often here people critique single match xG plots when the actual score and the xG score don’t align. People generally quickly start to question the model’s accuracy when the differences between what happened and what the plot says are big.

However, a fundamental thing to understand is that when a match ends in 1-1, we don’t expect the xG score to be 1.0-1.0 on average.

Wait what? Obviously if the xG score for a match is 1.0-1.0, in general the actual outcome we expect will also be 1-1. However the other way around this does not hold.  This might seem weird, but it becomes clearer when we look at matches that end in 0-0. Would we expect the xG score to be 0.0-0.0 as well? No, of course not! The average match that ends in 0-0 will surely have had some chances, so the xG score must be larger than 0.0-0.0. Similarly this is true for matches that end in 1-1 or any other score.

So what xG scores do we expect? Simulations of the ‘perfect model’ gave the following plot:


On the y-axis we can see the amount of goals scored by a team, and on the x-axis the average xG that such a team will have scored. The boxplots show 200 simulations of 260 games in the Premier League, and for each simulation the average xG score was taken when x goals were scored.

As we can read from the plot, when a team scores 0 goals in a game, on average we expect it to have about 0.8 xG. If a team scores once, it will have an average of about 1.1 xG. If a team scores twice it we expect it to have about 1.45 xG and when a team scores 3 times about 1.78 xG. The cause of this is that the underlying distribution of xG scores is not uniform. Obviously I don’t know the actual underlying distribution but my own model will serve as an approximation here. Let’s add the models and see how they perform:


There’s a lot to see here so let’s start by explaining how this is displayed. On the right we can see the models that were evaluated in my latest blog, with the number corresponding to how they performed (1 is best, 9 is worst). In the plot each number is shown for each row once, and they are scattered a bit so the numbers don’t overlap.

In general what we can see here is that models 1 to 5 are within the area of the boxplot in all 4 cases. This means that the more accurate models also perform better in this bias test. Model 6, 7 and 8 are outside of the boxplot area once or more, whereas the naïve Deadspin model isn’t within the boxplot area a single time. In general what we can take from here is that the better models tend to have smaller biases (as expected), and that simple models might seem decent over large samples but they make consistent errors on smaller samples. One other thing I noticde is that when a team scores 1 goal, all models but one (Caley) overestimate the average amount of xG. Similarly, when a team scores 2 goals, all models but one (Torvaney) underestimate the average amount of xG. Whether this is due to the sample or a structural phenomenon I’m afraid I can’t tell for sure.


Not all xG models are the same. When I see a statistic that uses xG I usually just assume it’s correct, while my analysis has shown that especially simpler models can make big systemic errors. I feel like the use of a too simple xG model might lead you to wrong conclusions. On the other hand half of the models I tested fall within the margin of error on every test I did. To me it seems like it is definitely worth it to invest a few extra hours to improve your xG model to get into that category. Simpler models can obviously still be used for analysis, but only if we understand their limitations and communicate this when publishing results.

Thus we come to the end of my 3-part piece on evaluating xG models. This was great fun so maybe it’s an idea to check back in a year or so to see how the state of xG models has changed. If you have any questions feel free to contact me on Twitter (@NilsMackay). If you want to read part 1 and 2:

Part 1: How NOT to evaluate your xG model

Part 2: How good are our xG models?

How good are our xG models?

Expected goals are a difficult metric. Apart from the huge amount of work in takes to create an xG-model, once you’ve got one it’s hard to tell if it’s any good. Most people try to check this by checking if their xG totals for entire seasons are similar to the actual goal totals, mostly using R2. In my latest blog I tried to explain why I think this is a very poor way to evaluate your xG-model. I also hinted at a better way to evaluate an xG-model, something I will explain and apply today. First I’ll explain my methodology and afterwards I’ll apply it to evaluate different xG-models, including a few of the most prominent ones in the community like Michael Caley’s and 11tegen11’s model. Are those models really better than other models? And how close to being perfect are they? If you’re only interested in the results please skip the next paragraph.


Let’s first start with the methodology. One of the critiques I had against the full season R2 plots was that a lot of information was lost by summing the entire season together. Therefore I decided to look at single match totals. As far as I know this is the smallest sample at which we look at xG-values, and by looking at single matches rather than seasons we have a lot more data points. The method I will be using to compare the xG-scores to the actual scores is the root-mean-square error percentage (RMSEP). The standard root-mean-squared error is a very simple statistic that measures the differences between your predictions and the actual outcomes. The root-mean-square error percentage (also known as the coefficient of variation of the RMSE) is a normalized version of this that I’ll be using so it’s possible to compare it with my ‘perfect’ model, a theoretical model which I explained in my last blog. The exact formula for the root-mean-square error percentage is:


There might be better or more suitable metrics out there but I think this is still reasonably easy and understandable/reproducible, and it’s a theoretically sound metric.

However, the exact value of this metric alone might not give much insight as it’s quite technical. So we’re going to need something to compare the results with. For this purpose, I decided to include two extra ‘models’ in my evaluation, which try to describe the upper and lower bound of performances. For the lower bound (lower is better) I’ll be using the ‘perfect’ model described in my latest blog. As an upper bound I’ll be using the model explained in an infamous Deadspin article, which assigns every shot an xG-value of exactly 0.095. The idea behind this is that if an xG-model can’t do better than that then really what’s the point of making one, so it creates a nice upper bound.

The results are in…

First a short introduction into the models that will be used in this evaluation:

  1. Nils Mackay, my own model. The start of the methodology can be found here, although I have greatly improved it since.
  2. Michael Caley (@MC_of_A). Numbers taken from the xG-plots on his Twitter account. Methodology here.
  3. 11tegen11 (@11tegen11). Methodology here.
  4. FootballStatistics (@stats4footy), no methodology available.
  5. @SteMc74. Methodology here.
  6. Willy Banjo (@bertinbertin). Methodology here.
  7. SciSports, a Dutch start-up company. (@SciSportsNL). Methodology online soon.
  8. Ben (@Torvaney). Ben uses a model that only uses x,y location and whether it was a header or not. You can create your own numbers using his model here.

Now for the results:


And the winner is…. Michael Caley!

What we see above is the RMSEP for every model (the white dot) for the set of games I used (the first 260 games in this year’s Premier League). On the top you see the ‘perfect’ model, which is a boxplot of 200 simulations I did. Basically, due to variation, a simulation can be relatively more in line with the xG-values, or less. If actual scores are ‘luckier’ or ‘less expected’ than the RMSEP becomes higher, and vice versa. What we can see is that (over a sample of 260 games) the RMSEP of a ‘perfect’ model varies about 0.08 in both directions. To kind of illustrate this ‘confidence interval’ I added the blue lines for all other models, which are basically just the observed value (white dot) plus or minus 0.08. Mind you these are not actual ‘confidence intervals’ as I don’t have those. These confidence intervals have to be seen in comparison with the ‘perfect’ model only though, as different outcomes of games would roughly affect all models similarly. Therefore I think the ranking of the models above is basically what it will be in any sample of 260 games or more.

What’s really surprising is that Michael Caley’s model performs almost as good as the ‘perfect’ model. This indicates that his estimations are really good and don’t have much room for improvement. This is somewhat surprising as the general consensus is that positional data will improve xG-models by a lot. My analysis shows that, although there’s still room for improvement, it won’t really matter that much (for xG-models).

Following by a decent margin we find 11tegen11 in second place and FootballStatistics in third. In fourth and fifth we find @SteMc74 and my own model, closely trailed by Willy Banjo’s model in sixth. In seventh and eighth we find the model by SciSports and Ben’s model. All the way at the back we find the ‘upper bound’, the Deadspin model. As somewhat expected by myself this model performs very poorly on single matches and it’s ‘confidence interval’ doesn’t even touch the worst simulation of the ‘perfect’ model.

So what can we take from here? First of all, even a simple model like Ben’s is a lot better than just counting shots. Second of all, creating a good xG-model can be really hard, but it is not impossible. Caley’s model is living proof that it’s possible to create a model that’s close to a ‘perfect’ model, even without using positional data.

I’ve done some additional analysis that looks at whether the models have certain biases. I feel like this article will get too long if I add it here, so I’ll write something about that in a week or so. Great thanks to FootballStatistics (@stats4footy) for his work in this article. Also great thanks to all the modelers who were so kind to provide data for this analysis.

(NOTE: I decided not to include Paul Riley’s (@footballfactman) model in the analysis. His model looks at xG2, while all the other models in this analysis look at xG1. The main difference between them is that xG2 assigns a value of 0 for blocked shots and shots off target, while xG1 doesn’t look at what happens to a shot. The implications if this is that there are fewer shots to be given a xG-value, which will lead to a smaller RMSEP due to lower variance. I figured comparing his model with the rest would be like comparing apples and pears. For who’s interested, his RMSEP was 0.81, very close but slightly behind Caley’s model.)

(NOTE 2: If you wish to reproduce these results please note that the actual value of the RMSEP for a model varies significantly between different samples. This is due to different amount/quality of shots in the games used. So if you want to see how your own model is doing, you’re going to have to use the first 260 games of the Premier League 15-16, or you’ll have to get data from all modelers for a different sample.)

How NOT to evaluate your xG model

Expected goals is a complex metric. Not only because it is difficult to calculate, but mostly because the models are very hard to evaluate. This is something I realized after recently creating my own Expected Goals (xG) model. (For those who are unaware what xG means; it’s a metric describing the probability a certain shot will end up being a goal. The simplest example for this is a penalty, which has an xG of about 0.75. In other words, about 3 out of 4 penalties are scored.) I soon realized it is very hard to determine how good my model really was, as I had nothing to compare it with.

Therefore I first set out on making a benchmark. In this case that benchmark would be a perfect xG model, so it is possible to ask yourself: how close is my model to being perfect? *(What does a ‘perfect’ xG even mean? I discuss this in the appendix since it is quite technical.)

How can you possibly have a perfect xG model?

I don’t. If I would I probably wouldn’t even have to write this article. It is however fairly simple to find out how a perfect model would perform. I might not know the exact xG values for all shots during this BPL season, but let’s assume I do know them for this BPL season in an alternate universe (stick with me). Let’s just for now assume that the xG values I calculated for this BPL season were actually 100% correct (which they are definitely not). The only thing we miss now are actual results in this ‘second universe’, so to gather these all I had to do was simulate all matches once using the ‘perfect’ xG values. This is also known as a Monte Carlo simulation. This gives one possible outcome of all matches. Let’s look at a 4-shot example:

table3At the left we can see that in reality, out of these shots only shot 1 was scored. Next to that we can see the xG value my model assigned to those shots. Simulating these xG values gave the results on the right. These simulated goals now have the xG values as their true underlying probabilities. In other words, these xG values are ‘perfect’ and the ‘simulated goals’ on the right is one of the possible outcomes.

Whatever measure we are going use, we can always check how a ‘perfect’ model would perform to compare. In the rest of this article, I will use this method as a benchmark.

R-squared is really, really not ok

When checking around the web what methods were used, I was surprised to see that R2 was the most common way of evaluating whether an xG model was any good. In most cases I would see a plot similar to this:


What we see here is the amount of goals all 20 teams in the 2014/2015 Premier League scored, and the sum of the Expected Goals a model of mine assigned to all shots taken by that team. Next I applied a linear regression which gave a R2 of 0.807. This sounds great!

But it isn’t. For several reasons:

  1. Information loss
    By summing all the xG values over a season, we lost a huge amount of data. We started with around 10000 points but reduced it to 20. Furthermore, this only gives us a sample size of 20, which is way too small.
  1. Is 0.807 even good?
    How good is this R2 figure really? Apart from the fact that the small sample size probably means that the value relies heavily on variation, it is also not as good as it sounds. If we simply count the shots a team attempts in a season and plot it against the goals scored, in this specific example you’ll get an R2 of 0.712! Over large samples like we use in this example, the xG per shot tends to be pretty similar for all teams, meaning the xG values you calculated won’t improve your results by much. Even more shockingly, a single simulation of the ‘perfect’ model gave a R2 of 0.755, which is lower than what our model achieved. Obviously over a larger sample of shots it will outperform my xG model, but the fact that it doesn’t here shows how unreliable these numbers are. The variance over such a small sample size appears to be so big, that we really can’t say anything useful about this R2 value.
  1. It’s theoretically wrong
    R2 measures how much variation of the response variable (actual goals) is explained by the decision variable (xG). To do this it finds a linear function that is the best fit. The line in the above example is:

Actual goals = -6.18 + 1.11 * xG

    This is NOT what we try to model when we create xG. The idea behind xG is that 1 xG is worth exactly 1 actual goal, which is not what is assumed by the linear regression method. For example, using the above formula we would expect to score 38 goals when we score 40 xG. This is clearly not what we aim to measure when using an xG model.

Go on then smartass, what metric should we use?

I have to admit that although I know R2 is wrong, I’m not sure what the best way is to evaluate xG models. Personally I believe a good way to evaluate an xG model is by looking at smaller samples than entire seasons. One could for instance look at single match totals of xG values and actual outcomes. That will make the influence of individual xG estimations much bigger, while single matches usually are the smallest sample in which we actively look at xG.

In an upcoming blog I will explain this method. Furthermore I will evaluate a set of xG models from i.a. Michael Caley, SciSports, @SteMc74, myself and more, using this method. Then we’ll finally know how close we are to a perfect xG model and which one is closest. If you want to participate with your own xG model please contact me on Twitter (@NilsMackay).

*Appendix (What is a perfect xG value?)

Since xG attempts to predict whether a shot ends up in the goal or not, one might say that a perfect xG model takes into account all possible variables. It would take into account things like: wind speed, wind direction, keeper positioning, the keeper’s reaction time, the way the ball is hit etc. However, such a model would perfectly predict whether a shot will become a goal or not and therefore only return values of 1 and 0. In other words for every shot it would say either: “Yes, this shot will become a goal” or “No, this shot will not become a goal”. Such a model would return the same xG values as the amount of actual goals scored in a match, which would be rather useless. The purpose of xG is (in my opinion) not to predict right before a shot gets taken whether it becomes a goal or not. The way in which it is used is to assess the quality of a chance and thereby the quality of a team’s performance.

Therefore I prefer to look at xG as the probability a shot becomes a goal when the given player tries to score from that exact situation. This will give answers like: “If Messi tried this shot from this exact situation 100 times, he would probably score 24 times”, which would correspond with a 0.24 xG value. In this article, I assumed this definition of an xG.


Introducing my Expected Goals model

While browsing the web in March 2014 I stumbled over an article by Sander Ijtsma (@11tegen11) and Michiel de Hoog (@MichielDeHoog) explaining why Lex Immers, at the time a regular starter in the midfield of Feyenoord, was one of the best players in the Eredivisie. At the moment, that was a very controversial statement, as Immers was known to blow huge chances regularly. Many Feyenoord fans even blamed him for Feyenoord not winning the title that season, as Feyenoord finished second behind Ajax with only a four point difference. The general consensus was that Immers was not good enough for a team like Feyenoord.

The article however gave a different view of reality, as it introduced me to the concept of Expected Goals (xG), a measure which quantifies how big a chance is. Basically, every shot on goal is given a value between 0 and 1, illustrating the probability that the shot will end up in the back of the net. The article showed that Immers was actually not blowing chances at all, but was scoring as much as expected. Furthermore it showed that Immers was actually very good at creating chances for a midfielder as well. The more analytic way of looking at football resonated with me, probably also partly because I was at the time (and am currently) a student in Business Analytics. After reading the article I started following the football analytics community (mostly based on Twitter), very interested in what it had to offer.

Soon after, I decided I wanted to play around with the data myself, so I wouldn’t be dependent on answers of other people for my questions. My first step was to create my own xG-model. I have to admit it took longer than I expected at first, but the first version of it is now done. That is also what this article is about. I will explain my methodology, which is different from those of the models I’ve seen so far, and test to see if it is actually doing what it suggests: predicting the probability a certain shot will end up in the back of the net.


The variables I used to predict the xG-values are the following:

  • Shot location
  • Whether it’s a header or a shot
  • Whether it’s a penalty or not
  • Whether it’s an own goal or not

Obviously there are many more factors which influence the chance of a shot going in, such as assist type, positioning of defenders and many more. Especially assist type is something I might pick up later, but for now I tried to keep it simple.

All models I know calculate the influence of shot location by dividing it into several factors such as distance to goal, and angle to goal (some use even more). Although it might be a good approximation, to me it sounded like a very complex way to compute the influence of location. The problem is that the goal posts make the distribution of values across the field very complex. For instance, a shot from 10 cm on the outside of the goalpost on the goal line will have an xG-value of practically zero, whereas the xG-value for a shot from 10 cm on the inside of the goalpost on the goal line will be about 1. This makes the exact values very hard to approximate by using angle and distance only.

To me it sounded more logical and precise to calculate the probability of a goal for a shot from a certain location by literally counting how many shots were taken from that exact position and counting how many of them ended up in the goal. Thus I divided the football pitch into squares of about a square meter, by making 100 squares in the length of the field and 50 squares in the width. Doing this for shots only gives the following field, in which a white corresponds with high xG-values and red with low xG-values:


This is a mess. Even though I’ve used 10 seasons worth of shots (about 80,000 shots) for this, the sample size seems to be too small as the differences between neighboring squares is too big at certain locations. Furthermore, lucky long shots screw up the values for locations far from goal, as not many shots are attempted from such range. This makes the influence of one lucky goal very big on the resulting xG-value. The plot for headers was very similar.

To fix this, I decided to calculate the xG-value of a position by looking at all surrounding squares. This increases the sample size significantly, apart from the fact that it makes sense intuitively. The probability to score from a certain location won’t change significantly if you move less than 1 meter from that position. The actual shots from the square itself were given some extra weight.

This still has some issues. Lucky goals from long shots still have a huge influence on the xG-value for that square. Furthermore, by also counting the squares around the actual square, the problem with the goalposts arises again. To solve this, I decided that squares that didn’t have a minimum amount of shots taken from them and goals scored from them, would be set equal to the minimum xG-value, which is about 0.017. This means that no matter from where a shot is attempted it will always have at least an xG-value of 0.017. The idea behind this is that players will only attempt a shot if they think it’ll have at least a certain probability of ending up in the goal.

This sounds very specific, but really all it does is eliminate weird xG-values for squares with a too small sample size, and eliminate incorrect xG-values for squares from which there was never scored before. I think it’s safe to say that if in a sample of 80000 shots there isn’t a single goal scored from a certain location, the xG-value for that position is probably not that big.

The updated field, for shots only, then looks like this:


That looks better. You can clearly see that if the angle becomes too sharp, the chance to score drops immensely. The lucky long shots are accounted for, and the probability a shot becomes a goal rises quickly as you approach the goal. On the goal line it is nearly 100%. The field for headers is slightly different and generally gives lower values, but it looks quite similar.

Does it work though?

Although it looks pretty, the question that arises immediately is: does it work? Or in other words:

  • Does it correlate well with the actual scoring chances within the sample data?

And more importantly:

  • Is it able to predict the probability a shot outside the sample will end up in the goal?

Let’s start with the first one. To check if the model even agrees with the sample data, I grouped shots in small bins which are determined by xG-value. For example, if a shot is given an xG-value of 0.12 it is put in the bin that contains shots with values between 0.1 and 0.2. Next I calculated the average of the xG-values in the bin, and calculated the number of those shots that actually became a goal in real life. This gave the following table:


Just to clarify, the actualxG values are the percentages of shots within the bin that were scored. The modelxG is the average xG-value that the model assigned to those shots. Thus the values within the modelxG column are by definition within the range of the bin, which is not necessarily true for the values in the actualxG column.

Once again the effect of sample size is easily visible. The bins that contain the most of the shots have the highest accuracy. I’m pretty happy with the overall results. For most of the shots the actual bin value and the model bin value are less than 0.3% apart. For more rare shots this increases slightly, but the percentage differences for those shots is still fairly small. Notice that all 26 shots in the bin for values between 0.9 and 1 are scored thus far. This is likely a ‘hot streak’, as a 100% chance practically doesn’t exist. The model rates the average xG-value for those shots at around 94%.

The fact that the model’s values are close to the actual probabilities was to be expected. The model itself uses the number of times a shot went into the goal from that position. The fact that the values are similar doesn’t say that much, apart from the fact that the model describes its own sample adequately. More interesting is to see if the model is able to predict the chance a shot will become a goal for shots outside of the sample. The sample I used for this model are the seasons 12/13 and 13/14 for all 5 major leagues (Premier League (ENG), La Liga (ESP), Bundesliga (GER), Serie A (ITA) and Ligue 1 (FRA)). To see if the model has predictive value, I will do a similar test as above, except for the fact that the shots will be from the season 14/15 for those 5 leagues. These shots are not used to make the model. The table now looks as follows:


I’m very happy with the results. The difference for the small chances increased slightly, but is still below a 1% difference for most shots. As we can see the chances within the 0.9-1 bin are not all scored this time, as we expected.

Obviously these figures aren’t perfect. For instance, let’s look at the 0.1-0.2 bin. The model predicted the shots to be scored at around 14.1% of the time. If we would simulate all the 9886 shots within this bin using that probability, chances are about 1394 of them would be scored. In reality, only 1315 of those shots were scored. A simple binomial test shows us that if the probability of 14.1% is correct, the probability that 1315 or less goals would occur, is practically zero. That’s solid proof that the model isn’t perfect, but that’s also not the point of what I did. The model I created only uses location and some very basic things to calculate the scoring probability, while in real life the scoring probability is obviously dependent on more variables. It does however give a very decent estimate and is easy to understand. The addition of more variables should improve the results even more.

Hope you enjoyed! If you did, please share. My next blog will be expanding on this subject. It’s my first blog so any comments/advice/feedback would be appreciated. If you find a flaw in my reasoning or calculations please let me know!