In the following article, we take a deep dive into expected data from all 20 Premier League clubs with the aim of predicting where each might finish in the coming season.
When I was looking into some data on expected goal metrics trying to find relevant information for the good of FPL, I thought to myself “Why not take it one step further?” Why does every bit of research we do need to revolve around FPL? There must be some aspects concerning the Premier League that are less covered yet thought-provoking nonetheless. The pool of information we have and try to exploit or address to make FPL more simple is enormous. How about something else?
In this article an in-depth analysis will be given of the likeliest results of the upcoming Premier League season based on expected metrics. We will look at historical data such as previous stats of promoted teams and the average over- and underperformance of expected goals by each Premier League team.
The gradual return of fans to the football stadium will be discussed and expected metrics numbers might even be translated into actual goals. And that could lead us to an early sight of a possible title winner and the teams most in danger of relegation from the Premier League.
1. How did each Premier League team perform based on non-penalty expected goal numbers?
Going by the logic that the most recent events and developments should present us with the best data to use going forward, the first thing that will be looked at are historical but recent non-penalty expected goal numbers. This is considered as one of the best ways to measure a team’s attack or defence.
To predict similar future events, numbers of the last three seasons will be taken into account. Any data previous to that period will be close to irrelevant due to the rapid changes in modern football, whilst data of last season only leaves predictions vulnerable to outliers.
Each of the last three seasons will receive a certain weighting. In these calculations the 20/21 season is weighted three times higher than both the 19/20 and 18/19 season. Based on that, the numbers will be translated into next season’s non-penalty expected goal numbers.
Why non-penalty numbers, you might ask. Whilst receiving more penalties on average could be a player’s or even a team’s quality, there isn’t any existing mathematical relationship between a team receiving or conceding a particular number of penalties one season and the next.
We’ve seen that during the last couple of seasons these numbers fluctuate noticeably. In addition to that, penalties skew the picture when measuring a certain team’s attack, which is a bit nugatory in these calculations.
2. How do we fit in Norwich City, Watford and Brentford?
Obviously, when trying to predict next seasons’ Premier League table Fulham, WBA and Sheffield United need to be replaced with Norwich, Watford and Brentford. However, if the numbers of the relegated sides are replaced by those Championship numbers of Norwich, Watford and Brentford, the table won’t add up.
Not only will the three promoted sides likely perform worse in the Premier League than they have done in the Championship, but also the averages displayed on both sides of the table won’t be equal. The numbers of Norwich, Watford and Brentford will be adjusted accordingly first before the averages will be leveled.
To do that, the change in numbers of promoted sides in previous seasons comes into the equation. In the past four seasons 12 Championship sides have found their way to the Prem. How did they fare in their first season of top-flight football and thus, how much did their numbers change?
As you can see, non-penalty expected goals against per 90 increased by an average of 40% and non-penalty expected goals scored per 90 decreased by an average of 32%. If we apply these averages to the Championship numbers of Norwich, Watford and Brentford in the last three seasons, the picture becomes much more feasible already.
If we take these adjusted numbers and put them into place, the undesirable problem we face is that averages on both metrics don’t match. The average of non-penalty expected goals against per 90 becomes around 1,183 whilst the average of non-penalty expected goals scored per 90 becomes close to 1,219. These numbers require further adjustment, as when one team produces or scores a particular amount another team should concede the exact same amount.
To make these averages match each other, we will again look at the averages of the past three Premier League seasons to decide what is the most likely non-penalty expected goals scored number per 90. During last season the average was 1,19. In 19/20 the average was 1,2285 and in 18/19 the average was 1,221. Again, the most recent season counts three times and both the other seasons once, which puts the weighted average on 1,2039 non-penalty expected goals scored per 90 per team in the Premier League.
Each team’s baseline xGA must be multiplied by 1,2039 divided by the average xGA and each team’s baseline xG must be multiplied by 1,2039 divided by the average xG to have matching xG and xGA values. Even if there’s nothing to suggest that overall expected goal volume (or better: chance creation) stays around the same as it has been for the past three seasons, it is the most sensible trend to go by. This provides us with the newest expected metrics for the 21/22 Premier League season.
3. How do we continue and predict single non-penalty expected goal results?
It gets really interesting when these numbers are converted into single league results, but another factor must be taken into account. As from the start of the 21/22 season football stadiums are set to return to full capacity. Due to COVID-19 we’ve seen empty stadiums become the norm, resulting in more away wins than home wins for the first time in Premier League history. However, as from the start of next season fans will be allowed back into the stadiums which will likely lead to the return of the HFA, Home Field Advantage.
Home Field Advantage is an important factor to consider when predicting results for the 21/22 campaign, because obviously teams playing at home will have more of an edge over their opposition than they had last time out. But what exact value can be assigned to it?
To determine that, we’ll look at the numbers of the last four full seasons with fans. This means that both 20/21 and 19/20, due to Project Restart without fans, will be completely disregarded in this aspect. The start of 19/20 can’t be taken into consideration either. It’s only fair to do so when all teams have played each other both home and away, otherwise it would skew the results.
The average Home Field Advantage over the last four full seasons with fans, still based on non-penalty expected goals, has been just over 1,111. As it needs to be specified as being relative to the average it is calculated by the average home xG divided by the average total xG.
This number is the home rate of non-penalty expected goals which means that on average, for every expected goal a team creates on neutral ground, the same team would have created 1,111 times that goal on home soil. Obviously, the opposite number applies to the away side.
For every expected goal a team creates on neutral ground, the same team would have created 0,889 times that goal away from home, which is the away rate relative to the average. We can now continue to convert the expected goal metrics into single league results by using the HFA and AFD (Away Field Disadvantage).
Example: we want to calculate the non-penalty xG result of Leeds United against Newcastle United. Goals scored by Leeds must be multiplied by the goals against rate of Newcastle as well as the HFA. Goals scored by Newcastle must be multiplied by the goals against rate of Leeds as well as the AFD.
4. Can we go one step further and predict non-penalty actual goal results?
How do we define the difference between expected goals scored and actual goals scored? By over– and underperformance. When we dive into historical numbers again, we find out that these two can fluctuate a lot year-to-year and rarely persist one season to another.
This means using a more modest estimate of finishing skill is recommendable. On the other hand, we must also try to keep the average over- or underperformance balanced and in case more seasons are involved, the over- or underperformance will be affected by multiple different players. It’s a judgment call.
The weighted average (baseline) of non-penalty expected goals was calculated by counting 20/21 three times and 19/20 and 18/19 once. A similar method will be applied to the over- and underperformance numbers. Firstly, the simpler part of calculating the numbers of teams that have been in the Premier League in each of the last three seasons will be done.
Secondly, to be able to calculate the baseline over- and underperformance of current Premier League teams that have been in the Championship during one or more of the past three seasons, their Championship numbers need to be adjusted to Premier League level once again.
A team can create chances in the Championship and have those adjusted to Premier League numbers, it probably won’t score as many goals from a similar expected number due to the better quality in both defence and attack of the Premier League. Hence, we need the average increase and decrease in expected goal numbers (40% and 32%, see step 2) as well as the average increase and decrease in actual goal numbers.
In conclusion with regards to Championship numbers:
– non-penalty xGA/90 must be multiplied by an increase of 40%
– non-penalty GA/90 must be multiplied by an increase of 57%
– non-penalty xG/90 must be multiplied by a decrease of 32%
– non-penalty G/90 must be multiplied by a decrease of 41%
Combined with the earlier baselines for over- and underperformance these numbers lead to the following table:
Now it’s just a case of multiplying non-penalty xG results by over- and underperformance numbers of both sides. To give an example, we revert to the previous example of Leeds United against Newcastle United which led us to a non-penalty xG result of 1,65 – 0,91 in favour of Leeds.
Unfortunately, a football team has never scored actual goals in decimals. The question arises if and how it’s possible to actually allocate goals with integers. Would rounding down all non-penalty actual goals in decimals to the nearest integer show us a good picture? No, since the total in goals scored would be understated and would turn out to be necessarily less than usual.
Would rounding down or up all non-penalty actual goals in decimals to the nearest integer show us a good picture? No, even though it could be the single most probable outcome it would provide us with unfair results both in single fixtures as over the course of a season.
5. How do we estimate league points?
Regarding the methods that have been used this far it is impossible to rationally allocate actual goals in integers. Despite that, there’s still a possibility using these goal volumes in decimals to calculate accurate probabilities of each team to win, lose, or drawn a single match and eventually ending up with a league table after 38 weeks.
To do that we will be using the Poisson distribution. This is a discrete probability distribution that expresses the probability of a given number of events (difference in goals scored) occurring in a fixed interval of volume (total goals scored) if these events occur with a known mean rate (non-penalty goals in decimals) and independently of the volume since the last event.
To estimate the probabilities of Leeds United win, lose, or draw at home to Newcastle United we must calculate the probabilities of Leeds scoring 0, 1, 2, 3, 4, 5, 6 and 6+ goals. The same must be done for Newcastle. Each probability of a result that is a Leeds win must be summed up to estimate the chances of a Leeds win. Again, the same must be done for a Newcastle win, and a draw.
Based on our findings the probability of Leeds United winning 4-1 at home to Newcastle United would be the likelihood of Leeds scoring four goals multiplied by the likelihood of Newcastle scoring a single goal. It would be 6,2% multiplied by 36,8% which results in 2,3%.
The probability of a Leeds win against Newcastle would be the likelihood of Leeds scoring a single goal multiplied by the likelihood of Newcastle scoring none plus the likelihood of Leeds scoring two goals multiplied by the likelihood of Newcastle scoring none or one plus the likelihood of Leeds scoring three goals multiplied by the likelihood of Newcastle scoring none, one or two, etc.
6. The Premier League 21/22 Table
Now we know how big of a chance teams have to win certain matches, we can conclude this study by creating the league table. Each probability of a win must be multiplied by three points and each probability of a draw must be multiplied by a single point.
The season hasn’t started yet, but it’s quite possible that we are going to call the Premier League a one-team league once again. Based on historical expected goal values, the effect of and on promoted teams, the return of fans into the stadiums and historical over- and underperformance, Manchester City are expected to finish on top with quite a margin.
Besides creating the highest expected goals by far (only Liverpool come close), they’re also fourth best for overperforming that particular metric. In the crosstable that shows the likelihood of a team scoring four goals in any match you can clearly see the percentages of City being much higher than all others.
Tottenham are perhaps the biggest surprise, coming in third. Harry Kane’s destination will obviously have a big impact on what is really going to happen there. With him, Spurs have overperformed both expected metrics greater than any other team.
These overperformances certainly affected the outcome of actual goals. However, events such as player transfers and managerial changes, and thus changes in a team’s style of play including changes in formation, are practically impossible to cover in computations like these.
Therefore, these events, for example Kane moving to Manchester City, have been left out of the conversation. On the other hand, you could say that the general ability of a team has been measured and is covered in the process, especially since last season is weighted several times. It’s very uncommon for a team to look entirely different than one or two years ago which is why weighted averages of the last three seasons form the basis of the conversation.
Lastly, it looks like the relegation scrap is going to be exciting. Four teams are expected to finish within two points of each other and another four sides look to just about better them with three, four, or five points. Watford are the only promoted team that are not expected to be relegated straight away. Their overperformance on expected goals against being the main reason for that.
More to come
We hope you enjoyed the article and hopefully it will help your planning ahead of the 2021/22 Fantasy Premier League season. We have plenty more lined up for pre season and will be publishing advice, team reveals and more strategy guides right up until the big kickoff.
Be sure to keep your eyes on the FPL Connect twitter page for all our latest releases, this season promises to be our biggest yet.
We’ve also launched our ‘custom mini-league’ site, a huge innovation on the current mini-league format. To celebrate, we’re organising a £1000 cash prize league. For more information see the graphic above and follow this link for details on how to enter.
*Starting data taken from FBref and Infogol*