PLAYER RATINGS

10.09.2017

COMPOSING ONE IN-GAME STATISTIC-BASED VALUE PER PLAYER

Today, we are provided with a huge amount of statistics for players in top competitions. We have shots, passes, interceptions, tacklings, aerial duels, blocks and more statistics, but it is still to determine, which of these stats really matter and which of them are more important than others. Additionally, all these stats have different averages and ranges, which makes it even harder to compare players. In my opinion, it would be ideal to have one value for a player to describe his quality. This value could be displayed in the lineup before the start of a match to show spectators, how good the players of the two teams are. Certainly, there are already ratings for players out there, most of them coming in the form of grades. However, these ratings are usually not transparent at all, meaning we do not know how these ratings are composed. Moreover, a lot of them seem to be influenced by opinions more than statistics. As a consequence, I decided to create player ratings solely based on real in-game statistics. But how can you compose these ratings? At first, we have to see in which states a player can be during a match.

The second possible state is, that he is not having the ball while his team is in possession. This is certainly harder to analyze, since there is no accessible data about offensive off-the-ball positioning of a player on the pitch. Therefore, we just have to assume, that if a player has a good position on the pitch, he will receive the ball from his teammates.

The third possible state is his positioning on the pitch while the opposing team is in possession of the ball. Again, there is no accessible data on the defensive positioning of a player on the pitch, so we have to assume, that the more statistically tracked defensive actions like blocks, tackles and interceptions a player has, the better his defensive positioning is.

1. Expected goals: Expected goals is about the valuation of a shot under the consideration of important criteria, which determine to which degree a shot turns to a goal in average. If you want more information on my expected goals model, see this article for more information: Expected goals

2. Passing: To depict the passing ability of a player, I have created an aggregated passing value, which considers and weights publicly available passing statistics. I have to admit though, that this is the most improvable part of the ratings since I currently do not possess information on how many players were outplayed by a pass and how many expected assists a player had. As soon as I have data on these, I will consider it in the ratings.

3. Possession control: A metric I created to display, how often a player loses possession in relation to the passes and shots he takes. Every statistical analysis I made, that investigated influential factors on success in a football match showed, that this is a very important metric, although I have not seen it being widely used in the football analytics community.

4. Defensive positioning: As stated before, I do not have access to positioning data, but I have access to statistics about defensive actions performed by a player, which will be aggregated to calculate a value of how good a player’s defense is.

5. Match practice: Another metric I have been using for years, that describes how many matches a player has played during recent weeks. As well as possession control, I have not seen it used elsewhere, although it is highly significant in my statistical research models.

States of a football player in a match

The first and most easy state to analyze is a player being in possession of the ball. While he has the ball, he can move around with it, pass, shoot, dribble, clear the ball or lose the possession. Fortunately, all these actions with the ball are statistically recorded. The second possible state is, that he is not having the ball while his team is in possession. This is certainly harder to analyze, since there is no accessible data about offensive off-the-ball positioning of a player on the pitch. Therefore, we just have to assume, that if a player has a good position on the pitch, he will receive the ball from his teammates.

The third possible state is his positioning on the pitch while the opposing team is in possession of the ball. Again, there is no accessible data on the defensive positioning of a player on the pitch, so we have to assume, that the more statistically tracked defensive actions like blocks, tackles and interceptions a player has, the better his defensive positioning is.

Calculation of the ratings

These three states provide us with enough influential factors to compose a player rating solely based on real in-game statistics. To do that, I have identified 5 key factors, which are used to calculate a rating for each player: 1. Expected goals: Expected goals is about the valuation of a shot under the consideration of important criteria, which determine to which degree a shot turns to a goal in average. If you want more information on my expected goals model, see this article for more information: Expected goals

2. Passing: To depict the passing ability of a player, I have created an aggregated passing value, which considers and weights publicly available passing statistics. I have to admit though, that this is the most improvable part of the ratings since I currently do not possess information on how many players were outplayed by a pass and how many expected assists a player had. As soon as I have data on these, I will consider it in the ratings.

3. Possession control: A metric I created to display, how often a player loses possession in relation to the passes and shots he takes. Every statistical analysis I made, that investigated influential factors on success in a football match showed, that this is a very important metric, although I have not seen it being widely used in the football analytics community.

4. Defensive positioning: As stated before, I do not have access to positioning data, but I have access to statistics about defensive actions performed by a player, which will be aggregated to calculate a value of how good a player’s defense is.

5. Match practice: Another metric I have been using for years, that describes how many matches a player has played during recent weeks. As well as possession control, I have not seen it used elsewhere, although it is highly significant in my statistical research models.

Player Rating – Example

After you have seen the 5 key factors, let us look at the rating of a specific player. Below you can see the rating of Cristiano Ronaldo right before the 16-17 Champions League Final. As Ronaldo is mostly known for his scoring, it is not surprising, that more than half of the points in his rating are coming from expected goals. He is also convincing in other categories, with his values in passing and possession control being decent for a striker. His defensive contribution is expectably low due to the fact, that he is an offensive-minded player. Regarding match practice, he is one of the best by playing many games in a season over 90 minutes. His values in the mentioned five categories sum up to an overall rating of 94, which (not surprisingly) outlines him as one of the best players in my database.
Characteristics of the ratings

After that glimpse at Ronaldo’s rating, how do the ratings look like in general? Each player rating is a number between 0 and 99. As the rated players are all playing on a professional level in top leagues, they will receive a minimum value of 50, while only outstanding players will reach a rating above 90. About 40% of the player ratings are between 70 and 75, with the average player rating at around 72.5. The ratings are balanced among all positions on the pitch, meaning the average rating for a striker does not differ from the average rating of a defender. You can see how the player ratings are distributed in the following graphic.
Ratings of 16-17 CL Final

To give you an idea, of how the ratings look for several players, I have composed the ratings for Real Madrid and Juventus Turin right before the 16-17 Champions League Final which you can see in the graphic below. With Real and Juventus being two of the best teams in Europe, it is not surprising that all player ratings are above the average rating of 72.5, 3 players have even received one of the extremely rare 90+ ratings. The average player rating for Juventus stands at 82.6 while the average rating for Real is at 84.
The Toni Kroos problem

Looking at the rating of Toni Kroos in the above graphic, a challenging problem becomes obvious. Toni Kroos is a German midfielder, who has won the Champions League three times and was victorious as well in the 2014 World Cup with Germany. While many people say, he is an outstanding world-class player, it is difficult to reproduce his value with the common, publicly available statistics. Sure, his rating of 82 in the above graphic is far from being average, he is among the top 10% players in Europe’s top 5 leagues. However, the value Toni Kroos provides for a team is not about him having a high value in expected goals, assists or possession gains, the value he provides is that he outplays a high amount of defenders with his passing. These valued passes (I will write an own article about that in the near future) are a relatively new football metric, that are unfortunately not publicly available. Therefore, I have to stick with traditional advanced football statistics which lead to Toni Kroos having a lower value than expected.
The Mikel Merino problem

While I analyzed statistics per 90 minutes of the German Bundesliga 16-17 season, Mikel Merino received the highest rating in “defense” among all players in Europe’s top five leagues, alongside decent ratings in passing and possession control. So we can conclude, that he is one of Europe’s best midfielders? Well, maybe he is, but his incredible per 90 stats came from just 293 minutes of playing time in the German Bundesliga, which is a very small sample size to use for the calculation of a player’s rating. A player’s rating becomes more convincing, if you select statistics of a large sample size. To do that, I am using the data of a player’s last three seasons played (if the player has played in a league were advanced player statistics were tracked). However, I think that a small sample size of a player’s actions is better than rating him with default values, I just do not want a player to be too good, if he has not played a lot of minutes. Therefore, players with a small sample size cannot overpass a certain maximum value. This maximum value increases with a player having more minutes played.
Wrap-up

Developing a one-value player rating was fun! The ratings make it tremendously easy to compare players and see, how much individual quality a team possesses. By calculating the ratings solely off in-game statistics, it is guaranteed, that the values are objective. Additionally, I was quite surprised that they serve as a good predictor for the outcome of a football match. Looking at the last four seasons of Premier League, LaLiga, Ligue 1 and Bundesliga, 69.5% of the matches that did not end as a draw were won by the team with the higher rating. If you increase the rating of a home team slightly to reproduce home advantage, the number gets higher with now 72% of the no-draw matches are won by the team with a higher rating. As the ratings add value, I will start to publish them before a match starts, to give interested readers an insight, on how good the players are, that are facing each other in the upcoming match. Other ideas involve detailed portraits on single players and how their overall rating and the rating in the 5 key factors developed over time.