Wednesday, March 18, 2015

Complex Network in Football Clubs and players

Complex Network in Football Clubs and players

When I saw the last blog on cricket world cup's complex network I thought that such kind of networks may be present in many sports one way or other. As football is one of the most famous games in the entire world, I focussed my search on this game and found some interesting information to create a very versatile complex network.
This complex network is based on the football clubs of Brazil and the main focus is on understanding if there exists some complex network properties in them.
Now let us first define out network.

Network: Bipartite network. One side Club and other side Player.
Node: Any player who has ever played in any club.
Edge: If two players have played in the same club, we connect them with an edge going through that club.
Weight of edge: Number of goals scored by a player for a club.
We can immediately see something like the citation netwok here.

Taking the data of 127 clubs and 13,411 players all taken from Brazillian championship from 1971-2002, we can define many statistical features for each club and player. Let G denote the number of goals scored by a club and M denotes the goals conceded then the following graph shows the variation of Nc and G/M follows the Gaussian curve fitting.

Lets try to see the degree distribution of each kind of vertex in the bipartite network. The player probability P(N) show expected decay which happens exponentially. Similar to citation network, N corresponds to the number of clubs a player has been involved in . Club probability distribution was less obvious because of them being in small number.

One thing to notice is that the probability of finding nomadic players is very less as compared to the one stable in one or two clubs.

Now here comes the most interesting part and the best result to showcase in the football network.
We define P(M) as the probability that total of M matches are played by any player (irrespective of the club). There is a kink or elbow in the graph so obtained in the semi log plot and happens at Mc = 40. This is where we can fit the cumulative distribution P(Mc) into two different exponentials.
Pc(M) = 0.150 + 0.857 * 10^(-0.042M) for M <40 and 0.410* 10^(-0.010M) for M>40.

Both of these are shown in  the following figure that the fit the given exponential distribution quite well. One of the most obvious conclusion from the graph is that once a player has found some fame it easier for him to keep playing. But one funny conclusion is that the same applies for player with notoriety. If you cross the threshold Mc, then there is high chance that you have achieved stability in the job as a player. This might be seemingly impractical to say, but we can extend this conclusion to the number of matches a player can survive when suffering from bad form.

Without goals, a football game can never exist. So goals must form most important dynamics of football. Hence the same was plotted with P(G) as the probability of a player scoring G goals. The following graph shows the plot. The cumulative goals scored P(Gc) is also plotted below the P(G) one.

Again an interesting threshold of Gc=10 is found here which separates region in apparently two different regions following separate power laws. Such laws are found many times in scientific collaboration network.

Again the two equations that best fit the network were worked upon and found to follow the following.
Pc(G) = -0.259 + 1.256*G^(-0.500) for G<10
Pc(G) = -0.004 + 4.454*G^(-1.440) for G>10

Also P(G)~G^(-1.5) for G<10
and P(G)~G^(-2.44) for G>10

Using the above graph, of the 11 players in a team, we can easily find which player is supposed to be in what sort of position, like goal keeper,defender or striker. From this distribution we found that nearly two thirds of player forms the less-scoring positions.

Making further attempts to generalise more complex network related features in this field we created a Soccer Player network with making edges between two players present in a team at the same time. This merging is same as that of scientist-paper bipartite network. Out of the obtained graph of 13,411 vertices and 315,566 edges the degree probability P(k) was calculated and average degree was found to be k = 47.1. Also getting the Clustering Coeffiecient of the network gave us
C = 0.79 making it a highly clustered network. The assortavity coeffiecient was A = 0.12 which makes it assortative network although less assortative than that of citation network having A=0.46.

The average shortest path was found to be D=3.29 from which we can logically conclude that there are 3.29 degrees of separation between the players.

An attempt to understand the time based evolution of the network was also made. The results obtained were:

Study on temporal evolution shows the following
1) Increasing mean connectivity k: Player's professional life is turning longer or transfer rate is going up.
2) Clustering Coefficient decreases: Movement of player to some outside club not under study.
3) The network is becoming more assortative with time. This means that segregation of players and transfer between the players in the clubs of similar type.

Conclusion and future works

The study shows that there are many facets in normal life where complex networks can be applied and lead to great predictions once we understand the network. Note that this study was centered around Brazillian clubs. If similar study is promoted for European clubs involving many countries we might get to see good clustering as well as better players to choose for transfer choice among the clubs.


Submission By - Nishkarsh Shastri

No comments:

Post a Comment