This repository contains the data and code associated with the paper:
"F. Basini, V. Tsouli, I. Ntzoufras, N. Friel, Assessing competitive balance in the English Premier League for over forty seasons using a stochastic block model, Journal of the Royal Statistical Society Series A: Statistics in Society, 2023".
README.md
- you're reading it- /Data_Premier - contains the tables of results (match grids) of the Premier League championship from season 1978/79 to 2019/20 in csv format. e.g. Result_Premier_0102.csv for season 2001/02.
- Note on the data: each csv file contains the results table where entries of the main diagonal are blank and the entry of match results is written as | 4~3 | for a match won 4 to 3 by row team against col team, where row team is playing home.
LABEL_CORRECTION_AND_ANALYSIS.R
- code to apply the label switching algorithm (collpcm
) and carry out post-hoc analysis of the chain.SBM_FUNCTIONS.R
- contains all functions used in the MCMC algorithm. e.g. get_loglik returns the collapsed loglikelihood.READ_TABLE_RESULTS.R
- code to load the result table and extract the relational pattern y.MCMC_main.R
- the heart (:heartpulse:) of the whole code which calls the other source files and runs the MCMC algorithm.- /Inference_results - now empty, folders for each season analysed will be created inside this folder once the code is run, e.g. [/Inference_results/mcmc_Premier_Season_0102] for season 2001/02.
OVER_TIME_ANALYSIS.R
- script to run the analysis for all the season in the data folder and to produce the analysis of top block probability over time and the associated plots.
- Clone repository.
- Open
MCMC_main.R
in RStudio and setFootball_SBM
as your working directory. - Uncomment
line 29
and setseason
to the season you want to analyse using the last two digits of each year, e.g. "1819" for 2018/19. (Provided that it is between 1978/79 and 2019/20)
season = "1819"
- Run it all. Waiting time: about 4 minutes
- Open
OVER_TIME_ANALYSIS.R
in RStudio and setFootball_SBM
as your working directory. - Run it all. Waiting time: about 2 and a half hours
In the associated folder /Inference_results/mcmc_Premier_Season_*season*
that will be created, the following items will be available:
In addition, if you are using OVER_TIME_ANALYSIS.R
, there will be one futher directory mcmc_Premier_OVER_TIME_ANALYSIS
, which will contain:
mcmc_Premier_OVER_TIME.RData
the workspace regarding the over time analysis.Posterior_K_table_OVER_TIME.txt
the table with posterior probability of K over each season.
Size_top_block_table.txt
the table with size of the estimated top block over each season.
* TopBlock_Size_barplot.pdf
the barplot showing the value of the size of the estimated top block over each season.
* TopBlock_prob_datapoints_JITTERED.pdf
the plot of the probability of being in the top block for all teams in the league and over all seasons.
* \Over_time_Teams
folder containing the probability of belonging to the top block over each season but focusing on one team at a time (see Fig. 8 of the paper). Below we provide an example for Manchester City:
All packages used are available on CRAN.
install.packages("plyr")
install.packages("stringi")
install.packages("seqinr")
install.packages("RColorBrewer")
install.packages("lattice")
install.packages("xtable")
install.packages("collpcm")
Jason Wyse and Caitriona Ryan (2019). collpcm: Collapsed Latent Position Cluster Model for Social Networks. R package version 1.1. https://CRAN.R-project.org/package=collpcm