Quick Control Charts for AFL

Who doesn’t like a wikipedia entry control chart If analysis of the control chart indicates that the process is currently under control (i.e., is stable, with variation only coming from sources common to the process), then no corrections or changes to process control parameters are needed or desired I mean gee whiz this sure could relate to something like I don’t know AFL total game scores?

There seems to always be talk about the scores in AFLM see AFL website, foxsports just to name a couple. Of course you could find more if you searched out for it as well.

Let’s use fitzRoy and the good people over at statsinsder who have kindly provided me with the expected score data you can get from the herald sun.

First thing, lets use fitzRoy

library(fitzRoy)
library(tidyverse)

## ── Attaching packages ──────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.1.0       ✔ purrr   0.3.2  
## ✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
## ✔ tidyr   0.8.3       ✔ stringr 1.4.0  
## ✔ readr   1.3.1       ✔ forcats 0.4.0

## ── Conflicts ─────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(ggQC)
fitzRoy::match_results%>%
  mutate(total=Home.Points+Away.Points)%>%
  group_by(Season,Round)%>%
  summarise(meantotal=mean(total))%>%
filter(Season>1989 &  Round=="R1")%>%
ggplot(aes(x=Season,y=meantotal))+geom_point()+geom_line()+stat_QC(method="XmR")+ylab("Mean Round 1 Total for Each Game") +ggtitle("Stop Freaking OUT over ONE ROUND")

So if we were to look at the control chart just for round 1 in each AFLM season since the 90s it would seem as though that even though this round was lower scoring that there isn’t much too see here.

After all we can and should expect natural variation in scores, wouldn’t footy be boring if scores were the same every week.

So next lets thing about the expected scores framework

So the thing is we know that week to week thanks to matterofstats we know that scoring does have certain properties. But what I want to look at with a control chart is last week in round 1 2019, were team creating worse oppotunities and outside the bounds that would cause us to worry if we were using a control chart at work.

## # A tibble: 6 x 8
##   Home_Team   Away_Team      Year Round H_CD_Exp A_CD_Exp H_Actual A_Actual
##   <chr>       <chr>         <dbl> <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
## 1 Carlton     Richmond       2017     1       70      120       89      132
## 2 Collingwood Western Bull…  2017     1       91       88       86      100
## 3 Sydney      Port Adelaide  2017     1       79       90       82      110
## 4 St Kilda    Melbourne      2017     1       99      109       90      120
## 5 Gold Coast  Brisbane Lio…  2017     1       73       98       96       98
## 6 Essendon    Hawthorn       2017     1      117       90      116       91

So now lets get plotting.

CDExp_17%>%
  mutate(totalexpected=H_CD_Exp + A_CD_Exp)%>%
  group_by(Year, Round)%>%
  summarise(meanexpected=mean(totalexpected))%>%
  unite("year_round", Year, Round)%>%
  ggplot(aes(x=year_round, y=meanexpected, group=1))+
  geom_line()+ 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

So what we can see here is that the plot has been re-ordered even though our dataframe is in the order that we want.

CDExp_17%>%
  mutate(totalexpected=H_CD_Exp + A_CD_Exp)%>%
  group_by(Year, Round)%>%
  summarise(meanexpected=mean(totalexpected))%>%
  unite("year_round", Year, Round)

## # A tibble: 55 x 2
##    year_round meanexpected
##    <chr>             <dbl>
##  1 2017_1             193.
##  2 2017_2             191.
##  3 2017_3             181.
##  4 2017_4             184.
##  5 2017_5             188.
##  6 2017_6             181.
##  7 2017_7             191.
##  8 2017_8             178.
##  9 2017_9             180.
## 10 2017_10            186.
## # … with 45 more rows

So what is going on here is that ggplot orders the graph by the levels of our dataframe

df<-CDExp_17%>%
  mutate(totalexpected=H_CD_Exp + A_CD_Exp)%>%
  group_by(Year, Round)%>%
  summarise(meanexpected=mean(totalexpected))%>%
  unite("year_round", Year, Round)
str(df)

## Classes 'tbl_df', 'tbl' and 'data.frame':    55 obs. of  2 variables:
##  $ year_round  : chr  "2017_1" "2017_2" "2017_3" "2017_4" ...
##  $ meanexpected: num  193 191 181 184 188 ...

df$year_round<-as.factor(df$year_round)
levels(df$year_round)

##  [1] "2017_1"  "2017_10" "2017_11" "2017_12" "2017_13" "2017_14" "2017_15"
##  [8] "2017_16" "2017_17" "2017_18" "2017_19" "2017_2"  "2017_20" "2017_21"
## [15] "2017_22" "2017_23" "2017_24" "2017_25" "2017_26" "2017_27" "2017_3" 
## [22] "2017_4"  "2017_5"  "2017_6"  "2017_7"  "2017_8"  "2017_9"  "2018_1" 
## [29] "2018_10" "2018_11" "2018_12" "2018_13" "2018_14" "2018_15" "2018_16"
## [36] "2018_17" "2018_18" "2018_19" "2018_2"  "2018_20" "2018_21" "2018_22"
## [43] "2018_23" "2018_24" "2018_25" "2018_26" "2018_27" "2018_3"  "2018_4" 
## [50] "2018_5"  "2018_6"  "2018_7"  "2018_8"  "2018_9"  "2019_1"

Which if you notice is the oder of our plot earlier.

So how do we fix that?

Lets just add a numeric column that is the row numbers and that will become our x-axis.

CDExp_17%>%
  mutate(totalexpected=H_CD_Exp + A_CD_Exp)%>%
  group_by(Year, Round)%>%
  summarise(meanexpected=mean(totalexpected))%>%
  unite("year_round", Year, Round)%>%
  mutate(id = row_number())%>%
  ggplot(aes(x=id, y=meanexpected, group=1))+
  geom_line()+ 
  stat_QC(method="XmR")

There you go, if we think about expected scores as oppotunities created, round 1 was actually on the up from the finals series, players just didn’t kick that well. But hey as we know goal kicking isn’t going to happen at 100% of the time.