This year just before round 16 starts the leading goalkickers are Lance Franklin and Jeremy Cameron, both of whom sit on a miserly 40 goals.

If we were to double the goalkicking totals we still wouldn’t have a player reach the magical 100 mark!

So what is going on in todays AFL? Are bags of goals dead? Will we ever see 100 goals kicked in a year again?

Lets explore the data using R and investigate:

Question - Are less bags being kicked?

First we need to think of a definition of what is a bag of goals?

Thankfully TripleM is to the rescue as they have done a poll that has gained over 2500 votes and in it 48% of voters polled thought 5 goals constitutes a bag of goals.

Second step is getting the data!

Thanks to the great website that is afltables we can just get a list of players and games that have kicked 5 or more goals.

Hopefully you have read in the data sucessfully if not it is saved here

#I am calling the dataset bags
library(ggplot2)
library(dplyr)
library(gapminder)
bags <- read_csv("insert your working directory here/bags.csv")
##to find working directory
getwd()
##remember always view the data
head(bags)
##   Goals Behinds          Player Team Opponent           Venue      date
## 1    18      NA    Fred Fanning   ME       SK   Junction Oval 30-Aug-47
## 2    17       5  Jason Dunstall   HW       RI   Waverley Park  2-May-92
## 3    17      NA Gordon Coventry   CW       FI   Victoria Park 19-Jul-30
## 4    16       4   Peter McKenna   CW       SM   Victoria Park 23-Aug-69
## 5    16       1    Peter Hudson   HW       ME Glenferrie Oval  3-May-69
## 6    16      NA    Tony Lockett   SY       FI    Western Oval 13-Aug-95
##   year
## 1 1947
## 2 1992
## 3 1930
## 4 1969
## 5 1969
## 6 1995
glimpse(bags)
## Observations: 9,134
## Variables: 8
## $ Goals    <int> 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 1...
## $ Behinds  <int> NA, 5, NA, 4, 1, NA, NA, 9, 4, NA, NA, 7, 5, 2, 2, 2,...
## $ Player   <fct> Fred Fanning, Jason Dunstall, Gordon Coventry, Peter ...
## $ Team     <fct> ME, HW, CW, CW, HW, SY, CW, FO, SK, SM, CW, GE, GE, H...
## $ Opponent <fct> SK, RI, FI, SM, ME, FI, HW, SK, SY, ES, ES, ES, SY, F...
## $ Venue    <fct> Junction Oval, Waverley Park, Victoria Park, Victoria...
## $ date     <fct> 30-Aug-47, 2-May-92, 19-Jul-30, 23-Aug-69, 3-May-69, ...
## $ year     <int> 1947, 1992, 1930, 1969, 1969, 1995, 1929, 1978, 1992,...

Now that we have our data all loaded, we notice something.

  • When a player hasn’t kicked any behinds its saved as NA we would rather it be a \(0\)

So lets fix that by replacing the NA with 0

bags[is.na(bags)]<-0

#check it worked
head(bags)
##   Goals Behinds          Player Team Opponent           Venue      date
## 1    18       0    Fred Fanning   ME       SK   Junction Oval 30-Aug-47
## 2    17       5  Jason Dunstall   HW       RI   Waverley Park  2-May-92
## 3    17       0 Gordon Coventry   CW       FI   Victoria Park 19-Jul-30
## 4    16       4   Peter McKenna   CW       SM   Victoria Park 23-Aug-69
## 5    16       1    Peter Hudson   HW       ME Glenferrie Oval  3-May-69
## 6    16       0    Tony Lockett   SY       FI    Western Oval 13-Aug-95
##   year
## 1 1947
## 2 1992
## 3 1930
## 4 1969
## 5 1969
## 6 1995
glimpse(bags)
## Observations: 9,134
## Variables: 8
## $ Goals    <int> 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 1...
## $ Behinds  <dbl> 0, 5, 0, 4, 1, 0, 0, 9, 4, 0, 0, 7, 5, 2, 2, 2, 1, 0,...
## $ Player   <fct> Fred Fanning, Jason Dunstall, Gordon Coventry, Peter ...
## $ Team     <fct> ME, HW, CW, CW, HW, SY, CW, FO, SK, SM, CW, GE, GE, H...
## $ Opponent <fct> SK, RI, FI, SM, ME, FI, HW, SK, SY, ES, ES, ES, SY, F...
## $ Venue    <fct> Junction Oval, Waverley Park, Victoria Park, Victoria...
## $ date     <fct> 30-Aug-47, 2-May-92, 19-Jul-30, 23-Aug-69, 3-May-69, ...
## $ year     <int> 1947, 1992, 1930, 1969, 1969, 1995, 1929, 1978, 1992,...

To see if the amount of bags is getting lower through time lets make the equivalent of a pivot table in excel

So instead of dragging years to ‘rows’ and instead of entering in count of in ‘values’ we use group_by and summarise to create a table called table.bags.by.year

table.bags.by.year <-bags %>% 
  group_by(year) %>%
  summarise(count_bags=n()) 

table.bags.by.year
## # A tibble: 121 x 2
##     year count_bags
##    <int>      <int>
##  1  1897          1
##  2  1898          6
##  3  1899          7
##  4  1900          3
##  5  1901         10
##  6  1902          5
##  7  1903          7
##  8  1904          4
##  9  1905         10
## 10  1906          9
## # ... with 111 more rows

Now the next thing we would like to do is view our table from largest count_bags to smallest count_bags we can do this as follows

table.bags.by.year %>% 
  arrange(desc(count_bags))
## # A tibble: 121 x 2
##     year count_bags
##    <int>      <int>
##  1  1993        181
##  2  1992        175
##  3  1991        174
##  4  1987        154
##  5  1994        151
##  6  1995        151
##  7  1982        138
##  8  1996        138
##  9  1985        136
## 10  1978        132
## # ... with 111 more rows

Looking at this table, we can see that in 1993 there was 181 instances of a bag of goals being kicked, 175 in 1992 and 174 in 1993.

If we just wanted to look at the total count_bags kicked in a certain year we can just filter out the year as follows:

table.bags.by.year %>% 
  filter(year==2017)
## # A tibble: 1 x 2
##    year count_bags
##   <int>      <int>
## 1  2017         35

Or maybe we want to look at the past 10 years?

table.bags.by.year %>% 
  filter (year>2006)
## # A tibble: 11 x 2
##     year count_bags
##    <int>      <int>
##  1  2007         90
##  2  2008        116
##  3  2009         72
##  4  2010         88
##  5  2011         71
##  6  2012         78
##  7  2013         67
##  8  2014         63
##  9  2015         67
## 10  2016         72
## 11  2017         35

In the past 10 years, we have only seen one year 2008 where 100 + \(5\) goal bags have been kicked. This year as of round 16 we only have 35 bags. We are over halfway through the season (with more AFL teams hence games compared to 2008) and even if we tripled the amount of 5+ goal hauls from here we still wouldn’t overtake 2008’s 5 goal haul!!!

Make some graphs!

If you are like me, you are probably a bit more of a visual kind of person, so instead of looking at tables of numbers (boring!) lets look at some graphs instead.

A common plot people use is a line graph to see if there is a trend over time.

To do simple plots using ggplot, I would highly reccomend following this guide here

  ggplot(data=table.bags.by.year)+ 
  geom_line(mapping=aes(x=year, y=count_bags))

This graph is a bit misleading because of the sudden dip in 2017 as we don’t have a complete year, so lets pull that year out.

table.bags.by.year%>% 
  filter(year<2017) %>%
  ggplot(aes(x=year,y=count_bags)) + geom_line()

Another good idea is to add labels to certain points of interest, below I am labelling the year 1993 as it was the year the number of bags peaked.

table.bags.by.year%>% 
  filter(year<2017) %>%
  ggplot(aes(x=year,y=count_bags)) + geom_line() +
  geom_text(data=subset(table.bags.by.year, year %in% c(1993)),aes(label=count_bags))

So graphically it clearly looks as though the amount of bags being kicked in todays game is going down. But is the game lower scoring?

Well we can check that, we can look at the same graph but instead of looking at the count of bags we can look at the average totals of games.

So again thanks to the great people at afltables for providing the data

csv saved here

gamescores <- read_csv("put your working directory here/gamescores.csv")
##remember always view the data
head(gamescores)
## # A tibble: 6 x 12
##   Date   year round home  home.g home.b home.t away  away.g away.b away.t
##   <chr> <int> <chr> <chr>  <int>  <int>  <int> <chr>  <int>  <int>  <int>
## 1 8-Ma~  1897 R1    Fitz~      6     13     49 Carl~      2      4     16
## 2 8-Ma~  1897 R1    Coll~      5     11     41 St K~      2      4     16
## 3 8-Ma~  1897 R1    Geel~      3      6     24 Esse~      7      5     47
## 4 8-Ma~  1897 R1    Sout~      3      9     27 Melb~      6      8     44
## 5 15-M~  1897 R2    Sout~      6      4     40 Carl~      5      6     36
## 6 15-M~  1897 R2    Esse~      4      6     30 Coll~      8      2     50
## # ... with 1 more variable: venue <chr>
glimpse(gamescores)
## Observations: 15,119
## Variables: 12
## $ Date   <chr> "8-May-1897", "8-May-1897", "8-May-1897", "8-May-1897",...
## $ year   <int> 1897, 1897, 1897, 1897, 1897, 1897, 1897, 1897, 1897, 1...
## $ round  <chr> "R1", "R1", "R1", "R1", "R2", "R2", "R2", "R2", "R3", "...
## $ home   <chr> "Fitzroy", "Collingwood", "Geelong", "South Melbourne",...
## $ home.g <int> 6, 5, 3, 3, 6, 4, 3, 9, 6, 5, 12, 8, 5, 5, 2, 11, 15, 5...
## $ home.b <int> 13, 11, 6, 9, 4, 6, 8, 10, 5, 9, 6, 11, 14, 11, 8, 10, ...
## $ home.t <int> 49, 41, 24, 27, 40, 30, 26, 64, 41, 39, 78, 59, 44, 41,...
## $ away   <chr> "Carlton", "St Kilda", "Essendon", "Melbourne", "Carlto...
## $ away.g <int> 2, 2, 7, 6, 5, 8, 10, 3, 5, 7, 6, 0, 3, 5, 6, 7, 3, 2, ...
## $ away.b <int> 4, 4, 5, 8, 6, 2, 6, 1, 7, 8, 5, 2, 4, 3, 6, 4, 8, 9, 9...
## $ away.t <int> 16, 16, 47, 44, 36, 50, 66, 19, 37, 50, 41, 2, 22, 33, ...
## $ venue  <chr> "Brunswick St", "Victoria Park", "Corio Oval", "Lake Ov...

So what we want to see is if bags of goals are going down, does that correspond to a decrease in total points scored in games?

To do this first we must come up with a new column that is the home.t+ away.t where home.t is the home score and away.t is the away score. We do this using mutate

afl.games<-gamescores %>% 
  mutate(total.game=home.t+away.t)

head(afl.games)
## # A tibble: 6 x 13
##   Date   year round home  home.g home.b home.t away  away.g away.b away.t
##   <chr> <int> <chr> <chr>  <int>  <int>  <int> <chr>  <int>  <int>  <int>
## 1 8-Ma~  1897 R1    Fitz~      6     13     49 Carl~      2      4     16
## 2 8-Ma~  1897 R1    Coll~      5     11     41 St K~      2      4     16
## 3 8-Ma~  1897 R1    Geel~      3      6     24 Esse~      7      5     47
## 4 8-Ma~  1897 R1    Sout~      3      9     27 Melb~      6      8     44
## 5 15-M~  1897 R2    Sout~      6      4     40 Carl~      5      6     36
## 6 15-M~  1897 R2    Esse~      4      6     30 Coll~      8      2     50
## # ... with 2 more variables: venue <chr>, total.game <int>
glimpse(afl.games)
## Observations: 15,119
## Variables: 13
## $ Date       <chr> "8-May-1897", "8-May-1897", "8-May-1897", "8-May-18...
## $ year       <int> 1897, 1897, 1897, 1897, 1897, 1897, 1897, 1897, 189...
## $ round      <chr> "R1", "R1", "R1", "R1", "R2", "R2", "R2", "R2", "R3...
## $ home       <chr> "Fitzroy", "Collingwood", "Geelong", "South Melbour...
## $ home.g     <int> 6, 5, 3, 3, 6, 4, 3, 9, 6, 5, 12, 8, 5, 5, 2, 11, 1...
## $ home.b     <int> 13, 11, 6, 9, 4, 6, 8, 10, 5, 9, 6, 11, 14, 11, 8, ...
## $ home.t     <int> 49, 41, 24, 27, 40, 30, 26, 64, 41, 39, 78, 59, 44,...
## $ away       <chr> "Carlton", "St Kilda", "Essendon", "Melbourne", "Ca...
## $ away.g     <int> 2, 2, 7, 6, 5, 8, 10, 3, 5, 7, 6, 0, 3, 5, 6, 7, 3,...
## $ away.b     <int> 4, 4, 5, 8, 6, 2, 6, 1, 7, 8, 5, 2, 4, 3, 6, 4, 8, ...
## $ away.t     <int> 16, 16, 47, 44, 36, 50, 66, 19, 37, 50, 41, 2, 22, ...
## $ venue      <chr> "Brunswick St", "Victoria Park", "Corio Oval", "Lak...
## $ total.game <int> 65, 57, 71, 71, 76, 80, 92, 83, 78, 89, 119, 61, 66...

Now lets see it average total score by year in a graph.

Hopefully now you are getting a feel for %>% while it might seem a bit weird thinking about it this way. Its another way of saying take the result from the left and apply what is coming next.

Below, we take afl.games dataset group it by year. After its been grouped by year (think pivot table) we summarise it by the mean of total.game. We then take that resulting pivot table and plot it using ggplot where x-axis is the year and y axis is the average total.

afl.games %>% 
  group_by(year)%>%
  summarise(average.total = mean(total.game)) %>% 
  ggplot(aes(x=year,y=average.total))+geom_line()

That is a bit annoying right for comparisions sake.

Lets stack the graphs on top of each other saving you the hastle of scrolling up and down. To make things a bit more clear visually lets add some vertical lines to help our eyes out.

##make plot 1
p1<-table.bags.by.year%>% 
  filter(year<2017) %>%
  ggplot(aes(x=year,y=count_bags)) + geom_line() +
  geom_text(data=subset(table.bags.by.year, year %in% c(1993,2016)),aes(label=count_bags)) +
  ylim(0,250) +
  geom_vline(xintercept =c(1993,2016))

## make plot 2
p2<-afl.games %>% group_by(year)%>%
  summarise(average.total = mean(total.game)) %>% 
  ggplot(aes(x=year,y=average.total))+geom_line() +
  ylim(0,250) +
  geom_vline(xintercept =c(1993,2016))

grid.arrange(p1, p2, nrow = 2)

Visually what do you see?

I see since 1993 it looks as though totals have remained relatively stable while the amount of bags being kicked as been on the steady decline.

What about how many people in goals are kicking games. In other words what is the spread of goalkickers?

Thankfully again thanks to afltables we have a list of all the goalkickers in games.

I have saved the file here

all_goal_kickers <- read_csv("put your working directory here/all_goals.csv")

Usually with excel if I wanted to get a summary of a few groups in one go. (The equivalent of moving lots of variables to ‘rows’) the table would actually look horrible.

I hate EXCEL

I hate EXCEL

Now a workaround, is to concatentate the columns you want in excel to do this in R you can use paste

 goal_kickers = mutate(all_goal_kickers,
              concated_column = paste(team, opponent, venue,date, sep = '_'))
# View(goal_kickers)
goal.spread <-goal_kickers %>% 
     group_by(concated_column,year) %>%
     summarise(count_spread=n())  
head(goal.spread)
## # A tibble: 6 x 3
## # Groups:   concated_column [6]
##   concated_column                year count_spread
##   <chr>                         <int>        <int>
## 1 AD_BB_Carrara_9-Aug-92         1992            9
## 2 AD_BB_Football Park_14-Aug-94  1994            7
## 3 AD_BB_Football Park_23-Jun-91  1991            9
## 4 AD_BB_Football Park_26-Apr-92  1992            6
## 5 AD_BB_Football Park_30-Jul-95  1995            6
## 6 AD_BB_Football Park_6-Jul-96   1996            7
##that would give the same as 
goal.spread.2<-goal_kickers%>%
  group_by(team,opponent,venue,date,year)%>%
  summarise(count_spread=n())
head(goal.spread.2)
## # A tibble: 6 x 6
## # Groups:   team, opponent, venue, date [6]
##   team  opponent venue         date       year count_spread
##   <chr> <chr>    <chr>         <chr>     <int>        <int>
## 1 AD    BB       Carrara       9-Aug-92   1992            9
## 2 AD    BB       Football Park 14-Aug-94  1994            7
## 3 AD    BB       Football Park 23-Jun-91  1991            9
## 4 AD    BB       Football Park 26-Apr-92  1992            6
## 5 AD    BB       Football Park 30-Jul-95  1995            6
## 6 AD    BB       Football Park 6-Jul-96   1996            7
##so now we can see the spread of goal kickers in each game

goal.spread.2%>%
   group_by(year)%>% 
   summarise(average.spread=mean(count_spread)) %>% 
   ggplot(aes(x=year,y=average.spread))+geom_line()

Looking at this, it would seem as though there was an uptake in spread of goalkickers since the mid 90s that has been maintained roughly through till today.

p3<- goal.spread.2%>%
   group_by(year)%>% 
   summarise(average.spread=mean(count_spread)) %>% 
   ggplot(aes(x=year,y=average.spread))+geom_line() +
  geom_vline(xintercept =c(1993,2016))
grid.arrange(p1, p2,p3 ,nrow = 3)

From this pretty quick and dirty run, we can see that roughly the amount of bags being kicked has gone down while totals and spread of goal kickers has remained relatively stable.

What does that mean for todays game and what are some possible next steps?

  • Instead of just eyeballing the graphs can we tests for statistically significant differences through time
  • What is the value of a “Coleman” contending forward? If scores are roughly the same with less bags being kicked. This would imply that players are chipping in 2,3 goals instead of 1,2 to maintain the same rough total while not getting on the board as much individually.

Finally, do you find this enjoyable? Does having R scripts/ datasets running all the way through and being fully reproducible make you want to do some more analysis yourself?

As always hit me up for feedback.