This year just before round 16 starts the leading goalkickers are Lance Franklin and Jeremy Cameron, both of whom sit on a miserly 40 goals.
If we were to double the goalkicking totals we still wouldn’t have a player reach the magical 100 mark!
So what is going on in todays AFL? Are bags of goals dead? Will we ever see 100 goals kicked in a year again?
Lets explore the data using R and investigate:
Question - Are less bags being kicked?
First we need to think of a definition of what is a bag of goals?
Thankfully TripleM is to the rescue as they have done a poll that has gained over 2500 votes and in it 48% of voters polled thought 5 goals constitutes a bag of goals.
Second step is getting the data!
Thanks to the great website that is afltables we can just get a list of players and games that have kicked 5 or more goals.
Hopefully you have read in the data sucessfully if not it is saved here
#I am calling the dataset bags
library(ggplot2)
library(dplyr)
library(gapminder)
bags <- read_csv("insert your working directory here/bags.csv")
##to find working directory
getwd()
##remember always view the data
head(bags)
## Goals Behinds Player Team Opponent Venue date
## 1 18 NA Fred Fanning ME SK Junction Oval 30-Aug-47
## 2 17 5 Jason Dunstall HW RI Waverley Park 2-May-92
## 3 17 NA Gordon Coventry CW FI Victoria Park 19-Jul-30
## 4 16 4 Peter McKenna CW SM Victoria Park 23-Aug-69
## 5 16 1 Peter Hudson HW ME Glenferrie Oval 3-May-69
## 6 16 NA Tony Lockett SY FI Western Oval 13-Aug-95
## year
## 1 1947
## 2 1992
## 3 1930
## 4 1969
## 5 1969
## 6 1995
glimpse(bags)
## Observations: 9,134
## Variables: 8
## $ Goals <int> 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 1...
## $ Behinds <int> NA, 5, NA, 4, 1, NA, NA, 9, 4, NA, NA, 7, 5, 2, 2, 2,...
## $ Player <fct> Fred Fanning, Jason Dunstall, Gordon Coventry, Peter ...
## $ Team <fct> ME, HW, CW, CW, HW, SY, CW, FO, SK, SM, CW, GE, GE, H...
## $ Opponent <fct> SK, RI, FI, SM, ME, FI, HW, SK, SY, ES, ES, ES, SY, F...
## $ Venue <fct> Junction Oval, Waverley Park, Victoria Park, Victoria...
## $ date <fct> 30-Aug-47, 2-May-92, 19-Jul-30, 23-Aug-69, 3-May-69, ...
## $ year <int> 1947, 1992, 1930, 1969, 1969, 1995, 1929, 1978, 1992,...
Now that we have our data all loaded, we notice something.
- When a player hasn’t kicked any behinds its saved as NA we would rather it be a \(0\)
So lets fix that by replacing the NA with 0
bags[is.na(bags)]<-0
#check it worked
head(bags)
## Goals Behinds Player Team Opponent Venue date
## 1 18 0 Fred Fanning ME SK Junction Oval 30-Aug-47
## 2 17 5 Jason Dunstall HW RI Waverley Park 2-May-92
## 3 17 0 Gordon Coventry CW FI Victoria Park 19-Jul-30
## 4 16 4 Peter McKenna CW SM Victoria Park 23-Aug-69
## 5 16 1 Peter Hudson HW ME Glenferrie Oval 3-May-69
## 6 16 0 Tony Lockett SY FI Western Oval 13-Aug-95
## year
## 1 1947
## 2 1992
## 3 1930
## 4 1969
## 5 1969
## 6 1995
glimpse(bags)
## Observations: 9,134
## Variables: 8
## $ Goals <int> 18, 17, 17, 16, 16, 16, 16, 15, 15, 15, 15, 14, 14, 1...
## $ Behinds <dbl> 0, 5, 0, 4, 1, 0, 0, 9, 4, 0, 0, 7, 5, 2, 2, 2, 1, 0,...
## $ Player <fct> Fred Fanning, Jason Dunstall, Gordon Coventry, Peter ...
## $ Team <fct> ME, HW, CW, CW, HW, SY, CW, FO, SK, SM, CW, GE, GE, H...
## $ Opponent <fct> SK, RI, FI, SM, ME, FI, HW, SK, SY, ES, ES, ES, SY, F...
## $ Venue <fct> Junction Oval, Waverley Park, Victoria Park, Victoria...
## $ date <fct> 30-Aug-47, 2-May-92, 19-Jul-30, 23-Aug-69, 3-May-69, ...
## $ year <int> 1947, 1992, 1930, 1969, 1969, 1995, 1929, 1978, 1992,...
To see if the amount of bags is getting lower through time lets make the equivalent of a pivot table in excel
So instead of dragging years to ‘rows’ and instead of entering in count of in ‘values’ we use group_by and summarise to create a table called table.bags.by.year
table.bags.by.year <-bags %>%
group_by(year) %>%
summarise(count_bags=n())
table.bags.by.year
## # A tibble: 121 x 2
## year count_bags
## <int> <int>
## 1 1897 1
## 2 1898 6
## 3 1899 7
## 4 1900 3
## 5 1901 10
## 6 1902 5
## 7 1903 7
## 8 1904 4
## 9 1905 10
## 10 1906 9
## # ... with 111 more rows
Now the next thing we would like to do is view our table from largest count_bags to smallest count_bags we can do this as follows
table.bags.by.year %>%
arrange(desc(count_bags))
## # A tibble: 121 x 2
## year count_bags
## <int> <int>
## 1 1993 181
## 2 1992 175
## 3 1991 174
## 4 1987 154
## 5 1994 151
## 6 1995 151
## 7 1982 138
## 8 1996 138
## 9 1985 136
## 10 1978 132
## # ... with 111 more rows
Looking at this table, we can see that in 1993 there was 181 instances of a bag of goals being kicked, 175 in 1992 and 174 in 1993.
If we just wanted to look at the total count_bags kicked in a certain year we can just filter out the year as follows:
table.bags.by.year %>%
filter(year==2017)
## # A tibble: 1 x 2
## year count_bags
## <int> <int>
## 1 2017 35
Or maybe we want to look at the past 10 years?
table.bags.by.year %>%
filter (year>2006)
## # A tibble: 11 x 2
## year count_bags
## <int> <int>
## 1 2007 90
## 2 2008 116
## 3 2009 72
## 4 2010 88
## 5 2011 71
## 6 2012 78
## 7 2013 67
## 8 2014 63
## 9 2015 67
## 10 2016 72
## 11 2017 35
In the past 10 years, we have only seen one year 2008 where 100 + \(5\) goal bags have been kicked. This year as of round 16 we only have 35 bags. We are over halfway through the season (with more AFL teams hence games compared to 2008) and even if we tripled the amount of 5+ goal hauls from here we still wouldn’t overtake 2008’s 5 goal haul!!!
Make some graphs!
If you are like me, you are probably a bit more of a visual kind of person, so instead of looking at tables of numbers (boring!) lets look at some graphs instead.
A common plot people use is a line graph to see if there is a trend over time.
To do simple plots using ggplot, I would highly reccomend following this guide here
ggplot(data=table.bags.by.year)+
geom_line(mapping=aes(x=year, y=count_bags))
This graph is a bit misleading because of the sudden dip in 2017 as we don’t have a complete year, so lets pull that year out.
table.bags.by.year%>%
filter(year<2017) %>%
ggplot(aes(x=year,y=count_bags)) + geom_line()
Another good idea is to add labels to certain points of interest, below I am labelling the year 1993 as it was the year the number of bags peaked.
table.bags.by.year%>%
filter(year<2017) %>%
ggplot(aes(x=year,y=count_bags)) + geom_line() +
geom_text(data=subset(table.bags.by.year, year %in% c(1993)),aes(label=count_bags))
So graphically it clearly looks as though the amount of bags being kicked in todays game is going down. But is the game lower scoring?
Well we can check that, we can look at the same graph but instead of looking at the count of bags we can look at the average totals of games.
So again thanks to the great people at afltables for providing the data
csv saved here
gamescores <- read_csv("put your working directory here/gamescores.csv")
##remember always view the data
head(gamescores)
## # A tibble: 6 x 12
## Date year round home home.g home.b home.t away away.g away.b away.t
## <chr> <int> <chr> <chr> <int> <int> <int> <chr> <int> <int> <int>
## 1 8-Ma~ 1897 R1 Fitz~ 6 13 49 Carl~ 2 4 16
## 2 8-Ma~ 1897 R1 Coll~ 5 11 41 St K~ 2 4 16
## 3 8-Ma~ 1897 R1 Geel~ 3 6 24 Esse~ 7 5 47
## 4 8-Ma~ 1897 R1 Sout~ 3 9 27 Melb~ 6 8 44
## 5 15-M~ 1897 R2 Sout~ 6 4 40 Carl~ 5 6 36
## 6 15-M~ 1897 R2 Esse~ 4 6 30 Coll~ 8 2 50
## # ... with 1 more variable: venue <chr>
glimpse(gamescores)
## Observations: 15,119
## Variables: 12
## $ Date <chr> "8-May-1897", "8-May-1897", "8-May-1897", "8-May-1897",...
## $ year <int> 1897, 1897, 1897, 1897, 1897, 1897, 1897, 1897, 1897, 1...
## $ round <chr> "R1", "R1", "R1", "R1", "R2", "R2", "R2", "R2", "R3", "...
## $ home <chr> "Fitzroy", "Collingwood", "Geelong", "South Melbourne",...
## $ home.g <int> 6, 5, 3, 3, 6, 4, 3, 9, 6, 5, 12, 8, 5, 5, 2, 11, 15, 5...
## $ home.b <int> 13, 11, 6, 9, 4, 6, 8, 10, 5, 9, 6, 11, 14, 11, 8, 10, ...
## $ home.t <int> 49, 41, 24, 27, 40, 30, 26, 64, 41, 39, 78, 59, 44, 41,...
## $ away <chr> "Carlton", "St Kilda", "Essendon", "Melbourne", "Carlto...
## $ away.g <int> 2, 2, 7, 6, 5, 8, 10, 3, 5, 7, 6, 0, 3, 5, 6, 7, 3, 2, ...
## $ away.b <int> 4, 4, 5, 8, 6, 2, 6, 1, 7, 8, 5, 2, 4, 3, 6, 4, 8, 9, 9...
## $ away.t <int> 16, 16, 47, 44, 36, 50, 66, 19, 37, 50, 41, 2, 22, 33, ...
## $ venue <chr> "Brunswick St", "Victoria Park", "Corio Oval", "Lake Ov...
So what we want to see is if bags of goals are going down, does that correspond to a decrease in total points scored in games?
To do this first we must come up with a new column that is the home.t+ away.t where home.t is the home score and away.t is the away score. We do this using mutate
afl.games<-gamescores %>%
mutate(total.game=home.t+away.t)
head(afl.games)
## # A tibble: 6 x 13
## Date year round home home.g home.b home.t away away.g away.b away.t
## <chr> <int> <chr> <chr> <int> <int> <int> <chr> <int> <int> <int>
## 1 8-Ma~ 1897 R1 Fitz~ 6 13 49 Carl~ 2 4 16
## 2 8-Ma~ 1897 R1 Coll~ 5 11 41 St K~ 2 4 16
## 3 8-Ma~ 1897 R1 Geel~ 3 6 24 Esse~ 7 5 47
## 4 8-Ma~ 1897 R1 Sout~ 3 9 27 Melb~ 6 8 44
## 5 15-M~ 1897 R2 Sout~ 6 4 40 Carl~ 5 6 36
## 6 15-M~ 1897 R2 Esse~ 4 6 30 Coll~ 8 2 50
## # ... with 2 more variables: venue <chr>, total.game <int>
glimpse(afl.games)
## Observations: 15,119
## Variables: 13
## $ Date <chr> "8-May-1897", "8-May-1897", "8-May-1897", "8-May-18...
## $ year <int> 1897, 1897, 1897, 1897, 1897, 1897, 1897, 1897, 189...
## $ round <chr> "R1", "R1", "R1", "R1", "R2", "R2", "R2", "R2", "R3...
## $ home <chr> "Fitzroy", "Collingwood", "Geelong", "South Melbour...
## $ home.g <int> 6, 5, 3, 3, 6, 4, 3, 9, 6, 5, 12, 8, 5, 5, 2, 11, 1...
## $ home.b <int> 13, 11, 6, 9, 4, 6, 8, 10, 5, 9, 6, 11, 14, 11, 8, ...
## $ home.t <int> 49, 41, 24, 27, 40, 30, 26, 64, 41, 39, 78, 59, 44,...
## $ away <chr> "Carlton", "St Kilda", "Essendon", "Melbourne", "Ca...
## $ away.g <int> 2, 2, 7, 6, 5, 8, 10, 3, 5, 7, 6, 0, 3, 5, 6, 7, 3,...
## $ away.b <int> 4, 4, 5, 8, 6, 2, 6, 1, 7, 8, 5, 2, 4, 3, 6, 4, 8, ...
## $ away.t <int> 16, 16, 47, 44, 36, 50, 66, 19, 37, 50, 41, 2, 22, ...
## $ venue <chr> "Brunswick St", "Victoria Park", "Corio Oval", "Lak...
## $ total.game <int> 65, 57, 71, 71, 76, 80, 92, 83, 78, 89, 119, 61, 66...
Now lets see it average total score by year in a graph.
Hopefully now you are getting a feel for %>% while it might seem a bit weird thinking about it this way. Its another way of saying take the result from the left and apply what is coming next.
Below, we take afl.games dataset group it by year. After its been grouped by year (think pivot table) we summarise it by the mean of total.game. We then take that resulting pivot table and plot it using ggplot where x-axis is the year and y axis is the average total.
afl.games %>%
group_by(year)%>%
summarise(average.total = mean(total.game)) %>%
ggplot(aes(x=year,y=average.total))+geom_line()
That is a bit annoying right for comparisions sake.
Lets stack the graphs on top of each other saving you the hastle of scrolling up and down. To make things a bit more clear visually lets add some vertical lines to help our eyes out.
##make plot 1
p1<-table.bags.by.year%>%
filter(year<2017) %>%
ggplot(aes(x=year,y=count_bags)) + geom_line() +
geom_text(data=subset(table.bags.by.year, year %in% c(1993,2016)),aes(label=count_bags)) +
ylim(0,250) +
geom_vline(xintercept =c(1993,2016))
## make plot 2
p2<-afl.games %>% group_by(year)%>%
summarise(average.total = mean(total.game)) %>%
ggplot(aes(x=year,y=average.total))+geom_line() +
ylim(0,250) +
geom_vline(xintercept =c(1993,2016))
grid.arrange(p1, p2, nrow = 2)
Visually what do you see?
I see since 1993 it looks as though totals have remained relatively stable while the amount of bags being kicked as been on the steady decline.
What about how many people in goals are kicking games. In other words what is the spread of goalkickers?
Thankfully again thanks to afltables we have a list of all the goalkickers in games.
I have saved the file here
all_goal_kickers <- read_csv("put your working directory here/all_goals.csv")
Usually with excel if I wanted to get a summary of a few groups in one go. (The equivalent of moving lots of variables to ‘rows’) the table would actually look horrible.
Now a workaround, is to concatentate the columns you want in excel to do this in R you can use paste
goal_kickers = mutate(all_goal_kickers,
concated_column = paste(team, opponent, venue,date, sep = '_'))
# View(goal_kickers)
goal.spread <-goal_kickers %>%
group_by(concated_column,year) %>%
summarise(count_spread=n())
head(goal.spread)
## # A tibble: 6 x 3
## # Groups: concated_column [6]
## concated_column year count_spread
## <chr> <int> <int>
## 1 AD_BB_Carrara_9-Aug-92 1992 9
## 2 AD_BB_Football Park_14-Aug-94 1994 7
## 3 AD_BB_Football Park_23-Jun-91 1991 9
## 4 AD_BB_Football Park_26-Apr-92 1992 6
## 5 AD_BB_Football Park_30-Jul-95 1995 6
## 6 AD_BB_Football Park_6-Jul-96 1996 7
##that would give the same as
goal.spread.2<-goal_kickers%>%
group_by(team,opponent,venue,date,year)%>%
summarise(count_spread=n())
head(goal.spread.2)
## # A tibble: 6 x 6
## # Groups: team, opponent, venue, date [6]
## team opponent venue date year count_spread
## <chr> <chr> <chr> <chr> <int> <int>
## 1 AD BB Carrara 9-Aug-92 1992 9
## 2 AD BB Football Park 14-Aug-94 1994 7
## 3 AD BB Football Park 23-Jun-91 1991 9
## 4 AD BB Football Park 26-Apr-92 1992 6
## 5 AD BB Football Park 30-Jul-95 1995 6
## 6 AD BB Football Park 6-Jul-96 1996 7
##so now we can see the spread of goal kickers in each game
goal.spread.2%>%
group_by(year)%>%
summarise(average.spread=mean(count_spread)) %>%
ggplot(aes(x=year,y=average.spread))+geom_line()
Looking at this, it would seem as though there was an uptake in spread of goalkickers since the mid 90s that has been maintained roughly through till today.
p3<- goal.spread.2%>%
group_by(year)%>%
summarise(average.spread=mean(count_spread)) %>%
ggplot(aes(x=year,y=average.spread))+geom_line() +
geom_vline(xintercept =c(1993,2016))
grid.arrange(p1, p2,p3 ,nrow = 3)
From this pretty quick and dirty run, we can see that roughly the amount of bags being kicked has gone down while totals and spread of goal kickers has remained relatively stable.
What does that mean for todays game and what are some possible next steps?
- Instead of just eyeballing the graphs can we tests for statistically significant differences through time
- What is the value of a “Coleman” contending forward? If scores are roughly the same with less bags being kicked. This would imply that players are chipping in 2,3 goals instead of 1,2 to maintain the same rough total while not getting on the board as much individually.
Finally, do you find this enjoyable? Does having R scripts/ datasets running all the way through and being fully reproducible make you want to do some more analysis yourself?
As always hit me up for feedback.