Animating administrative data

Looking at ageing in the English population

England as other developed countries is ageing. Is this process going to increase, stabilise, decrease …? The Office of National Statistics regularly releases information on population trends. In this post I look at the old-age dependency ratio (OADR) which represents the people aged 65 and over for every 1,000 people aged 16 to 64 years (“traditional working age”).

The process is relatively nice and easy. It consists of three core steps:

  • Load the geographical data and select the information for England

  • Arrange the data into a tidy format

  • Plot the data in a faceted way

Load the data

First things first, let’s get the geographical information (aka. the shapefile) on board. Before, this was a bit messy process but this is extremely easy with sf::read_sf(). The Office of National Statistics offers geographical information at several levels for the whole United Kingdom. I am only interested in the information referred to the 326 districts in England (denominated with E0... )

england = sf::read_sf("./data/geography/england_lad_2017/Local_Authority_Districts_December_2017_Ultra_Generalised_Clipped_Boundaries_in_United_Kingdom_WGS84.shp") %>% 
  filter(str_detect(lad17cd, "^E0")) 

The data created is a tibble

## Simple feature collection with 5 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 418796.4 ymin: 506491 xmax: 478259.4 ymax: 537164.9
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
## # A tibble: 5 x 11
##   objectid lad17cd lad17nm lad17nmw  bng_e  bng_n  long   lat st_areasha
##      <dbl> <chr>   <chr>   <chr>     <dbl>  <dbl> <dbl> <dbl>      <dbl>
## 1        1 E06000… Hartle… <NA>     447157 531476 -1.27  54.7  96586817.
## 2        2 E06000… Middle… <NA>     451141 516887 -1.21  54.5  54741668.
## 3        3 E06000… Redcar… <NA>     464359 519597 -1.01  54.6 247140467.
## 4        4 E06000… Stockt… <NA>     444937 518183 -1.31  54.6 206473805.
## 5        5 E06000… Darlin… <NA>     428029 515649 -1.57  54.5 198298966.
## # … with 2 more variables: st_lengths <dbl>, geometry <MULTIPOLYGON [m]>

The second main source of information refers to the OADR. The data covers years 1996 to 20361 in periods of 10 years. The names of the variables in the information of the ONS are not particularly clean so janitor::clean_names() does a splendid job here.

ageing = read_excel("data/raw/ons_uk_ageing.xls", sheet = "Old Age Dependency Ratio" ) %>%
  clean_names() %>%
  filter(str_detect(area_code, "^E0")) 

Tidy the data

The key is to present the data in long format having a time and a dependency variable for each decade and old age dependency ratio respectively. tidyr::gather() may be a good option here.

ageing = ageing %>%
  gather(year, dependency, x1996:x2036, -area_code, -area_name) %>%
  mutate(year = gsub("x", "", year)) %>%
  arrange(area_code, year)
## # A tibble: 5 x 4
##   area_code area_name  year  dependency
##   <chr>     <chr>      <chr>      <dbl>
## 1 E06000001 Hartlepool 1996        244.
## 2 E06000001 Hartlepool 2006        261.
## 3 E06000001 Hartlepool 2016        303.
## 4 E06000001 Hartlepool 2026        384.
## 5 E06000001 Hartlepool 2036        480.

Also, to fill the map I convert the OADR into a factor variable with dplyr::ntile() and link it to the data frame with geographical data (england). I create a variable lab_dep with the max and min value of each tile and that I will use as label for the represented variable.

ageing = ageing %>%
  mutate(quintile_dep = ntile(dependency, 7), 
         lab_dep = as.factor(ordered(case_when(quintile_dep == 1 ~ "[81 - 218)", 
                                               quintile_dep == 2 ~ "[218 - 250)", 
                                               quintile_dep == 3 ~ "[250 - 283)",
                                               quintile_dep == 4 ~ "[283 - 329)",
                                               quintile_dep == 5 ~ "[329 - 392)", 
                                               quintile_dep == 6 ~ "[392 - 478)", 
                                               quintile_dep == 7 ~ "[478 - 928)")))) 


england_ext = full_join(england, ageing, by = c("lad17cd" = "area_code", "lad17nm" ="area_name") )

england_ext$quintile_dep = as.factor(england_ext$quintile_dep)
## Observations: 1,630
## Variables: 15
## $ objectid     <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, …
## $ lad17cd      <chr> "E06000001", "E06000001", "E06000001", "E06000001",…
## $ lad17nm      <chr> "Hartlepool", "Hartlepool", "Hartlepool", "Hartlepo…
## $ lad17nmw     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ bng_e        <dbl> 447157, 447157, 447157, 447157, 447157, 451141, 451…
## $ bng_n        <dbl> 531476, 531476, 531476, 531476, 531476, 516887, 516…
## $ long         <dbl> -1.27023, -1.27023, -1.27023, -1.27023, -1.27023, -…
## $ lat          <dbl> 54.67616, 54.67616, 54.67616, 54.67616, 54.67616, 5…
## $ st_areasha   <dbl> 96586817, 96586817, 96586817, 96586817, 96586817, 5…
## $ st_lengths   <dbl> 50245.93, 50245.93, 50245.93, 50245.93, 50245.93, 3…
## $ year         <chr> "1996", "2006", "2016", "2026", "2036", "1996", "20…
## $ dependency   <dbl> 244.1609, 261.2594, 303.0965, 384.3626, 479.5169, 2…
## $ quintile_dep <fct> 2, 3, 4, 5, 7, 2, 2, 2, 4, 5, 3, 3, 5, 6, 7, 2, 2, …
## $ lab_dep      <ord> [218 - 250), [250 - 283), [283 - 329), [329 - 392),…
## $ geometry     <MULTIPOLYGON [m]> MULTIPOLYGON (((448801.4 53..., MULTIP…

Plot the data

Let´s begin plotting the OADR for 2016. sf::geom_sf() seems to be a simpler solution to other alternatives such as ggplot:geom_polygon().

 england_ext %>%
  filter(year == "2016") %>%
  mutate(lab_dep = gsub("[[:alpha:]]", "", lab_dep)) %>%
  rename(OADR = quintile_dep) %>%
  ggplot(aes(x = long, y = lat, fill = OADR)) + 
  geom_sf(colour = alpha("grey", 1 /3), size = 0.2) +
  coord_sf( datum = NA) +
  scale_fill_viridis(option = "viridis",
                     labels = c("[81 - 218)", "[218 - 250)", "[250 - 283)", "[283 - 329)", "[329 - 392)", "[392 - 478)","[478 - 928)"),
                     alpha = 0.85, 
                     discrete = T, 
                     direction = -1)

Not too bad. Yet, what is the picture over time?.

plot_ageing = england_ext %>%
  mutate(lab_dep = gsub("[[:alpha:]]", "", lab_dep)) %>%
  rename(OADR = quintile_dep) %>%
  ggplot(., aes(x = long, y = lat, fill = OADR)) + 
  geom_sf(colour = alpha("grey", 1 /3), size = 0.2) +
  coord_sf( datum = NA) +
  scale_fill_viridis(option = "viridis",
                     labels = c("[81 - 218)", "[218 - 250)", "[250 - 283)", "[283 - 329)", "[329 - 392)", "[392 - 478)","[478 - 928)"),
                     alpha = 0.85, 
                     discrete = T, 
                     direction = -1, 
                     guide = guide_legend(
                       direction = "horizontal",
                       title.position = "top",
                       title.hjust =0.5)) +
  facet_wrap(~ year, ncol = 3)

The key line is the facet_wrap() call which allows to facet OADR over years. A bit of fine tunning with some options in theme() give a cleaner plot.

plot_ageing +
  theme(axis.text = element_blank()
        ,axis.title = element_blank()
        ,axis.ticks = element_blank()
        ,axis.line=element_blank()
        ,panel.grid = element_blank()
        ,legend.text = element_text(size = 10)
        ,legend.key.width = unit(0.35,"cm")
        ,legend.key.height = unit(0.35,"cm")
        ,plot.title = element_text(size= 6)
        ,legend.position = "bottom"
        ,plot.caption = element_text()
        ,legend.background = element_blank()
        ,panel.background = element_blank()
        ,legend.spacing.x = unit(0.25, 'cm')
        ,strip.text.x = element_text(size = 12)) 

Animate the plot

We can combine the former plots by creating an animated map with gganimate::animate(). Here the main difference with respect to a simple plot in ggplot() is the incorporation of transition_manual() that specifies the transition in the map.

aniplot = england_ext %>%
  mutate(lab_dep = gsub("[[:alpha:]]", "", lab_dep)) %>%
  rename(OADR = quintile_dep) %>%
  ggplot(., aes(x = long, y = lat, fill = OADR)) + 
  geom_sf(colour = alpha("grey", 1 /3), size = 0.2) +
  coord_sf( datum = NA) +
  labs(title = "Old Age Dependency Ratio (OADR)",
       subtitle = 'Year: {current_frame}') +
  transition_manual(year) + 
  scale_fill_viridis(option = "viridis",
                     labels = c("[81 - 218)", "[218 - 250)", "[250 - 283)", "[283 - 329)", "[329 - 392)", "[392 - 478)","[478 - 928)"),
                     alpha = 0.85, 
                     discrete = T, 
                     direction = -1, 
                     guide = guide_legend(
                       direction = "horizontal",
                       title.position = "top",
                       title.hjust =0.5)) +
  theme(axis.text = element_blank()
        ,axis.title = element_blank()
        ,axis.ticks = element_blank()
        ,axis.line=element_blank()
        ,panel.grid = element_blank()
        ,legend.title = element_blank()
        ,legend.text = element_text(size = 14)
        ,legend.key.width = unit(0.35,"cm")
        ,legend.key.height = unit(0.5,"cm")
        ,plot.title = element_text(size= 22)
        ,plot.subtitle=element_text(size=18)
        ,legend.position = "bottom"
        ,plot.caption = element_text()
        ,legend.background = element_blank()
        ,panel.background = element_blank()
        ,legend.spacing.x = unit(0.25, 'cm'))

The code and data to reproduce the figures are available on Github.


  1. Data for 2026 and 2036 are projections