What if we divided our countries as we do our wealth? Using R to visualise wealth distributions

24 August 2012

How would the map of a country look like, if we distributed the land in proportion to the distribution of wealth in that country? A somewhat curious question, spurred by a OWS-poster that appeared in my Twitter-feed. Of course, even curious questions have a right to an answer, and I will give a walkthrough on how to answer that question with R.

For those following at home, this blog-post is generated using this R Markdown-file, which you can load into Rstudio and knit into this webpage, or use the purl() function in knitr to extract the R-code.

We first load some required packages. reshape2 and plyr help with data-manipulation, spdep and ipgrah for manipulating spatial objects and graphs (of the edge-vertex kind) and ggplot2 + RColorBrewer for pretty graphs (of the barchart kind).

require(reshape2)
require(plyr)
require(spdep)
require(igraph)
require(ggplot2)
require(RColorBrewer)

National wealth distributions

For starters, what do national wealth distributions look like? The Credit Suisse World Wealth Rapport 2011 (Shorrocks & Davies, p. 14) contains a good selection of wealth distributions, from which we select six countries (plus Belgium, cf. Rademaekers & Vuchelen, 1999) that have data for four easily presentable points in the wealth distribution, i.e.

We combine these datapoints into a dataframe:

wealth.dist <- structure(
  list(
    CA = c(0.054, 0.442, 0.349, 0.155),  # 2005, family
    FR = c(0.040, 0.440, 0.278, 0.242),  # 2010, adult
    IE = c(0.050, 0.390, 0.330, 0.230),  # 2001, household
    IT = c(0.115, 0.462, 0.301, 0.122),  # 2008, household
    GB = c(0.092, 0.465, 0.318, 0.125),  # 2008, household
    US = c(0.025, 0.265, 0.372, 0.338),  # 2007, family
    BE = c(0.135, 0.373, 0.300, 0.197)), # 1994, family   
  .Names = c("CA", "FR", "IE", "IT", "GB", "US", "BE"), 
  row.names = c("Bottom 50%", "Mid 40%", "Top 9%", "Elite 1%"), 
  class = "data.frame")

wealth.dist <- wealth.dist[,order(wealth.dist[4,])] # order by share 1%

t(wealth.dist*100) # print table
##    Bottom 50% Mid 40% Top 9% Elite 1%
## IT       11.5    46.2   30.1     12.2
## GB        9.2    46.5   31.8     12.5
## CA        5.4    44.2   34.9     15.5
## BE       13.5    37.3   30.0     19.7
## IE        5.0    39.0   33.0     23.0
## FR        4.0    44.0   27.8     24.2
## US        2.5    26.5   37.2     33.8

A simple, and probably most effective, visualisation are stacked bar charts:

wealth.dist$EQ <- c(0.5, 0.4, 0.09, 0.01)  # add perfectly equal country

# restack data (one rown per combination)
d <- melt(t(wealth.dist))
names(d) <- c("country", "group", "value")
d$group <- factor(d$group, levels = row.names(wealth.dist))
d$country <- factor(d$country, levels = names(wealth.dist))

# plot distributions in stacked barchart
q <- ggplot(d, aes(country, value * 100, fill = group)) + 
    geom_bar(stat = "identity")
q <- q + scale_fill_brewer(
    palette = "RdYlBu", 
    guide = guide_legend(reverse = TRUE, title = "Groups"))
q <- q + opts(title = "National wealth ownership")
q + scale_x_discrete(name = "Country") + 
    scale_y_continuous(name = "Percentage of wealth owned")

plot of chunk plot-wealth-bars

You appear to have a 50/50 rule: 50% of the wealth is owned by the top 10%, the other half of the wealth is owned by the bottom 90%. Within the top 10%, wealth ownership is concentrated in the top 1%, ranging from 12% in Italy to 34% in the US. On the bottom end of the wealth distribution we see that the poorest half of the population has a disproportionally small share of the wealth. This is abundantly clear if the national distributions are contrasted with the wealth distribution of a hypothetical country (“EQ”) with a perfectly equal wealth distribution.

Distributing country areas

For creating our maps, we can obtain per-country spatial R-objects, directly from the GADM Database. Conveniently, these objects contain the surface area per sub-region. We need to calculate the cumulative share of the total country surface, for each of these regions (polygons). Subsequently, we aggreagate the areas in such a way, that the cumulative area distribution reflects the national wealth distribution.

One tricky bit is that we want continous regions, as it is hard to judge the total area of discountinous, irregular shapes. We are helped by the fact that the polygons/areas constituting a country form a graph of neighbours. E.g. this graph links together neighbouring arrondissements in Belgium:

load(url('http://www.filefactory.com/file/778ibc72fct3/n/BEL_adm3_RData'))
g <- graph.adjlist(poly2nb(gadm)) # convert polygons to graph
set.seed(2222) # fix seed for nice graph layout
plot(g, # plot graph 
     vertex.label=gadm$NAME_3, 
     vertex.size=10, 
     edge.curved=F, 
     edge.arrow.size=0)

plot of chunk plot-arrond-graph

We can use this underlying graph structure of the country to group adjecent areas together. If we preform a breadth-first search on the graph of areas, we traverse all the areas going from neighbour to neighbour without visting an area twice. We then can use the resulting sequence of neighbours to order the areas into continous blocks. This operation might seem complex, but it is a oneliner in R:

# order object x based on the sequence obtained by a BFS of graph g
x[graph.bfs(g, root=i)$order,]

Putting all this together, we have a function cut.area() that takes a spatial country-object x, a series of breaks and labels representing the distribution you want to represent and the cardinal direction that we start from when “dividing” the country.

cut.area <- function(x, breaks, labels=NULL, start='E') {

  # extract the area-value of the individual polygons
  # and assign to the dataframe
  x$area <- laply(polygons(x)@polygons, .fun=slot, 'area')
  x$area.pct <- x$area/sum(x$area)

  # select most extreme N/S/W/E polygon to start aggregation
  start_id <- switch(start,
    E = which.max(coordinates(x)[,1]),
    W = which.min(coordinates(x)[,1]),
    N = which.max(coordinates(x)[,2]), 
    S = which.min(coordinates(x)[,2]))

  # convert the polygons to a neigbours adjacency list
  nb <- poly2nb(x)

  # if there are unconnected polygons (e.g. islands), link them as 
  # neighbour to an area with a similar id.
  if (any(card(nb) == 0)) { 
    link.region <- function(nb, id) {
      # TODO id = 1!
      nb[id][[1]] <- as.integer(id-1)
      nb[id-1][[1]] <- c(nb[id-1][[1]], id)
      nb
    }

    unatt.poly <- which(card(nb) == 0) 
    for (id in unatt.poly) {
      nb <- link.region(nb, id)
    }
  }

  # convert the neighbours adjecency list to a graph
  g <- graph.adjlist(nb)

  # order the polygons so that neighbours are next to eachother,
  # using breadth-first search on the polygon graph
  x <- x[graph.bfs(g, root=start_id)$order,]

  # calculate cumulative percentage of area per polygon
  x$area.cpct <- cumsum(x$area.pct)

  # finally, create a variable 'cut' holding the categorical area-cuts
  x$cut <- cut(x$area.cpct, breaks, labels=labels)

  x
}

Generating maps

We can now apply cut.area() to a country of choice, with a wealth distribution of choice. Lets apply it to the seven countries we listed above, and visualise the results.

load(url('http://www.filefactory.com/file/778ibc72fct3/n/BEL_adm3_RData'))
be_breaks <- c(0,cumsum(wealth.dist$BE)) # 0-0.13-0.50-0.80-1
be <- cut.area(gadm, breaks=be_breaks, labels=row.names(wealth.dist), start='S')
area_colors <- brewer.pal(length(levels(be$cut)), 'RdYlBu')
spplot(be, 'cut', col.regions=area_colors,
  main="Area of Belgium in proportion to national household wealth distribution")

plot of chunk plot-BE

load(url("http://www.filefactory.com/file/2dc8q1bgqmgn/n/ITA_adm2_RData"))
spplot(
  cut.area(
    gadm, breaks=c(0, cumsum(wealth.dist$IT)), 
    labels=row.names(wealth.dist), start='N'),
  'cut', col.regions=area_colors,
  main = "Area of Italy in proportion to national household wealth distribution")

plot of chunk plot-IT

load(url("http://www.filefactory.com/file/30q38feo9f9z/n/GBR_adm2_RData"))
spplot(
  cut.area(
    gadm, breaks=c(0, cumsum(wealth.dist$GB)), 
    labels=row.names(wealth.dist), start='S'),
  'cut', col.regions=area_colors,
  main = "Area of GB in proportion to national household wealth distribution")

plot of chunk plot-GB

load(url("http://www.filefactory.com/file/3kejp8u67dd3/n/IRL_adm1_RData"))
spplot(
  cut.area(
    gadm, breaks=c(0, cumsum(wealth.dist$IE)), 
    labels=row.names(wealth.dist), start='N'),
  'cut', col.regions=area_colors,
  main = "Area of Ireland in proportion to national household wealth distribution")

plot of chunk plot-IE

Unfortunatly, the areas are not fine-grained enough to assign the last 5% that is owned by the bottom 50% of the Irish populace.

load(url("http://www.filefactory.com/file/65bjutk3usdr/n/FRA_adm3_RData"))
spplot(
  cut.area(
    gadm, breaks=c(0, cumsum(wealth.dist$FR)), 
    labels=row.names(wealth.dist), start='N'),
  'cut', col.regions=area_colors,
  main = "Area of France in proportion to national household wealth distribution")

plot of chunk plot-FR

load(url("http://www.filefactory.com/file/1tzd1kse1ezf/n/CAN_adm2_RData"))
spplot(
  cut.area(
    gadm, breaks=c(0, cumsum(wealth.dist$CA)), 
    labels=row.names(wealth.dist), start='E'),
  'cut', col.regions=area_colors,
  main = "Area of Canada in proportion to national household wealth distribution")

plot of chunk plot-CA

load(url('http://www.filefactory.com/file/6mfjlkvowe2f/n/USA_adm1_RData'))

# select only mainland USA
coords <- coordinates(gadm)                                            
s <- (abs(coords[,1]) < 140) & (coords[,2] < 60 & coords[,2] > 23)
usa <- gadm[s,]

spplot(
  cut.area(
    usa, breaks=c(0, cumsum(wealth.dist$US)), 
    labels=row.names(wealth.dist), start='E'),
  'cut', col.regions=area_colors,
  main = "Area of the USA in proportion to national household wealth distribution")

plot of chunk plot-US

References

Shorrocks, A. & Davies, J. (2011). Global Wealth Databook 2011. Credit Suisse Research Institute.

Rademaekers, & Vuchelen, J. (1999). De verdeling van het Belgisch gezinsvermogen. Extraits des Cahiers Economiques de Bruxelles, nr. 164.

This entry was tagged as visualisation inequality R