24 August 2012

How would the map of a country look like, if we distributed the land in proportion to the distribution of wealth in that country? A somewhat curious question, spurred by a OWS-poster that appeared in my Twitter-feed. Of course, even curious questions have a right to an answer, and I will give a walkthrough on how to answer that question with R.

For those following at home, this blog-post is generated using this R Markdown-file, which you can load into Rstudio and knit into this webpage, or use the `purl()`

function in `knitr`

to extract the R-code.

We first load some required packages. `reshape2`

and `plyr`

help with data-manipulation, `spdep`

and `ipgrah`

for manipulating spatial objects and graphs (of the edge-vertex kind) and `ggplot2`

+ `RColorBrewer`

for pretty graphs (of the barchart kind).

require(reshape2) require(plyr) require(spdep) require(igraph) require(ggplot2) require(RColorBrewer)

For starters, what do national wealth distributions look like? The Credit Suisse World Wealth Rapport 2011 (Shorrocks & Davies, p. 14) contains a good selection of wealth distributions, from which we select six countries (plus Belgium, cf. Rademaekers & Vuchelen, 1999) that have data for four easily presentable points in the wealth distribution, i.e.

- the share of wealth owned by the
**bottom 50%**of the population, - the share of the
**mid 40%**, - the share of the
**top 9%**and - the share of the
**elite 1%**.

We combine these datapoints into a dataframe:

wealth.dist <- structure( list( CA = c(0.054, 0.442, 0.349, 0.155), # 2005, family FR = c(0.040, 0.440, 0.278, 0.242), # 2010, adult IE = c(0.050, 0.390, 0.330, 0.230), # 2001, household IT = c(0.115, 0.462, 0.301, 0.122), # 2008, household GB = c(0.092, 0.465, 0.318, 0.125), # 2008, household US = c(0.025, 0.265, 0.372, 0.338), # 2007, family BE = c(0.135, 0.373, 0.300, 0.197)), # 1994, family .Names = c("CA", "FR", "IE", "IT", "GB", "US", "BE"), row.names = c("Bottom 50%", "Mid 40%", "Top 9%", "Elite 1%"), class = "data.frame") wealth.dist <- wealth.dist[,order(wealth.dist[4,])] # order by share 1% t(wealth.dist*100) # print table

## Bottom 50% Mid 40% Top 9% Elite 1% ## IT 11.5 46.2 30.1 12.2 ## GB 9.2 46.5 31.8 12.5 ## CA 5.4 44.2 34.9 15.5 ## BE 13.5 37.3 30.0 19.7 ## IE 5.0 39.0 33.0 23.0 ## FR 4.0 44.0 27.8 24.2 ## US 2.5 26.5 37.2 33.8

A simple, and probably most effective, visualisation are stacked bar charts:

wealth.dist$EQ <- c(0.5, 0.4, 0.09, 0.01) # add perfectly equal country # restack data (one rown per combination) d <- melt(t(wealth.dist)) names(d) <- c("country", "group", "value") d$group <- factor(d$group, levels = row.names(wealth.dist)) d$country <- factor(d$country, levels = names(wealth.dist)) # plot distributions in stacked barchart q <- ggplot(d, aes(country, value * 100, fill = group)) + geom_bar(stat = "identity") q <- q + scale_fill_brewer( palette = "RdYlBu", guide = guide_legend(reverse = TRUE, title = "Groups")) q <- q + opts(title = "National wealth ownership") q + scale_x_discrete(name = "Country") + scale_y_continuous(name = "Percentage of wealth owned")

You appear to have a 50/50 rule: 50% of the wealth is owned by the top 10%, the other half of the wealth is owned by the bottom 90%. Within the top 10%, wealth ownership is concentrated in the top 1%, ranging from 12% in Italy to 34% in the US. On the bottom end of the wealth distribution we see that the poorest half of the population has a disproportionally small share of the wealth. This is abundantly clear if the national distributions are contrasted with the wealth distribution of a hypothetical country (“EQ”) with a perfectly equal wealth distribution.

For creating our maps, we can obtain per-country spatial R-objects, directly from the GADM Database. Conveniently, these objects contain the surface area per sub-region. We need to calculate the cumulative share of the total country surface, for each of these regions (polygons). Subsequently, we aggreagate the areas in such a way, that the cumulative area distribution reflects the national wealth distribution.

One tricky bit is that we want continous regions, as it is hard to judge the total area of discountinous, irregular shapes. We are helped by the fact that the polygons/areas constituting a country form a graph of neighbours. E.g. this graph links together neighbouring *arrondissements* in Belgium:

load(url('http://www.filefactory.com/file/778ibc72fct3/n/BEL_adm3_RData')) g <- graph.adjlist(poly2nb(gadm)) # convert polygons to graph set.seed(2222) # fix seed for nice graph layout plot(g, # plot graph vertex.label=gadm$NAME_3, vertex.size=10, edge.curved=F, edge.arrow.size=0)

We can use this underlying graph structure of the country to group adjecent areas together. If we preform a breadth-first search on the graph of areas, we traverse all the areas going from neighbour to neighbour without visting an area twice. We then can use the resulting sequence of neighbours to order the areas into continous blocks. This operation might seem complex, but it is a oneliner in R:

# order object x based on the sequence obtained by a BFS of graph g x[graph.bfs(g, root=i)$order,]

Putting all this together, we have a function `cut.area()`

that takes a spatial country-object `x`

, a series of breaks and labels representing the distribution you want to represent and the cardinal direction that we start from when “dividing” the country.

cut.area <- function(x, breaks, labels=NULL, start='E') { # extract the area-value of the individual polygons # and assign to the dataframe x$area <- laply(polygons(x)@polygons, .fun=slot, 'area') x$area.pct <- x$area/sum(x$area) # select most extreme N/S/W/E polygon to start aggregation start_id <- switch(start, E = which.max(coordinates(x)[,1]), W = which.min(coordinates(x)[,1]), N = which.max(coordinates(x)[,2]), S = which.min(coordinates(x)[,2])) # convert the polygons to a neigbours adjacency list nb <- poly2nb(x) # if there are unconnected polygons (e.g. islands), link them as # neighbour to an area with a similar id. if (any(card(nb) == 0)) { link.region <- function(nb, id) { # TODO id = 1! nb[id][[1]] <- as.integer(id-1) nb[id-1][[1]] <- c(nb[id-1][[1]], id) nb } unatt.poly <- which(card(nb) == 0) for (id in unatt.poly) { nb <- link.region(nb, id) } } # convert the neighbours adjecency list to a graph g <- graph.adjlist(nb) # order the polygons so that neighbours are next to eachother, # using breadth-first search on the polygon graph x <- x[graph.bfs(g, root=start_id)$order,] # calculate cumulative percentage of area per polygon x$area.cpct <- cumsum(x$area.pct) # finally, create a variable 'cut' holding the categorical area-cuts x$cut <- cut(x$area.cpct, breaks, labels=labels) x }

We can now apply `cut.area()`

to a country of choice, with a wealth distribution of choice. Lets apply it to the seven countries we listed above, and visualise the results.

load(url('http://www.filefactory.com/file/778ibc72fct3/n/BEL_adm3_RData')) be_breaks <- c(0,cumsum(wealth.dist$BE)) # 0-0.13-0.50-0.80-1 be <- cut.area(gadm, breaks=be_breaks, labels=row.names(wealth.dist), start='S') area_colors <- brewer.pal(length(levels(be$cut)), 'RdYlBu') spplot(be, 'cut', col.regions=area_colors, main="Area of Belgium in proportion to national household wealth distribution")

load(url("http://www.filefactory.com/file/2dc8q1bgqmgn/n/ITA_adm2_RData")) spplot( cut.area( gadm, breaks=c(0, cumsum(wealth.dist$IT)), labels=row.names(wealth.dist), start='N'), 'cut', col.regions=area_colors, main = "Area of Italy in proportion to national household wealth distribution")

load(url("http://www.filefactory.com/file/30q38feo9f9z/n/GBR_adm2_RData")) spplot( cut.area( gadm, breaks=c(0, cumsum(wealth.dist$GB)), labels=row.names(wealth.dist), start='S'), 'cut', col.regions=area_colors, main = "Area of GB in proportion to national household wealth distribution")

load(url("http://www.filefactory.com/file/3kejp8u67dd3/n/IRL_adm1_RData")) spplot( cut.area( gadm, breaks=c(0, cumsum(wealth.dist$IE)), labels=row.names(wealth.dist), start='N'), 'cut', col.regions=area_colors, main = "Area of Ireland in proportion to national household wealth distribution")

Unfortunatly, the areas are not fine-grained enough to assign the last 5% that is owned by the bottom 50% of the Irish populace.

load(url("http://www.filefactory.com/file/65bjutk3usdr/n/FRA_adm3_RData")) spplot( cut.area( gadm, breaks=c(0, cumsum(wealth.dist$FR)), labels=row.names(wealth.dist), start='N'), 'cut', col.regions=area_colors, main = "Area of France in proportion to national household wealth distribution")

load(url("http://www.filefactory.com/file/1tzd1kse1ezf/n/CAN_adm2_RData")) spplot( cut.area( gadm, breaks=c(0, cumsum(wealth.dist$CA)), labels=row.names(wealth.dist), start='E'), 'cut', col.regions=area_colors, main = "Area of Canada in proportion to national household wealth distribution")

load(url('http://www.filefactory.com/file/6mfjlkvowe2f/n/USA_adm1_RData')) # select only mainland USA coords <- coordinates(gadm) s <- (abs(coords[,1]) < 140) & (coords[,2] < 60 & coords[,2] > 23) usa <- gadm[s,] spplot( cut.area( usa, breaks=c(0, cumsum(wealth.dist$US)), labels=row.names(wealth.dist), start='E'), 'cut', col.regions=area_colors, main = "Area of the USA in proportion to national household wealth distribution")

Shorrocks, A. & Davies, J. (2011). Global Wealth Databook 2011. Credit Suisse Research Institute.

Rademaekers, & Vuchelen, J. (1999). De verdeling van het Belgisch gezinsvermogen. Extraits des Cahiers Economiques de Bruxelles, nr. 164.