# Mapping wealth with R: national maps as represenations of wealth distributions
How would the map of a country look like, if we distributed the land in proportion to the distribution of wealth in that country? A somewhat curious question, spurred by a [OWS-poster](http://twitter.com/PascalGoethals/status/225696415723159552/photo/1) that appeared in my Twitter-feed. Of course, even curious questions have a right to an answer, and I will give a walkthrough own how to answer that question with R.
For those following at home, this blog-post is generated using [this](../assets/postdata/wealth-area/wealth-dist-area.Rmd) R Markdown-file, which you can load into Rstudio and [knit](http://yihui.name/knitr/) into this webpage, or use the `purl()` function in `knitr` to extract the [R-code](../assets/postdata/wealth-area/wealth-dist-area.R).
We first load some required packages. `reshape2` and `plyr` help with data-manipulation, `spdep` and `ipgrah` for manipulating spatial objects and graphs (of the edge-vertex kind) and `ggplot2` + `RColorBrewer` for pretty graphs (of the barchart kind).
```{r, message=FALSE}
library(reshape2)
require(plyr)
require(spdep)
require(igraph)
library(ggplot2)
require(RColorBrewer)
```
## National wealth distributions
For starters, how do national wealth distribution look like? The Credit Suisse World Wealth Rapport 2011 (Shorrocks & Davies, p. 14) contains a good selection of wealth distributions, from which we select six countries (plus Belgium, cf. Rademaekers & Vuchelen, 1999) that have data for four easily presentable points in the wealth distribution: the share of wealth owned by the bottom 50% of the population, the share of the mid 40%, the share of the top 9% and the share of the elite 1%.
We combine these datapoints into a dataframe:
```{r}
wealth.dist <- structure(
list(
CA = c(0.054, 0.442, 0.349, 0.155), # 2005, family
FR = c(0.040, 0.440, 0.278, 0.242), # 2010, adult
IE = c(0.050, 0.390, 0.330, 0.230), # 2001, household
IT = c(0.115, 0.462, 0.301, 0.122), # 2008, household
GB = c(0.092, 0.465, 0.318, 0.125), # 2008, household
US = c(0.025, 0.265, 0.372, 0.338), # 2007, family
BE = c(0.135, 0.373, 0.300, 0.197)), # 1994, family
.Names = c("CA", "FR", "IE", "IT", "GB", "US", "BE"),
row.names = c("Bottom 50%", "Mid 40%", "Top 9%", "Elite 1%"),
class = "data.frame")
wealth.dist <- wealth.dist[,order(wealth.dist[4,])] # order by share 1%
t(wealth.dist*100) # print table
```
A simple, and probably most effective, visualisation are stacked bar charts:
```{r}
wealth.dist$EQ <- c(0.50, 0.40, 0.09, 0.01) # perfectly equal
d <- melt(t(wealth.dist))
names(d) <- c('country', 'group', 'value')
d$group <- factor(d$group, levels=row.names(wealth.dist))
d$country <- factor(d$country, levels=names(wealth.dist))
q <- ggplot(d, aes(country, value*100, fill=group)) + geom_bar(stat='identity')
q <- q + scale_fill_brewer(palette="RdYlBu", guide = guide_legend(reverse=TRUE, title="Groups"))
q <- q + opts(title='National wealth ownership')
q + scale_x_discrete(name='Country') + scale_y_continuous(name='Percentage of wealth owned')
```
You appear to have a 50/50 rule: 50% of the wealth is owned by the top 10%, the other half of the wealth is owned by the bottom 90%. Within the top 10%, wealth ownership is concentrated in the top 1%, ranging from 12% in Italy to 34% in the US. On the bottom end of the wealth distribution we see that the poorest half of the population has a disproportionally small share of the wealth. This is abundantly clear if the national distributions are contrasted with the wealth distribution of a hypothetical country ("EQ") with a perfectly equal wealth distribution.
## Distributing country areas
For creating our maps, we can obtain per-country spatial R-objects, directly from the [GADM Database](http://www.gadm.org). Conveniently, these objects contain the surface area per sub-region. We need to calculate the cumulative share of the total country surface, for each of these regions (polygons). Subsequently, we aggreagate the areas in such a way, that the cumulative area distribution reflects the national wealth distribution.
One tricky bit is that we want continous regions, as it is hard to judge the total area of discountinous, irregular shapes. We are helped by the fact that the polygons/areas constituting a country form a graph of neighbours. E.g. this graph links together neighbouring *arrondissements* in Belgium:
```{r}
load(url('http://www.filefactory.com/file/778ibc72fct3/n/BEL_adm3_RData'))
g <- graph.adjlist(poly2nb(gadm)) # convert polygons to graph
set.seed(2222) # fix seed for nice graph layout
plot(g, # plot graph
vertex.label=gadm$NAME_3,
vertex.size=10,
edge.curved=F,
edge.arrow.size=0)
```
We can use this underlying graph structure of the country to group adjecent areas together. If we preform a [breadth-first search](http://en.wikipedia.org/wiki/Breadth-first_search) on the graph of areas, we traverse all the areas going from neighbour to neighbour without visting an area twice. We then can use the resulting sequence of neighbours to order the areas into continous blocks. This operation might seem complex, but it is a oneliner in R:
# order object x based on the sequence obtained by a BFS of g
x[graph.bfs(g, root=i)$order,]
Putting all this together, we have a function `cut.area()` that takes a spatial country-object `x`, a series of breaks and labels representing the distribution you want to represent and the cardinal direction that we start from when "dividing" the country.
```{r, tidy=FALSE}
cut.area <- function(x, breaks, labels=NULL, start='E') {
# extract the area-value of the individual polygons
# and assign to the dataframe
x$area <- laply(polygons(x)@polygons, .fun=slot, 'area')
x$area.pct <- x$area/sum(x$area)
# select most extreme N/S/W/E polygon to start aggregation
start_id <- switch(start,
E = which.max(coordinates(x)[,1]),
W = which.min(coordinates(x)[,1]),
N = which.max(coordinates(x)[,2]),
S = which.min(coordinates(x)[,2]))
# convert the polygons to a neigbours adjacency list
nb <- poly2nb(x)
# if there are unconnected polygons (e.g. islands), link them as
# neighbour to an area with a similar id.
if (any(card(nb) == 0)) {
link.region <- function(nb, id) {
# TODO id = 1!
nb[id][[1]] <- as.integer(id-1)
nb[id-1][[1]] <- c(nb[id-1][[1]], id)
nb
}
unatt.poly <- which(card(nb) == 0)
for (id in unatt.poly) {
nb <- link.region(nb, id)
}
}
# convert the neighbours adjecency list to a graph
g <- graph.adjlist(nb)
# order the polygons so that neighbours are next to eachother,
# using breadth-first search on the polygon graph
x <- x[graph.bfs(g, root=start_id)$order,]
# calculate cumulative percentage of area per polygon
x$area.cpct <- cumsum(x$area.pct)
# finally, create a variable 'cut' holding the categorical area-cuts
x$cut <- cut(x$area.cpct, breaks, labels=labels)
x
}
```
## Generating maps
We can now apply `cut.area()` to a country of choice, with a wealth distribution of choice. Lets apply it to the seven countries we listed above, and visualise the results.
```{r, cache=TRUE}
load(url('http://www.filefactory.com/file/778ibc72fct3/n/BEL_adm3_RData'))
be_breaks <- c(0,cumsum(wealth.dist$BE)) # 0-0.13-0.50-0.80-1
be <- cut.area(gadm, breaks=be_breaks, labels=row.names(wealth.dist), start='S')
area_colors <- brewer.pal(length(levels(be$cut)), 'RdYlBu')
spplot(be, 'cut', col.regions=area_colors,
main="Area of Belgium in proportion to national household wealth distribution")
```
```{r, cache=TRUE}
load(url("http://www.filefactory.com/file/2dc8q1bgqmgn/n/ITA_adm2_RData"))
spplot(
cut.area(
gadm, breaks=c(0, cumsum(wealth.dist$IT)),
labels=row.names(wealth.dist), start='N'),
'cut', col.regions=area_colors,
main = "Area of Italy in proportion to national household wealth distribution")
```
```{r, cache=TRUE}
load(url("http://www.filefactory.com/file/30q38feo9f9z/n/GBR_adm2_RData"))
spplot(
cut.area(
gadm, breaks=c(0, cumsum(wealth.dist$GB)),
labels=row.names(wealth.dist), start='S'),
'cut', col.regions=area_colors,
main = "Area of GB in proportion to national household wealth distribution")
```
```{r, cache=TRUE}
load(url("http://www.filefactory.com/file/3kejp8u67dd3/n/IRL_adm1_RData"))
spplot(
cut.area(
gadm, breaks=c(0, cumsum(wealth.dist$IE)),
labels=row.names(wealth.dist), start='N'),
'cut', col.regions=area_colors,
main = "Area of Ireland in proportion to national household wealth distribution")
```
Unfortunatly, the areas are not fine-grained enough to assign the last 5% that is owned by the bottom 50% of the Irish populace.
```{r, cache=TRUE}
load(url("http://www.filefactory.com/file/65bjutk3usdr/n/FRA_adm3_RData"))
spplot(
cut.area(
gadm, breaks=c(0, cumsum(wealth.dist$FR)),
labels=row.names(wealth.dist), start='N'),
'cut', col.regions=area_colors,
main = "Area of France in proportion to national household wealth distribution")
```
```{r, cache=TRUE}
load(url("http://www.filefactory.com/file/1tzd1kse1ezf/n/CAN_adm2_RData"))
spplot(
cut.area(
gadm, breaks=c(0, cumsum(wealth.dist$CA)),
labels=row.names(wealth.dist), start='E'),
'cut', col.regions=area_colors,
main = "Area of Canada in proportion to national household wealth distribution")
```
```{r}
load(url('http://www.filefactory.com/file/6mfjlkvowe2f/n/USA_adm1_RData'))
# select only mainland USA
coords <- coordinates(gadm)
s <- (abs(coords[,1]) < 140) & (coords[,2] < 60 & coords[,2] > 23)
usa <- gadm[s,]
spplot(
cut.area(
usa, breaks=c(0, cumsum(wealth.dist$US)),
labels=row.names(wealth.dist), start='E'),
'cut', col.regions=area_colors,
main = "Area of the USA in proportion to national household wealth distribution")
```
## References
Shorrocks, A. & Davies, J. (2011). Global Wealth Databook 2011. Credit Suisse Research Institute.
Rademaekers, & Vuchelen, J. (1999). De verdeling van het Belgisch gezinsvermogen. Extraits des Cahiers Economiques de Bruxelles, nr. 164.