The mapping file for Belgian geographic codes (kudos to Steven Groenez), contains the required information to to:

With the mapping file and the free and open-source software package R, it takes ~5 lines of code to read in data, translate for example a postal code to a NIS region code, and save the data for further analyses.

In this walkthrough we will demonstrate and explain:

  1. How to use the mapping file to translate geographical codes:
    1. Read in data and the mapping file (CSV-files);
    2. Use the mapping file to translate between different Belgian geographical codes and levels;
    3. Export the data file with the translated codes to CSV, SPSS, SAS or Stata formats.
  2. How to use postal codes translated to NIS-codes on different levels geographical levels to create thematic maps.

Using the mapping file

Full code

The code below loads the mapping file and an example dataset with counts of respondents per postal code. It then maps the postal codes to NIS-codes on municipal level, district (“arrondissement”) level, and region level.

You should be able to copy-paste this code in R, run the example, and obtain a new CSV-file with the suffix ‘mapped’, containing the data from the original file and the three new columns derived from the postcal codes. In step one to four we will walk through, extend, and explain this code-snippet.

The first time you run the code below, you need to install the add-on library “readr” by running the line below.

install.packages("readr")
# Full example: using the mapping code
# ------------------------------------

# import library (install first if needed) 
library(readr)

# change working directory to location of mapping & data files
setwd('C:/Users/u0062125/Documents/work/HIVA/projects/varia/mapping_postcodes')

# load mapping and data files
mapping = read_csv('20170224_mapping_municipalities.csv', col_types = 'ccccccccccccc')
dat = read_csv('example_data_postcode.csv', col_types = 'ci')

# map required colums (postal code to NIS on muncip., district & regional level)
dat$municipality_nis_code = mapping$municipality_nis_code[
  match(dat$postcode, mapping$postcode)]
dat$arrond_nis_code = mapping$arrondissement_nis_code[
  match(dat$postcode, mapping$postcode)]
dat$region_label_nl = mapping$region_label_nl[
  match(dat$postcode, mapping$postcode)]

# export enriched data-file
write_csv(dat, 'example_data_respondents_postcode_mapped.csv')

Step 1: preparation

Get what you need:

  • If needed, install (download) and RStudio (download);
  • Download the mapping file (‘20170224_mapping_municipalities.csv’);
  • Download the example data we will be using (‘example_data_postcode.csv’).

After starting R, use setwd() to change to location to where your placed the downloaded files.

# Set the working directory where R will read & write files in the next steps
#   Note the forward slashes!
setwd('C:/Users/u0062125/Documents/work/HIVA/projects/varia/mapping_postcodes')

To load the CSV-files, we will use the R add-on library “readr”, which provides userfriendly ways to read text-delimited data such as CSV, TSV, etc. (more info).

The first time you use a library, you need to install it using install.package("libary name"). After that, when you use your script, you don’t run the install-command, but do always need to load the required libraries using library("library name").

install.packages('readr') # run this line only the first time to install add-on library
library(readr) # load add-on library every time you run the script

With the “readr” library loaded, we use read_csv() to read in the CSV mapping file with the NIS & NUTS mappings for different geographical levels in Belgium.

mapping = read_csv(
  file = '20170224_mapping_municipalities.csv', 
  col_types = 'ccccccccccccc') # read in each of the 13 columns as character ("c")

With the mapping datafile succesfully loaded, proceed to step 2.

Step 2: load example data

We read-in the downloaded example data using read_csv().

dat = read_csv(
  file = 'example_data_postcode.csv', 
  col_types = 'ci') # read first column as character, second as integer

This datafile contains 1042 records and two variables, “postcode” and “n_respondents”, summarizing the number of survey-respondents for each postal code. In total it concerns 20000 respondents.

dim(dat) # dimensions of the example dataset (rows, columns)
## [1] 1042    2
head(dat) # first 6 observations
## # A tibble: 6 × 2
##   postcode n_respondents
##      <chr>         <int>
## 1     1000            79
## 2     1020            85
## 3     1030           208
## 4     1040            44
## 5     1050            88
## 6     1060            72
sum(dat$n_respondents) # total number of respondents
## [1] 20000

We can do a quick check if all postal codes in the data, are also present in the mapping file. If this returns false, that means that there are likely invalid or malformed postal codes in the input data.

all(dat$postcode %in% mapping$postcode) # all postal codes present? => TRUE/FALSE
## [1] TRUE

Now that we are sure that every postal code can be matched, proceed to step 3.

Step 3: translate codes

We use the different columns in the mapping file to translate the postal code in the data to other codings. This can be done for each column in the mapping file, in different directions (NUTS on different levels, NIS on different level, text labels, etc.).

Also the input to the mapping does not need to be a postal code, it can be matched to any column in the mapping file. E.g. matching on the NIS-code on muncipal level, to translate it into the NIS-code for regions.

We will illustrate by mapping the postal code in the example data to

  1. the NIS code for that municipality,
  2. the NIS-code for the arrondissement that muncipality is in, and
  3. the label for the region the municipality is in.

To do so, we use each time the match() function to map observations on a specific column present in both the datafile as the mapping file:

match(datafile$column_name, mappingfile$column_name)

By specifying in front the the first square backet a column present in the mapping file, you can choose the mapped variable that will be written to the extended dataset.

mappingfile$column_name[match(datafile$column_name, mappingfile$column_name)]

# Translate postal code in example data to the NIS code for muncipality 
dat$municipality_nis_code = mapping$municipality_nis_code[
  match(dat$postcode, mapping$postcode)]
head(dat) # show first rows after translating to NIS-codes
## # A tibble: 6 × 3
##   postcode n_respondents municipality_nis_code
##      <chr>         <int>                 <chr>
## 1     1000            79                 21004
## 2     1020            85                 21004
## 3     1030           208                 21015
## 4     1040            44                 21005
## 5     1050            88                 21009
## 6     1060            72                 21013

We see that there is a new, third column “municipality_nis_code”, containing the mapped NIS-codes on municipal level, for the postal codes we used as input.

We repeat this, mapping postal codes to NIS-codes on district level and to labels on region level.

# Translate postal code in example data to the NIS code for arrondissement
dat$arrond_nis_code = mapping$arrondissement_nis_code[
  match(dat$postcode, mapping$postcode)]

# Translate postal code in example data to the label for region
dat$region_label_nl = mapping$region_label_nl[
  match(dat$postcode, mapping$postcode)]
head(dat) # show first six rows
## # A tibble: 6 × 5
##   postcode n_respondents municipality_nis_code arrond_nis_code
##      <chr>         <int>                 <chr>           <chr>
## 1     1000            79                 21004           21000
## 2     1020            85                 21004           21000
## 3     1030           208                 21015           21000
## 4     1040            44                 21005           21000
## 5     1050            88                 21009           21000
## 6     1060            72                 21013           21000
## # ... with 1 more variables: region_label_nl <chr>

We finally have three news columns, containing the requested mapped values for each row/record, i.e. “municipality_nis_code”, “arrond_nis_code” and “region_label_nl”.

To illustrate, we request a simple table of the “region_label_nl” variable, which shows that we have 22 postal codes with 1 or more respondents in Brussels, 508 in Flanders and 512 in Wallonia.

table(dat$region_label_nl)
## 
## Brussels Hoofdstedelijk Gewest                  Vlaams Gewest 
##                             22                            508 
##                   Waals Gewest 
##                            512

You can repeat the match()-step for different columns. Once happy about the resulting dataset, export it in step 4.

Step 4: export dataset

We use the write_csv() function in the “readr” library to export the expanded dataset to a filename of choice. By default it will end up in the working directory specified above.

write_csv(dat, 'example_data_respondents_postcode_mapped.csv')

Most statistical programs can read in a CSV-file. If it is really necessary to directly read in the expanded data file in a specific binary format (SAS, SPSS, Stata), you can (install and) use the haven library to do so.

library(haven) # install first if needed

write_sas(dat, 'example_data_respondents_postcode_mapped.sas7bdat')
write_sav(dat, 'example_data_respondents_postcode_mapped.sav')
write_dta(dat, 'example_data_respondents_postcode_mapped.dta')

Application: create a thematic map

Full code

Once your data is enriched with the proper NIS-codes using the mapping file, it becomes relatively straightforward to visualise data on a thematic map. This is possible for every level that is defined by the NIS-codes: muncipality, district (“arrondissement”), province and region.

The code below loads the example data and mapping file, maps postal codes to NIS-codes and visualises the number of respondents per muncipality on a thematic map. The full code below is slightly simplified, ignoring the fact that postal codes can be nested within municipal NIS-code. In step one to four we will walk through, extend, and explain this code-snippet.

The first time you run the code below, you need to install three add-on libraries by running the lines below.

install.packages('readr')
install.packages('tmap')
install.packages("BelgiumMaps.StatBel", repos = "http://www.datatailor.be/rcube", type = "source")
# Full example: create municipal map
# ----------------------------------

# load required libraries, install first if needed
library(readr)
library(tmap)
library(BelgiumMaps.StatBel)

# load geographic dataset on municipality level
data("BE_ADMIN_MUNTY")

# load mapping file & example data with postal codes
mapping = read_csv('20170224_mapping_municipalities.csv', col_types = 'ccccccccccccc')
dat = read_csv('example_data_postcode.csv', col_types = 'ci')

# Translate postal code to the NIS code for muncipality & district
dat$municipality_nis_code = mapping$municipality_nis_code[
  match(dat$postcode, mapping$postcode)]
dat$arrond_nis_code = mapping$arrondissement_nis_code[
  match(dat$postcode, mapping$postcode)]

# add number of respondents per NIS-code (municipality level)
BE_ADMIN_MUNTY@data$n_respondents = dat$n_respondents[
  match(BE_ADMIN_MUNTY@data$CD_MUNTY_REFNIS, dat$municipality_nis_code)]

# Generate a map on muncipal level
map.municip = qtm(
  BE_ADMIN_MUNTY,
  fill = 'n_respondents', 
  fill.breaks = c(0, 1, 10, 100, 150, 250, 350),
  fill.title = 'Number of respondents')

# Save the map to a PNG-file
save_tmap(
  map.municip, 
  filename = 'map_response_municipalities.png',
  width = 1920, height = 1080)

Step 1: preparation

We assume that you ran the previous steps on how to use the mapping file, which means that you should have a dataset-object called “dat” in your R-session, with NIS-codes on muncipal and district-level. Check by viewing the first rows:

head(dat)
## # A tibble: 6 × 4
##   postcode n_respondents municipality_nis_code arrond_nis_code
##      <chr>         <int>                 <chr>           <chr>
## 1     1000            79                 21004           21000
## 2     1020            85                 21004           21000
## 3     1030           208                 21015           21000
## 4     1040            44                 21005           21000
## 5     1050            88                 21009           21000
## 6     1060            72                 21013           21000

There is an issue that we skipped over in the full code example: there is not a one-on-one relationship between postal codes and NIS-codes on the municipal level. Frequently this is the case, but in larger municipalities or with historical mergers of municipalities, etc. it is possible that there are multiple postal code for one municipal NIS-code.

You can see this already it in the first records of the dat dataset above: at least two postal codes, “1000” (79 respondents) and “1020” (85 respondents) have both municipal NIS-code “21004”. So to correctly present the counts of respondents in each municipality (which is defined by a NIS-code), we need to aggregate the counts of postal codes nested in municipal NIS-codes.

To do this, we use the “dplyr”" library to create a new, aggegated dataset called dat.municip. The functionality of dplyr is very extensive, for more information a good point to start is this chapter in the (full-text online) book “R for Data Science” (Grolemund & Wickham, 2017).

library(dplyr) # load library, install first if needed
dat.municip = dat %>%
  group_by(municipality_nis_code) %>%  # group data by municip., based on NIS-code column
  tally(n_respondents) %>%             # tally respondents counts per municipality
  rename(n_respondents = n)            # rename to keep variable names consistent

This results in a dataset dat.muncip, still on municipality level, with two variables containing the NIS-code and the – properly aggregated – counts of respondents.

dat.municip
## # A tibble: 588 × 2
##    municipality_nis_code n_respondents
##                    <chr>         <int>
## 1                  11001            25
## 2                  11002           711
## 3                  11004            30
## 4                  11005            31
## 5                  11007            22
## 6                  11008            49
## 7                  11009            64
## 8                  11013            49
## 9                  11016            49
## 10                 11018            18
## # ... with 578 more rows

We still have the correct total count of 20000 respondents, and see that the respondents in municipal NIS-code “21004” are correctly aggregated. We now also have a total tally of 195 repondents for municipal NIS-code 21004.

sum(dat.municip$n_respondents)
## [1] 20000
dat.municip %>% filter(municipality_nis_code == '21004') # show aggregated count for NIS-code 21004
## # A tibble: 1 × 2
##   municipality_nis_code n_respondents
##                   <chr>         <int>
## 1                 21004           195

This new total of 195 repondents is correctly aggregated, if we look at the original dat dataset, and sum all respondents we have in the four postal codes nested in NIS-code 21004.

# show all records in the original dat dataset with NIS-code 21004
dat %>% filter(municipality_nis_code == '21004') 
## # A tibble: 4 × 4
##   postcode n_respondents municipality_nis_code arrond_nis_code
##      <chr>         <int>                 <chr>           <chr>
## 1     1000            79                 21004           21000
## 2     1020            85                 21004           21000
## 3     1120            21                 21004           21000
## 4     1130            10                 21004           21000
# tally all respondents for all records in the original dat datset with NIS-code 21004
dat %>% 
  filter(municipality_nis_code == '21004') %>% 
  tally(n_respondents) 
## # A tibble: 1 × 1
##       n
##   <int>
## 1   195

We are now pretty sure that we have the proper counts of respondents for each municipality, defined by a NIS-code. In the next step 2 we visualise these counts on a map of municipalities.

Step 2: municipal map

To generate maps , we need the “tmap” library (for plotting the map) and the “BelgiumMaps.StatBel” library. The “BelgiumMaps.StatBel” library is a tremendous time-saver, developed by Jan Wijfels on the basis of open data from the federal Belgian government and contains the required spatial datasets with boundaries to create maps for Belgium.

These spatial datasets – in our case for municipal and arrondissement/district level – are loaded using the data() function, after loading the “BelgiumMaps.StatBel” library itself.

library(tmap) # load library, install first if needed
library(BelgiumMaps.StatBel) # load library, install first if needed

data("BE_ADMIN_MUNTY") # load geographic dataset on municipality level
data("BE_ADMIN_DISTRICT") # load geographic dataset on district (arrond.) level

Other available objects are BE_ADMIN_AGGLOMERATIONS (200m sudivisions), BE_ADMIN_SECTORS (statistical sectors), BE_ADMIN_PROVINCE (provinces) and the entire region of Belgium (BE_ADMIN_BELGIUM)

These spatial datasets contain both spatial elements and data related to those elements. The data-part can be accessed and modified through the @data-modifier.

head(BE_ADMIN_MUNTY@data) # show associated data for spatial elements on municipal level
## # A tibble: 6 × 20
##   CD_MUNTY_REFNIS TX_MUNTY_DESCR_NL TX_MUNTY_DESCR_FR CD_DSTR_REFNIS
## *           <chr>             <chr>             <chr>          <int>
## 1           11001        Aartselaar        Aartselaar          11000
## 2           11002         Antwerpen            Anvers          11000
## 3           11004          Boechout          Boechout          11000
## 4           11005              Boom              Boom          11000
## 5           11007          Borsbeek          Borsbeek          11000
## 6           11008        Brasschaat        Brasschaat          11000
## # ... with 16 more variables: TX_ADM_DSTR_DESCR_NL <chr>,
## #   TX_ADM_DSTR_DESCR_FR <chr>, CD_PROV_REFNIS <int>,
## #   TX_PROV_DESCR_NL <chr>, TX_PROV_DESCR_FR <chr>, CD_RGN_REFNIS <int>,
## #   TX_RGN_DESCR_NL <chr>, TX_RGN_DESCR_FR <chr>, nuts0 <chr>,
## #   nuts1 <chr>, nuts2 <chr>, nuts3 <chr>, SURFACE.GIS.h <dbl>,
## #   SURFACE.CAD.h <dbl>, SURFACE.GIS.km2 <dbl>, SURFACE.CAD.km2 <dbl>

Out-of-the box, the associated data contains the required NIS-codes, NUTS-codes, labels in Dutch and French, some basic characteristics such as surface area, etc.

The core idea is that we add additional variables to this associated dataset, which are then used to color-in the map in the next steps. Depending on which geographic level we are dealing with, we add information to the approriate spatial dataset.

To be sure, lets’ check that every NIS-code in our data is also present in the NIS-codes in the spatial dataset:

all(dat$municipality_nis_code %in% BE_ADMIN_MUNTY@data$CD_MUNTY_REFNIS)
## [1] TRUE

This is true, so our mapping resulted in codes that are all valid, and thus are linkable to the codes in the spatial dataset.

As in the example above, we also use match() here, but this time to add the variable for number of respondents to the geographic dataset on muncipality level, while matching on the correct NIS-code for each muncipality.

# add number of respondents per NIS-code (municipality level)
BE_ADMIN_MUNTY@data$n_respondents = dat.municip$n_respondents[
  match(BE_ADMIN_MUNTY@data$CD_MUNTY_REFNIS, dat.municip$municipality_nis_code)]
# There are records with missing values, we delete those when summing
sum(BE_ADMIN_MUNTY@data$n_respondents, na.rm = TRUE)
## [1] 20000

We correctly still have information on 20000 respondents, but now contained in the associated data for the spatial dataset on muncipal level.

With the respondent data added to the spatial dataset, we use the qtm() function (“quick topic map”) from the “tmap” library. See the online documentation of this extensive library for more complex examples on how to creating both static as interactive maps.

map.muncip = qtm(
  BE_ADMIN_MUNTY,                                # use municip. spatial dataset
  fill = 'n_respondents',                        # fill with column containing respondent counts
  fill.breaks = c(0, 1, 10, 100, 150, 250, 350), # specify smaller than default cuts in scale
  fill.title = 'Number of respondents')
map.muncip

Generating this map is a one-liner with qtm(), as we have the data nicely on the correct level (municipality). Note the difference with the map on muncipal level, without the correction for nested postal codes, which resulted in visibly lower counts in the large cities.

Next to municipal level, We have already mapped the records in the example data to the NIS-code on the district level. In step 3 we also aggregate and display the data on this level.

Step 3: aggregate to district map

The same approach we used to aggerate nested postal codes within municipal NIS-codes, we now use to aggregate the respondent counts per muncipality on the district (“arrondissement”) level.

dat.district = dat %>%
  group_by(arrond_nis_code) %>%  # group data by district, based on NIS-code column
  tally(n_respondents) %>%       # tally respondents counts per district
  rename(n_respondents = n)      # rename to keep variable names consistent

This results in a two-variable dataset dat.district, with “n_respondents” indicating the number of respondents.

head(dat.district) # show first six rows
## # A tibble: 6 × 2
##   arrond_nis_code n_respondents
##             <chr>         <int>
## 1           11000          1710
## 2           12000           756
## 3           13000          1388
## 4           21000          1370
## 5           23000           928
## 6           24000           778
sum(dat.district$n_respondents)
## [1] 20000

We still have the correct total of respondents (20000), but now aggregated on a “higher” geographical level.

As we did with the munipal spatial dataset, we add the aggregated number of respondents using match().

BE_ADMIN_DISTRICT@data$n_respondents = dat.district$n_respondents[
  match(BE_ADMIN_DISTRICT@data$CD_DSTR_REFNIS, dat.district$arrond_nis_code)]

We can plot this map on arrondissement-level in the same way with qtm():

map.arrond = qtm(
  BE_ADMIN_DISTRICT, # use district spatial dataset
  fill = 'n_respondents', # fill with column containing respondent counts
  fill.breaks = c(0, 1, 100, 300, 500, 700, 900, 1000, 1200), # specify larger cuts in scale
  fill.title = 'Number of respondents')
map.arrond

Step 4: save maps

Finally, we save the two maps created by qtm() to the working directory using save_tmap().

save_tmap(map.arrond, 'map_response_arrond.png',width=1920, height=1080)
save_tmap(map.muncip, 'map_response_muncip.png',width=1920, height=1080)