Drawing maps with R. A basic tutorial

By Eugenio Petrovich

Maps are a powerful tool to visualize information. Plotting data on a map can reveal trends and patterns that are difficult to spot by inspecting a spreadsheet. Maps are also very useful to communicate information to the public in an appealing and interpretative way.

In this brief tutorial, we will learn how to generate simple geographic maps with R. In particular, we will learn how to produce the following map of the DR2 members in Europe:

Getting started

R is a free and open-source software that offers many solutions for computing data and producing visualizations. A great advantage of R is that its basic functionalities can be expanded by further packages that are freely available on CRAN, the Comprehensive R Archive Network. Moreover, there is an active R community around the world that answers most of the coding questions you may have.

The packages necessary for this tutorial can be installed with:

install.packages(c("sf", "rnaturalearth" , "rnaturalearthdata", "rgeos", 
"ggspatial", "ggrepel", "tidyverse"))

The first five packages are specifically developed for maps: sf is used to manage spatial data, rnaturalearth and rnaturalearthdata contain information about all countries of the world, as well as information that is required to plot those countries as polygons on a map, and ggspatial improves the visualization of spatial data. ggrepel will help us in managing the labels on the map, whereas tidyverse comprises a set of R libraries that have become the standard for data manipulation and visualization.

After installing the packages, we need to load them in our workspace:

library("sf")
library("rnaturalearth")
library("rnaturalearthdata")
library("rgeos")
library("ggspatial")
library("ggrepel")
library("tidyverse")

Before creating the maps, we need to import in R the geographic data of the DR2 members. We stored them in the following data frame:

ID City Country Members Label Lat Lng
1 Turin Italy 14 Turin (14) 45,07049 7,68682
2 Siena Italy 2 Siena (2) 43,31822 11,33064
3 Pisa Italy 1 Pisa (1) 43,70853 10,4036
4 Florence Italy 1 Florence (1) 43,77925 11,24626
5 Barcelona Spain 1 Barcelona (1) 41,38879 2,15899
6 Amsterdam Netherlands 1 Amsterdam (1) 52,37403 4,88969
7 Vienna Austria 1 Vienna (1) 48,20849 16,37208
8 Bruxelles Belgium 1 Bruxelles (1) 50,85045 4,34878
9 Montreal Canada 1 Montreal (1) 45,50884 -73,5878
10 Lausanne Switzerland 1 Lausanne (1) 46,516 6,63282

As you can see, cities are the basic unit of our data frame. For each of them, we specified the country, the number of DR2 members, the label we will display on the map (it is made of the name of the city plus the number of DR2 members between brackets), and the latitude and longitude (you can found them here).

We import the data frame, that is stored in a CSV file, in R with the function read.csv. Since we used a header with the columns’ names, we set the argument header to TRUE. We need also to specify that the separator between the columns is the semi-colon and that the decimal separator is the comma (and not the period, because we used an Italian version of Excel to produce the file).

DR2_data <- read.csv(file="DR2_geo_data.CSV",
    header=TRUE, 
    sep=";", 
    dec = ",")

You can check the first records of the data frame by using the command head(DR2_data).

We are now ready for producing our map.

Creating the world map

The first step of our mapping exercise is creating a world map. To do this, we use the function ne_countries to pull country data from rnaturalearth. We specify medium as scale and sf as return class of the data frame, so that the data are already in the right format for geographic mapping.

world <- ne_countries(scale = "medium", returnclass = "sf")

We plot these data with ggplot2, the tidyverse package for visualization, and sf:

ggplot(data = world) +
       geom_sf()

The result is a map of the world:

We will use the world map as a base map on which we will highlight the countries where DR2 members are based.

To highlight the DR2 countries on the map, we need now to “add” our DR2 data on the world data frame. We do this by the function left.join.

world_joined <- left_join(world, DR2_data, by = c("name" = "Country"))

This function tells R that it has to join the DR2 data table on the world data by looking for a match on the country name (we specify the matching key between the two datasets in the by argument). When the join operation finds a match, it combines the records from the two tables. When it does not find a match, as in the case of Brazil, it will set the value of the DR2 columns (e.g., “Members”) of the non-matching records as NA, the standard code used by R for missing values. Thus, the record Brazil will have NA as the value of the column “Members”. It is important to retain all the countries in the world and not only those with DR2 members. Otherwise, when we will plot our data on the map, we will lose all the countries without DR2! This is why we used the left.join instead of the simple join: we want R to retain all the records in the “left hand” dataset (i.e., the one that occupies the first argument in the function).

We want now to highlight the countries with DR2 members on the world map. To do this, we use an if…else control in the fill argument. If the value of the column “Members” is null (i.e., it equals NA), we set the color of the country to grey. If it is not null, that is, if there are DR2 members in that country, we set the color to red. Note that in the first case, we used the color name, whereas in the second case, we used the hexadecimal color code corresponding to the color of the DR2 logo. The color argument specifies the color of the borders of the countries.

 DR2_countries_map <- ggplot(data =  world_joined)+
	geom_sf(fill = ifelse(is.na(world_joined$Members), "lightgrey", "#c8242b"), 
	color = "black")

The result is the following:

The map is however quite unsatisfying. Apart from Canada, the European countries where DR2 members are based are too small to stand out on a world map. The world scale is thus not very effective to display the geographical distribution of DR2. We need to zoom in at the level of Europe. A very useful feature of the sf package is that it allows doing this very easily, by specifying a set of coordinates of the area we are interested in:

DR2_countries_map +
	coord_sf(xlim = c(-16.1, 32.88), 
		ylim = c(35, 60), 
		expand = TRUE)

The resulting European map is the following:

Highlighting cities (point data)

We know that DR2 members are not only based in some countries but in specific cities within those countries. In our DR2 data frame, we had the DR2 cities along with their geographic coordinates. We want now to plot these cities as points on our map.

We need first to convert our data frame to an sf object:

sf_DR2_cities <- st_as_sf(DR2_data, 
	coords = c("Lng", "Lat"), 
	remove = FALSE, 
	crs = 4326, 
	agr = "constant")

Note that we had to indicate the columns in which the geographic coordinates of our cities are stored, as well as other parameters such as the geographic projection used (here WGS84, which is the CRS code #4326).

We can now plot the city-points on the map:

DR2_countries_map +
	geom_sf(data = sf_DR2_cities)

Cities on world map

To see them clearly, let us zoom on Europe as we learned before:

DR2_countries_map +
	geom_sf(data = sf_DR2_cities) +
	coord_sf(xlim = c(-16.1, 32.88), 
		ylim = c(35, 60), 
		expand = TRUE)

Adding labels

To ease the interpretation of our map, it is very useful to add some labels. For instance, we want to know the number of DR2 members based in the cities we highlighted before. We already have the text of the labels in the column “Label” of the DR2 data frame. We need now to visualize this text on the map. We do this by using the function geom_label_repel. This function, which is included in the package ggrepel, improves the positioning of labels on a plot: it repels labels away from each other, away from data points, and away from edges of the plotting area.
In the aesthetics parameter of the function, we specify that we want the labels to be positioned on the map based on the latitude and longitude of the cities, and that their text is indicated in the “Label” column. The other parameters specify the color of the labels, the size of the text, and the amount of “repelling force” of the positioning algorithm.

DR2_countries_map +
	geom_sf(data = sf_DR2_cities) +
	geom_label_repel(data = sf_DR2_cities, 
		aes(x = Lng, y = Lat, label = Label), 
		color = "black", 
		fontface = "bold", 
		size = 3, 
		force = 5)

The resulting world map with the cities and the labels is the following:

However, there is a problem. If we zoom on Europe, we find an “intruder”: the label “Montreal (1)” should not show up in the European map!

To solve this little issue, we had to filter out the cities situated in countries outside Europe. We thus create a subset of the DR2 data frame, specifying that we want to retain all the records whose country is not ( != ) Canada:

DR2_european_cities <- subset(DR2_data, 
			Country != "Canada", 
			select = City:Lng)
## Convert to the sf format
sf_DR2_european_cities <- st_as_sf(DR2_european_cities, 
	coords = c("Lng", "Lat"), 
	remove = FALSE, 
	crs = 4326, 
	agr = "constant")

If we plot the new dataset on the European map, we discover that the intruder has been removed:

DR2_countries_map +
	geom_sf(data = sf_DR2_european_cities) +
	geom_label_repel(data = sf_DR2_european_cities, 
		aes(x = Lng, y = Lat, label = Label), 
		color = "black", 
		fontface = "bold", 	
		size = 3, 
		force = 5)+
	coord_sf(xlim = c(-16.1, 32.88), 
		ylim = c(35, 60), 
		expand = TRUE)

Clearly, there can be reasons to keep the Montreal label: for instance, to show that DR2 has also overseas members.

Improving the map

In the versions of the map we have generated so far, the information about the number of DR2 members is represented in the labels, as a number between brackets. Is it possible to represent it as a visual feature, so that it catches the eye immediately? A first idea could be to change the size of the labels proportionally to the number of members:

DR2_countries_map +
	geom_sf(data = sf_DR2_european_cities) +
	geom_label_repel(data = sf_DR2_european_cities, 
		aes(x = Lng, y = Lat, label = Label, size = Members), 
		color = "black", 
		fontface = "bold", 	
		force = 5)+
	coord_sf(xlim = c(-16.1, 32.88), 
		ylim = c(35, 60), 
		expand = TRUE)

However, the result is quite bad, because of the great difference in size between Turin and the other cities. Since most of the cities have just one member, their labels are too small to be readable. Note that R adds automatically a legend to interpret the size of the labels.

A better solution is to set the size of the city points proportional to the number of members:

DR2_countries_map +
	geom_sf(data = sf_DR2_european_cities, 
		aes(size = Members))+
	geom_label_repel(data = sf_DR2_european_cities, 
		aes(x = Lng, y = Lat, label = Label), 
		color = "black", 
		fontface = "bold",
		size = 3, 	
		force = 9)+
	coord_sf(xlim = c(-16.1, 32.88), 
		ylim = c(35, 60), 
		expand = TRUE)

Note that R creates automatically a legend based on the size of the points:

By the same token, we can use also the color of the points to represent the number of members. We customize the color scale, setting its extreme as blue and green, so that the big cities will be colored in blue and small cities in green:

DR2_countries_map +
	geom_sf(data = sf_DR2_european_cities, 
		aes(color = Members, size = Members))+
	scale_color_gradient(low = "blue", high = "green")+
	geom_label_repel(data = sf_DR2_european_cities, 
		aes(x = Lng, y = Lat, label = Label), 
		color = "black", 
		fontface = "bold",
		size = 3, 	
		force = 9)+
	coord_sf(xlim = c(-16.1, 32.88), 
		ylim = c(35, 60), 
		expand = TRUE)

Note that R adds a second legend to interpret the color of the points:

The last map, however, seems to me “overloaded”. The same information (the DR2 members) is visualized in three different ways: as a number in the label, as the size of the points, and as the color of the points. Personally, I find this solution redundant. I think that the second map is the most balanced (and the aesthetically most pleasant).

Final touches

A great advantage of ggplot2 is that it allows to control almost all the graphical aspects of the visualizations. By changing the parameters in the theme function, we can fine-tune our map until it matches our tastes. To realize the final version of the map, we change the color of the background of the map (that is the ocean) to a light blue, we remove the axes titles, texts, and ticks, and the legend. Lastly, we add a title to our map.

European_DR2_map2 <- DR2_countries_map +
	geom_sf(data = sf_DR2_european_cities, 
		aes(size = Members))+
	geom_label_repel(data = sf_DR2_european_cities, 
		aes(x = Lng, y = Lat, label = Label), 
		color = "black", 
		fontface = "bold",
		size = 3, 	
		force = 12)+
	coord_sf(xlim = c(-16.1, 32.88), 
		ylim = c(35, 60), 
		expand = TRUE)+
	theme_minimal() +
	theme(panel.background = element_rect(fill = "aliceblue"), 
		axis.title.x = element_blank(), 
		axis.title.y = element_blank(), 
		axis.text.x = element_blank(),
		axis.text.y = element_blank(),
		axis.ticks = element_blank(),
		legend.position = "none") +
	ggtitle("Map of DR2 Members in Europe")

The result is our target map:

The final step is saving the map in an appropriate format. We save both a PDF version of the map, which keeps the highest quality, and a lighter PNG version:

ggsave("DR2_map_Europe.pdf")
ggsave("DR2_map_Europe.png", dpi = "screen")

Further readings

This brief tutorial was largely inspired by the tutorial I used to learn the basics of mapping with R. It explains very clearly several other topics related with maps and I definitely recommend it. Another useful tutorial is this one, that explains how to use another R package for maps, ggmap, and some common data wrangling operations.

Here you can find an introduction to R graphics with ggplot2 and here numerous clear tutorials on data analysis and visualization with R.

Maps and history

The world map we started with has a clear limitation from a historical point of view: it is based on the countries existing today, with their contemporary borders. However, we know well that both countries and borders change in history. If we want, for instance, to reconstruct the geography of Leibniz’s correspondents and acquaintances, it would be anachronistic to plot the data on an XXI-century map of Europe. There are several digital archives of historical maps freely available on the web, curated by archives and museums. However, to be used in R, these maps should be preliminary translated into shape files, in which countries are represented as polygons.

I do not know if there are repositories of shape files of historical maps. If you know, please share the links in the comments and I will update the post.

This entry was posted in Data-Driven Research, Tutorials. Bookmark the permalink.

1 Response to Drawing maps with R. A basic tutorial

  1. Eugenio Petrovich says:

    At this link, it is possible to find sources that offer historical country data in the GIS file format for download. Datasets range from the Greek and Roman world to historical China and Japan:
    https://www-gislounge-com.cdn.ampproject.org/c/s/www.gislounge.com/find-gis-data-historical-country-boundaries/amp/

    I thank Emiliano Tolusso for pointing out this material.

Leave a Reply

Your e-mail address will not be published. Required fields are marked *.

This site uses Akismet to reduce spam. Learn how your comment data is processed.