By Eugenio Petrovich
Maps are a powerful tool to visualize information. Plotting data on a map can reveal trends and patterns that are difficult to spot by inspecting a spreadsheet. Maps are also very useful to communicate information to the public in an appealing and interpretative way.
In this brief tutorial, we will learn how to generate simple geographic maps with R. In particular, we will learn how to produce the following map of the DR2 members in Europe:
R is a free and open-source software that offers many solutions for computing data and producing visualizations. A great advantage of R is that its basic functionalities can be expanded by further packages that are freely available on CRAN, the Comprehensive R Archive Network. Moreover, there is an active R community around the world that answers most of the coding questions you may have.
The packages necessary for this tutorial can be installed with:
install.packages(c("sf", "rnaturalearth" , "rnaturalearthdata", "rgeos", "ggspatial", "ggrepel", "tidyverse"))
The first five packages are specifically developed for maps:
sf is used to manage spatial data,
rnaturalearthdata contain information about all countries of the world, as well as information that is required to plot those countries as polygons on a map, and
ggspatial improves the visualization of spatial data.
ggrepel will help us in managing the labels on the map, whereas
tidyverse comprises a set of R libraries that have become the standard for data manipulation and visualization.
After installing the packages, we need to load them in our workspace:
library("sf") library("rnaturalearth") library("rnaturalearthdata") library("rgeos") library("ggspatial") library("ggrepel") library("tidyverse")
Before creating the maps, we need to import in R the geographic data of the DR2 members. We stored them in the following data frame:
As you can see, cities are the basic unit of our data frame. For each of them, we specified the country, the number of DR2 members, the label we will display on the map (it is made of the name of the city plus the number of DR2 members between brackets), and the latitude and longitude (you can found them here).
We import the data frame, that is stored in a CSV file, in R with the function
read.csv. Since we used a header with the columns’ names, we set the argument
TRUE. We need also to specify that the separator between the columns is the semi-colon and that the decimal separator is the comma (and not the period, because we used an Italian version of Excel to produce the file).
DR2_data <- read.csv(file="DR2_geo_data.CSV", header=TRUE, sep=";", dec = ",")
You can check the first records of the data frame by using the command
We are now ready for producing our map.
Creating the world map
The first step of our mapping exercise is creating a world map. To do this, we use the function
ne_countries to pull country data from
rnaturalearth. We specify
medium as scale and
sf as return class of the data frame, so that the data are already in the right format for geographic mapping.
world <- ne_countries(scale = "medium", returnclass = "sf")
We plot these data with
tidyverse package for visualization, and
ggplot(data = world) + geom_sf()
The result is a map of the world:
We will use the world map as a base map on which we will highlight the countries where DR2 members are based.
To highlight the DR2 countries on the map, we need now to “add” our DR2 data on the world data frame. We do this by the function
world_joined <- left_join(world, DR2_data, by = c("name" = "Country"))
This function tells R that it has to join the DR2 data table on the world data by looking for a match on the country name (we specify the matching key between the two datasets in the
by argument). When the join operation finds a match, it combines the records from the two tables. When it does not find a match, as in the case of Brazil, it will set the value of the DR2 columns (e.g., “Members”) of the non-matching records as
NA, the standard code used by R for missing values. Thus, the record Brazil will have
NA as the value of the column “Members”. It is important to retain all the countries in the world and not only those with DR2 members. Otherwise, when we will plot our data on the map, we will lose all the countries without DR2! This is why we used the
left.join instead of the simple
join: we want R to retain all the records in the “left hand” dataset (i.e., the one that occupies the first argument in the function).
We want now to highlight the countries with DR2 members on the world map. To do this, we use an if…else control in the fill argument. If the value of the column “Members” is null (i.e., it equals
NA), we set the color of the country to grey. If it is not null, that is, if there are DR2 members in that country, we set the color to red. Note that in the first case, we used the color name, whereas in the second case, we used the hexadecimal color code corresponding to the color of the DR2 logo. The
color argument specifies the color of the borders of the countries.
DR2_countries_map <- ggplot(data = world_joined)+ geom_sf(fill = ifelse(is.na(world_joined$Members), "lightgrey", "#c8242b"), color = "black")
The result is the following:
The map is however quite unsatisfying. Apart from Canada, the European countries where DR2 members are based are too small to stand out on a world map. The world scale is thus not very effective to display the geographical distribution of DR2. We need to zoom in at the level of Europe. A very useful feature of the
sf package is that it allows doing this very easily, by specifying a set of coordinates of the area we are interested in:
DR2_countries_map + coord_sf(xlim = c(-16.1, 32.88), ylim = c(35, 60), expand = TRUE)
The resulting European map is the following:
Highlighting cities (point data)
We know that DR2 members are not only based in some countries but in specific cities within those countries. In our DR2 data frame, we had the DR2 cities along with their geographic coordinates. We want now to plot these cities as points on our map.
We need first to convert our data frame to an
sf_DR2_cities <- st_as_sf(DR2_data, coords = c("Lng", "Lat"), remove = FALSE, crs = 4326, agr = "constant")
Note that we had to indicate the columns in which the geographic coordinates of our cities are stored, as well as other parameters such as the geographic projection used (here WGS84, which is the CRS code #4326).
We can now plot the city-points on the map:
DR2_countries_map + geom_sf(data = sf_DR2_cities)
To see them clearly, let us zoom on Europe as we learned before:
DR2_countries_map + geom_sf(data = sf_DR2_cities) + coord_sf(xlim = c(-16.1, 32.88), ylim = c(35, 60), expand = TRUE)
To ease the interpretation of our map, it is very useful to add some labels. For instance, we want to know the number of DR2 members based in the cities we highlighted before. We already have the text of the labels in the column “Label” of the DR2 data frame. We need now to visualize this text on the map. We do this by using the function
geom_label_repel. This function, which is included in the package
ggrepel, improves the positioning of labels on a plot: it repels labels away from each other, away from data points, and away from edges of the plotting area.
In the aesthetics parameter of the function, we specify that we want the labels to be positioned on the map based on the latitude and longitude of the cities, and that their text is indicated in the “Label” column. The other parameters specify the color of the labels, the size of the text, and the amount of “repelling force” of the positioning algorithm.
DR2_countries_map + geom_sf(data = sf_DR2_cities) + geom_label_repel(data = sf_DR2_cities, aes(x = Lng, y = Lat, label = Label), color = "black", fontface = "bold", size = 3, force = 5)
The resulting world map with the cities and the labels is the following:
However, there is a problem. If we zoom on Europe, we find an “intruder”: the label “Montreal (1)” should not show up in the European map!
To solve this little issue, we had to filter out the cities situated in countries outside Europe. We thus create a subset of the DR2 data frame, specifying that we want to retain all the records whose country is not (
!= ) Canada:
DR2_european_cities <- subset(DR2_data, Country != "Canada", select = City:Lng) ## Convert to the sf format sf_DR2_european_cities <- st_as_sf(DR2_european_cities, coords = c("Lng", "Lat"), remove = FALSE, crs = 4326, agr = "constant")
If we plot the new dataset on the European map, we discover that the intruder has been removed:
DR2_countries_map + geom_sf(data = sf_DR2_european_cities) + geom_label_repel(data = sf_DR2_european_cities, aes(x = Lng, y = Lat, label = Label), color = "black", fontface = "bold", size = 3, force = 5)+ coord_sf(xlim = c(-16.1, 32.88), ylim = c(35, 60), expand = TRUE)
Clearly, there can be reasons to keep the Montreal label: for instance, to show that DR2 has also overseas members.
Improving the map
In the versions of the map we have generated so far, the information about the number of DR2 members is represented in the labels, as a number between brackets. Is it possible to represent it as a visual feature, so that it catches the eye immediately? A first idea could be to change the size of the labels proportionally to the number of members:
DR2_countries_map + geom_sf(data = sf_DR2_european_cities) + geom_label_repel(data = sf_DR2_european_cities, aes(x = Lng, y = Lat, label = Label, size = Members), color = "black", fontface = "bold", force = 5)+ coord_sf(xlim = c(-16.1, 32.88), ylim = c(35, 60), expand = TRUE)
However, the result is quite bad, because of the great difference in size between Turin and the other cities. Since most of the cities have just one member, their labels are too small to be readable. Note that R adds automatically a legend to interpret the size of the labels.
A better solution is to set the size of the city points proportional to the number of members:
DR2_countries_map + geom_sf(data = sf_DR2_european_cities, aes(size = Members))+ geom_label_repel(data = sf_DR2_european_cities, aes(x = Lng, y = Lat, label = Label), color = "black", fontface = "bold", size = 3, force = 9)+ coord_sf(xlim = c(-16.1, 32.88), ylim = c(35, 60), expand = TRUE)
Note that R creates automatically a legend based on the size of the points:
By the same token, we can use also the color of the points to represent the number of members. We customize the color scale, setting its extreme as blue and green, so that the big cities will be colored in blue and small cities in green:
DR2_countries_map + geom_sf(data = sf_DR2_european_cities, aes(color = Members, size = Members))+ scale_color_gradient(low = "blue", high = "green")+ geom_label_repel(data = sf_DR2_european_cities, aes(x = Lng, y = Lat, label = Label), color = "black", fontface = "bold", size = 3, force = 9)+ coord_sf(xlim = c(-16.1, 32.88), ylim = c(35, 60), expand = TRUE)
Note that R adds a second legend to interpret the color of the points:
The last map, however, seems to me “overloaded”. The same information (the DR2 members) is visualized in three different ways: as a number in the label, as the size of the points, and as the color of the points. Personally, I find this solution redundant. I think that the second map is the most balanced (and the aesthetically most pleasant).
A great advantage of
ggplot2 is that it allows to control almost all the graphical aspects of the visualizations. By changing the parameters in the
theme function, we can fine-tune our map until it matches our tastes. To realize the final version of the map, we change the color of the background of the map (that is the ocean) to a light blue, we remove the axes titles, texts, and ticks, and the legend. Lastly, we add a title to our map.
European_DR2_map2 <- DR2_countries_map + geom_sf(data = sf_DR2_european_cities, aes(size = Members))+ geom_label_repel(data = sf_DR2_european_cities, aes(x = Lng, y = Lat, label = Label), color = "black", fontface = "bold", size = 3, force = 12)+ coord_sf(xlim = c(-16.1, 32.88), ylim = c(35, 60), expand = TRUE)+ theme_minimal() + theme(panel.background = element_rect(fill = "aliceblue"), axis.title.x = element_blank(), axis.title.y = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank(), legend.position = "none") + ggtitle("Map of DR2 Members in Europe")
The result is our target map:
The final step is saving the map in an appropriate format. We save both a PDF version of the map, which keeps the highest quality, and a lighter PNG version:
ggsave("DR2_map_Europe.pdf") ggsave("DR2_map_Europe.png", dpi = "screen")
This brief tutorial was largely inspired by the tutorial I used to learn the basics of mapping with R. It explains very clearly several other topics related with maps and I definitely recommend it. Another useful tutorial is this one, that explains how to use another R package for maps,
ggmap, and some common data wrangling operations.
Here you can find an introduction to R graphics with
ggplot2 and here numerous clear tutorials on data analysis and visualization with R.
Maps and history
The world map we started with has a clear limitation from a historical point of view: it is based on the countries existing today, with their contemporary borders. However, we know well that both countries and borders change in history. If we want, for instance, to reconstruct the geography of Leibniz’s correspondents and acquaintances, it would be anachronistic to plot the data on an XXI-century map of Europe. There are several digital archives of historical maps freely available on the web, curated by archives and museums. However, to be used in R, these maps should be preliminary translated into shape files, in which countries are represented as polygons.
I do not know if there are repositories of shape files of historical maps. If you know, please share the links in the comments and I will update the post.
At this link, it is possible to find sources that offer historical country data in the GIS file format for download. Datasets range from the Greek and Roman world to historical China and Japan:
I thank Emiliano Tolusso for pointing out this material.