In this post we’re going to build up this graph which shows all countries with at least one border and the connections that remain after removing countries with only a single border.
I’ve been thinking about TidyTuesday datasets with country data and how it could be interesting to use country borders as a component of the chart making process. And what’s better than working on an actual TidyTuesday visualisation than getting distracted with something tangential to it?
In my utility package {cjhRutils} I have a tidygraph object containing the nodes and edges of the connected countries, the code can be found in this script. After loading the package (alongside {tidygrapph}) we can see our dataset:
# A tbl_graph: 173 nodes and 608 edges
#
# A directed simple graph with 26 components
#
# A tibble: 173 × 8
id iso_a2 iso_a3 name name_long name_en region_wb continent
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 AE ARE United Arab Emirates United A… United… Middle E… Asia
2 2 AF AFG Afghanistan Afghanis… Afghan… South As… Asia
3 3 AL ALB Albania Albania Albania Europe &… Europe
4 4 AM ARM Armenia Armenia Armenia Europe &… Asia
5 5 AO AGO Angola Angola Angola Sub-Saha… Africa
6 6 AQ ATA Antarctica Antarcti… Antarc… Antarcti… Antarcti…
# ℹ 167 more rows
#
# A tibble: 608 × 3
from to border_region
<int> <int> <chr>
1 1 120 Middle East & North Africa
2 1 136 Middle East & North Africa
3 2 34 Cross Region
# ℹ 605 more rows
The graph contains all countries with at least one connection, let’s filter the graph to only include countries with two border or more… and visualise that naively with {ggraph}
The guide/legend for that chart is a little bit complicated. Let’s look at why:
Nodes are coloured by the continent the node belongs to.
Edges are coloured by if the two nodes belong to the same continent.
There is a single node from “North America” but no edges with the border_region of “North America”
There are edges that need to be coloured “Cross Region” but no nodes with that colour.
To solve this I thought of using my old trick of hijacking an unused aesthetic and manipulating its guide. However! That’s not really possible in this case, so I googled for alternatives and was extremeley geom_custom() was added in late 2023. In the chart below I’ve used guide_custom() to add a red line that I can use to label cross regional borders.
In the final chart I’d like to add a label to the US to explain why it’s included but Canada isn’t. So let’s grab the coordinates of the node so I can use them to help figure out where to place the label
# A tibble: 0 × 2
# ℹ 2 variables: x <dbl>, y <dbl>
Nice. Now we can think about beautification. I’ve chosen to use colours from the <coolors.co> service and have found a subjective balance of colours that I think looks good based on how many nodes are in each group. To ensure a little bit of sense to the colours, I’ll order them as a factor so that the group with the most nodes appears at the top of the legend.
Let’s add in my labels, which are manually placed but use the node position extracted earlier to help place them.
gg_graph_countries <- gg_graph_before_label +geom_curve(data =tibble(x =5.08-8.5,y =-4.96 ,xend =5.08-1.3,yend =-4.96-0.2 ),aes(x, y, yend = yend, xend = xend),inherit.aes =FALSE,arrow =arrow(length =unit(0.01, "npc")),curvature =0.2,angle =90 ) +geom_label(data =tibble(x =5.08-8.5,y =-4.96-0.5,label =str_wrap("Canada isn't here. It only has a single land border with the US - which is included as it has two borders",30 ) ),aes(x, y, label = label),fill = colorspace::darken("#D8E4EA"),label.padding =unit(0.4, "lines"),hjust =0,colour ="black",inherit.aes =FALSE,size =4 )gg_graph_countries %>%ggsave(quarto_here("gg_graph_countries.png"), .,width =4.25*3,height =3.4*3,bg ="#D8E4EA" )