Visualizing geospatial data

Author

Claus O. Wilke

Introduction

In this worksheet, we will discuss we will discuss how to visualize geospatial data.

First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.

Next we set up the data.

We will be working with the datasets texas_income and texas_counties. The dataset texas_income contains the median income of all counties in Texas, as well as the shape information about each county (stored in the geometry column). The column FIPS contains a five-digit id code that uniquely represents each county.

The dataset texas_counties holds information about how many people lived in Texas counties in 2010, as well as the size of each county (column area). The column popratio is the ratio of the number of inhabitants to the median across all counties. The column FIPS contains a five-digit id code that uniquely represents each county.

Wrangling data

Before we perform any visualizations, we will first gain some experience manipulating data tables containing geospatial information. This does not require us to learn any new concepts, as data tables with geospatial information (i.e., containing a geometry column) can be manipulated just like those without.

Let’s try this out. Take the texas_income table and filter out the rows for the counties “Travis” and “Harris”.

Hint
texas_income |>
  filter(___)
Solution
texas_income |>
  filter(county %in% c("Travis", "Harris"))

Now join the texas_income table with the texas_counties table and then find the five largest counties.

Hint: Use the function left_join() to join the tables, and use the functions arrange() and slice() to find the five largest counties.

Hint
texas_income |>
  left_join(___) |>
  ___
Hint
texas_income |>
  left_join(texas_counties) |>
  arrange(___) |>
  slice(___)
Solution
texas_income |>
  left_join(texas_counties) |>
  arrange(desc(area)) |>
  slice(1:5)

Visualizing simple features

We can visualize datasets containing simple features with the function geom_sf(). This geom is very simple to use, as it automatically finds the geometry column and draws it in the appropriate coordinate system. All we need to think about is whether we want to apply a color mapping, e.g. to make a choropleth.

Try this out by making a plot of the counties in Texas, without applying any kind of aesthetic mapping. Remember, the dataset texas_income contains the required geometry information.

Hint
ggplot(texas_income) +
  ___
Solution
ggplot(texas_income) +
  geom_sf()

Now map the data column median_income to the fill color. Also choose an appropriate color scale from the colorspace package.

Hint: You can see the available color palettes here.

Hint
ggplot(texas_income) +
  geom_sf(aes(fill = ___)) +
  scale_fill_continuous_sequential(palette = ___)
Solution
ggplot(texas_income) +
  geom_sf(aes(fill = median_income)) +
  scale_fill_continuous_sequential(palette = "Lajolla")

Finally, make a plot that highlights the 10 smallest counties in Texas. This will require you to join texas_income and texas_counties first.

Hint
texas_income |>
  left_join(texas_counties) |>
  mutate(
    smallest = rank(area) <= 5
  )
Hint
texas_income |>
  left_join(texas_counties) |>
  mutate(
    smallest = rank(area) <= 5
  ) |>
  ggplot() +
  geom_sf(aes(fill = ___))
Hint
texas_income |>
  left_join(texas_counties) |>
  mutate(
    smallest = rank(area) <= 5
  ) |>
  ggplot() +
  geom_sf(aes(fill = smallest))
Solution
texas_income |>
  left_join(texas_counties) |>
  mutate(
    smallest = rank(area) <= 5
  ) |>
  ggplot() +
  geom_sf(aes(fill = smallest), size = 0.2) +
  scale_fill_manual(
    values = c(
      `TRUE` = "#D55E00",
      `FALSE` = "#E8EEF9"
    )
  )

Changing the projection

One major benefit of the sf framework is that different map projections are built in and supported out-of-the-box. We can refer to projections by their EPSG codes, and these codes can be looked up on websites such as https://spatialreference.org/ or https://epsg.io/.

We can set the coordinate system via coord_sf(), which takes an argument crs that specifies the Coordinate Reference System (CRS). For example, coord_sf(crs = 3083) will select a Texas Centric Albers Equal Area projection (https://spatialreference.org/ref/epsg/3083/). Try this out.

Hint
ggplot(texas_income) +
  geom_sf() +
  ___
Solution
ggplot(texas_income) +
  geom_sf() +
  coord_sf(crs = 3083)

Here are a few other coordinate systems to try out, to see how different projections affect how the map looks.

  • EPSG:32139: Texas Centric Lambert Conformal Conic; notice the subtle changes compared to 3083.
  • EPSG:3857: Web Mercator, used e.g. by Google Maps; not a good projection in practice.
  • EPSG:3338: Alaska Albers equal area; not appropriate for Texas, but shows more extreme changes in the plot