Visualizing geospatial data
Introduction
In this worksheet, we will discuss we will discuss how to visualize geospatial data.
First we need to load the required R packages. Please wait a moment until the live R session is fully set up and all packages are loaded.
Next we set up the data.
We will be working with the datasets texas_income
and texas_counties
. The dataset texas_income
contains the median income of all counties in Texas, as well as the shape information about each county (stored in the geometry
column). The column FIPS
contains a five-digit id code that uniquely represents each county.
The dataset texas_counties
holds information about how many people lived in Texas counties in 2010, as well as the size of each county (column area
). The column popratio
is the ratio of the number of inhabitants to the median across all counties. The column FIPS
contains a five-digit id code that uniquely represents each county.
Wrangling data
Before we perform any visualizations, we will first gain some experience manipulating data tables containing geospatial information. This does not require us to learn any new concepts, as data tables with geospatial information (i.e., containing a geometry
column) can be manipulated just like those without.
Let’s try this out. Take the texas_income
table and filter out the rows for the counties “Travis” and “Harris”.
|>
texas_income filter(___)
|>
texas_income filter(county %in% c("Travis", "Harris"))
Now join the texas_income
table with the texas_counties
table and then find the five largest counties.
Hint: Use the function left_join()
to join the tables, and use the functions arrange()
and slice()
to find the five largest counties.
|>
texas_income left_join(___) |>
___
|>
texas_income left_join(texas_counties) |>
arrange(___) |>
slice(___)
|>
texas_income left_join(texas_counties) |>
arrange(desc(area)) |>
slice(1:5)
Visualizing simple features
We can visualize datasets containing simple features with the function geom_sf()
. This geom is very simple to use, as it automatically finds the geometry
column and draws it in the appropriate coordinate system. All we need to think about is whether we want to apply a color mapping, e.g. to make a choropleth.
Try this out by making a plot of the counties in Texas, without applying any kind of aesthetic mapping. Remember, the dataset texas_income
contains the required geometry information.
ggplot(texas_income) +
___
ggplot(texas_income) +
geom_sf()
Now map the data column median_income
to the fill color. Also choose an appropriate color scale from the colorspace package.
Hint: You can see the available color palettes here.
ggplot(texas_income) +
geom_sf(aes(fill = ___)) +
scale_fill_continuous_sequential(palette = ___)
ggplot(texas_income) +
geom_sf(aes(fill = median_income)) +
scale_fill_continuous_sequential(palette = "Lajolla")
Finally, make a plot that highlights the 10 smallest counties in Texas. This will require you to join texas_income
and texas_counties
first.
|>
texas_income left_join(texas_counties) |>
mutate(
smallest = rank(area) <= 5
)
|>
texas_income left_join(texas_counties) |>
mutate(
smallest = rank(area) <= 5
|>
) ggplot() +
geom_sf(aes(fill = ___))
|>
texas_income left_join(texas_counties) |>
mutate(
smallest = rank(area) <= 5
|>
) ggplot() +
geom_sf(aes(fill = smallest))
|>
texas_income left_join(texas_counties) |>
mutate(
smallest = rank(area) <= 5
|>
) ggplot() +
geom_sf(aes(fill = smallest), size = 0.2) +
scale_fill_manual(
values = c(
`TRUE` = "#D55E00",
`FALSE` = "#E8EEF9"
) )
Changing the projection
One major benefit of the sf framework is that different map projections are built in and supported out-of-the-box. We can refer to projections by their EPSG codes, and these codes can be looked up on websites such as https://spatialreference.org/ or https://epsg.io/.
We can set the coordinate system via coord_sf()
, which takes an argument crs
that specifies the Coordinate Reference System (CRS). For example, coord_sf(crs = 3083)
will select a Texas Centric Albers Equal Area projection (https://spatialreference.org/ref/epsg/3083/). Try this out.
ggplot(texas_income) +
geom_sf() +
___
ggplot(texas_income) +
geom_sf() +
coord_sf(crs = 3083)
Here are a few other coordinate systems to try out, to see how different projections affect how the map looks.
- EPSG:32139: Texas Centric Lambert Conformal Conic; notice the subtle changes compared to 3083.
- EPSG:3857: Web Mercator, used e.g. by Google Maps; not a good projection in practice.
- EPSG:3338: Alaska Albers equal area; not appropriate for Texas, but shows more extreme changes in the plot