Visualizing geospatial data

Claus O. Wilke

2025-04-14

Parallels (latitude) and meridians (longitude)

There are many ways to project onto a 2D plane

There are many ways to project onto a 2D plane

Mercator projection: Shapes are preserved, areas are severely distorted

There are many ways to project onto a 2D plane

Goode homolosine: Areas are preserved, shapes are somewhat distorted

Projecting the US

Alaska, Hawaii, and the lower 48 are far apart; difficult to show on one map

Projecting the US

A fair, area-preserving projection

A common visualization. What happened to Alaska?

Alaska and Hawaii were moved closer; Alaska was also reduced in size

A fair visualization of the 50 states

Alaska is the largest state; 2.2 the size of Texas

Choropleth mapping: Coloring areas by a data value

US population density as a choropleth map

Alaska has very low population density

US population density as a choropleth map

Alaska has very low population density

US median income as a choropleth map

A binned color scale can make the map more readable

Choropleth maps can be misleading

Large area of Alaska makes it appear very rich; but remember, it’s mostly empty

A cartogram heatmap may be preferable

Each state is shown as an equally sized square

Maps and layers

Maps show data in a geospatial context

Wind turbines in the San Francisco Bay Area

Maps are composed of several distinct layers

Wind turbines in the San Francisco Bay Area

The concept of aesthetic mappings still applies

Location of individual wind turbines in the Shiloh Wind Farm

Making geospatial visualizations in R

The sf package: Simple Features in R

Artwork by Allison Horst

Getting the data

We’ll be working with the texas_income dataset:

library(sf)  # always load the sf package when working with geospatial data

texas_income <- readRDS(url("https://wilkelab.org/SDS366/datasets/Texas_income.rds"))

texas_income
Simple feature collection with 254 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
Geodetic CRS:  NAD83
First 10 features:
    FIPS    county median_income  moe                       geometry
1  48001  Anderson         41327 1842 MULTIPOLYGON (((-96.0648 31...
2  48003   Andrews         70423 6038 MULTIPOLYGON (((-103.0647 3...
3  48005  Angelina         44223 1611 MULTIPOLYGON (((-95.00488 3...
4  48007   Aransas         41690 3678 MULTIPOLYGON (((-96.8229 28...
5  48009    Archer         60275 5182 MULTIPOLYGON (((-98.95382 3...
6  48011 Armstrong         59737 4968 MULTIPOLYGON (((-101.6294 3...
7  48013  Atascosa         52192 3005 MULTIPOLYGON (((-98.80479 2...
8  48015    Austin         53687 3810 MULTIPOLYGON (((-96.62085 3...
9  48017    Bailey         37397 8652 MULTIPOLYGON (((-103.0469 3...
10 48019   Bandera         49863 7193 MULTIPOLYGON (((-99.60332 2...

The sf package: Simple Features in R

# the entire dataset
texas_income
Simple feature collection with 254 features and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
Geodetic CRS:  NAD83
First 10 features:
    FIPS    county median_income  moe                       geometry
1  48001  Anderson         41327 1842 MULTIPOLYGON (((-96.0648 31...
2  48003   Andrews         70423 6038 MULTIPOLYGON (((-103.0647 3...
3  48005  Angelina         44223 1611 MULTIPOLYGON (((-95.00488 3...
4  48007   Aransas         41690 3678 MULTIPOLYGON (((-96.8229 28...
5  48009    Archer         60275 5182 MULTIPOLYGON (((-98.95382 3...
6  48011 Armstrong         59737 4968 MULTIPOLYGON (((-101.6294 3...
7  48013  Atascosa         52192 3005 MULTIPOLYGON (((-98.80479 2...
8  48015    Austin         53687 3810 MULTIPOLYGON (((-96.62085 3...
9  48017    Bailey         37397 8652 MULTIPOLYGON (((-103.0469 3...
10 48019   Bandera         49863 7193 MULTIPOLYGON (((-99.60332 2...

The sf package: Simple Features in R

# the column holding geometry information
print(texas_income$geometry)
Geometry set for 254 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -106.6456 ymin: 25.83738 xmax: -93.50829 ymax: 36.5007
Geodetic CRS:  NAD83
First 5 geometries:
MULTIPOLYGON (((-96.0648 31.98066, -96.06305 31...
MULTIPOLYGON (((-103.0647 32.52219, -103.0005 3...
MULTIPOLYGON (((-95.00488 31.42396, -95.00334 3...
MULTIPOLYGON (((-96.8229 28.16743, -96.82127 28...
MULTIPOLYGON (((-98.95382 33.49637, -98.95377 3...

The sf package: Simple Features in R

# data wrangling works as normal
texas_income |>
  filter(county == "Travis")
Simple feature collection with 1 feature and 4 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -98.17298 ymin: 30.0245 xmax: -97.36954 ymax: 30.62825
Geodetic CRS:  NAD83
   FIPS county median_income moe                       geometry
1 48453 Travis         61451 591 MULTIPOLYGON (((-98.15927 3...

ggplot supports simple features with geom_sf()

# plot all of Texas
ggplot(texas_income) + 
  geom_sf()

 

ggplot supports simple features with geom_sf()

# plot only Travis County
texas_income |>
  filter(county == "Travis") |>
  ggplot() + 
  geom_sf()

 

ggplot supports simple features with geom_sf()

# plot the ten richest counties
texas_income |>
  slice_max(median_income, n = 10) |>
  ggplot() + 
  geom_sf()

 

ggplot supports simple features with geom_sf()

# color counties by median income
texas_income |>
  ggplot(aes(fill = median_income)) + 
  geom_sf() +
  scale_fill_continuous_sequential(
    palette = "Blues", rev = TRUE
  )

 

ggplot supports simple features with geom_sf()

# highlight the ten richest counties
texas_income |>
  mutate(
    top_ten = rank(desc(median_income)) <= 10
  ) |>
  ggplot(aes(fill = top_ten)) + 
  geom_sf() +
  scale_fill_manual(
    values = c(
      `TRUE` = "#D55E00",
      `FALSE` = "#E8EEF9"
    )
  )

 

ggplot supports simple features with geom_sf()

# highlight the ten richest counties
texas_income |>
  mutate(
    top_ten = rank(desc(median_income)) <= 10
  ) |>
  ggplot(aes(fill = top_ten)) + 
  geom_sf(color = "black", linewidth = 0.1) +
  scale_fill_manual(
    name = NULL,
    values = c(
      `TRUE` = "#D55E00",
      `FALSE` = "#E8EEF9"
    ),
    breaks = c(TRUE),
    labels = "top-10 median income"
  ) +
  theme_minimal_grid(14)

 

We apply styling as usual

We can customize the projection with coord_sf()

ggplot(texas_income) + 
  geom_sf(
    aes(fill = median_income),
    color = "black", linewidth = 0.1
  ) +
  scale_fill_continuous_sequential(
    palette = "Blues", rev = TRUE
  ) +
  theme_minimal_grid(14)

 

We can customize the projection with coord_sf()

ggplot(texas_income) + 
  geom_sf(
    aes(fill = median_income),
    color = "black", linewidth = 0.1
  ) +
  scale_fill_continuous_sequential(
    palette = "Blues", rev = TRUE
  ) +
  coord_sf() + # added by default with geom_sf()
  theme_minimal_grid(14)

 

We can customize the projection with coord_sf()

ggplot(texas_income) + 
  geom_sf(
    aes(fill = median_income),
    color = "black", linewidth = 0.1
  ) +
  scale_fill_continuous_sequential(
    palette = "Blues", rev = TRUE
  ) +
  coord_sf(
    # Texas Centric Albers Equal Area projection
    crs = 3083
  ) +
  theme_minimal_grid(14)

 

We can customize the projection with coord_sf()

ggplot(texas_income) + 
  geom_sf(
    aes(fill = median_income),
    color = "black", linewidth = 0.1
  ) +
  scale_fill_continuous_sequential(
    palette = "Blues", rev = TRUE
  ) +
  coord_sf(
    # Texas Centric Lambert Conformal Conic projection
    crs = 32139
  ) +
  theme_minimal_grid(14)

 

We can customize the projection with coord_sf()

ggplot(texas_income) + 
  geom_sf(
    aes(fill = median_income),
    color = "black", linewidth = 0.1
  ) +
  scale_fill_continuous_sequential(
    palette = "Blues", rev = TRUE
  ) +
  coord_sf(
    # Web Mercator (Google Maps)
    crs = 3857
  ) +
  theme_minimal_grid(14)

 

We can customize the projection with coord_sf()

ggplot(texas_income) + 
  geom_sf(
    aes(fill = median_income),
    color = "black", linewidth = 0.1
  ) +
  scale_fill_continuous_sequential(
    palette = "Blues", rev = TRUE
  ) +
  coord_sf(
    # Longitude-Latitude WGS84 (GPS)
    crs = 4326
  ) +
  theme_minimal_grid(14)

 

We can customize the projection with coord_sf()

ggplot(texas_income) + 
  geom_sf(
    aes(fill = median_income),
    color = "black", linewidth = 0.1
  ) +
  scale_fill_continuous_sequential(
    palette = "Blues", rev = TRUE
  ) +
  coord_sf(
    # Alaska Albers equal area
    crs = 3338
  ) +
  theme_minimal_grid(14)

 

We can get map data from the rnaturalearth package

library(rnaturalearth)

sf_world <- ne_countries(returnclass='sf')
ggplot(sf_world) + geom_sf()

 

We can get map data from the rnaturalearth package

library(rnaturalearth)

sf_world <- ne_countries(returnclass='sf')
ggplot(sf_world) + geom_sf() + coord_sf(crs = "ESRI:54030") # Robinson projection

 

We can get map data from the rnaturalearth package

A map of the lower 48:

sf_us <- ne_states(
  country = "United States of America",
  returnclass='sf'
)

sf_us |>
  # exclude Alaska (US02), Hawaii (US15)
  filter(!code_local %in% c("US02", "US15")) |>
  ggplot() + geom_sf()

 

We can get map data from the rnaturalearth package

A map of the lower 48:

sf_us <- ne_states(
  country = "United States of America",
  returnclass='sf'
)

sf_us |>
  # exclude Alaska (US02), Hawaii (US15)
  filter(!code_local %in% c("US02", "US15")) |>
  ggplot() + geom_sf() + coord_sf(crs = "ESRI:102003") # US Albers equal area

 

Further reading