Coordinate systems and axes

Claus O. Wilke

2024-12-28

Most data visualizations use Cartesian coordinates

 

Changing units does not change the plot

 

 

If scale units are unrelated, aspect ratio is arbitrary

 

Non-linear scales and coordinate systems

Logarithmic scales (log scales)

Visualize these five values: 1,   3.16,   10,   31.6,   100


 

 

 

Example: Population number of Texas counties

A linear scale emphasizes large counties

 

Example: Population number of Texas counties

A log scale shows symmetry around the median

 

Nonlinear coordinate systems: Polar coordinates

 

 

Cartesian vs polar example


 

 

Scales and coordinate systems in ggplot2

Getting the data

The boxoffice dataset:

boxoffice <- tibble(
  rank = 1:5,
  title = c("Star Wars", "Jumanji", "Pitch Perfect 3", "Greatest Showman", "Ferdinand"),
  amount = c(71.57, 36.17, 19.93, 8.81, 7.32) # million USD
)

The tx_counties dataset:

tx_counties <- read_csv("https://wilkelab.org/SDS375/datasets/US_census.csv") |> 
  filter(state == "Texas") |>
  mutate(popratio = pop2010/median(pop2010)) |>
  arrange(desc(popratio)) |>
  mutate(index = 1:n())

The temperatures dataset:

temperatures <- read_csv("https://wilkelab.org/SDS375/datasets/tempnormals.csv") |>
  mutate(
    location = factor(
      location, levels = c("Death Valley", "Houston", "San Diego", "Chicago")
    )
  ) |>
  select(location, station_id, day_of_year, month, temperature)

Scale functions customize the x and y axes

Recall the box-office example from a prior lecture:

ggplot(boxoffice) +
  aes(amount, fct_reorder(title, amount)) +
  geom_col()

Scale functions customize the x and y axes

Add scale functions (no change in figure so far):

ggplot(boxoffice) +
  aes(amount, fct_reorder(title, amount)) +
  geom_col() +
  scale_x_continuous() +
  scale_y_discrete()

Scale functions customize the x and y axes

The parameter name sets the axis title:

ggplot(boxoffice) +
  aes(amount, fct_reorder(title, amount)) +
  geom_col() +
  scale_x_continuous(
    name = "weekend gross (million USD)"
  ) +
  scale_y_discrete(
    name = NULL  # no axis title
  )

Note: We could do the same with xlab() and ylab()

Scale functions customize the x and y axes

The parameter limits sets the scale limits:

ggplot(boxoffice) +
  aes(amount, fct_reorder(title, amount)) +
  geom_col() +
  scale_x_continuous(
    name = "weekend gross (million USD)",
    limits = c(0, 80)
  ) +
  scale_y_discrete(
    name = NULL
  )

Note: We could do the same with xlim() and ylim() but I advise against it, as these functions can have unexpected side-effects

Scale functions customize the x and y axes

The parameter breaks sets the axis tick positions:

ggplot(boxoffice) +
  aes(amount, fct_reorder(title, amount)) +
  geom_col() +
  scale_x_continuous(
    name = "weekend gross (million USD)",
    limits = c(0, 80),
    breaks = c(0, 25, 50, 75)
  ) +
  scale_y_discrete(
    name = NULL
  )

Scale functions customize the x and y axes

The parameter labels sets the axis tick labels:

ggplot(boxoffice) +
  aes(amount, fct_reorder(title, amount)) +
  geom_col() +
  scale_x_continuous(
    name = "weekend gross (million USD)",
    limits = c(0, 80),
    breaks = c(0, 25, 50, 75),
    labels = c("0", "$25M", "$50M", "$75M")
  ) +
  scale_y_discrete(
    name = NULL
  )

Scale functions customize the x and y axes

The parameter expand sets the axis expansion:

ggplot(boxoffice) +
  aes(amount, fct_reorder(title, amount)) +
  geom_col() +
  scale_x_continuous(
    name = "weekend gross (million USD)",
    limits = c(0, 80),
    breaks = c(0, 25, 50, 75),
    labels = c("0", "$25M", "$50M", "$75M"),
    expand = expansion(mult = c(0, 0.06))
  ) +
  scale_y_discrete(
    name = NULL
  )

Scale functions define transformations

Linear y scale:

ggplot(tx_counties) +
  aes(x = index, y = popratio) +
  geom_point() +
  scale_y_continuous() 

 

Log y scale:

ggplot(tx_counties) +
  aes(x = index, y = popratio) +
  geom_point() +
  scale_y_log10()

 

Parameters work the same for all scale functions

Linear y scale:

ggplot(tx_counties) +
  aes(x = index, y = popratio) +
  geom_point() +
  scale_y_continuous(
    name = "population number / median",
    breaks = c(0, 100, 200),
    labels = c("0", "100", "200")
  )

 

Log y scale:

ggplot(tx_counties) +
  aes(x = index, y = popratio) +
  geom_point() +
  scale_y_log10(
    name = "population number / median",
    breaks = c(0.01, 1, 100),
    labels = c("0.01", "1", "100")
  )

 

Coords define the coordinate system

ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
  geom_line() +
  coord_cartesian()  # cartesian coords are the default

 

Coords define the coordinate system

ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
  geom_line() +
  coord_polar()   # polar coords

 

Coords define the coordinate system

ggplot(temperatures, aes(day_of_year, temperature, color = location)) +
  geom_line() +
  coord_polar() +
  scale_y_continuous(limits = c(0, 105))  # fix up temperature limits

 

Further reading