vignettes/misc-geoms-stats.Rmd
misc-geoms-stats.Rmd
The package provides a few miscellaneous geoms and stats that can be generally helpful when generating uncertainty visualizations.
When showing an individual value, such as a mean, we may want to show it as a vertical or horizontal line instead of a point. For this purpose, ungeviz provides horizontal and vertical point lines (plines), through the two geoms geom_hpline()
and geom_vpline()
. These geoms behave like points but can be styled like lines. They have the useful advantage over geom_segment()
that they work with both continuous and discrete position scales.
library(ggplot2)
library(ungeviz)
ggplot(iris, aes(Species, Sepal.Length)) +
geom_point(position = position_jitter(width = 0.3, height = 0), size = 0.5) +
geom_hpline(aes(colour = Species), stat = "summary", width = 0.6, size = 1.5)
Plines have a length and a thickness. The length is specified as width
and height
, respectively, for the horizontal and vertical variants. The thickness is always specified as size
. Note that width
and height
specify the total length of the pline, whereas the equally named parameters in position_jitter()
specify only the linear extent in one direction. Therefore, in the above example, we use width = 0.3
for position_jitter()
and width = 0.6
for geom_hpline()
.
Plines can be combined with error bars to show parameter estimates and their uncertainties in a regression model.
library(dplyr)
library(forcats)
library(broom)
library(emmeans)
cacao_lumped <- cacao %>%
mutate(
location = fct_lump(location, n = 10)
)
cacao_means <- lm(rating ~ location, data = cacao_lumped) %>%
emmeans("location") %>%
tidy() %>%
mutate(location = fct_reorder(location, estimate))
ggplot(cacao_means, aes(x = estimate, y = location)) +
geom_errorbarh(aes(xmin = estimate - std.error, xmax = estimate + std.error), height = 0.3) +
geom_vpline(aes(x = estimate), size = 1.5, height = 0.7, color = "#D55E00") +
xlim(2.8, 3.6) +
theme_minimal()
Sometimes we may want to visualize the uncertainty distributions as colored bands that fade out (called confidence strips). Confidence strips can be generated with stat_confidence_density()
. This stat takes as input a mean value (mapped to x
), a margin of error (mapped to moe
), and a confidence level corresponding to the margin of error (mapped to confidence
). It uses these inputs to calculate the corresponding density function for a normal distribution, which it provides as new variable stat(density)
. A stat(ndensity)
is also provided, which is scaled so its maximum value is one. stat_confidence_density()
automatically maps the normalized density to the alpha
aesthetic, so that we can draw confidence strips simply by specifying a fill color.
ggplot(cacao_means, aes(x = estimate, y = location)) +
stat_confidence_density(aes(moe = std.error), confidence = 0.68, fill = "#81A7D6", height = 0.7) +
geom_errorbarh(aes(xmin = estimate - std.error, xmax = estimate + std.error), height = 0.3) +
geom_vpline(aes(x = estimate), size = 1.5, height = 0.7, color = "#D55E00") +
xlim(2.8, 3.6) +
theme_minimal()
There is one important caveat for stat_confidence_density()
: It will generally not work for non-linear or transformed scales, because it uses the margin of error to fit a normal distribution and the moe is not modified by these scale transformation. I would recommend against using stat_confidence_density()
with anything but a standard linear scale.
We can combine stat_confidence_density()
also with other geoms. For example, we can draw the confidence distributions as ridgelines.
library(ggridges)
ggplot(cacao_means, aes(x = estimate, y = location)) +
stat_confidence_density(
aes(moe = std.error, height = stat(density)), geom = "ridgeline",
confidence = 0.68, fill = "#81A7D6", alpha = 0.8, scale = 0.08,
min_height = 0.1
) +
geom_vpline(aes(x = estimate), size = 1.5, height = 0.5, color = "#D55E00") +
xlim(2.8, 3.6) +
theme_minimal()