This is the dataset you will be working with:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

triathlon <- olympics %>% 
  filter(!is.na(height)) %>%             # only keep athletes with known height
  filter(sport == "Triathlon") %>%       # keep only triathletes
  mutate(
    medalist = case_when(                # add column to track medalist vs not
      is.na(medal) ~ "non-medalist",
      !is.na(medal) ~ "medalist"         # any medals (Gold, Silver, Bronze) count
    )
  )

triathlon is a subset of olympics and contains only the data for triathletes. More information about the original olympics dataset can be found at https://github.com/rfordatascience/tidytuesday/tree/master/data/2021/2021-07-27/readme.md and https://www.sports-reference.com/olympics.html.

For this project, use triathlon to answer the following questions about athletes competing in this sport:

  1. In how many events total did male and female triathletes compete for each country?
  2. Are there height differences among triathletes between sexes or over time?
  3. Are there height differences among triathletes that have medaled or not, again also considering athlete sex?

You should make one plot per question.

Hints:

You can delete these instructions from your project. Please also delete text such as Your approach here or # Q1: Your R code here.

Introduction: Your introduction here.

Approach: Your approach here.

Analysis:

# Q1: Your R code here
# Q2: Your R code here
# Q3: Your R code here

Discussion: Your discussion of results here.