Lab Worksheet 4

Part 1: The dplyr pipe

The following questions in Part 1 are from Lab Worksheet 3. Answer these questions again, but this time use the dplyr pipe (%>%) in your answer.

Problem 1: In an in-class exercise, we made the following plot of the Sitka dataset:

# download the sitka data set:
sitka <- read.csv("http://wilkelab.org/classes/SDS348/data_sets/sitka.csv")
head(sitka)
##   size Time tree treat
## 1 4.51  152    1 ozone
## 2 4.98  174    1 ozone
## 3 5.41  201    1 ozone
## 4 5.90  227    1 ozone
## 5 6.15  258    1 ozone
## 6 4.24  152    2 ozone
ggplot(sitka, aes(x = Time, y = size, group = tree)) +
  geom_line() +
  facet_wrap(~treat)

Now modify the plot so that the line for each tree is colored according to the maximum size of the tree.

sitka_new <-
  sitka %>%
  group_by(tree) %>%
  mutate(max_size = max(size))

ggplot(sitka_new, aes(x = Time, y = size, group = tree, color = max_size)) +
  geom_line() +
  facet_wrap(~treat)

Problem 2: The package nycflights13 contains information about all flights departing from one of the NY City airports in 2013. In particular, the data table flights lists on-time departure and arrival information for 336,776 individual flights:

library(nycflights13)
flights
## # A tibble: 336,776 x 19
##     year month   day dep_time sched_dep_time dep_delay arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>
##  1  2013     1     1      517            515         2      830
##  2  2013     1     1      533            529         4      850
##  3  2013     1     1      542            540         2      923
##  4  2013     1     1      544            545        -1     1004
##  5  2013     1     1      554            600        -6      812
##  6  2013     1     1      554            558        -4      740
##  7  2013     1     1      555            600        -5      913
##  8  2013     1     1      557            600        -3      709
##  9  2013     1     1      557            600        -3      838
## 10  2013     1     1      558            600        -2      753
## # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
## #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dttm>

We would like to collect some information about arrival delays of United Airlines (UA) flights. Do the following: pick all UA departures with non-zero arrival delay and calculate the mean arrival delay for each of the corresponding flight numbers. Which flight had the longest mean arrival delay and how long was that delay?

flights %>%
  filter(carrier == "UA" & arr_delay != 0) %>%
  group_by(flight) %>%
  summarize(mean_delay = mean(arr_delay)) %>%
  filter(mean_delay == max(mean_delay))
## # A tibble: 1 x 2
##   flight mean_delay
##    <int>      <dbl>
## 1   1510        283

Flight 1510 had the longest delay, with an average arrival delay of 283 minutes.

###Part 2: Combining data-frames with dplyr

Problem 1: Invent two simple data sets that allow you explain the difference between the dplyr functions left_join() and inner_join(). Explain which features of your data sets affect the behavior of these two functions.

# data set 1
d1 <- read.table(text = "
label number1
A 1
B 2
C 4", head = T)

# data set 2
d2 <- read.table(text = "
label number2
A 2
C 4
D 6
", head = T)

d1
##   label number1
## 1     A       1
## 2     B       2
## 3     C       4
d2
##   label number2
## 1     A       2
## 2     C       4
## 3     D       6
left_join(d1, d2)
## Joining, by = "label"
## Warning: Column `label` joining factors with different levels, coercing to
## character vector
##   label number1 number2
## 1     A       1       2
## 2     B       2      NA
## 3     C       4       4
inner_join(d1, d2)
## Joining, by = "label"
## Warning: Column `label` joining factors with different levels, coercing to
## character vector
##   label number1 number2
## 1     A       1       2
## 2     C       4       4

We join the two data sets by label. left_join() finds all rows in the second data set that match to rows in the first data set. Therefore, we don’t get a value for number2 corresponding to label B. By contrast, inner_join() only keeps the matching rows that exist in both data sets. Therefore, the resulting table has only two rows, one for label A and one for label C.

Problem 2: I have split the sitka data set into two data-frames. First, look up the documentation for the bind_rows function. What does bind_rows do? Next, use bind_rows to combine sitka1 and sitka2 back into a single data-frame.

The bind_rows function adds rows from one data-frame to another as long as both data-frames have the same number of columns and identical column names.

sitka1 <- sitka[1:100, ]
sitka2 <- sitka[101:395, ]
head(sitka1)
##   size Time tree treat
## 1 4.51  152    1 ozone
## 2 4.98  174    1 ozone
## 3 5.41  201    1 ozone
## 4 5.90  227    1 ozone
## 5 6.15  258    1 ozone
## 6 4.24  152    2 ozone
head(sitka2)
##     size Time tree treat
## 101 4.04  152   21 ozone
## 102 4.64  174   21 ozone
## 103 4.86  201   21 ozone
## 104 5.09  227   21 ozone
## 105 5.25  258   21 ozone
## 106 3.53  152   22 ozone
sitka_combined <- bind_rows(sitka1, sitka2)
sitka_combined
##     size Time tree   treat
## 1   4.51  152    1   ozone
## 2   4.98  174    1   ozone
## 3   5.41  201    1   ozone
## 4   5.90  227    1   ozone
## 5   6.15  258    1   ozone
## 6   4.24  152    2   ozone
## 7   4.20  174    2   ozone
## 8   4.68  201    2   ozone
## 9   4.92  227    2   ozone
## 10  4.96  258    2   ozone
## 11  3.98  152    3   ozone
## 12  4.36  174    3   ozone
## 13  4.79  201    3   ozone
## 14  4.99  227    3   ozone
## 15  5.03  258    3   ozone
## 16  4.36  152    4   ozone
## 17  4.77  174    4   ozone
## 18  5.10  201    4   ozone
## 19  5.30  227    4   ozone
## 20  5.36  258    4   ozone
## 21  4.34  152    5   ozone
## 22  4.95  174    5   ozone
## 23  5.42  201    5   ozone
## 24  5.97  227    5   ozone
## 25  6.28  258    5   ozone
## 26  4.59  152    6   ozone
## 27  5.08  174    6   ozone
## 28  5.36  201    6   ozone
## 29  5.76  227    6   ozone
## 30  6.00  258    6   ozone
## 31  4.41  152    7   ozone
## 32  4.56  174    7   ozone
## 33  4.95  201    7   ozone
## 34  5.23  227    7   ozone
## 35  5.33  258    7   ozone
## 36  4.24  152    8   ozone
## 37  4.64  174    8   ozone
## 38  4.95  201    8   ozone
## 39  5.38  227    8   ozone
## 40  5.48  258    8   ozone
## 41  4.82  152    9   ozone
## 42  5.17  174    9   ozone
## 43  5.76  201    9   ozone
## 44  6.12  227    9   ozone
## 45  6.24  258    9   ozone
## 46  3.84  152   10   ozone
## 47  4.17  174   10   ozone
## 48  4.67  201   10   ozone
## 49  4.67  227   10   ozone
## 50  4.80  258   10   ozone
## 51  4.07  152   11   ozone
## 52  4.31  174   11   ozone
## 53  4.90  201   11   ozone
## 54  5.10  227   11   ozone
## 55  5.10  258   11   ozone
## 56  4.28  152   12   ozone
## 57  4.80  174   12   ozone
## 58  5.27  201   12   ozone
## 59  5.55  227   12   ozone
## 60  5.65  258   12   ozone
## 61  4.47  152   13   ozone
## 62  4.89  174   13   ozone
## 63  5.23  201   13   ozone
## 64  5.55  227   13   ozone
## 65  5.74  258   13   ozone
## 66  4.46  152   14   ozone
## 67  4.84  174   14   ozone
## 68  5.11  201   14   ozone
## 69  5.34  227   14   ozone
## 70  5.46  258   14   ozone
## 71  4.60  152   15   ozone
## 72  4.08  174   15   ozone
## 73  4.17  201   15   ozone
## 74  4.35  227   15   ozone
## 75  4.59  258   15   ozone
## 76  3.73  152   16   ozone
## 77  4.15  174   16   ozone
## 78  4.61  201   16   ozone
## 79  4.87  227   16   ozone
## 80  4.93  258   16   ozone
## 81  4.67  152   17   ozone
## 82  4.88  174   17   ozone
## 83  5.18  201   17   ozone
## 84  5.34  227   17   ozone
## 85  5.49  258   17   ozone
## 86  2.96  152   18   ozone
## 87  3.47  174   18   ozone
## 88  3.76  201   18   ozone
## 89  3.89  227   18   ozone
## 90  4.30  258   18   ozone
## 91  3.24  152   19   ozone
## 92  3.93  174   19   ozone
## 93  4.76  201   19   ozone
## 94  4.62  227   19   ozone
## 95  4.64  258   19   ozone
## 96  4.36  152   20   ozone
## 97  4.77  174   20   ozone
## 98  5.02  201   20   ozone
## 99  5.26  227   20   ozone
## 100 5.45  258   20   ozone
## 101 4.04  152   21   ozone
## 102 4.64  174   21   ozone
## 103 4.86  201   21   ozone
## 104 5.09  227   21   ozone
## 105 5.25  258   21   ozone
## 106 3.53  152   22   ozone
## 107 4.25  174   22   ozone
## 108 4.68  201   22   ozone
## 109 4.97  227   22   ozone
## 110 5.18  258   22   ozone
## 111 4.22  152   23   ozone
## 112 4.69  174   23   ozone
## 113 5.07  201   23   ozone
## 114 5.37  227   23   ozone
## 115 5.58  258   23   ozone
## 116 2.79  152   24   ozone
## 117 3.10  174   24   ozone
## 118 3.30  201   24   ozone
## 119 3.38  227   24   ozone
## 120 3.55  258   24   ozone
## 121 3.30  152   25   ozone
## 122 3.90  174   25   ozone
## 123 4.34  201   25   ozone
## 124 4.96  227   25   ozone
## 125 5.40  258   25   ozone
## 126 3.34  152   26   ozone
## 127 3.81  174   26   ozone
## 128 4.21  201   26   ozone
## 129 4.54  227   26   ozone
## 130 4.86  258   26   ozone
## 131 3.76  152   27   ozone
## 132 4.36  174   27   ozone
## 133 4.70  201   27   ozone
## 134 5.44  227   27   ozone
## 135 5.32  258   27   ozone
## 136 4.49  152   28   ozone
## 137 4.76  174   28   ozone
## 138 5.15  201   28   ozone
## 139 5.37  227   28   ozone
## 140 5.56  258   28   ozone
## 141 4.88  152   29   ozone
## 142 5.14  174   29   ozone
## 143 5.52  201   29   ozone
## 144 6.08  227   29   ozone
## 145 6.17  258   29   ozone
## 146 4.88  152   30   ozone
## 147 5.32  174   30   ozone
## 148 5.63  201   30   ozone
## 149 5.75  227   30   ozone
## 150 5.94  258   30   ozone
## 151 3.80  152   31   ozone
## 152 4.16  174   31   ozone
## 153 4.45  201   31   ozone
## 154 4.89  227   31   ozone
## 155 5.05  258   31   ozone
## 156 4.46  152   32   ozone
## 157 4.62  174   32   ozone
## 158 5.00  201   32   ozone
## 159 5.40  227   32   ozone
## 160 5.49  258   32   ozone
## 161 4.29  152   33   ozone
## 162 4.82  174   33   ozone
## 163 5.32  201   33   ozone
## 164 5.46  227   33   ozone
## 165 5.50  258   33   ozone
## 166 4.06  152   34   ozone
## 167 4.58  174   34   ozone
## 168 4.81  201   34   ozone
## 169 5.12  227   34   ozone
## 170 5.27  258   34   ozone
## 171 5.16  152   35   ozone
## 172 5.43  174   35   ozone
## 173 5.71  201   35   ozone
## 174 6.08  227   35   ozone
## 175 6.21  258   35   ozone
## 176 3.81  152   36   ozone
## 177 4.12  174   36   ozone
## 178 4.42  201   36   ozone
## 179 4.62  227   36   ozone
## 180 4.60  258   36   ozone
## 181 5.09  152   37   ozone
## 182 5.62  174   37   ozone
## 183 5.90  201   37   ozone
## 184 6.36  227   37   ozone
## 185 6.49  258   37   ozone
## 186 4.13  152   38   ozone
## 187 4.71  174   38   ozone
## 188 5.27  201   38   ozone
## 189 5.56  227   38   ozone
## 190 5.72  258   38   ozone
## 191 4.85  152   39   ozone
## 192 5.36  174   39   ozone
## 193 5.52  201   39   ozone
## 194 5.96  227   39   ozone
## 195 6.13  258   39   ozone
## 196 4.11  152   40   ozone
## 197 4.62  174   40   ozone
## 198 4.95  201   40   ozone
## 199 5.28  227   40   ozone
## 200 5.43  258   40   ozone
## 201 4.95  152   41   ozone
## 202 5.39  174   41   ozone
## 203 5.82  201   41   ozone
## 204 6.42  227   41   ozone
## 205 6.48  258   41   ozone
## 206 4.36  152   42   ozone
## 207 4.65  174   42   ozone
## 208 5.04  201   42   ozone
## 209 5.38  227   42   ozone
## 210 5.47  258   42   ozone
## 211 4.05  152   43   ozone
## 212 4.65  174   43   ozone
## 213 5.09  201   43   ozone
## 214 5.44  227   43   ozone
## 215 5.60  258   43   ozone
## 216 3.76  152   44   ozone
## 217 4.27  174   44   ozone
## 218 4.59  201   44   ozone
## 219 5.10  227   44   ozone
## 220 5.25  258   44   ozone
## 221 2.84  152   45   ozone
## 222 3.25  174   45   ozone
## 223 3.69  201   45   ozone
## 224 4.16  227   45   ozone
## 225 4.21  258   45   ozone
## 226 4.33  152   46   ozone
## 227 4.80  174   46   ozone
## 228 5.09  201   46   ozone
## 229 5.42  227   46   ozone
## 230 5.61  258   46   ozone
## 231 3.99  152   47   ozone
## 232 4.55  174   47   ozone
## 233 4.91  201   47   ozone
## 234 5.26  227   47   ozone
## 235 5.30  258   47   ozone
## 236 3.50  152   48   ozone
## 237 3.75  174   48   ozone
## 238 3.97  201   48   ozone
## 239 4.71  227   48   ozone
## 240 4.85  258   48   ozone
## 241 3.31  152   49   ozone
## 242 3.45  174   49   ozone
## 243 4.16  201   49   ozone
## 244 4.48  227   49   ozone
## 245 4.54  258   49   ozone
## 246 3.03  152   50   ozone
## 247 3.55  174   50   ozone
## 248 3.97  201   50   ozone
## 249 4.40  227   50   ozone
## 250 4.58  258   50   ozone
## 251 3.27  152   51   ozone
## 252 3.83  174   51   ozone
## 253 4.44  201   51   ozone
## 254 4.80  227   51   ozone
## 255 4.89  258   51   ozone
## 256 3.56  152   52   ozone
## 257 4.18  174   52   ozone
## 258 4.70  201   52   ozone
## 259 5.27  227   52   ozone
## 260 5.28  258   52   ozone
## 261 3.39  152   53   ozone
## 262 3.73  174   53   ozone
## 263 3.92  201   53   ozone
## 264 4.11  227   53   ozone
## 265 4.15  258   53   ozone
## 266 3.72  152   54   ozone
## 267 4.16  174   54   ozone
## 268 4.55  201   54   ozone
## 269 5.03  227   54   ozone
## 270 5.02  258   54   ozone
## 271 4.53  152   55 control
## 272 5.05  174   55 control
## 273 5.18  201   55 control
## 274 5.41  227   55 control
## 275 5.42  258   55 control
## 276 4.97  152   56 control
## 277 5.32  174   56 control
## 278 5.83  201   56 control
## 279 6.29  227   56 control
## 280 6.45  258   56 control
## 281 4.37  152   57 control
## 282 4.81  174   57 control
## 283 5.03  201   57 control
## 284 5.19  227   57 control
## 285 5.40  258   57 control
## 286 4.58  152   58 control
## 287 4.99  174   58 control
## 288 5.37  201   58 control
## 289 5.68  227   58 control
## 290 5.93  258   58 control
## 291 4.00  152   59 control
## 292 4.50  174   59 control
## 293 4.92  201   59 control
## 294 5.44  227   59 control
## 295 5.87  258   59 control
## 296 4.73  152   60 control
## 297 5.05  174   60 control
## 298 5.33  201   60 control
## 299 5.92  227   60 control
## 300 6.01  258   60 control
## 301 5.15  152   61 control
## 302 5.63  174   61 control
## 303 6.11  201   61 control
## 304 6.39  227   61 control
## 305 6.61  258   61 control
## 306 4.10  152   62 control
## 307 4.46  174   62 control
## 308 4.84  201   62 control
## 309 5.29  227   62 control
## 310 5.48  258   62 control
## 311 3.22  152   63 control
## 312 3.85  174   63 control
## 313 4.47  201   63 control
## 314 4.85  227   63 control
## 315 5.11  258   63 control
## 316 2.23  152   64 control
## 317 2.89  174   64 control
## 318 3.16  201   64 control
## 319 3.40  227   64 control
## 320 3.52  258   64 control
## 321 3.65  152   65 control
## 322 4.36  174   65 control
## 323 4.76  201   65 control
## 324 5.18  227   65 control
## 325 5.44  258   65 control
## 326 3.40  152   66 control
## 327 3.92  174   66 control
## 328 4.50  201   66 control
## 329 4.97  227   66 control
## 330 5.14  258   66 control
## 331 5.16  152   67 control
## 332 5.49  174   67 control
## 333 5.74  201   67 control
## 334 6.05  227   67 control
## 335 6.21  258   67 control
## 336 4.04  152   68 control
## 337 4.52  174   68 control
## 338 5.15  201   68 control
## 339 5.59  227   68 control
## 340 5.87  258   68 control
## 341 4.52  152   69 control
## 342 4.91  174   69 control
## 343 5.04  201   69 control
## 344 5.71  227   69 control
## 345 5.97  258   69 control
## 346 4.56  152   70 control
## 347 5.12  174   70 control
## 348 5.40  201   70 control
## 349 5.69  227   70 control
## 350 5.89  258   70 control
## 351 4.90  152   71 control
## 352 5.35  174   71 control
## 353 5.71  201   71 control
## 354 6.12  227   71 control
## 355 6.25  258   71 control
## 356 4.83  152   72 control
## 357 5.10  174   72 control
## 358 5.43  201   72 control
## 359 5.59  227   72 control
## 360 6.04  258   72 control
## 361 5.46  152   73 control
## 362 5.79  174   73 control
## 363 6.12  201   73 control
## 364 6.41  227   73 control
## 365 6.63  258   73 control
## 366 4.17  152   74 control
## 367 4.67  174   74 control
## 368 5.16  201   74 control
## 369 5.56  227   74 control
## 370 5.75  258   74 control
## 371 3.35  152   75 control
## 372 4.05  174   75 control
## 373 4.51  201   75 control
## 374 5.22  227   75 control
## 375 5.44  258   75 control
## 376 3.33  152   76 control
## 377 3.82  174   76 control
## 378 4.38  201   76 control
## 379 4.99  227   76 control
## 380 5.17  258   76 control
## 381 3.41  152   77 control
## 382 3.68  174   77 control
## 383 4.03  201   77 control
## 384 4.28  227   77 control
## 385 4.54  258   77 control
## 386 4.50  152   78 control
## 387 4.80  174   78 control
## 388 5.28  201   78 control
## 389 5.83  227   78 control
## 390 6.16  258   78 control
## 391 2.99  152   79 control
## 392 3.61  174   79 control
## 393 4.48  201   79 control
## 394 4.91  227   79 control
## 395 5.06  258   79 control