This is the dataset you will be working with:
ufo_sightings <- read_csv("https://wilkelab.org/classes/SDS348/data_sets/ufo_sightings_clean.csv") %>% separate(datetime, into = c("month", "day", "year"), sep = "/") %>% separate(year, into = c("year", "time"), sep = " ") %>% separate(date_posted, into = c("month_posted", "day_posted", "year_posted"), sep = "/") %>% select(-time, -month_posted, -day_posted) %>% mutate(year = as.numeric(year)) %>% filter(!is.na(country))
## ## ── Column specification ──────────────────────────────────────────────────────── ## cols( ## datetime = col_character(), ## city = col_character(), ## state = col_character(), ## country = col_character(), ## shape = col_character(), ## duration_seconds = col_double(), ## duration_hours_min = col_character(), ## comments = col_character(), ## date_posted = col_character(), ## latitude = col_double(), ## longitude = col_double() ## )
Question: Since 1990, which cities have reported the most UFO sightings, and how has the number of UFO sightings for these cities changed over time?
Introduction: We are working with the
ufo_sightings dataset, which contains 70,662 reports of UFO sightings from 1910 to 2014 for five countries (US, Canada, Australia, Great Britain, and Germany). Each row of the dataset represents a single UFO sighting. The dataset contains 14 columns that provide the time, location, and description of the sighting.
To determine how the number of UFO sightings has changed over the years in the cities with the highest number of reported sightings, we will be working with the following columns:
city: the city in which the sighting was reported
year: the year of the reported sighting
Approach: Our approach is to fist determine which cities have the highest number of UFO sightings. Next, we will visualize the number of UFO sightings across the years for the top six cities using a scatter plot and a linear regression line. A regression line can be used to determine if there is a trend between the two continuous variables. The alternative to a scatter plot with regression line could be a line plot. However, since we do not know if there is a relationship between UFO sightings and time, a line plot does not seem appropriate here.
To look at the cities with the highest number of UFO sightings, these function will be applied:
filter()to extract only the sightings after 1990
count()to count the number of sightings per city
desc()to sort the table by descending count
slice()to keep the top cities with the highest number of reported UFO sightings
To plot the number of UFO sightings over time, we will use the following functions:
filter()to reduce the dataset to the top cities with the highest number of sightings and to all sightings after 1990
count()to count the number of observations per year and city
mutate()to rewrite the
citycolumn in a new order
fct_reorder()to reorder the
citycolumn by the number of sightings
fct_recode()to change the city names to upper case
geom_point()to create a scatter plot of UFO sighting counts for each year
geom_smooth()to add a regression line to the scatter plot
facet_wrap()to create scatter plot facets for each city
# extracting the top 6 cities with the highest number of UFO reports: top_cities <- ufo_sightings %>% filter(year > 1990) %>% count(city) %>% arrange(desc(n)) %>% slice(1:6) # let's look at the table: top_cities
## # A tibble: 6 x 2 ## city n ## <chr> <int> ## 1 seattle 503 ## 2 phoenix 439 ## 3 portland 360 ## 4 las vegas 357 ## 5 los angeles 324 ## 6 san diego 315
# counting the number of UFO sightings in the top cities for each year: summary <- ufo_sightings %>% filter(city %in% top_cities$city, year > 1990) %>% count(year, city) %>% mutate(city = fct_rev(fct_reorder(city, n, sum))) %>% mutate( # change all city names to upper case city = fct_recode( city, Seattle = "seattle", Phoenix = "phoenix", Portland = "portland", `Las Vegas` = "las vegas", `Los Angeles` = "los angeles", `San Diego` = "san diego" ) ) # looking at the top 3 rows in the summarized data. head(summary, n = 3)
## # A tibble: 3 x 3 ## year city n ## <dbl> <fct> <int> ## 1 1991 Las Vegas 3 ## 2 1991 Los Angeles 1 ## 3 1991 Phoenix 1
# plotting the number of UFO sightings across time: ggplot(summary, aes(year, n)) + geom_point(size = 1) + geom_smooth( method = "lm", color = "salmon3", fill = "antiquewhite3", size = 0.9) + facet_wrap(vars(city)) + scale_x_continuous( name = "Year", limits = c(1990, 2015), breaks = seq(from = 1990, to = 2015, by = 5), labels = seq(from = 1990, to = 2015, by = 5), expand = c(0.05, 0.05)) + scale_y_continuous( name = "Number of UFO Sightings", limits = c(0, 45), breaks = seq(from = 0, to = 45, by = 10), expand = c(0, 0)) + theme_bw(12) + theme( axis.text = element_text(color = "black", size = 10), panel.grid.minor = element_blank(), panel.spacing = unit(1, "lines"), strip.text.x = element_text(size = 12), aspect.ratio = 3/5 )
## `geom_smooth()` using formula 'y ~ x'
Discussion: The top 6 cities with the highest number of reported UFO sightings are Seattle, Phoenix, Portland, Las Vegas, Los Angeles and San Diego. Their reported UFO sighting counts were 503, 439, 360, 357, 324, and 315 sightings, respectively. All six of these cities are located in the west and midwest United States.
Looking at the scatter plots, we see that, overall, the number of UFO sightings increases with time for all six cities. The cities Phoenix and Seattle appear to have the greatest variance in UFO sighting counts across time; there were almost as many reports from 1995 to 2005 as from 2005 to 2015. These two cities also had the widest confidence bands. Las Vegas, on the other hand, appears to have the least variance; we see a narrow confidence band and a more prominent linear increase in sightings across the years. We can speculate that the improvement of communication technology in recent years has allowed easier access to the National UFO Reporting Center.