Project 1 Instructions

Please use the project template R Markdown document to complete your project. The knitted R Markdown document (as a PDF) and the raw R Markdown file (as .Rmd) must be submitted to Canvas by 12:00pm on Fri., Feb 28th, 2020. These two documents will be graded jointly, so they must be consistent (as in, don’t change the R Markdown file without also updating the knitted document!).

All results presented must have corresponding code. Any answers/results given without the corresponding R code that generated the result will be considered absent. To be clear: if you do calculations by hand instead of using R and then report the results from the calculations, you will not receive credit for those calculations. All code reported in your final project document should work properly. Please do not include any extraneous code or code which produces error messages. (Code which produces warnings is acceptable, as long as you understand what the warnings mean.)

For this project, you will be using the dataset ufo_sightings. This dataset contains over 70,000 reports of UFO sightings over the last century. Data contains city, state, country, year, time, duration, description, and observed shape of each UFO sighting. This dataset was scraped, geolocated, and time standardized from the National UFO Reporting Center (NUFORC) data by Sigmond Axel. More information about the data set can be found at https://github.com/planetsig/ufo-reports and http://www.nuforc.org/.

The column contents are as follows:

This project consists of two parts. Each part should be structured as follows:

We encourage you to be concise. A paragraph should typically not be longer than 5 sentences.

You are not required to perform any statistical tests in this project, but you may do so if you find it helpful to answer your question.

Part 1 Instructions

In Part 1, we provide you with a specific question to answer and with specific instructions on how to answer the question.

In the Introduction section, write a brief introduction to the dataset, the question, and what parts of the dataset are necessary to answer the question. You may repeat some of the information about the dataset provided above, paraphrasing on your own terms. Imagine that your project is a standalone document and the grader has no prior knowledge of the dataset.

In the Approach section, describe the data manipulation necessary to answer the given question. Provide your analysis in 1–2 code chunks, including a plot that will help you find the answer to your question. For the plot, provide a clear explanation as to why this type of plot (e.g. boxplot, barplot, histogram, etc.) is best for providing the information you are asking about. (You can draw on the materials provided here for guidance.)

In the Discussion section, interpret the results of your analysis. Identify any trends revealed (or not revealed) by the plot. Speculate about why the data looks the way it does.

Part 2 Instructions

In Part 2, you will supply the question (still using the ufo_sightings dataset) and the approach. Your question cannot repeat the material from Part 1.

Clearly state your question, introduction and approach in the sections provided. For the Introduction, you do not need to repeat the whole dataset description from Part 1, but you do need to describe the columns required to answer your question. In the Approach section, describe the data manipulation necessary to answer your question.

Use the ggplot2 library to create a plot that will help you find an answer to your question. For the plot, provide a clear explanation as to why this type of plot (e.g. boxplot, barplot, histogram, etc.) is best for providing the information you are asking about. Important: Your plot must be of a different type than the one you made for Part 1.

Answer your question by interpreting your plot and identifying any trends it reveals, or does not reveal, as the case may be.