-
Notifications
You must be signed in to change notification settings - Fork 0
/
proposal.qmd
75 lines (50 loc) · 7.5 KB
/
proposal.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
title: "Alien Encounters 🛸"
subtitle: "Visualizing UFO Sightings"
format: html
code-fold: true
editor: visual
---
```{r, echo=FALSE, results='hide'}
#| label: load-pkgs
#| message: false
library(tidyverse)
```
# Dataset
```{r, echo=TRUE, results='hide'}
#| label: load-dataset
#| message: false
#> Loading the Dataset
#> UFO_Sighting -
ufo <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/ufo_sightings.csv")
#> Places -
places <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-06-20/places.csv")
```
The [UFO sightings Dataset](https://github.com/rfordatascience/tidytuesday/tree/master/data/2023/2023-06-20) is a rich collection of reports documenting encounters with unidentified flying objects (UFOs). Comprising diverse attributes, it offers valuable insights into these unexplained phenomena. This dataset has been sourced from the TidyTuesday project, a popular platform within the data science community, known for its weekly data visualization and data cleaning challenges. It was released as part of the TidyTuesday challenge on June 20, 2023.
With its multifaceted elements and global scope, it provides a comprehensive view of these unexplained phenomena. As we delve deeper into the data, we can uncover patterns, correlations, and insights that may shed light on the intriguing world of UFO sightings, offering both scientific and anecdotal perspectives on these enigmatic events. To provide an overview of the phenomenon, we have selected the `ufo_sighting.csv` and `places.csv` files from the provided dataset, each of which contains 96429 observations of 12 variables and 14417 instances of variables, respectively.
UFO sightings are associated with unexplained aerial phenomena, making them a subject of enduring fascination. Researchers and enthusiasts are drawn to these sightings due to the mystery surrounding them, and they seek to uncover potential patterns or explanations. Our team discovered a shared interest in mysterious and scientifically curious occurrences, therefore we chose this dataset.
The following key variables were considered to develop the proposed analysis plan.
| Variable | Class | Description |
|------------------------|-----------|------------------------------------------------------------------------------------------------------------------------|
| reported_date_time | datetime | The time and date of the sighting, as it appears in the original NUFORC data. |
| reported_date_time_utc | datetime | The time and date of the sighting, normalized to UTC. |
| city | character | The city of the sighting. |
| state | character | The state, province, or similar division of the sighting. |
| country_code | character | The 2-letter country code of the sighting, normalized from the original data. |
| day_part | character | The approximate part of the day in which the sighting took place such as "nautical twilight", "sunrise", and "sunset." |
| latitude | double | The latitude for this city, from geonames.org. |
| longitude | double | The longitude for this city, from geonames.org. |
| shape | character | The reported shape of the craft. |
| duration_seconds | double | The duration normalized to seconds. |
# Questions
Q1. What are the number of UFO encounters globally?
Q2. How did the trend of UFO sightings vary over years?
# Analysis plan
## Question 1
To answer "What are the number of UFO encounters globally?", initially the dataset has to be loaded from the "TidyTuesday" source using the `read.csv` function in R. The date and location columns have to be verified to ensure that they are in the correct data types. The structure and content of the dataset has to be understood clearly. The unique values and frequency distributions of key variables, particularly `reported_date_time`, `city`, `state`, `country_code`, and `day_part` have to be studied. Relevant date and location information from the dataset, which includes `reported_date_time`, `city`, `state`, `country_code`, and potentially `day_part` has to be extracted. Depending on the analysis, data by day, month, or year can be aggregated to examine the trends . To visualize the global distribution of UFO sightings, a geographical plot can be drawn. Bubble maps can be plotted making use of the `longitude` and `latitude` information of the values `city`, `state`, and `country_code` variables. Bubble maps can simultaneously display both the geographical locations of UFO sightings and how their frequency has changed over time. Each bubble could represent a specific location, and the size or color of the bubble can be used to encode the number of sightings in a particular year.\
A `geom_treemap` can be used to hierarchically visualize the number of sightings depending on the geographical locations. This can be done by grouping the countries into respective continents. Then, the plots have to be customized with appropriate labels, titles, legends, and color schemes to enhance clarity and readability. The generated visualizations have to be analyzed for any notable patterns, clusters, or trends in UFO sightings over time and across locations to draw insights from the data.
## Question 2
To answer "How did the trend of UFO sightings vary over years?", start by making inferences from the previously generated plots in question 1. A new column has to be created to extract the year from the `reported_date_time` or `reported_date_time_utc` column. This will allow to group sightings by year. Count the number of sightings for each year by grouping the data by the extracted year. This will provide you with the annual total of UFO sightings.\
To show the trend of UFO sightings over the years, a time series plot (`geom_line`) can be used. They will be helpful in understanding the temporal distribution of UFO sightings. Monthly or yearly counts can be plotted depending on the preference. For a more informative visualization, line charts can be used depending on the granularity of the time period. Based on the depth of analysis, additional variables such as `shape` or `duration_seconds` can be incorporated to explore correlations between sighting characteristics and dates/locations.\
We can use density plot (`geom_density`) or histogram (`geom_histogram`) to see at which part of the day the sighting has happened. This type of visualization can provide insights into the diurnal patterns of UFO sightings and helps in understanding if there are any specific times of day when sightings are more frequent. Additional aesthetics, such as color or shape can be added to represent any other variables.\
Finally, the resulting plots can be used to analyze how UFO sightings have changed over time, showing the pattern visually.