The data/ directory contains fifty CSV files (one per week) of timestamped sales data. Each row in a file has two columns:
sale_time - The timestamp on which the sale was made e.g. 2012-10-01 01:42:22 purchaser_gender - The gender of the person who purchased (male or female)
-
Plot daily sales for all 50 weeks
-
It looks like there has been a sudden change in daily sales. What date did it occur?
-
Is the change in daily sales at the date selected statistically significant? If so, what is the p-value?
-
Does the data suggest that the change in daily sales is due to a shift in the proportion of male-vs-female customers?
-
Assume a given day is divided into four dayparts: night (12:00AM - 6:00AM), morning (6:00AM to 12:00PM), afternoon (12:00PM to 6:00PM) and evening (6:00PM - 12:00AM). What is the percentage of sales in each daypart over all 50 weeks?