-
Notifications
You must be signed in to change notification settings - Fork 0
/
STA304PS2-Group5.Rmd
430 lines (255 loc) · 28.3 KB
/
STA304PS2-Group5.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
---
title: "A statistical study on how people's life satisfaction presents differently among five family income levels and the presence/absence status of children"
author:
- Ruoning Guo Student No.1004772114
- Fanzhen Meng Student No.1002812824
- Yifan Zhu Student No.1006345849
- Kaiou Guo Student No.1004151865
date: "10/19/2020"
output:
pdf_document:
latex_engine: xelatex
html_document:
df_print: paged
mainfont: Times New Roman
fig.align: center
---
```{r , echo=FALSE,results="hide",messange=FALSE,warnings=FALSE,include=FALSE}
library(readr)
library(dplyr)
library(FrF2)
library(kableExtra)
library(ggplot2)
data <- read_csv("~/Desktop/304ps2data/gss.csv")
vars <- c("feelings_life","total_children","income_family" )
newdata <- data[vars] #20602
newdata <- newdata[!is.na(newdata$feelings_life), ]
newdata <- newdata[!is.na(newdata$income_family), ]
knitr::opts_chunk$set(echo = FALSE,warning = FALSE, message = FALSE,results = "hide")
```
```{r,echo=FALSE}
newdata <- newdata %>%
head(newdata, n=1000)
attach(newdata)
dim(newdata)
str(newdata)
Children=array(0,1000)
for (i in 1:1000)
{ if (newdata[i,2] > 0)
{Children[i]=1}
}
Children=as.factor(Children)
newdata1 <- cbind(newdata, Children)
attach(newdata1)
library(plyr)
newValueIncome <- c("Less than $25,000"="below25k","$25,000 to $49,999"="25-50k","$50,000 to $74,999"="50-75k","$75,000 to $99,999"="75-100k","$125,000 and more"="100k+","$100,000 to $ 124,999"="100k+")
newdata1$income <- revalue(newdata1$income, newValueIncome)
CE <- NULL
for (i in 1:1000)
{ if (newdata1$Children[i]==0 & newdata1$income[i]=="below25k")
{CE[i]="NCbelow25k"}
else if (newdata1$Children[i]==0 & newdata1$income[i]=="25-50k")
{CE[i]="NC25-50k"}
else if (newdata1$Children[i]==0 & newdata1$income[i]=="50-75k")
{CE[i]="NC50-75k"}
else if (newdata1$Children[i]==0 & newdata1$income[i]=="75-100k")
{CE[i]="NC75-100k"}
else if (newdata1$Children[i]==0 & newdata1$income[i]=="100k+")
{CE[i]="NC100k+"}
else if (newdata1$Children[i]==1 & newdata1$income[i]=="below25k" )
{CE[i]="Cbelow25k"}
else if (newdata1$Children[i]==1 & newdata1$income[i]=="25-50k" )
{CE[i]="C25-50k"}
else if (newdata1$Children[i]==1 & newdata1$income[i]=="50-75k" )
{CE[i]="C50-75k"}
else if (newdata1$Children[i]==1 & newdata1$income[i]=="75-100k" )
{CE[i]="C75-100k"}
else {CE[i]="C100k+"} }
Children=as.factor(Children)
CE=as.factor(CE)
newdata1 <- newdata1[,-2]
newdata1 <- newdata1[,-2]
```
## Abstract
This paper studied people’s life satisfaction score by analyzing the Canadian GSS from 2017. Our results showed that people with lower family income represent significantly different on life satisfaction compared with other income levels, whereas the presence of children does not. By creating a new variable that combines income levels with having/no children status, we discovered that: 1) when people don't have children, the lowest income groups were no differences in mean life satisfaction compared to the highest income group 2) when people having children, the lowest income groups were significantly different in mean satisfaction compare to other high income groups. The results may open a door for the discussion of increasingly family pressure in modern society.
## Introduction
Life satisfaction at an individual level is extremely important to the well-being of society. In fact, it can be seen as a measurement for human welfare (Haybron, 2011). Empirical evidence suggests that individuals with higher life satisfaction are more productive (Bellet, et al). What’s more, life satisfaction is also linked to better academic performance among students (Antaramian, 2017). Therefore, it is important to understand what factors could contribute to a person’s life satisfaction in order to improve society as a whole.
This analysis will study how do different variables contribute to an individual’s life satisfaction using the Canadian general social surveys (GSS) from 2017. The GSS recorded 20602 individual-level data regarding each respondent’s gender, income, education, etc (Statistics Canada, 2020). In particular, there is a variable that asks the respondents about their “feelings about life as a whole”, which can be seen as a subjective measurement of the respondents’ level of life satisfaction (Statistics Canada,2020).
When conducting our analysis, we wanted to narrow down some of the factors contributing to overall life satisfaction. Hence two key factors-"children" and "income" have been discovered from the GSS survey. In the following study, we will be focusing on investigating the close relationship between these two variables and the Canadian's life satisfaction. This paper involves six parts: data, model choosing, results, discussion, weakness, and next step.
## Data
```{r,echo=FALSE, results="show",message=FALSE}
angie_table <- FrF2(nruns = 8, nfactors = 3, factor.names =c("Feelings_life","Children","Income"))
angie_table$Blocks <- NULL
angie_table <- arrange(angie_table,Children, Income, Feelings_life)
attach(angie_table)
angie_table$Feelings_life <- c(8,10,8,10,8,9,4,10)
angie_table$Children <- c(1,1,1,1,0,1,1,1)
angie_table$Income <- c("25-50k","75-100k","75-100k","100k+","50-75k","50-75k","below25k","below25k")
kable(angie_table, position="center")
```
### Data understanding
To understand our data, we must do a simple introduction to the methodology first. So the GSS is an annual survey designed to study the living conditions and well-being of Canadians (Government of Canada, 2020). The target population of this survey is all Canadians; however, the sampling frame is Canadians older than 15 with functional phone number who lives in private households in all provinces excluding the 3 territories (Government of Canada, 2020). The sampling frame should be a good representation of the target population, therefore the GSS is expected to provide useful and reliable information about the living conditions of Canadians. The non-response questions have been named NA. This paper will randomly select 1000 observations from the sampled population of 20602 individuals, and conduct the study only using the 1000 newly sampled observations.
The table above is a preview of our data set. The variables in the dataset are:
- Children(two levels): the person has children(1) or not(0).
- Income(five levels): below25k, 25-50k, 50-75k, 75-100k, and 100k+, which indicates the annual family income.
- feeling_life: the person's "feelings of life as a whole" on a scale of 1-10, which can be seen as a level of life satisfaction.
### Data preparation
After using the "gsscleaning" code provided by our instructor Rohan Alexander and Sam Caetano, we've made some additional selection and modification.
- Remove NA and other unused columns: To facilitate the analysis, all rows that give NA in feelings_life, total_children, and income_family have been taking out first.
- Select the first 1000 rows: Since the raw dataset was already in a random sequence, there is no need to randomize the sample again. So we've selected the first 1000 rows of respondents from the raw dataset.
- Shorten the income level's name and two income level "125,000 and more" and "$100,000 to 124,999" are combined into one level "100k+". So there were 6 levels , now 5 levels.
- Create two new variables: 1) Children-by converting the number of children to a factor with 2 levels; 1 if the person has 1 or more than 1 children, and 0 if the number of children is 0, and 2) CE- a variable that combines children status and family income.
Replace the continuous variable-"total_children" to a binary variable might omit some of the variance bias since a family with 1 child or 5 children will be placed at the same level. However, it will contribute to a more intuitive analysis. More importantly, using a binary variable to combine with another categorical variable to make a deeper analysis would be more feasible.
## Model
### Model selection
Linear regression is known as a statistical model that can be used to predict a continuous outcome according to one or more continuous predictor variables. By contrast, the analysis of variance(ANOVA) model is used to predict a continuous outcome according to one or more categorical predictor variables(P.Mean, 2010). If we used a linear regression model to apply and analyze the impact of life satisfaction on the two categorical variables, some responses will not fit linear regression assumptions and cause assumption violation.
In our study, a two-sample t.test procedure will firstly be used to investigate whether or not there is a difference in life satisfaction between children status(having children or not). Then the one-way analysis of variance will be used to predict life satisfaction(scale of 1-10) according to the categorical variable -Income with 5 levels. Specifically, it will show the difference between two or more means through significance tests and provides us a way to make multiple comparisons of several groups of means. This is crucially important to our study because to compare the mean life satisfaction among different income levels is one of the most important parts of our analysis. In addition, the three assumptions of one-way ANOVA are i)normality ii)homogeneity of variance iii)independence(STAT:5201, University of IOWA). They will be verified in the discussion section.
However, ANOVA also has weaknesses. The results from the ANOVA indicated that several groups of means were not equal (p-value < 0.05), but it will not tell which means were different from which other means. Thus we will apply Tukey’s HSD test to make further analysis. Tukey's test reports whether those means for every possible pair of levels are significantly different or not. Using Tukey’s test, we will be able to effectively adjust the p-value of each comparison so that it corrects for multiple comparisons(Meng, 2019).
When conducting our model, we are concerned and wanted to investigate that: 1) is family income related to people's life satisfaction? 2) do people who have children possess a different life satisfaction score from those who do not have children? 3) whether or not there is a difference in life satisfaction among the 10 categories of people classified by the combination of their children status and family income?
### Alternative model
Since the response variable, feelings_life is a categorical and ordered variable, it can take a number from 1 to 10, where higher numbers can be translated into higher life satisfaction. The ordinal logistic regression might be more suitable for modeling when the response is ordinal (Frost,2017). It could potentially provide a better estimation than the ANOVA model.
Furthermore, the income independent variable can be transformed into ordinal variables which will increase the interpretability of the estimated coefficient. We can construct a new variable denoted as X_1:
X_1=1, if family income is below $25,000
X_1=2, if family income is between $25,000 to 49,999
X_1=3, if family income is between $50,000 to 74,999
X_1=4, if family income is between $75,000 to 99,999
X_1=5, if family income is between $100,000 to 124,999
X_1=6, if family income is $125,000 and more.
This new variable can also replace the income variable used in the ANOVA model. Note if all we want to study is whether if family income has a statistically significant impact on life satisfaction, using or not would not make a large difference. However, the estimated coefficient on could provide real meanings under certain regression models.
We can then regress feelings_life on, children1, and the interaction term using ordinal logistic regression. A positive and significant coefficient would mean that the independent variable has a positive statistically significant impact on life satisfaction. For example, given the result from our ANOVA model, we would expect the coefficient to be positive. The coefficient on the interaction term can be used to study how does the impact of income on life satisfaction changes based on whether if the respondent has children or not.
Generally speaking, we would expect the ordinal logistic model to outperform linear models when the response variable is ordinal (Lee,2019). However, the ordinal logistic model also faces other limitations. Ordinal logistic regression has 4 key assumptions: ordinal response, variation in independent variables, no collinearity, and proportional odds (Lee,2019).
- The first two assumptions are clearly satisfied.
- The third assumption can be tested by studying the correlation between independent variables. However, given that we only have two independent variables, this should not be a large concern. Intuitively speaking, we would not expect having children or not to be significantly and directly correlated with income. Nevertheless, more rigorous methods should be used to test for multicollinearity (Lee,2019).
- The last assumption could be problematic because it requires that the relationship between each value of the response variable being the same (Lee,2019). Life satisfaction is a subjective term that cannot be exactly measured. One single value of the response by itself is quite meaningless, so intuitively, we would not know if this assumption holds or not. However, Brant's test can be used to test for proportional odds (Lee,2019).
In summary, the ordinal logistic model is an alternative model that can be used in our study. It might outperform our current model if all assumptions are satisfied, however, it also faces its own limitations.
## Results
### The boxplots below show people's life satisfaction for people who have children and no children, for people classified by family income, and people classified by both children status and family income.
Figure 1: To compare life satisfaction between children and no-children status
```{r, echo=FALSE, results="hide",fig.width=5.5, fig.height=4.5}
attach(newdata1)
#have children or no children
boxplot(feelings_life~Children, main="Figure 1: Life satisfaction by children status",ylab="Life satisfaction ",xlab="Children status")
children_mean <- tapply(feelings_life,Children,mean)
children_mean
points(children_mean,col="red",pch=20)
var.test(feelings_life~Children)
t.test(feelings_life~Children, var.equal=FALSE) #Welch Two Sample t-test
```
- The distribution of life satisfaction among people who have children and no children seems similar in variance and mean(these red dots). In particular, people who have children look have a higher life satisfaction on average, but not much.
Figure 2: To compare life satisfaction by family ncome
```{r, echo=FALSE, results="hide",fig.width=8, fig.height=6}
attach(newdata1)
boxplot(feelings_life~income, main="Figure 2: Life satisfaction by income",ylab="Income",xlab="Life satisfaction")
income_mean <- tapply(feelings_life,income,mean)
points(income_mean,col="red",pch=20)
anova(lm(feelings_life~income))
TukeyHSD(aov(feelings_life~as.factor(income)))
```
- From the plot above, we can observe that as the family income increases, life satisfaction increasing as well.
Figure 3: to compare life satisfaction among the 10 categories of the new factor-CE (people that classified by the combination of their children status and family income)
```{r, echo=FALSE, results="hide",fig.width=13, fig.height=8}
CE_mean <- tapply(feelings_life,CE,mean)
tapply(feelings_life,CE,var)
t.test(feelings_life~Children, var.equal=FALSE)
lm(feelings_life~CE)
anova(lm(feelings_life~CE))
TukeyHSD(aov(feelings_life~CE))
boxplot(feelings_life~CE, main="Figure 3: Life satisfaction by CE",ylab="Life satisfaction",xlab="CE",xpd=NA)
points(CE_mean,col="red",pch=20)
```
Recall: CE is a variable that combines children status and family income.
(Note: C and NC indicates the children status- Having children or No children; family income level is following after. For example, NCbelow25k indicates people who don't have children and with annual family income below 25,000 dollars; C100+ indicate people who have children and with annual family income more than 100,000 dollars.)
- The boxplot shows that for all family income below25k, 25-50k, 50k-75k, people without children have a higher average life satisfaction than people who have children. On the contrary, for higher family income 75-100k and 100k+, people who have children have a higher average life satisfaction than those who don't have children.
### Residual plots
Figure 4
```{r, fig.width=7, fig.height=5}
par(mfrow=c(2,2))
lm_CE <- lm(feelings_life~CE, newdata1)
plot(lm_CE)
```
- The Residuals versus Fitted values plot shows the fitted values (predicted values) on the x-axis and the residuals (errors) on the y-axis. It can be observed that there is a pattern that the variation of the residuals is not randomly distributed around 0. Hence the assumption of constant variance might not be held.
- Normal quantile plot: the majority of residual points line on the 45-degree reference line, which is a normal distribution. Hence the assumption of normality holds true.
- "Residuals vs Leverage" plot helps us to detect if there are any influential cases. Since they did not exceed the Cook's distance, there are no influential points.
## Discussion
### Section 1: Investigate whether or not there is a difference in life satisfaction between children status(having children or not).
- Using F-test for variance, we got the p-value=0.04378 therefore we reject H0, that evidence both groups have different variances. Then we need to use Welch's two-sample t-test to adjust the number of degrees of freedom when the variances are different.
- By using Walch's two-sample t-test(Meng, 2019), we have t = -1.384, df = 705, p-value = 0.1668. Evidence shows(p=0.17) that people's life satisfactionhas no difference for people who have children or not.
### Section 2: Investigate whether or not there is a difference in life satisfaction among people classified by family income.
- Applying a one-way analysis of variance. In one-way ANOVA, there are two possible hypotheses. The null hypothesis (H0) is that there is no difference between 5 family income levels and equality between means. The alternative hypothesis (H1) is that there is a difference between the means and levels(Meng, 2019).
- After that we have test statistics(df)=26.295(1,998) and P-value<0.0001, which reject H0.This is strong evidence that shows that life satisfaction means are very different from the 5 levels of family income.
- In addition, the Tukey’s procedure((Meng, 2019)) also indicates the 5 levels are very different. We capture the people with an income level below 25k were significantly different(p-value<0.0001) from all other income levels in mean life satisfaction.
### Section 3: Investigate whether or not there is a difference in life satisfaction among the 10 categories of people classified by the combination of their children status and family income.
- By using ANOVA, there is strong evidence that shows that not all 10 categories of people have the same life satisfaction means(p-value<0.0001).
- However, there are several notable points by observing Tukey’s output and then comparing the p-values. Evidence shows that people(have children) with family income 0-25k, 25k-50k and 100k+ were no difference in mean life satisfaction compared to people who have no children(p-value=0.998, 0.999, 0.445 respectively). In another word, it seems these three income group's life satisfaction level is not affected by children status. In addition, no-children people with income below 25k were no differences in mean life satisfaction compare to no-children people with income 75-100k and 100k+(p-value=0.25 and 0.30).
- On the other hand, people who have children with income below 25k were significantly different in mean satisfaction compare to all other income groups(p-value<0.01). This can also be observed in figure 3 boxplot, which shows the largest mean difference in life satisfaction is between C25k and C100k+(people who have children with income below 25k and people who have children with income above 100k).
### Section 4: Assess whether the necessary assumptions of the model hold.
- Normality: From results-Figure 4 observation we know that the normality hold.
- Constant variance: the assumption of constant variance might not behold since there is a pattern that the variation of the residuals is not randomly distributed around 0.
- Independence: We assume that people include in this study were not related in any way. It is likely that with a sample this large that some people are related but no additional information was given to verify relatedness.
Perhaps a bit surprising after discussion, those who have children showed no statistically significant difference in the level of life satisfaction comparing to those who do not have children while controlling for their income. It could be the case that having children will make people happier, however, at the same time, it will also increase the emotional burden (Angeles, 2009). This could explain why having children did not have a statistically significant impact on life satisfaction. However, this could also be the result of not controlling for enough variables, which will be further addressed in the weakness section.
On average, people with different family income levels showed statistically significant different levels of life satisfaction, while controlling for the presence/absence of children. What's more, a notable finding is that: for people who don't have children, the lower income groups were no differences in mean life satisfaction compared to the high income group. While for people who have children, the lowest income groups were significantly different in mean satisfaction compare to other high income groups. This result is expected, previous empirical research has suggested that richer people are often “happier” people (Ortiz-Ospina, et al,2013). Individuals with higher incomes are not only able to purchase a greater variety and amount of goods and services, but also better able to raise children financially.
Although the no-children family with low income were no differences in mean life satisfaction, with the birth of children, higher family income seems to play an important role to break the barrier such that leading to a higher level of life satisfaction.
# Weaknesses
The work might have some drawbacks as follows. First of all, we have employed various statistical methods to verify the necessary assumptions for our models. It appears that the variation of the residual is not randomly distributed around 0. Therefore, the results may need to be validated again in the future since the assumption of the model might not be fully held.
Secondly, the goal of this study is to study variables that contribute to life satisfaction, however, we only used the Canadian GSS data from 2017. So further study can be conducted using the Canadian GSS survey from other years. If we observe similar results from other datasets, then it makes our conclusion more reliable. Moreover, we do not know if our result from studying the Canadian population would translate well to other parts of the world or not. Income and the presence of children might impact life satisfaction differently in other countries. For example, intuitively, we might expect the presence/absence of children to have a greater impact on life satisfaction in countries with low birth rates, such as Japan.
Thirdly, our research might have omitted variable bias problems. Intuitively speaking, apart from income and having children or not, there are many other variables such as education, sex, religion, etc. that could potentially have a significant impact on life satisfaction. By omitting these variables we have induced bias into our estimated coefficients, it might also affect the significance of the variables we included. This is because the predictor variables are no longer independently distributed with the error term, leading to invalid inferences.
# Next Steps
The results may need to be validated again since the homogeneity of variance assumption of the model might not be held. In the next step, we have a few considerations:
- Find a transformation to deal with the unconstant variance. For example, applying the variance-stabilizing transformation to the response variable(y) or Box-Cox transformations to achieve constant variance(STAT:5201, University of IOWA).
- Applying multinomial regression and comparing the results with the current results.
- More rigorous statistical methods should be used when selecting relevant independent variables. The omitted variable bias problem was discussed earlier, however, including too many independent variables will lead to a multi-collinearity problem and will over-complicate the problem. Multi-collinearity will increase the standard error of our estimates, so there is a tradeoff between omitted variable bias and multi-collinearity. We can potentially look at the Akaike information criterion or the Bayesian information criterion to come up with a better set of relevant independent variables for this study.
## References
Angeles, L. (2009). Children and Life Satisfaction. Journal of Happiness Studies, 11(4), 523-538. doi:10.1007/s10902-009-9168-z
Antaramian, S. (2017). The importance of very high life satisfaction for students' academic success. Retrieved October 19, 2020, from https://www.tandfonline.com/doi/abs/10.1080/2331186X.2017.1307622
Bellet, C., Neve, J. D., & Ward, G. (2019). Does Employee Happiness Have an Impact on Productivity? SSRN Electronic Journal. doi:10.2139/ssrn.3470734
Frost, J., Tamanani, R., Hedvat, J., Bumadian, I., Manfredo-Thomas, A., Padua, A., . . . Siddiqui, A. (2017, June 13). Regression Tutorial with Analysis Examples. Retrieved October 19, 2020, from https://statisticsbyjim.com/regression/regression-tutorial-analysis-examples/
Government of Canada, S. (2017, February 27). The General Social Survey: An Overview. Retrieved October 19, 2020, from https://www150.statcan.gc.ca/n1/pub/89f0115x/89f0115x2013001-eng.html
Government of Canada, S. (2020, October 17). Statistics Canada: Canada's national statistical agency. Retrieved October 19, 2020, from https://www.statcan.gc.ca/eng/start
Haybron, D. M. (2001). 3. Life satisfaction. Happiness, 31-41. doi:10.1093/actrade/9780199590605.003.0003
Lee, E. (2019, May 29). Ordinal Logistic Regression on World Happiness Report. Retrieved October 19, 2020, from https://medium.com/evangelinelee/ordinal-logistic-regression-on-world-happiness-report-221372709095
Ortiz-Ospina, E., & Roser, M. (2013, May 14). Happiness and Life Satisfaction. Retrieved October 19, 2020, from https://ourworldindata.org/happiness-and-life-satisfaction
One-way ANOVA Part 1. (n.d.). Retrieved October 19, 2020, from https://vault.hanover.edu/~altermattw/courses/220/spss/oneway/one-way-stats1.html
Meng, F. (2019). STA303 notes. Toronto, Ontario: University of Toronto.
Mean, P. (n.d.). Regression and Anova. Retrieved October 19, 2020, from http://www.pmean.com/10/Interesting2010.html
Pitman, E. J. (1979). Some basic theory for statistical inference. London, UK: Chapman and Hall.
STAT:5201 Applied Statistics II. (n.d.). Retrieved October 19, 2020, from http://homepage.divms.uiowa.edu/~rdecook/stat5201.html
Technology, A. (n.d.). Data Centre. Retrieved October 19, 2020, from http://dc.chass.utoronto.ca/myaccess.html
Software used in producing the report: Rstudio
Packages used in producing the report:
readr: Read Rectangular Text Data
dplyr: A Grammar of Data Manipulation
FrF2: Fractional Factorial Designs with 2-Level Factors
kableExtra: Construct Complex Table with 'kable' and Pipe Syntax
ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics
Retrieved October 19, 2020, from https://cran.r-project.org/web/packages/available_packages_by_name.html
## Appendix
```{r, echo=TRUE, results="show"}
#Section 1:
children_mean <- tapply(feelings_life,Children,mean)
children_mean
var.test(feelings_life~Children)
t.test(feelings_life~Children, var.equal=FALSE) #Welch Two Sample t-test
```
```{r, echo=TRUE, results="show"}
#Section 2:
income_mean <- tapply(feelings_life,income,mean)
income_mean
anova(lm(feelings_life~income))
TukeyHSD(aov(feelings_life~as.factor(income)))
```
```{r, echo=TRUE, results="show"}
#Section 3:
CE_mean <- tapply(feelings_life,CE,mean)
CE_mean
tapply(feelings_life,CE,var)
t.test(feelings_life~Children, var.equal=FALSE)
lm(feelings_life~CE)
anova(lm(feelings_life~CE))
TukeyHSD(aov(feelings_life~CE))
```
```{r, echo=TRUE, results="show", fig.width=7, fig.height=5}
#Section 4:
par(mfrow=c(2,2))
lm_CE <- lm(feelings_life~CE, newdata1)
plot(lm_CE)
```