forked from laobubu/jekyll-theme-EasyBook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
louismozart_teyou_2SML1.tex
341 lines (241 loc) · 10.6 KB
/
louismozart_teyou_2SML1.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Template for AIMS Rwanda Assignments %%% %%%
%%% Author: AIMS Rwanda tutors %%% ### %%%
%%% Email: tutors2017-18@aims.ac.rw %%% ### %%%
%%% Copyright: This template was designed to be used for %%% ####### %%%
%%% the assignments at AIMS Rwanda during the academic year %%% ### %%%
%%% 2017-2018. %%% ######### %%%
%%% You are free to alter any part of this document for %%% ### ### %%%
%%% yourself and for distribution. %%% ### ### %%%
%%% %%% %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%% Ensure that you do not write the questions before each of the solutions because it is not necessary. %%%%%%
\documentclass[12pt,a4paper]{article}
%%%%%%%%%%%%%%%%%%%%%%%%% packages %%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{amsfonts}
\usepackage{graphicx}
\usepackage{wasysym}
\usepackage[all]{xy}
\usepackage{tikz}
\usepackage{verbatim}
\usepackage[left=2cm,right=2cm,top=3cm,bottom=2.5cm]{geometry}
\usepackage{hyperref}
\usepackage{caption}
\usepackage{subcaption}
\usepackage{psfrag}
\usepackage{float}
\usepackage{mathrsfs}
%%%%%%%%%%%%%%%%%%%%% students data %%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\student}{Louis Mozart Kamdem}
\newcommand{\course}{LaTex}
\newcommand{\assignment}{1}
%%%%%%%%%%%%%%%%%%% using theorem style %%%%%%%%%%%%%%%%%%%%
\newtheorem{thm}{Theorem}
\newtheorem{lem}[thm]{Lemma}
\newtheorem{defn}[thm]{Definition}
\newtheorem{exa}[thm]{Example}
\newtheorem{rem}[thm]{Remark}
\newtheorem{coro}[thm]{Corollary}
\newtheorem{quest}{Question}[section]
%%%%%%%%%%%%%% Shortcut for usual set of numbers %%%%%%%%%%%
\newcommand{\N}{\mathbb{N}}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\C}{\mathbb{C}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%555
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%% title page %%%%%%%%%%%%%%%%%%%%%%%%%%
\thispagestyle{empty}
\begin{center}
\textbf{AFRICAN INSTITUTE FOR MATHEMATICAL SCIENCES \\[0.5cm]
(AIMS RWANDA, KIGALI)}
\vspace{1.0cm}
\end{center}
%%%%%%%%%%%%%%%%%%%%% assignment information %%%%%%%%%%%%%%%%
\noindent
\rule{17cm}{0.2cm}\\[0.3cm]
Name:\student \hfill Assignment Number: 1\\[0.1cm]
Course: SML \hfill Date: \today\\
\rule{17cm}{0.05cm}
\vspace{1.0cm}
\textbf{\textit{Exercice 4}}
\begin{enumerate}
\item Upper triangular pairwise scatterplot of the data:
\begin{figure}[H]
\centering
\includegraphics[height=5cm,width=12cm]{scatterplot}
\caption{Upper triangular Scatterplot}
\end{figure}
From the above picture, the scatter plot of the response variable(score) with the variables motheriq, read and count tend to a straigh line means that those variables among all the other variables are the most correlated to the response variable.
Now to see the most correlated variable we can made a correlation table with those four variables which yield:
\begin{table}[ht]
\centering
\caption{Correlation table between the most strongly correlated variables}
\begin{tabular}{rrrrr}
\hline
& score & count & motheriq & read \\
\hline
score & 1.00 & 0.54 & 0.57 & 0.53 \\
count & 0.54 & 1.00 & 0.02 & 0.91 \\
motheriq & 0.57 & 0.02 & 1.00 & -0.04 \\
read & 0.53 & 0.91 & -0.04 & 1.00 \\
\hline
\end{tabular}
\end{table}
From the above table, we clearly identify that the most correlated variable to the response variable is motheriq
\item Correlation matrix for the data.
\begin{figure}[H]
\centering
\includegraphics[height=5cm,width=12cm]{corelation_matrix}
\caption{Correlation matrix}
\end{figure}
From the correlation, we see that the motheriq is the most correlated variable with the score
\item Plot of the histogram and test of normality
We have the plots below:
\begin{figure}[H]
\centering
\begin{subfigure}[b]{0.4\textwidth}
\centering
\includegraphics[height=5cm,width=5cm]{hist}
\end{subfigure}
\begin{subfigure}[b]{0.4\textwidth}
\centering
\includegraphics[height=5cm,width=5cm]{hist_norm}
\end{subfigure}
\caption{Histogram}
\end{figure}
The second subfigure above show that the distribution of the response variable tend to follow a normal distribution. We can have the confirmation by doing a shapiro-wike test of normality on R which yield a p-value of 0.76 which is greather than 0.05 meaning that the output follow a normal distribution.
\item Performing the simple linear regression model
The most important variable to fit a simple linear model with the score is motheriq. The table below give the summary of the model.
\begin{table}[ht]
\centering
\begin{tabular}{rrrrr}
\hline
& Estimate & Std. Error & t value & Pr($>$$|$t$|$) \\
\hline
(Intercept) & 111.0930 & 11.8567 & 9.37 & 6.02e-11 \\
motheriq & 0.4066 & 0.1002 & 4.06 & 0.0003 \\
\hline
\end{tabular}
\caption{SLR model}
\end{table}
\item generation of the 4 residuals plots
\begin{figure}[H]
\centering
\begin{subfigure}[b]{0.4\textwidth}
\caption{Linearity}
\centering
\includegraphics[height=5cm,width=5cm]{1}
\end{subfigure}
\begin{subfigure}[b]{0.4\textwidth}
\caption{Normality}
\centering
\includegraphics[height=5cm,width=5cm]{2}
\end{subfigure}
\centering
\begin{subfigure}[b]{0.4\textwidth}
\caption{Homoskedasticity}
\centering
\includegraphics[height=5cm,width=5cm]{3}
\end{subfigure}
\begin{subfigure}[b]{0.4\textwidth}
\caption{Outliers}
\centering
\includegraphics[height=5cm,width=5cm]{4}
\end{subfigure}
\end{figure}
We Observe from the plot below that the linearity asumption doesn't meet also the homoskedasticity doesn't meet well but is acceptable. From the outliers, we see that there is some observations which badly influence the estimations of our parameters these include the observations 4,6,13,15,24,20
\item For the we are going to use the model above, we see in the table of summary of the model that the estimated slope is 0.4066 this mean motheriq and score are positively correlated and when the motheriq increase from one unit, we have the score which increase of 0.4.
\item Confidence bands and prediction bands
\begin{figure}[H]
\centering
\includegraphics[height=5cm, width=12cm]{6}
\caption{Confident band and Interval band}
\end{figure}
With the figure above, we observe that our model indicate with 95 percent of confident that a mother iq of around 129 will give a score of around 160. Also the fitted points tend to follow the straight blue line which confirm the assumption of normality the response variable.
\item Multiple linear regression
\begin{table}[ht]
\centering
\begin{tabular}{rrrrr}
\hline
& Estimate & Std. Error & t value & Pr($>$$|$t$|$) \\
\hline
(Intercept) & 75.5085 & 24.0262 & 3.14 & 0.0039 \\
fatheriq & 0.2525 & 0.1376 & 1.84 & 0.0771 \\
motheriq & 0.4001 & 0.0729 & 5.49 & 7.33e-06 \\
speak & 0.1876 & 0.1477 & 1.27 & 0.2143 \\
count & 0.2065 & 0.2663 & 0.78 & 0.4446 \\
read & 7.5441 & 5.5864 & 1.35 & 0.1877 \\
edutv & -4.2024 & 2.2450 & -1.87 & 0.0717 \\
cartoons & -3.3390 & 2.0181 & -1.65 & 0.1092 \\
\hline
\end{tabular}
\end{table}
With this model, we can see that outside of the motheriq variable, all the other variables are useless because their p-value is greather than 0.05
Comparing this with our first model using anova and aic show that
\begin{table}[H]
\caption{Comparison between model and model1}
\begin{subtable}{.5\linewidth}
\centering
\caption{AIC}
\begin{tabular}{rrr}
\hline
& df & AIC \\
\hline
model1 & 9.00 & 179.65 \\
model & 3.00 & 203.27 \\
\hline
\end{tabular}
\end{subtable} %
\begin{subtable}{.5\linewidth}
\centering
\caption{ANOVA}
\scalebox{0.7}{ \begin{tabular}{lrrrrrr}
\hline
& Res.Df & RSS & Df & Sum of Sq & F & Pr($>$F) \\
\hline
1 & 34 & 505.47 & & & & \\
2 & 28 & 187.90 & 6 & 317.57 & 7.89 & 4.931e-05 \\
\hline
\end{tabular}}
\end{subtable}
\end{table}
We observe on the table of AIC that the model with multiples variables have small AIC that the simple linear model also in the table of ANOVA we have a very small p-value meaning that the model with more features is the best one to choose.\\
\end{enumerate}
\textbf{Conclusion:} By conclusion, we can retain from the analysist of this dataset that the intelligence of a children reflect the intellligence of his mother. But all the other variables considered in this dataset play and important role in the determination of the intelligence level of a children\\\\
\textbf{\textit{Exercice 2}}
\begin{enumerate}
\item History and description of the data
The Microarray Gene Expression dataset have be collected to identify the prostate cancer on patients. In our case, we have the data on 79 patients (observations) and 500 variables. The output of the dataset is Y $\in\{0,1\}$ meaning we are in the case of pattern recognition or classification learning. In the case of Y = 0, we have a patient who is negative to the cancer prostate an when Y=1, the patient is positive to the cancer prostate
\item Plot of the distribution of the response
\begin{figure}[H]
\centering
\includegraphics[height=7cm,width=5cm]{5}
\caption{Distribution of the response variable}
\end{figure}
With the above barplot, we can see that among the 79 patients, 39 patients are not carriers of the disease while 40 of them carry.
\item Comment on the shape and the dimensionnality
Using the command `dim' on R, we see that the data have 500 variables(Columns) against 79 observations(rows). since $79\lll500$ we are in the case of ultra-hight dimensionsional system or overdetermined system.
\item Our comments is based on theobservation of 8 variables. For those variables, we have the box plot and plot below:
\begin{figure}[H]
\centering
\begin{subfigure}[b]{0.4\textwidth}
\centering
\includegraphics[height=5cm,width=7cm]{8}
\caption{Boxplot}
\end{subfigure}
\begin{subfigure}[b]{0.4\textwidth}
\centering
\includegraphics[height=5cm,width=8cm]{9}
\caption{Scatter plot}
\end{subfigure}
\end{figure}
We can observe from the scatter plot above that many of those random variables are highly correlated. In the case of these variable, we can see a negatif median.
\end{enumerate}
\end{document}