-
Notifications
You must be signed in to change notification settings - Fork 1
/
theory.tex
2736 lines (2360 loc) · 130 KB
/
theory.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%
% theory.tex - OpenPave Theory
%
% The contents of this file are subject to the Academic Development
% and Distribution License Version 1.0 (the "License"); you may not
% use this file except in compliance with the License. You should
% have received a copy of the License with this file. If you did not
% then please contact whoever distributed this file too you, since
% they may be in violation of the License, and this may affect your
% rights under the License.
%
% Software distributed under the License is distributed on an "AS IS"
% basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
% the License for the specific language governing rights and
% limitations under the License.
%
% The Initial Developer of the Original Software is Jeremy Lea.
%
% Portions Copyright (C) 2006-2008 OpenPave.org.
%
% Contributor(s): Jeremy Lea <reg@openpave.org>.
%
\documentclass[11pt,twoside,letterpaper]{optech}
\settrimmedsize{11in}{210mm}{*}
\setlength{\trimtop}{0.38197\stockheight-0.38197\paperheight}
\setlength{\trimedge}{0.5\stockwidth-0.5\paperwidth}
\settypeblocksize{9in-2em}{105mm}{*}
\setulmargins{*}{*}{1.618}
\setheadfoot{1.5em}{2.5em}
\setheaderspaces{*}{1.5em}{*}
\setlrmargins{20mm}{*}{*}
\setmarginnotes{1em}{65mm-1em}{1em}
\checkandfixthelayout
\setlength{\headwidth}{\textwidth}
\addtolength{\headwidth}{\marginparsep}
\addtolength{\headwidth}{\marginparwidth}
\mathindent=0em
\aliaspagestyle{chapter}{opchap}
\chapterstyle{optech}
\pagestyle{optech}
\ifpdf
\pdfinfo {
/Title (OpenPave.org Theory)
/Subject (The theory behind the OpenPave.org software)
/Author (Jeremy D. Lea)
/Keywords (Pavement design; Layered elastic theory)
}
\fi
\usepackage{lscape,rotating,flafter}
\usepackage[absolute]{textpos}
\usepackage{emp}
\empTeX{\documentclass[11pt]{optech}}
\empprelude{
input rboxes;^^J
input latexmp;^^J
setupLaTeXMP(mode=normal,textextlabel=enable,^^J
class="optech",options="11pt");^^J
latexmp_prepend:="\fontfamily{\sfdefault}\fontsize{6}{10}\selectfont ";^^J
def resetattachstrings_latexmp = "" enddef;^^J
input optech.mp;^^J
}
% Bibliographic reference style.
\usepackage[round,comma,authoryear,sort&compress]{natbib}
\usepackage[nolist,nohyperlinks]{acronym}
%\DoubleSpacing
\newcommand*{\OP}{\textsc{OpenPave.org}}
\newcommand*{\Fortran}{\textsc{Fortran}}
\newcommand*{\CC}{C\nolinebreak\hspace{-.06em}\raisebox{.5ex}{\tiny\textbf
+}\nolinebreak\hspace{-.09em}\raisebox{.5ex}{\tiny\textbf +}}
\DeclareMathOperator{\sgn}{sgn}
\DeclarePairedDelimiter\abs{\lvert}{\rvert}
\DeclarePairedDelimiter\deter{\lvert}{\rvert}
\DeclarePairedDelimiter\norm{\lVert}{\rVert}
\begin{document}
\begin{empfile}
\begin{titlingpage}
\begin{adjustwidth}{0pt}{\textwidth-\headwidth}
{\color{opgreen} \hrule width \headwidth height 2pt }
\setlength{\TPHorizModule}{\headwidth}
\setlength{\TPVertModule}{\textheight}
\textblockorigin{\spinemargin+\trimedge}{\uppermargin}
\begin{textblock}{0.95}[1,0](1,0.12)
\begin{flushright}
{\Huge \textbf{The theory behind \OP\ software} }
\end{flushright}
\end{textblock}
\begin{textblock}{0.5}[1,0](1,0.4)
{\color{opgreen} \hrule height 2pt } \vspace{5pt}
\begin{flushright}
{\large \textbf{An introduction to the mathematics of pavement analysis and design} }
\end{flushright}
\end{textblock}
\begin{textblock}{0.5}[1,0](1,0.5)
{\color{opgreen} \hrule height 2pt } \vspace{5pt}
\begin{flushright}
{\large \textbf{Jeremy D. Lea} }
\end{flushright}
\end{textblock}
\begin{textblock}{0.5}[0,1](0,0.99)
\begin{flushleft}
{\tiny \copyright\ 2006-2007 \OP. All Rights Reserved.}
\end{flushleft}
\end{textblock}
\begin{textblock}{0.5}[1,1](1,0.99)
\begin{flushright}
{\large \textbf{\today} }
\end{flushright}
\end{textblock}
\begin{textblock}{0.5}[0,0](0,0.4)
\noindent
\includegraphics[width=0.498\headwidth,clip=true,viewport=46mm 92mm 188mm 264mm]{theory-title.pdf}
\end{textblock}
\vspace{\stretch{1}}
{\color{opgreen} \hrule width \headwidth height 2pt }
\end{adjustwidth}
\end{titlingpage}
\frontmatter
\setsecnumdepth{section}
\settocdepth{section}
\tableofcontents
%\listoffigures
%\listoftables
\mainmatter
\chapter{Introduction}
\OP\ is committed to supplying the best possible open source software for
pavement engineering. To that end it is important that the theoretical and
mathematical foundations underlying that software are well understood, and
that the mathematics are accurately and faithfully translated into code.
Thus, this document outlines the mathematics of the software and how this is
carried down to level of source code. The source code makes reference to
this document rather than extensive comments, since it is hard to express
complex mathematics in comments. However, this document is not just for
people looking at the source code---it should also serve as an excellent
reference for anyone looking to understand the more theoretical aspects of
pavement engineering, and of some associated fields.
This document is written in \AmS -\LaTeX, and is licensed under the same
conditions as the source code, the \ac{ADDL}. You are, therefore, free to
make alterations to this document and to use it as course notes or for other
purposes, provided that you distribute any changes to you make. Please
review the conditions in the license for more information.
\section{Typographical conventions}
Because this document covers a number of diverse fields, there is some need
to standardize the typographical conventions. As a result there are a number
of places where this document does not follow the `normal' conventions for a
particular field. However, since there is already considerable variety in
the various fields, it is hoped that readers will be able to follow without
having to completely re-wire their brains.
While most of the mathematical conventions will be introduced at the
appropriate time in the text, the following general conventions are used
within the general text of this document:
\begin{description}
\tightlist
\item[\normalfont\textrm{Roman}] is used for general text.
\item[\normalfont\textit{Italics}] are used for emphasis.
\item[\normalfont\texttt{Typewriter}] is used for source code.
\end{description}
Within mathematical formulas the following conventions are generally used,
although they need to be broken once in a while.
\begin{description}
\tightlist
\item[\normalfont$\mathrm{Roman}$] is used for functions and operators.
\item[\normalfont$\mathit{Italics}$] are used for scalars.
\item[\normalfont$\mathbf{Bold}$] is used for matrices.
\item[\normalfont$\mathtt{Typewriter}$] is used for pseudo-mathematics in code.
\item[\normalfont$\mathsf{Sans-serif}$] is used for tensors.
\end{description}
\chapter{Numbers and Computers}
While it might seem a little strange to start at such a basic level as
numbers for a document dealing with pavements, there are few pavement
engineers who have a solid background in computer programming. In addition,
there are a number of places in common terminology where words are over used
and so this section outlines some of the more basic concepts underlying \OP's
code.
\section{Collections of Objects}
Underlying much of what goes on in both mathematics and computers is the idea
of collections of some type of abstract object. These collections are given
different names depending on their properties, and it is necessary to
understand the differences to avoid using the wrong collections. However, we
do not have the space to develop these concepts in detail, so you may wish to
consult a more general reference.
Mathematicians, Computer Scientists and Database designers use the same terms
to refer to collections of objects, although their definitions often differ
considerably. Here we will be detailing those differences, and refining the
use of terms used later within the text. In computer science the type of
collection is referred to as a container.
The objects which can be collected are as varied as your imagination ---
anything which you can name can be an object. We will discuss objects in
more detail in a later section. However, a distinction must be drawn between
collections of like objects and dislike objects. In mathematics, collections
normally only contain like objects and in this document collections of like
objects will be referred to as sets. Collections of unlike objects will be
referred to as tuples. The objects within a collection are referred to as
elements.
\subsection{Sets}
Sets are one of the most fundamental concepts in mathematics and, in fact, all
of the mathematics which we will deal with can be carefully defined in terms
of sets and operations on those sets. There is a very large branch of
mathematics which deals with Set Theory, but here we will only need to
consider na\"ive set theory, which is what most people have learned in school.
A set is a collection of objects, with no particular order and no repeated
objects (or rather repeat objects are ignored). In normal mathematical
notation sets are denoted by `curly braces', such as $\{0,$ $1,$ $2\}$ or
$\{\textrm{apple}\}$. If we denote the heads and tails outcomes of tossing
a coin as $H$ and $T$ respectively, then, because sets do not contain repeated
items, the sets $\{H,$ $T,$ $T,$ $T,$ $H\}$ and $\{H,$ $T\}$ are equivalent.
Similarly the sets $\{1,$ $2,$ $3,$ $3,$ $2,$ $1\}$ and $\{1,$ $2,$ $3\}$ are also
equivalent.
In mathematics, sets can be countably infinite (meaning that the items in
the set can be differentiated from one an other and not split into smaller
items, and can thus be counted, but that there are an infinite number of
items in the set), or they can uncountable (meaning that the items in the
set can always be split into smaller items), or the set can be finite. In
computer science sets are always finite, since computers don't like dealing
with infinite numbers. No programming languages directly implement the
mathematical concept of a set, since it is too general to be handled by
computer code, although some languages have containers which are called sets
within the documentation of that language.
\subsection{Multisets}
In mathematics a multiset is a set which can contain repeated items. In
computer science the same concept is often referred to as a bag. Thus if you
were to toss a coin five times and get the results $\{H,$ $T,$ $T,$ $T,$
$H\}$, then toss it a further five times and get $\{T,$ $H,$ $T,$ $H,$ $H\}$
then these would be different multisets. However, the multisets
$\{T,$ $H,$ $T,$ $H,$ $H\}$ and $\{T,$ $T,$ $H,$ $H,$ $H\}$ are equivalent.
Multisets are also denoted by `curly brackets', but can be distinguished by
having repeated items. In mathematics, multisets are normally ordered, as
discussed below, since there is little reason for their use if they are
unordered.
\subsection{Ordered Multisets (Arrays)}
In mathematics a tuple is an ordered multiset. Thus if you are tossing a
coin looking for the first heads the multisets $\{T,$ $H,$ $T,$ $H,$ $H\}$ and
$\{T,$ $T,$ $H,$ $H,$ $H\}$ are different (if they represent the order in
which the tosses occurred). Tuples form the basis of vectors, tensors and
matrix algebra, which we will discuss in a short while. One should not
confuse the fact that the set is ordered with it being sorted. A sorted set
is one in which the elements have been reordered according to some rule. If
one encounters a multiset in mathematics it is generally ordered.
In computer science tuple is an over-used term, which can mean an ordered
multiset (although the term array is normally used for these) or it can mean
an ordered set of unlike objects. It is also used in this second sense by
database designers. Since most of the readers of this document will be more
familiar with the term array, we will use that for ordered multisets, and
restrict the use of tuple to the database sense, which will be discussed
below.
Arrays are normally denoted using `square brackets', so the first ordered
multiset above should have been written as $[T,$ $H,$ $T,$ $H,$ $H]$. Items
in an ordered multiset are also distinguished by a subscripted index, such as
$a_i$. Most computer programming languages implement arrays of some form.
However, these can be distinguished from the mathematical concept of a ordered
multiset in two ways: they are never infinite, and they often have more than
on index. The first issue is seldom a problem since infinite quantities are
not handled well by computers in general, and so must be handled explicitly
in the programming. The second is really a convenience for a `set of sets',
which can be handled in a number of different ways.
The terms vector and matrix are also heavily over-used, and we will narrow
their definitions in a later section.
\subsection{Ordered Sets (Lists)}
Although in mathematics it is traditional to write sets in their natural
sorted order, there are only a few places where ordered sets are used in
mathematics --- most mathematical formulations are actually independent of
the order of the sets. However, they are much more common in computer science
where they are normally referred to as lists, or sometimes, unique lists.
\subsection{Sets of Unlike Objects (Tuples)}
In mathematics sets generally contain objects defined from some domain, which
is normally one of the very early assumptions in various proofs or
definitions. The need to handle unlike objects is seldom encountered, and so
mathematicians have taken to using the term tuple to ordered multisets (or
arrays). The term vector is also often used for the same concept.
However, in computers there are many instances where pieces of information
are logically grouped but of distinct types. The classic example is
someone's name, address and telephone number. These are all objects, in the
general sense, but not of any meaningful value in mathematics. However, they
must be stored in a set. We will use the term tuple for these types of sets.
Sometimes the data stored in a tuple can also be represented by an array (an
ordered multiset), such as coordinates for a point, which are all real
numbers. However, the ordering is important --- so important that one can
argue that although they look the same, they are very different things.
\subsection{Maps and Structures}
An ordered set or multiset has a natural index into the items --- first,
second, etc. It is conceptually simple to expand that concept to using more
abstract indices (or keys). These types of sets are referred to as maps
(which must not be confused with a mapping, which will be addressed later) or
sometimes as keyed sets. However, this becomes quite difficult using
standard mathematically notation, and maps are not often used in mathematics.
Often in computer science the key and value pair from a map will be referred
to as a tuple, since they are unlike objects with a particular order. Within
this document the term keyed set will be reserved for sets where the key is a
component of the value and the term map reserved for sets where the key is a
not a component. However, one can always convert directly between these two
representations, by either defining a keyed set which contains tuples of the
keys and values from a map, or by splitting the tuples in a keyed set into
seperate keys and values and defining a map from these.
By nature the key must be unique (since otherwise one cannot find the items
--- one cannot have two first items in a list), and so a keyed set is
naturally not capable of being a multiset, while a map could have multiple
equal values and thus be a multiset.
One major use of maps in computer science is to provide names for the
different slots in a tuple (since a tuple stores unlike objects). In this
way one can handle unordered tuples. All modern computer languages also
provide the concept of structures, which are an ordered tuple with named
members. A map and a structure are logically completely equivalent and some
computer languages allow one to convert easily between the two. However, the
members of a structure are normally determined in advance, while a map could
conceivably hold an infinite number of elements.
\section{Containers}
The various sets listed above can all be implemented in a computer in a
number of different ways depending on the characteristics of the process
being modelled by the software. These go beyond what is normally required
for mathematics, although some are used in various branches of operations
research.
\subsection{Arrays}
\section{Numbers}
As with sets, numbers are more complicated than they appear, especially in
computers. Most people are comfortable with integers and real numbers,
because they have learned these in school. Engineers typically have a better
grasp of numbers, including rational/irrational numbers, complex numbers and
others. However, engineers, like most people, typically have a poor grasp of
how computers do math.
In math, numbers can be broken into four broad groups: integers, rational, real,
and complex numbers. Computers, on the other hand, only handle two kinds of
numbers: constrained integers and constrained rational numbers, and it is thus
up to software to handle the edge cases where there is a difference between the
theory and the code.
\section{Vectors and Tensors}
The word `vector' is one of the most overused words in math and programming.
It might be used to describe an array, a tuple, a matrix or a first-order
tensor, and as a result is not often informative. Since we have other terms
available, it is best to think of vectors as first-order tensors.
\section{Linear Algebra}
\chapter{Statistics}
While reliability is a fairly simple concept, there are many different
approaches to calculating it, not all of which provide a consistent, unbiased
analysis. In this section, a short mathematical background to the approach
used in \OP\ is given. The approach is an adaptive importance sampling
algorithm, based on assumed distributions for the design parameters.
\section{Statistical Terminology}
Often the terminology used in statistical analysis can be confusing, and
unless the reader understands the implications of the words being used, they
can often loose track of some of the subtleties of the problem. If the
reader is well versed in statistics, then they probably want to skip this
section. Readers who have little or no background in statistics will
certainly want to consult an entry level statistics text before continuing.
\subsection{Definition of Probability}
The three basic building blocks of statistics are variables, events and
probabilities. A variable is a number, which in general can take on a value
within some range, but has a fixed value when actually evaluated. Variables
can either be discrete (integer) or continuous (real) numbers. There are
three basic types of variables: constants, parameters and random variables.
A constant is, fairly obviously, a constant number whose value is known
precisely (normally fairly rare in engineering). A parameter is a variable
whose value has been determined based on past data, and a random variable is,
well, random --- when measured it could take on any value within it's range.
However, a random variable can be observed (under a given state of the
system) and the observed value of a random variable is fixed and is known as
an observation. Observations are often in pairs or sets of random variables
observed under the same state of the system.
Random variables can be classified as exogenous and endogenous, which refer
to variables that explain the randomness in the system (hence they are
also called explanatory variables) and variables which are explained by the
randomness in the system. Thus, when a random variable is modelled as a
function of other random variables (i.e. $y=f(\mathbf{x})$), the
$\mathbf{x}$ variables are exogenous, and the $y$ variable endogenous. The
value of $y$ might be observable, but the randomness this value is assumed
to be explained entirely by the randomness in the explanatory variables.
An event occurs when the observed value of one or more random variables meet
some criteria. Thus an event is always an endogenous Boolean random
variable, which can be represented as an indicator variable. The act of
observing an event, to determine an outcome, is a trial.
There are different kinds of randomness in any analysis system. The first
type is inherent randomness --- things that cannot be control
\textit{mathematically}, such as layer thickness, material properties or
wheel loads. While these can be controlled physically, they are represented
by random variables whose variability cannot be reduced mathematically. The
second kind are errors: These are problems while observing the random
variable which result in the observation being different to the real value.
These errors can be random --- they have some variance but zero mean, or
systematic --- they have a non-zero mean. The randomness can also be due to
a lack of data --- the more observations which have been made, the more is
known about the system. The first two kinds of randomness tend to affect
the individual outcomes, while the third tends to affect the parameters.
Any parameters in the model are endogenous, because, although they might be
constants, they are unknown, and we have to approximate their values using a
random variable. Thus parameters are treated as constants in some
situations and random variables in others.
Random variables in the same set, measured under the same state, can be
dependent or independent. If they are dependent, then the value of one of
the variables tells you something about the value of another. This is also
known as correlation, although this terms has a more specific meaning. If a
random variable, measured under some state, can be used to predict the value
of the same variable under another state, then the system is said to be
auto-correlated.
The theory of probability is based on the theory of sets of events. While a
probability is a numerical value, it is not a variable, but relates to the
chance of certain event being observed. The classical definition of
probability relates to the expected number of times that any event is
observed, in an infinite number of trials, and thus only relates to past
observations. The Bayesian definition of probability, which is more useful
in engineering, is defined by the probability that the observer expects to
see an event occur in the next trial. Given that a trial is performed, the
outcome must be one or more events. Thus if we are watching for a single
event, and it did not occur, then it's compliment must have occurred. Thus
the set operators: compliment, union and intersection can be used to define
the probabilities of combinations of events.
Probabilities have some special properties: they can only take on a value
from 0 to 1, inclusive; the probability of observing an event is always one
minus the probability of not observing the same event and, given a set of
mutually exclusive events covering all outcomes, the sum of the probability
of all of these events is one.
Confidence is a probability value which is chosen by the observer, rather
than being calculated, and is always used in reference to some error or
interval. Thus the statement `The thickness should be $150\pm10$~mm at
$95\%$ confidence' is correct, while the statement `The thickness should be
$150\pm10$~mm with a $95\%$ probability' is not. After observing the
thickness we may be able to state that `The thickness is $150\pm10$~mm with
$98.2\%$ probability', which leads to the concept of hypothesis testing.
The first statement is a hypothesis, while the last is a fact. If the facts
agree with the hypothesis, then we conclude that the hypothesis is correct
(although there is a chance that we reject a true hypothesis or accept one
which is not true).
Based on these groupings of variables and probabilities, there are three
main groupings of statistical analysis, although they overlap somewhat. The
first is the analysis of the probabilities of random variables, known as
probability modelling. The second is the analysis of endogenous random
variables as functions of exogenous random variables, known as statistical
modelling, and the third is the testing of hypotheses, known as statistical
inference, which is determining whether the probability that an event might
be observed compared to the confidence with which we wish it to be observed.
\subsection{Conditional Probability}
One of the most powerful ideas in probability theory is the idea of
conditioning. Once we have observed one random variable it can tell us
something of the probability of observing a particular value for another
variable. For example, if we observe that it is raining, then we would
except to observe than the temperature is below average. If we have the
events $A$ and $B$, then we define the conditional probability of observing
$A$ given that we have already observed $B$ (and therefore that the
probability of observing $B$ is greater than zero), as:
\begin{equation*}
P\{A|B\}=\frac{P\{A\cap B\}}{P\{B\}}
\end{equation*}
or in other words, the probability of observing both $A$ and $B$, weighted
by the probability of observing $B$. Since probability is independent of
observation, we can assume we measured $A$ first to obtain:
\begin{equation*}
P\{A\cap B\}=P\{A|B\}P\{B\}=P\{B|A\}P\{A\}
\end{equation*}
If the probability of $A$ is independent of the outcome $B$ then we say that
the two events are independent:
\begin{equation*}
P\{A\cap B\}=P\{A\}P\{B\}=P\{B\}P\{A\}
\end{equation*}
When we are dealing with conditional probabilities, we refer to the
unconditional probability of observing an event as the marginal probability.
Given that $B$ has outcomes $1$ to $n$, then we have:
\begin{equation*}
P\{A\}=\sum_{b=1}^n{P\{A|B=b\}P\{B=b\}}
\end{equation*}
or that the total probability of observing $A$ is the sum of the
probabilities of observing $A$ conditional on all of the outcomes of $B$.
\subsection{Probability distributions}
The analysis of the probability is complicated by the fact that we cannot
treat discrete and continuous random variables with the same mathematics,
although the concepts are the same. This is because in a continuous random
variable, the probability of observing any particular value is zero, and the
probabilities have to be handled by integration over some range, while in a
discrete random variable the probability of observing a value between any of
the values is zero, and the probabilities must be handled as summations over
a discrete number of points. For this reason we talk about probability
density in continuous random variables, and probability mass in discrete
random variables.
Given a random variable $X$, we denote any particular observation as $x$.
We will begin by considering a discrete random variable, which might take on
any value in the set $A$, with $n$ elements. No matter what the actual
values in $A$, we can map these values as integers from $1$ to $n$ (for
discrete unbounded sets the same notation applies, except $n=\infty$) in the
same order as in $A$, if $A$ is an ordered set. The probability of
observing one value $a$, from $1$ to $n$ is a function of the system being
observed and given by:
\begin{equation*}
p_X\left(a\right)=P\{X=a\}
\end{equation*}
The probability that $x$ is less than or equal to $a$ is given by:
\begin{equation*}
F_X\left(a\right)=P\{X\leqslant a\}=\sum_{x=1}^a{P\{X=x\}}
\end{equation*}
These two functions are respectively known as the \ac{PMF} and \ac{CDF}.
Defining probability functions for continuous random variables is a little
more tricky, because the probability of observing a particular value is
zero. We thus define a \ac{PDF}, which is the limiting probability of
observing a value within a small increment:
\begin{equation*}
f_X\left(x\right)=\mathop{\lim}\limits_{\Delta x\to 0}
\frac{P\{x\leqslant X < x + \Delta x\}}{\Delta x}
\end{equation*}
and then define the CDF as:
\begin{equation*}
F_X\left(a\right)=P\{X\leqslant a\}=\int_{-\infty}^a{f_X\left(x\right)dx}
\end{equation*}
Note that the probability density function is a not constrained to the range
$[0,1]$. There is a unique relationship between the \ac{PMF} or \ac{PDF} and
the CDF, so that if one is known then there is one and only one possible
function will satisfy the conditions of the other. For most of the rest of
this discussion we will use the notation for continuous random variables. In
either case the CDF is a non-decreasing function of $x$, with values from
$0$ to $1$, and is defined as taking on a value of $0$ for undefined values
of $x$.
There are many different functions which satisfy the requirements of either
a \ac{PMF}, \ac{PDF} or CDF, which will not all be described here, since they are well
documented in literature. Each function takes a number of parameters
(normally one to four parameters), which are known as the distribution
parameters. For reliability analysis we are mostly concerned with the
normal and log-normal distributions. However, we will return to them after
discussing some of the general properties of distributions.
Very often we wish to know what value of a random variable corresponds to a
given probability (e.g. for what load is there a 95\% probability that a
beam will fail). This can be determined using the inverse CDF:
\begin{equation*}
x = F_X^{-1}\left(p\right) \text{ such that } p = F_X\left(x\right)
\end{equation*}
Also, in many cases we wish to know the probability that a value is greater
than some threshold, which can be calculated using the complementary CDF:
\begin{equation*}
\bar F_X\left(a\right) = P\{X>a\} = \int_a^\infty{f_X\left(x\right)dx} =
1-F_X\left(a\right)
\end{equation*}
In cases where we are numerically evaluating functions at probabilities very
close to one it is often better to use a special complementary CDF routine
because of numerical stability.
\subsection{Multivariate probability distributions}
It is also possible to develop probability distributions for more than one
random variable. These are known as multivariate or joint probability
density functions. Essentially, these are a probability density function
for a new random variable, which is the set of all possible joint outcomes
of the individual random variables. We normally use the random variables
$X$ and $Y$ for bivariate distributions and the random vector $\mathbf{X}$
for more than two random variables. While it is mathematically possible to
develop multivariate distributions for combinations of discrete and
continuous random variables the mathematics is fairly hairy and we will
avoid it. For discrete random variables the joint \ac{PMF} and CDF can be
defined as:
\begin{align*}
p_{XY}\left(x,y\right) & = P\{X=x\cap Y=y\} \\
\begin{split}
F_{XY}\left(x,y\right) & = P\{X\leqslant x\cap Y\leqslant y\} \\
& = \sum_{x_i\leqslant x}
{\sum_{y_j\leqslant y}
{p_{XY}\left(x_i,y_j\right)}}
\end{split}
\end{align*}
and using the rules for conditional probability outlined above it is
possible to develop a conditional \ac{PMF}:
\begin{equation*}
\begin{split}
p_{X|Y}\left(x|y\right) & = \frac{P\{X=x\cap Y=y\}}{P\{Y=y\}} \\
& = \frac{p_{XY}\left(x,y\right)}{p_Y\left(y\right)} \\
& = \frac{p_{XY}\left(x,y\right)}
{\sum_x{p_{XY}\left(x,y\right)}}
\end{split}
\end{equation*}
A similar scheme applies for continuous random variables:
\begin{align*}
\begin{split}
F_{XY}\left(x,y\right) & = P\{X\leqslant x\cap Y\leqslant y\} \\
& = \int_{-\infty}^y{\int_{-\infty}^x
{f_{XY}\left(x,y\right)dxdy}}
\end{split} \\
f_{XY}\left(x,y\right) & = \frac{\partial^2F_{XY}(x,y)}
{\partial x\partial y} \\
\begin{split}
f_{X|Y}\left(x|y\right) & = \frac{f_{XY}\left(x,y\right)}
{f_Y\left(y\right)} \\
& = \frac{f_{XY}\left(x,y\right)}{\int_{-\infty}^\infty
{f_{XY}\left(x,y\right)dx}}
\end{split}
\end{align*}
The extension of these formulae into $n$ dimensions is fairly obvious. It
should also be noted that the solution of the CDF requires an $n$-fold
integral over the \ac{PDF}, which, if possible, is tedious mathematically and
costly computationally.
\subsection{Partial descriptors of random variables}
While one or other of the distribution functions completely describe a
random variable, they are normally too cumbersome for general use and
reporting, and so we seek values which summarize the variable. The most
general summary is the mean, which is the value which we would expect to
take when observed, which is a weighted average of all of the possible
values:
\begin{align*}
\begin{split}
E\left[X\right] =\mu_X & = \frac{\sum_x{xp_X\left(x\right)}}
{\sum_x{p_X\left(x\right)}} \\
& = \sum_x{x p_X\left(x\right)}
\end{split} \\
E\left[X\right] =\mu_X & = \int_{-\infty}^\infty
{x f_X\left(x\right)dx}
\end{align*}
where the $E\left[\bullet\right]$ operator denotes the expected value. This
is different from the median, which is the value below which half of the
expected outcomes lie, and the mode, which is the point with the highest
probability mass or density.
Now that we know what value to expect, we are also interested in the spread
of the values. The standard deviation is the expected absolute deviation
from the mean, and is defined as:
\begin{equation*}
\sigma_X = \sqrt{Var\left[X\right]}
\end{equation*}
where:
\begin{align*}
\begin{split}
Var\left[X\right] & = E\left[\left(X-\mu_X\right)^2\right] \\
& = \sum_x{\left(x-\mu_X\right)^2
p_X\left(x\right)}
\end{split} \\
\begin{split}
Var\left[X\right] & = E\left[\left(X-\mu_X\right)^2\right] \\
& = \int_{-\infty }^\infty
{\left(x-\mu_X\right)^2 f_X\left(x\right)dx}
\end{split}
\end{align*}
For most distributions in regular use both the mean and standard deviation
are defined by some simple formula involving the distribution parameters.
It is often convenient to use the coefficient of variation (COV), which is
defined as:
\begin{equation*}
\delta_X = \frac{\sigma_X}{\abs*{\mu_X}}
\end{equation*}
If we have two random variables $X$ and $Y$ then we often would like to know
if there is any relationship between them. The simplest measure of a
relationship is determine if, given that we know the deviation of an
observation of one variable from it's mean, we can determine anything about
the deviation of the other variable from it's mean. This is known as
covariance, and is defined as:
\begin{align*}
\begin{split}
Cov\left[X,Y\right] & = E\left[\left(X-\mu_X\right)
\left(Y-\mu_Y\right)\right] \\
& = \sum_x{ \sum_y{
\left(x-\mu_X\right)\left(y-\mu_Y\right)
p_{XY}\left(x,y\right)}}
\end{split} \\
\begin{split}
Cov\left[X,Y\right] & = E\left[\left(X-\mu_X\right)
\left(Y-\mu_Y\right)\right] \\
& = \int_{-\infty}^\infty{
\int_{-\infty}^\infty{
\left(x-\mu_X\right)\left(y-\mu_Y\right)
f_{XY}\left(x,y\right)dxdy}}
\end{split}
\end{align*}
The covariance has units which are a product of the units for $X$ and $Y$,
and so it is convenient to define the dimensionless correlation coefficient:
\begin{equation*}
\rho_{XY} = \frac{Cov\left[X,Y\right]}{\sigma_X\sigma_Y}
\end{equation*}
It is possible to prove that the correlation coefficient has a range
$\left[-1,1\right]$ and takes on a value of $0$ if the variables are
independent, and $1$ or $-1$ if the variables are perfectly correlated.
\subsection{The normal and multi-normal distributions}
By far the most widely used distribution is the normal or Gaussian
distribution, the classic bell shaped curve. The normal distribution is
defined by two parameters, which happen to also be the mean and standard
deviation of any random variable which has a normal distribution. However,
we will first consider the standard normal distribution which has a mean of
zero and a standard deviation of one. The \ac{PDF} and CDF are defined (using
traditional notation, including $u$ as the standard normal random variable)
as:
\begin{align*}
U & \sim \operatorname{N}\left(0,1\right) \\
\varphi\left(u\right) & = \frac{1}{\sqrt{2\pi}}\exp
\left(-\frac{u^2}{2}\right) \\
\Phi\left(u\right) & = \int_{-\infty}^u{\varphi\left(u\right)}du
\end{align*}
There is no closed form solution to the CDF of the normal distribution, and
so it must either be evaluated using a numerical integration technique or
approximated. In \OP\ there are approximations for the standard normal CDF
and the inverse CDF which are based on a well known approximations using a
technique known as Rational Chebyshev approximations, which are accurate to
at least the computer's accuracy and are evaluated directly, without
iteration.
The generalized form of the normal \ac{PDF} and CDF are:
\begin{align*}
X & \sim \operatorname{N}\left(\mu,\sigma^2\right) \\
f_X\left(x\right) & = \frac{1}{\sigma\sqrt{2\pi}}
\exp\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right) \\
F_X\left(x\right) & = \Phi\left(\frac{x-\mu}{\sigma}\right)
\end{align*}
If we have more than one random variable then we define the multivariate
normal (or multi-normal) distribution as follows:
\begin{align*}
\operatorname{E}\left[X_1\right] & = \mu _1 \\
\operatorname{Var}\left[X_1\right] & = \sigma _1^2 \\
\operatorname{Cov}\left[X_2,X_1\right] & = \rho _{2\,1}\sigma _2\sigma _1
\end{align*}
We can express this in matrix notation as:
\begin{align*}
\mathbf{M} = \begin{bmatrix}
\mu _1 \\
\mu _2 \\
\vdots \\
\mu _n \\
\end{bmatrix} &&
\mathbf{D} = \begin{bmatrix}
\sigma _1 & {} & {} & \text{diag.} \\
{} & \sigma _2 & {} & {} \\
{} & {} & \ddots & {} \\
{} & {} & {} & \sigma _n \\
\end{bmatrix} &&
\mathbf{R} & = \begin{bmatrix}
1 & {} & {} & \text{sym.} \\
\rho _{2\,1} & 1 & {} & {} \\
\vdots & \vdots & \ddots & {} \\
\rho _{n\,1} & \rho _{n\,2} & \cdots & 1 \\
\end{bmatrix} &&
\mathbf{\Sigma} = \mathbf{DRD}
\end{align*}
\begin{align*}
\mathbf{X} & \sim \operatorname{N}\left(\mathbf{M},\mathbf{\Sigma}\right) \\
f_\mathbf{X}\left(\mathbf{x}\right) & =
\frac{\left(2\pi\right)^{-n/2}}{\sqrt{\left|\mathbf{\Sigma}\right|}}
\exp\left(-\frac{1}{2}\left(\mathbf{x}-\mathbf{M}\right)^{\operatorname{T}}
\mathbf{\Sigma}^{-1}\left(\mathbf{x}-\mathbf{M}\right)\right) \\
F_\mathbf{X}\left(\mathbf{x}\right) & =
\int_{-\infty}^{x_n}{\cdots\int_{-\infty}^{x_1}
{f_\mathbf{X}\left(\mathbf{x}\right) dx_1} \cdots dx_n }
\end{align*}
If we are dealing with correlated standard normal variables (which we will
denote with $\mathbf{z}$ throughout this discussion) then $\mathbf{M}$ is a zero matrix
and $\mathbf{D}$ is the identity matrix, and the distribution reduces to:
\begin{align*}
\mathbf{Z} & \sim \operatorname{N}\left( 0,\mathbf{R}\right) \\
\varphi _n\left(\mathbf{z},\mathbf{R}\right) & =
\frac{\left(2\pi\right)^{-n/2}}{\sqrt{\left|\mathbf{R}\right|}}
\exp\left(-\frac{1}{2}\mathbf{z}^{\operatorname{T}}\mathbf{R}^{-1}\mathbf{z}\right)
\end{align*}
We can further simplify this equation if we are working in the uncorrelated
standard normal space to:
\begin{equation*}
\begin{array}{*{20}c}
\mathbf{U}\sim\operatorname{N}(0,1)
& {\phi_n(\mathbf{u}) = (2\pi)^{{-n\mathord{\left/\vphantom {{ - n} 2} \right.
\kern-\nulldelimiterspace} 2}} \exp \left( { - \frac{1}
{2}\left\| \mathbf{u} \right\|^2 } \right)} \\
\end{array}
\end{equation*}
There are no closed form solutions to the $n$-fold integral in the CDF.
There are however a number of simplifications. If we are dealing with two
correlated standard normal variables then we can obtain the result:
\begin{equation*}
\begin{gathered}
\Phi _2 \left( {z_1 ,z_2 ,\rho } \right) = \Phi \left( {z_1 } \right)\Phi \left( {z_2 } \right) + \int_0^\rho {\text{j}_2 \left( {z_1 ,z_2 ,\rho } \right)d\rho } \hfill \\
\text{j}_2 \left( {z_1 ,z_2 ,\rho } \right) = \frac{1}
{{2\pi \sqrt {1 - \rho ^2 } }}\exp \left( { - \frac{{\left( {z_1^2 - 2\rho z_1 z_2 + z_2^2 } \right)}}
{{2\left( {1 - \rho ^2 } \right)}}} \right) \hfill \\
\end{gathered}
\end{equation*}
The uncorrelated standard normal space is also rotational symmetric, so if
we are interested in the probability contained within a region bounded by a
hyper-plane then it is always possible to rotate that plane so that we can
obtain the marginal CDF in only one dimension. Since the marginal CDF of
the multi-normal distribution is also a normal distribution we can obtain the
following result:
\begin{equation*}
\Phi _n \left( {\mathbf{u}:\beta - \mathbf{\alpha }^T \mathbf{u} \leqslant 0} \right) = \int_{ - \infty }^\infty { \cdots \int_\beta ^\infty {\text{j}_n \left( \mathbf{u} \right)du_1 \cdots } du_n } = \Phi \left( { - \beta } \right)
\end{equation*}
\subsection{The log-normal distribution}
Many quantities in engineering, like length and elastic modulus, cannot be
negative, and so require a distribution which does not predict non-zero
probabilities for negative numbers. For these quantities the log-normal
distribution is used. It is closely related to the normal distribution
because it is obtained by assuming that the logs of the random variable are
normally distributed. We will only deal with a single random variable in
this discussion.
The log-normal distribution has two parameters: l and z, and the \ac{PDF} and CDF
are defined as:
\begin{equation*}
\begin{array}{*{20}c}
{X \sim \operatorname{LN} \left( {\lambda ,\zeta ^2 } \right)} & {f_X \left( x \right) = \frac{1}
{{\sqrt {2\pi } \zeta x}}\exp \left( { - \frac{1}
{2}\left( {\frac{{\ln x - \lambda }}
{\zeta }} \right)^2 } \right)} & {F_X \left( x \right) = \Phi \left( {\frac{{\ln x - \lambda }}
{\zeta }} \right)} \\
\end{array}
\end{equation*}
l and z are not the mean and standard deviation. These, and the coefficient
of variation, are given by the following equations:
\begin{equation*}
\begin{array}{*{20}c}
{\mu = \exp \left( {\lambda + {{\zeta ^2 } \mathord{\left/
{\vphantom {{\zeta ^2 } 2}} \right.
\kern-\nulldelimiterspace} 2}} \right)} & {\delta = \sqrt {\exp \left( {\zeta ^2 } \right) - 1} } & {\sigma = \mu \delta } \\
\end{array}
\end{equation*}
If we are given the mean and standard deviation (or the coefficient of
variation) it is possible to derive l and z from the following equations:
\begin{equation*}
\begin{array}{*{20}c}
{\zeta = \sqrt {\ln \left( {1 + \left( {\frac{\sigma }
{\mu }} \right)^2 } \right)} = \sqrt {\ln \left( {1 + \delta ^2 } \right)} } & {\lambda = \ln \mu - {{\zeta ^2 } \mathord{\left/
{\vphantom {{\zeta ^2 } 2}} \right.
\kern-\nulldelimiterspace} 2}} \\
\end{array}
\end{equation*}
\subsection{Transformation to the standard normal space}
Because we might be working with a number of different probability
distributions, and these might be correlated, the space in which we have to
perform calculations is very complex. In particular, although we might have
a marginal \ac{PDF} $f(x)$ for each $x$ in our space $\mathbf{x}$, we need the
multivariate \ac{PDF} $f(\mathbf{x})$, which would normally require the
evaluation of an $n$-fold integral over the space of $\mathbf{x}$ (where the
vector $\mathbf{x}$ as $n$ elements). Mathematically this is a difficult
problem, and it becomes easier to perform calculations when we transform the
variables into the uncorrelated standard normal space. To achieve this
transformation, we make use of a family of multivariate distributions known
as Nataf distributions. Nataf distributions take the form:
\begin{equation*}
f_\mathbf{X} \left( \mathbf{x} \right) = \frac{{f_1 \left( {x_1 } \right)f_2 \left( {x_2 } \right) \cdots f_n \left( {x_n } \right)}}
{{\varphi (z_1 )\varphi (z_2 ) \cdots \varphi (z_n )}}\varphi \left( {\mathbf{z},\mathbf{R}_0 } \right) = \varphi \left( \mathbf{u} \right)
\end{equation*}
Notice that only the marginal \ac{PDF} of each of the random variables appears in
the function, but that the correlation matrix \textbf{R} is replaced by
\textbf{R}0 (the details of this will be discussed momentarily). Notice
also, that the function assumes that some transformation from $\mathbf{x}$
to $\mathbf{u}$ exists. Many such transformations might exist, the only
requirement being that the mapping is one-to-one and reversible (i.e. we can
go $\mathbf{x}\rightarrow\mathbf{u}\rightarrow\mathbf{x}$). The
most useful of these transformations for the Nataf family takes the form:
\begin{equation*}
\begin{gathered}
\begin{array}{*{20}c}
{X_i \sim F_i \left( {x_i } \right)} & {\mu _i = \operatorname{E} \left[ {X_i } \right]} & {\sigma _i^2 = \operatorname{Var} \left[ {X_i } \right]} & {\rho _{ij} = \frac{{\operatorname{Cov} \left[ {X_i ,X_j } \right]}}
{{\sigma _i \sigma _j }}} \\
\end{array} \hfill \\
\mathbf{L}_0 \mathbf{L}_0^{\operatorname{T}} = \mathbf{R}_0 = \left[ {\rho _{0,ij} } \right] \hfill \\
\rho _{ij} = \iint {\frac{{x_i - \mu _i }}
{{\sigma _i }}\frac{{x_j - \mu _j }}
{{\sigma _j }}}\varphi _2 \left( {z_i ,z_j ,\rho _{0,ij} } \right)dz_j dz_i \hfill \\
\begin{array}{*{20}c}
{\begin{array}{*{20}c}
{\mathbf{u} = \mathbf{L}_0^{ - 1} \mathbf{z}} \\
\end{array} } & {\mathbf{z} = \left[ {z_i } \right]} & {z_i = \Phi ^{ - 1} \left( {F_i \left( {x_i } \right)} \right)} \\
\end{array} \hfill \\
\end{gathered}
\end{equation*}
Where \textbf{L}0 is the lower triangle decomposition of \textbf{R}0,
obtained by Cholesky decomposition.
While the mathematics might appear overwhelming, the mechanics of the
transformation are quite simple. For each variable $x$ we make use of the
fact that the marginal CDF $F(x)$ provides a one-to-one mapping to
probabilities in a range (0,1). Using these probabilities we make use of the
inverse standard normal CDF to transform $x$ into a standard normal variable
$z$. However, these $z$ variables are still correlated, and so we use a skew
transform of the coordinate system, based on the correlation structure in the
standard normal space. The only mathematically intractable component is the
integral in the equation above (because we have to solve for $\rho_{0,ij}$
and not $\rho_{ij}$) and Liu and Der Kiureghian (1989)1 supply a number of
closed form approximations for $\rho_{0,ij}$ as a function of $\rho_{ij}$ and
the distribution types of $x_i$ and $x_j$.
If $x_i$ and $x_j$ are both normal then $\rho_{0,ij}$ equals $\rho_{ij}$ by
definition. If $x_j$ is log-normal, then $\rho_{0,ij}$ is given (exactly) by the following equations depending on
whether $x_i$ is normal or log-normal:
\begin{equation*}
\begin{gathered}
\rho _{0,ij} = \rho _{ij} \frac{{\delta _j }}
{{\zeta _j }} \hfill \\
\rho _{0,ij} = \frac{{\ln \left( {1 + \rho _{ij} \delta _i \delta _j } \right)}}
{{\zeta _i \zeta _j }} \hfill \\
\end{gathered}
\end{equation*}
If we only have normal distributions, then the mathematics is greatly
simplified:
\begin{equation*}
\begin{gathered}