1
00:00:17,320 --> 00:00:18,320
\
2
00:00:18,320 --> 00:00:19,900
Hello there, welcome to this lecture of probability
methods in civil engineering. Today it
3
00:00:19,900 --> 00:00:32,340
is 5th lecture and in this lecture, we will
cover the some standard discrete probability
4
00:00:32,340 --> 00:00:38,780
distribution that will be very useful for
this; for different problems in civil engineering.
5
00:00:38,780 --> 00:00:43,830
Basically, this class and maybe one or two
classes we will cover some standard
6
00:00:43,830 --> 00:00:49,420
distribution of the random variables. And
today’s class mostly we will discuss about
7
00:00:49,420 --> 00:00:53,160
the
discrete random variables, and next or next
8
00:00:53,160 --> 00:00:59,190
one or two classes we will cover the different
distribution for continuous random variable.
9
00:00:59,190 --> 00:01:06,500
So, this distribution, this discrete distributions
even though limited, but have some
10
00:01:06,500 --> 00:01:12,990
application in different problem that we will
discuss one after another them. And we will
11
00:01:12,990 --> 00:01:18,580
start with our presentation with some quick
recapitulation of the pdf of the discrete
12
00:01:18,580 --> 00:01:25,460
random variable. So our today, we will discuss
on this discrete probability distribution,
13
00:01:25,460 --> 00:01:28,880
and this there are different discrete probability
distribution and here.
14
00:01:28,880 --> 00:01:29,880
.
15
00:01:29,880 --> 00:01:34,810
.There is some list is given it is not necessary
that these are the only distribution which
16
00:01:34,810 --> 00:01:39,930
are discrete there maybe some other, but mostly
the civil engineering problem the
17
00:01:39,930 --> 00:01:44,000
application of civil engineering problem limited
to these distributions. So, we will first
18
00:01:44,000 --> 00:01:51,110
today’s lecture, we will first start with
the pdf of discrete random variable that we;
19
00:01:51,110 --> 00:01:54,820
this
part we covered in the last couple of classes.
20
00:01:54,820 --> 00:01:59,220
But we will just quickly see that what this
distribution is that in general, what is this
21
00:01:59,220 --> 00:02:02,750
distribution? Then, we then will just show
that
22
00:02:02,750 --> 00:02:08,380
one after another the binomial distribution
then, we will go to multinomial distribution
23
00:02:08,380 --> 00:02:14,260
then Poisson distribution, geometric distribution,
negative binomial distribution, hyper
24
00:02:14,260 --> 00:02:18,260
geometric distribution.
So, this distribution is having different
25
00:02:18,260 --> 00:02:20,230
application in different civil engineering
problem
26
00:02:20,230 --> 00:02:25,440
for example, when you call about the binomial
distribution, we generally think of this
27
00:02:25,440 --> 00:02:34,160
different rate of success or failure of; in
a particular event. When you talk about
28
00:02:34,160 --> 00:02:41,480
multinomial, we generally go for some more
than two outcomes so, more than two
29
00:02:41,480 --> 00:02:45,799
possibilities so that distribution, we generally
see through multinomial distribution.
30
00:02:45,799 --> 00:02:50,190
Similarly, for the Poisson distribution, we
talk about in terms of its occurrence over
31
00:02:50,190 --> 00:02:54,010
a
time or space or over an area. And those things
32
00:02:54,010 --> 00:02:57,829
particularly for this rainfall phenomena
whether it is a rainy day or non rainy day
33
00:02:57,829 --> 00:03:02,650
or the railway accident and all this kind
of
34
00:03:02,650 --> 00:03:06,230
problem will generally deal with that Poisson
distribution.
35
00:03:06,230 --> 00:03:11,040
Then, there are one after another the geometric
distribution comes then negative
36
00:03:11,040 --> 00:03:15,209
binomial distribution comes, hyper geometric
distributions comes all this distribution
37
00:03:15,209 --> 00:03:17,760
we
will see. First, we will see this distribution,
38
00:03:17,760 --> 00:03:23,850
what are this different distribution properties
they are PMF particularly that you know that
39
00:03:23,850 --> 00:03:29,560
for discrete random variable that
probability mass function we use. So, we will
40
00:03:29,560 --> 00:03:35,100
see, what are the PMF of the probability
mass function for different distribution first
41
00:03:35,100 --> 00:03:41,460
and there some of the moments that first
moment second moment, we will see. And then,
42
00:03:41,460 --> 00:03:45,630
we will discuss about some of the
applications in different civil engineering
43
00:03:45,630 --> 00:03:46,630
problem.
44
00:03:46,630 --> 00:03:47,630
.
45
00:03:47,630 --> 00:03:53,910
So, to start with this discrete random variable,
as we have discussed earlier also that a
46
00:03:53,910 --> 00:04:03,650
discrete random variable is a function that
can take only a finite number of values. So,
47
00:04:03,650 --> 00:04:08,920
that value, it is not continuous over the
domain, over the sample space, it can take
48
00:04:08,920 --> 00:04:14,419
only
some finite numbers. And in mostly in general
49
00:04:14,419 --> 00:04:20,949
case, this is generally equidistance that
one, two, three like that, even though that
50
00:04:20,949 --> 00:04:28,120
is not the compulsory case. And the probability
density function of a discrete random variable
51
00:04:28,120 --> 00:04:34,200
indicates the correspondence between the
values taken by the random variable and their
52
00:04:34,200 --> 00:04:40,710
associated probabilities. And it is
concentrated as a mass of a particular value
53
00:04:40,710 --> 00:04:43,630
and generally known as probability mass
function.
54
00:04:43,630 --> 00:04:49,120
So, these things we discuss, as when we you
are talking about it can take some finite
55
00:04:49,120 --> 00:04:55,600
number of values. So, this here the probability
is not treated as a density rather we treat
56
00:04:55,600 --> 00:05:01,830
that probability to be concentrated at those
values means, which that random variable can
57
00:05:01,830 --> 00:05:07,310
take. So, it can be treated as a mass, that
which is concentrated at that particular value
58
00:05:07,310 --> 00:05:14,530
and that is why, this distribution is generally
known as the probability mass function.
59
00:05:14,530 --> 00:05:15,530
..
60
00:05:15,530 --> 00:05:25,780
So, now this probability mass function PMF
is the probability distribution of a discrete
61
00:05:25,780 --> 00:05:32,250
random variable, say this discrete random
variable is denoted as X which is generally
62
00:05:32,250 --> 00:05:38,280
denoted by this small pX( x). So, here this
X denotes as we have discussed earlier, that
63
00:05:38,280 --> 00:05:41,350
X
denote that random variable and the small
64
00:05:41,350 --> 00:05:47,310
x denote that particular value of that random
variable. And so, this X are now the finite
65
00:05:47,310 --> 00:05:53,240
in numbers some specific value that it can
take. So, the small p indicates that it is;
66
00:05:53,240 --> 00:05:55,780
this is the probability mass function and
when we
67
00:05:55,780 --> 00:06:01,360
indicate that cumulative distribution function
then we replace this one as capital P, as
68
00:06:01,360 --> 00:06:02,980
we
discussed earlier.
69
00:06:02,980 --> 00:06:11,110
So, this pX( x)this thing indicate that the
probability of the value when it takes a
70
00:06:11,110 --> 00:06:19,360
particular X, that is X equals to x taken
by that random variable X. So, this things
71
00:06:19,360 --> 00:06:25,560
though
a particular function so, this is now a function,
72
00:06:25,560 --> 00:06:29,841
which is denoting the probabilities for a
particular value that this random variable
73
00:06:29,841 --> 00:06:34,980
can take. Now, these functions also as we
discussed that should follow some properties
74
00:06:34,980 --> 00:06:41,150
to become a valid PMF, that valid
probability mass function. So, and this properties
75
00:06:41,150 --> 00:06:47,400
are shown like this, that for each and
every value of that random variable can take,
76
00:06:47,400 --> 00:06:53,840
that should be greater than equal to 0 and
this is valid for all possible values of the
77
00:06:53,840 --> 00:07:00,340
X and also that summation of all this
probabilities should be equals to 1.
78
00:07:00,340 --> 00:07:06,030
Now, when we discuss the different probability
distribution function for discrete random
79
00:07:06,030 --> 00:07:11,280
variable just now, what we have given this
list, that this list so, we can test that
80
00:07:11,280 --> 00:07:12,280
these two
81
00:07:12,280 --> 00:07:18,300
.properties. So, whatever the; these are the
standard discrete random variable which is
82
00:07:18,300 --> 00:07:24,389
available so, we can check these two properties,
that whether this two conditions that is
83
00:07:24,389 --> 00:07:29,919
the; which is required for this PMF is satisfied
or not, that we can check.
84
00:07:29,919 --> 00:07:30,919
.
85
00:07:30,919 --> 00:07:38,910
So, to; we will start with this binomial distribution,
this binomial distribution is used to
86
00:07:38,910 --> 00:07:45,340
find the probability of getting x number of
occurrence of a particular event in a sequence
87
00:07:45,340 --> 00:07:53,460
of n repeated trials. These trials are called
Bernoulli trials provided that the following
88
00:07:53,460 --> 00:08:00,580
assumptions hold good. There are only two
possible outcomes for each trial and which
89
00:08:00,580 --> 00:08:06,130
are arbitrarily called as success and failure.
So, when we talk about that one particular
90
00:08:06,130 --> 00:08:11,800
random experiment and that experiment is having
some two specific outcomes. So, when
91
00:08:11,800 --> 00:08:17,210
we are talking about this two specific outcome
so, this is; this success and failure is
92
00:08:17,210 --> 00:08:25,949
arbitrary. Now, if I just go for the very
basic example of tossing a coin then, I can
93
00:08:25,949 --> 00:08:29,669
say
that coming out of head is success and coming
94
00:08:29,669 --> 00:08:34,250
out of tail is the failure.
So, this is arbitrary, it is not that head
95
00:08:34,250 --> 00:08:37,770
is always success and tail is failure, I can
just
96
00:08:37,770 --> 00:08:44,209
reverse the notation and as well. To some
of the specific example of civil engineering,
97
00:08:44,209 --> 00:08:49,380
if
I say that a reservoir has is some particular
98
00:08:49,380 --> 00:08:57,000
high flood level so, above that we have to
consider that this water level is dangerous.
99
00:08:57,000 --> 00:09:02,960
So, I can say that the outcome are two
whether it is below or above the high flood
100
00:09:02,960 --> 00:09:06,300
low level. Now, in this case, I can denote
that
101
00:09:06,300 --> 00:09:12,550
above the high flood level is my success case
and below the high flood level is the failure
102
00:09:12,550 --> 00:09:17,540
case. So, this is nothing to do with the real
life scenario, that which one should be
103
00:09:17,540 --> 00:09:24,080
.success and which one should be failure.
So, one outcome one particular event, I can
104
00:09:24,080 --> 00:09:27,150
say
that this is success and this is failure.
105
00:09:27,150 --> 00:09:31,220
And for the binomial distribution when we
are discussing, we have to remember that we
106
00:09:31,220 --> 00:09:37,450
are considering only two possible outcome.
And two possible outcome are also this
107
00:09:37,450 --> 00:09:44,960
should be mutually exclusive, such is that
occurrence of one event will automatically
108
00:09:44,960 --> 00:09:51,540
indicate that non-occurrence of the other
particular of the possible outcome, so that
109
00:09:51,540 --> 00:09:54,760
is
why? So, when you say that there; so, when
110
00:09:54,760 --> 00:10:01,010
you say that these are the Bernoulli’s trials
of first assumption that it must hold good
111
00:10:01,010 --> 00:10:07,930
is that there are only two possible outcome.
And these outcomes are arbitrarily, we can
112
00:10:07,930 --> 00:10:11,700
call that one is success and another one is
failure.
113
00:10:11,700 --> 00:10:19,190
Now, the probability of the success is same
for each trial. Now, again when we are
114
00:10:19,190 --> 00:10:25,300
talking about that this probability of outcome
of each trial is same for the different trials
115
00:10:25,300 --> 00:10:33,730
then, what you mean is that; so, if we take
the basic example of this throwing a or tossing
116
00:10:33,730 --> 00:10:39,390
a coin then, I can say that the probability
of coming of this head is equals to some
117
00:10:39,390 --> 00:10:47,390
number say point five. So, that number is
fixed for each trial so, if I do; if I repeat
118
00:10:47,390 --> 00:10:51,960
that
particular trial then, that probability of
119
00:10:51,960 --> 00:10:55,360
that particular outcomes should not change.
Now,
120
00:10:55,360 --> 00:11:02,610
if I say that in a particular day for that
reservoir problem, if I say then, in a particular
121
00:11:02,610 --> 00:11:06,480
day
whether the reservoir water level should cross
122
00:11:06,480 --> 00:11:12,440
the high flood level or not that should
have some probability.
123
00:11:12,440 --> 00:11:17,580
And that probability should remain same for
the different trials, that you are considering,
124
00:11:17,580 --> 00:11:23,200
if you consider that particular random, even
to be Bernoulli distribution. Now the
125
00:11:23,200 --> 00:11:29,210
question is how to assign that particular
probability that is the different issue that
126
00:11:29,210 --> 00:11:33,980
we will
discuss again in the successive classes. But
127
00:11:33,980 --> 00:11:39,520
what you should remember at this point is
that particular event, which we are arbitrarily
128
00:11:39,520 --> 00:11:44,540
naming as success that particular event,
probability of that particular event that
129
00:11:44,540 --> 00:11:52,000
should be known to us and that should be fixed
for all the trials that we are going to conduct.
130
00:11:52,000 --> 00:11:58,430
So, that is why it says that, the probability
of the success is same for each trial.
131
00:11:58,430 --> 00:12:05,990
Now, the outcomes for different trials are
independent. Now, this is also important in
132
00:12:05,990 --> 00:12:09,340
the
sense that when we are talking about that,
133
00:12:09,340 --> 00:12:16,360
I am conducting a particular trial. So, outcome
of this trial whether success or failure should
134
00:12:16,360 --> 00:12:23,000
not depend on what we got just
immediately previous trial. So, successive
135
00:12:23,000 --> 00:12:26,520
trials are independent to each other. So,
and
136
00:12:26,520 --> 00:12:33,260
.the last condition, last assumption of this
Bernoulli trial is there is a fixed number
137
00:12:33,260 --> 00:12:37,060
of trial
to be conducted. So, this in so how many trials
138
00:12:37,060 --> 00:12:42,190
that we are going to conduct to get that x
number of occurrence of that particular event,
139
00:12:42,190 --> 00:12:46,530
which we are calling as success here so,
this n should be known.
140
00:12:46,530 --> 00:12:54,690
So, what are two things, that we know here
prior to define this binomial distribution
141
00:12:54,690 --> 00:12:57,440
are
the two thing, one is that total number of
142
00:12:57,440 --> 00:13:03,250
trials that we are considering and the
probability of success for the each trial.
143
00:13:03,250 --> 00:13:06,610
So, these two information should be known
and
144
00:13:06,610 --> 00:13:14,330
with these two information known and with
all these four assumptions to be satisfied
145
00:13:14,330 --> 00:13:17,650
we
can define, what is this binomial distribution?
146
00:13:17,650 --> 00:13:18,650
.
147
00:13:18,650 --> 00:13:25,420
So, if the probability of success again I
repeat that this success is the arbitrary,
148
00:13:25,420 --> 00:13:28,480
in the
sense of arbitrary a particular event out
149
00:13:28,480 --> 00:13:32,220
of two possible outcome, I can tell that,
it is to be
150
00:13:32,220 --> 00:13:37,970
success. So, if the probability of success,
that is the occurrence of an event in each
151
00:13:37,970 --> 00:13:41,970
trial is
given by this p so, this p is the probability
152
00:13:41,970 --> 00:13:44,121
of success so, it is known to us with just
now,
153
00:13:44,121 --> 00:13:49,990
I discussed. So, this p is known to us, that
is the probability of success then, the
154
00:13:49,990 --> 00:13:58,600
probability of getting exactly x successful
event, among the n trials in a Bernoulli
155
00:13:58,600 --> 00:14:05,399
sequence is given by this binomial probability
mass function. So, this how many trials I
156
00:14:05,399 --> 00:14:09,660
will conduct, this is also known to us this
probability is also known to us.
157
00:14:09,660 --> 00:14:16,279
Now, that the probability of getting exactly
x success so, this number, the number of
158
00:14:16,279 --> 00:14:21,190
success is the random variable that we are
considering in this binomial distribution.
159
00:14:21,190 --> 00:14:22,190
Now,
160
00:14:22,190 --> 00:14:28,660
.this one that probability of x, which is
expressed that nCx p power x 1 minus p power
161
00:14:28,660 --> 00:14:33,360
n
minus x and x can take any value in between
162
00:14:33,360 --> 00:14:42,589
0 to n. Now, each these terms are having
some meaning so, nCx means that n combination
163
00:14:42,589 --> 00:14:49,010
x so, out of total n outcome how many
different way, I can select that x outcome.
164
00:14:49,010 --> 00:14:56,360
Now, if this is multiplied with the probability
of success. So, and each success is independent
165
00:14:56,360 --> 00:15:02,680
to each other that is why this power to x.
And that should be multiplied as I told that
166
00:15:02,680 --> 00:15:05,110
these two events, the success and failures
are
167
00:15:05,110 --> 00:15:11,920
mutually exclusive; that means, that the probability
of failure is automatically 1 minus p
168
00:15:11,920 --> 00:15:16,550
when we the total probability is 1. So, if
this is the probability of success then
169
00:15:16,550 --> 00:15:22,730
automatically the probability of failure so,
be equals to 1 minus p. And that should be
170
00:15:22,730 --> 00:15:26,089
for
the; if the success is for x number of cases
171
00:15:26,089 --> 00:15:30,920
then, the failure should be for the n minus
x
172
00:15:30,920 --> 00:15:36,680
number of cases. So, that is why this two
are multiplied to get the total, get the
173
00:15:36,680 --> 00:15:45,290
probability that x is exactly equals to x
out of total n trials, which is given by nCx
174
00:15:45,290 --> 00:15:50,730
probability p power x 1 minus p power n minus
x and; obviously, x can take any value
175
00:15:50,730 --> 00:15:56,310
between any integer value of course, any integer
value in between 0 to n.
176
00:15:56,310 --> 00:16:02,959
Now, this nCx you know that this n combination
x is expressed by n factorial divided by
177
00:16:02,959 --> 00:16:08,680
x factorial multiplied by n minus x factorial
and this is known as this binomial
178
00:16:08,680 --> 00:16:17,100
coefficient. So, this now, if we can see here
that if we put any value of x, for x equals
179
00:16:17,100 --> 00:16:22,810
to
0 1 2 up to n then all this values are positive.
180
00:16:22,810 --> 00:16:28,220
And if you take summation of this
probability, we will see that the summation
181
00:16:28,220 --> 00:16:31,519
of all this individual probability masses
that
182
00:16:31,519 --> 00:16:37,680
is concentrated at x equals to 0 1 2 up to
n, this should be equals to 1. So, this is
183
00:16:37,680 --> 00:16:44,620
the valid
PMF first of all and is the PMF for binomial
184
00:16:44,620 --> 00:16:46,760
distribution.
185
00:16:46,760 --> 00:16:47,760
..
186
00:16:47,760 --> 00:16:54,200
Now, this binomial distribution is having
one very interesting property, this is known
187
00:16:54,200 --> 00:16:59,760
as
the additive property of this binomial distribution.
188
00:16:59,760 --> 00:17:06,740
Which says that, if X is a random
variable with binomial distribution having
189
00:17:06,740 --> 00:17:12,839
parameters n1, p. That is; this p is the
probability of success and total number of
190
00:17:12,839 --> 00:17:20,990
trial is n1 and there is another random variable
Y which is also a binomial distribution having
191
00:17:20,990 --> 00:17:29,640
the parameters n2 and p. Now, then their
sum, if I take the sum of this two random
192
00:17:29,640 --> 00:17:35,059
variable X and Y, their sum Z should be is
a
193
00:17:35,059 --> 00:17:41,899
random variable, which is again a binomial
distribution having the parameters n and p
194
00:17:41,899 --> 00:17:46,840
in
such a way that this n is equals to n1 plus
195
00:17:46,840 --> 00:17:50,399
n2.
So, when we are adding two binomial distribution,
196
00:17:50,399 --> 00:17:56,529
we are getting another binomial
distribution and the; while adding we should
197
00:17:56,529 --> 00:18:01,169
consider that the probability of success for
both the random variables is same which is
198
00:18:01,169 --> 00:18:11,029
p. Then, the summation should also have the
binomial distribution with the probability
199
00:18:11,029 --> 00:18:16,489
of success same as those of two random
variables which is p. And this total number
200
00:18:16,489 --> 00:18:21,309
of trials is the summation of the total number
of trials for the first random variable and
201
00:18:21,309 --> 00:18:23,700
the total number of trials for the second
random
202
00:18:23,700 --> 00:18:33,049
variable. So, here this b, generally we write
as in the capital letters so, this two should
203
00:18:33,049 --> 00:18:35,529
be
capital letter.
204
00:18:35,529 --> 00:18:36,529
..
205
00:18:36,529 --> 00:18:47,380
Now, if we see that different moment of this
distribution, we know that what are we
206
00:18:47,380 --> 00:18:52,669
discussed in this last class. So, if you see
the first moment that is mean, the mean of
207
00:18:52,669 --> 00:18:56,559
the
binomial distribution is given by is np and
208
00:18:56,559 --> 00:19:00,470
the variance of the binomial distribution
is
209
00:19:00,470 --> 00:19:09,710
given by np into 1 minus p. And the coefficient
of Skewness of binomial distribution is
210
00:19:09,710 --> 00:19:17,299
given by gamma, which is equals to 1 minus
2p divided by square root of np multiplied
211
00:19:17,299 --> 00:19:27,470
by 1 minus p. Now, this Skewness, I will come
little later with this description before
212
00:19:27,470 --> 00:19:33,890
that, if we just see this mean and variance
here, this can be easily shown like this.
213
00:19:33,890 --> 00:19:34,890
.
214
00:19:34,890 --> 00:19:41,859
.That for a particular trial, one individual
trial we are claiming that this probability
215
00:19:41,859 --> 00:19:45,639
of
success is p and we are saying that, there
216
00:19:45,639 --> 00:19:47,730
are n numbers of different trials are there.
What
217
00:19:47,730 --> 00:19:56,240
we also assume during the discussion of the
probably that different trial, that is the
218
00:19:56,240 --> 00:20:01,119
Bernoulli’s trials which are mutual, which
are independent. So, the successive
219
00:20:01,119 --> 00:20:07,999
observations, that successive outcomes are
independent. So, what we can say that, if
220
00:20:07,999 --> 00:20:11,179
the
expected value, this is the expected value
221
00:20:11,179 --> 00:20:14,460
for the success is p for one trial and there
are
222
00:20:14,460 --> 00:20:20,229
such n trials which are independent to each
other. Then obviously, the total number of
223
00:20:20,229 --> 00:20:26,640
expected value of the success should be the;
for one trial it is p, two trials it will
224
00:20:26,640 --> 00:20:29,020
be 2p
and similarly, for the n trials it will be
225
00:20:29,020 --> 00:20:33,000
np.
And so, that is why this value this np is
226
00:20:33,000 --> 00:20:37,649
you’re the expectation of that random variable
X.
227
00:20:37,649 --> 00:20:45,460
Similarly, if we take that variance, we know,
we can just say that arbitrarily that this
228
00:20:45,460 --> 00:20:52,139
success is 1 and failure is 0. If I say then
we can say that, then we can multiply that
229
00:20:52,139 --> 00:20:55,129
this
one multiplied by this p is the number of
230
00:20:55,129 --> 00:20:58,009
success and the number of failure should be
1
231
00:20:58,009 --> 00:21:08,419
minus p so, this is coming, this is for the
one particular trial. And as there are n different
232
00:21:08,419 --> 00:21:15,690
independent trial then, we can say that total;
that this variance for this whole this n
233
00:21:15,690 --> 00:21:19,479
different trials should be multiplied by simply
by n.
234
00:21:19,479 --> 00:21:28,260
So, this is; this one is equals to here variance
of X, similarly we can also see the
235
00:21:28,260 --> 00:21:34,710
Skewness. That the interesting point here,
that for the Skewness is that; this Skewness
236
00:21:34,710 --> 00:21:41,889
and we also discussed that positively skewed,
negatively skewed and symmetrical
237
00:21:41,889 --> 00:21:48,370
distribution now, which is dependent on this
probability of success. Now, if we put the
238
00:21:48,370 --> 00:21:54,000
probability of success is equals to 0.5 then,
you can see that this Skewness becoming
239
00:21:54,000 --> 00:21:58,330
zero. So, this Skewness factor, the coefficient
of Skewness if it becomes zero, we know
240
00:21:58,330 --> 00:22:05,170
that the distribution becomes symmetric.
Now, for the probability; if the probability
241
00:22:05,170 --> 00:22:11,779
of success is and probability of failure are
exactly same to each other then, the resulting
242
00:22:11,779 --> 00:22:19,779
binomial distribution is symmetric. Now, if
this p is greater than 1 minus p then, it
243
00:22:19,779 --> 00:22:22,929
will be skewed to the left and we know that
is
244
00:22:22,929 --> 00:22:28,529
skewed to the left, it means the positively
skewed. And if this p is less than equals
245
00:22:28,529 --> 00:22:32,490
to one,
if it is less than 1 minus p then, it is skewed
246
00:22:32,490 --> 00:22:37,450
to the right that means, it is negatively
skewed. So, depending on the probability of
247
00:22:37,450 --> 00:22:43,899
success, this Skewness coefficient of
Skewness of this binomial distribution changes
248
00:22:43,899 --> 00:22:51,799
from positively skewed to symmetric to
negatively skewed.
249
00:22:51,799 --> 00:22:52,799
...
250
00:22:52,799 --> 00:22:57,499
Next we will discuss about this multinomial
distribution now, it is similar to the binomial
251
00:22:57,499 --> 00:23:04,149
distribution in the sense, that in the binomial
distribution we as we told that there are
252
00:23:04,149 --> 00:23:10,200
only two possible outcome. Now, if you say
that there are more than two possible
253
00:23:10,200 --> 00:23:16,739
outcome then, the resulting random variable
is becoming as a vector and that the
254
00:23:16,739 --> 00:23:23,659
distribution of that vector we call as a multinomial
distribution. Now, if there are n
255
00:23:23,659 --> 00:23:30,489
independent trials with each trial allowing
k mutually exclusive outcomes, whose
256
00:23:30,489 --> 00:23:38,409
probabilities are p1 p2 up to pk. Now, when
we are saying that there are k mutually
257
00:23:38,409 --> 00:23:44,519
exclusive outcome just remember that for binomial
case we are talking about this two
258
00:23:44,519 --> 00:23:49,559
mutually exclusive outcomes
So, here we are making in general and mostly
259
00:23:49,559 --> 00:23:55,979
more than two so, which is that the
success. Now, this probabilities are p1 p2
260
00:23:55,979 --> 00:24:01,590
up to pk obviously, as these are mutually
exclusive the summation of this probabilities
261
00:24:01,590 --> 00:24:08,730
should be equals to 1 which is written here
that a summation of this all pi, i from 1
262
00:24:08,730 --> 00:24:13,529
to k is equals to 1. Then, the probability
of
263
00:24:13,529 --> 00:24:23,570
getting x1 outcome that is one particular
outcome is the number is x1. Second kind of
264
00:24:23,570 --> 00:24:26,380
the
outcome is exactly equal to x2 and exactly
265
00:24:26,380 --> 00:24:29,739
in this way the xk number of outcomes for
the
266
00:24:29,739 --> 00:24:38,009
kth kind; obviously. Now, when we are talking
about that x1 x2 x3 and up to xk then as we
267
00:24:38,009 --> 00:24:42,799
have already stated that there are n independent
trials that means, the summation of this
268
00:24:42,799 --> 00:24:47,570
x1 x2 xk should be equals to n so, which is
written here.
269
00:24:47,570 --> 00:24:55,700
.Then, the distribution of this kind of that
where there is more than one possible outcome.
270
00:24:55,700 --> 00:25:02,919
So, that probability of this exact numbers
that is x1 x2 xk for the first kind, second
271
00:25:02,919 --> 00:25:05,899
kind
and the kth kind respectively is given by
272
00:25:05,899 --> 00:25:10,899
this distribution. That means, here the; that
n
273
00:25:10,899 --> 00:25:16,720
factorial divided by x1 factorial, multiplied
by x2 factorial, xk factorial like this.
274
00:25:16,720 --> 00:25:24,759
Multiplied by their success, that the probability
of the success for the first kind power x1,
275
00:25:24,759 --> 00:25:29,809
probability of the success power x2, like
this up to kth. Similarly, if you just compare
276
00:25:29,809 --> 00:25:32,499
it
with the binomial that means, in the binomial
277
00:25:32,499 --> 00:25:38,440
there are only two possible outcome.
So, here, for the first possible so, that
278
00:25:38,440 --> 00:25:41,440
is why is the subscription was not there and
the
279
00:25:41,440 --> 00:25:45,999
probability of success was p and obviously,
if the probability; so, if I replace this
280
00:25:45,999 --> 00:25:51,220
p1 by p
then obviously, this p2 is equals to 1 minus
281
00:25:51,220 --> 00:26:00,009
p 1. So, that is what exactly we got in the
binomial distribution.
282
00:26:00,009 --> 00:26:01,009
.
283
00:26:01,009 --> 00:26:06,289
The mean and variance of this multinomial
distribution, the joint probability distribution
284
00:26:06,289 --> 00:26:11,679
whose values are given by these probabilities
is called the multinomial distribution. It
285
00:26:11,679 --> 00:26:14,929
is
so called, because of the different values
286
00:26:14,929 --> 00:26:20,889
of xi, the probabilities are given by
corresponding terms of the multinomial expansion,
287
00:26:20,889 --> 00:26:29,369
that is p1 plus p2 p3 up to pk. So, this
mean of this distribution can be shown that,
288
00:26:29,369 --> 00:26:36,370
is that the npi, pi means that for a particular
outcome when we are talking about ith outcome,
289
00:26:36,370 --> 00:26:43,239
that mean is npi. That is total number of
trial multiplied by the rate of success that
290
00:26:43,239 --> 00:26:45,239
the probability of success for that particular
291
00:26:45,239 --> 00:26:52,230
.outcome. And similarly, for the variance
for that particular outcome is equals to npi
292
00:26:52,230 --> 00:26:57,139
into
1 minus pi, exactly similar this can be done
293
00:26:57,139 --> 00:27:01,769
same way from this binomial distribution.
.
294
00:27:01,769 --> 00:27:11,989
Another important discrete distribution is
known as this Poisson distribution and the
295
00:27:11,989 --> 00:27:15,979
process is known as the Poisson process. And
this is important when you are talking
296
00:27:15,979 --> 00:27:22,239
about this, when we are modeling this rainfall
occurrence of rainfall or occurrence of this
297
00:27:22,239 --> 00:27:28,109
road accident, or rail accident in transportation
engineering particularly then, we
298
00:27:28,109 --> 00:27:33,450
generally use this kind of distribution. This
Poisson distribution, this Poisson process
299
00:27:33,450 --> 00:27:36,690
is
analogous to the binomial process, but it
300
00:27:36,690 --> 00:27:44,080
corresponds to the occurrence of the event
along a continuous time or space scale whereas,
301
00:27:44,080 --> 00:27:49,029
the binomial process correspond to the
occurrence of the event along a discrete time
302
00:27:49,029 --> 00:27:52,419
scale.
Now, this is important when you are talking
303
00:27:52,419 --> 00:27:57,789
about this binomial process then, we are
talking about the n different trials. And
304
00:27:57,789 --> 00:28:01,080
obviously, the n different trials should be
a
305
00:28:01,080 --> 00:28:09,019
particular integer value out of that n independent
trial what is the success rate and all we
306
00:28:09,019 --> 00:28:13,450
are just investigating. Now, when we are talking
about this Poisson process this is
307
00:28:13,450 --> 00:28:19,490
generally on a continuous time scale and this
continuous time scale over a time scale,
308
00:28:19,490 --> 00:28:26,009
when this particular event is occurring. Now,
or what we can say that over a particular
309
00:28:26,009 --> 00:28:32,499
span of time that, what is the possibility
of this different; the number of outcomes
310
00:28:32,499 --> 00:28:35,050
of
particular event. If you say that rail accident
311
00:28:35,050 --> 00:28:37,159
then over a month of time, this is the time.
312
00:28:37,159 --> 00:28:46,809
.In a month the number of rail accidents that
can happen. So, this is known as the Poisson
313
00:28:46,809 --> 00:28:51,190
process.
Now, this Poisson distribution is used to
314
00:28:51,190 --> 00:28:54,559
model a particular event that can occur at
any
315
00:28:54,559 --> 00:29:00,389
time, or at any space. So, this is not only
over the time, we can also say that along
316
00:29:00,389 --> 00:29:06,330
the
stretch the top of a highway or along the
317
00:29:06,330 --> 00:29:12,679
stretch of a railway line. So, this can be
happened in the time direction or in the spatial
318
00:29:12,679 --> 00:29:18,019
direction or this can be even extended to
the area. So, over a particular area that
319
00:29:18,019 --> 00:29:24,879
number of occurrence or it can be even extended
to the volume, over a particular volume that
320
00:29:24,879 --> 00:29:30,489
number of occurrence. So, that over a
continuous medium, I can say now, over that
321
00:29:30,489 --> 00:29:37,179
this can be this can be time or 1
dimensional space or 2 dimensional space,
322
00:29:37,179 --> 00:29:43,139
or 3 dimensional space. So, over that domain
that number of occurrences is modeled through
323
00:29:43,139 --> 00:29:47,980
this Poisson process.
So, now if we consider, suppose that how we
324
00:29:47,980 --> 00:29:54,139
can make the analogy with this Bernoulli’s
process. Now, the; say we if we consider a
325
00:29:54,139 --> 00:30:01,049
Bernoulli’s process in a certain time interval,
where p is the probability of occurrence of
326
00:30:01,049 --> 00:30:06,989
an event within the time interval, if the
time
327
00:30:06,989 --> 00:30:14,559
interval decreases then, the probability of
p also decreases obvious. Whereas, the number
328
00:30:14,559 --> 00:30:22,889
of trials n should increase now, n should
increase in such a way that if the decrease
329
00:30:22,889 --> 00:30:26,909
of p
and increase of n occurs, in such a manner
330
00:30:26,909 --> 00:30:33,590
that the product n p remains constant. Then,
the binomial distribution approaches to the
331
00:30:33,590 --> 00:30:34,960
Poisson distribution.
.
332
00:30:34,960 --> 00:30:45,009
.Now, if you see that the; what assumption
that we should follow to for a particular
333
00:30:45,009 --> 00:30:49,830
process to call as the Poisson process, these
are this. The Poisson process is based on
334
00:30:49,830 --> 00:30:54,340
the
following assumptions. First, a particular
335
00:30:54,340 --> 00:30:58,340
event can occur at random at any point in
the
336
00:30:58,340 --> 00:31:03,450
time or space and obviously, over this space
means, over a line segment or over an area
337
00:31:03,450 --> 00:31:12,759
etcetera. The number of occurrences of an
event in a given time or space interval is
338
00:31:12,759 --> 00:31:20,489
independent of that in any other non overlapping
time (or space) interval. So, if I say that
339
00:31:20,489 --> 00:31:27,839
over the temporal direction, the number of
occurrences over say t1 to t2 is totally
340
00:31:27,839 --> 00:31:32,029
independent of the number of occurrences from
t2 to t3.
341
00:31:32,029 --> 00:31:33,029
.
342
00:31:33,029 --> 00:31:37,519
That means, in a line if I just start from
here so, the number of occurrences in this
343
00:31:37,519 --> 00:31:44,229
from
this t1 to t2 and from this t3 to t4 as long
344
00:31:44,229 --> 00:31:47,869
as we say that this t3 is greater than t2
that means,
345
00:31:47,869 --> 00:31:53,009
these two zone are non overlapping to each
other. Then, the number of occurrences over
346
00:31:53,009 --> 00:31:58,659
this interval is independent of the number
of occurrence over this interval. And similarly,
347
00:31:58,659 --> 00:32:04,629
the same thing can be extended to the area
as well as for the time and space direction.
348
00:32:04,629 --> 00:32:10,359
So,
I repeat the number of occurrences of an event
349
00:32:10,359 --> 00:32:17,769
in a given time or space interval is
independent of that in any other non overlapping
350
00:32:17,769 --> 00:32:24,029
time (or space) interval.
The probability of occurrence of an event
351
00:32:24,029 --> 00:32:27,559
in a small interval that is delta t is given
by
352
00:32:27,559 --> 00:32:33,799
lambda delta t, where lambda is the mean rate
of occurrence of the event. This lambda is
353
00:32:33,799 --> 00:32:40,340
the parameter of this distribution where it
is the mean rate means number of occurrences
354
00:32:40,340 --> 00:32:45,679
over the unit time so, over an unit time how
many times that particular event can occur.
355
00:32:45,679 --> 00:32:50,589
.So, that is designated by the lambda which
is the parameter for this Poisson process.
356
00:32:50,589 --> 00:32:54,919
So,
this should be known before hand so that we
357
00:32:54,919 --> 00:33:02,389
can define what is Poisson distribution. The
probability of more than one occurrences of
358
00:33:02,389 --> 00:33:05,220
an event in the small interval delta t is
negligible.
359
00:33:05,220 --> 00:33:06,220
.
360
00:33:06,220 --> 00:33:10,849
So, what we are talking about there is single
occurrence in this small time interval delta
361
00:33:10,849 --> 00:33:15,200
t
that is a unit time is this one. So, this
362
00:33:15,200 --> 00:33:18,019
probability of more than one occurrence of
an event
363
00:33:18,019 --> 00:33:25,839
in the small interval delta t is negligible.
So, with this assumption, if we define that
364
00:33:25,839 --> 00:33:29,820
is the
as per this assumption of this Poisson process,
365
00:33:29,820 --> 00:33:40,059
the number of occurrences of an event Xt
in the time t is given by this Poisson distribution.
366
00:33:40,059 --> 00:33:44,499
One thing I just want to mention here
for whenever, we are talking about any particular
367
00:33:44,499 --> 00:33:53,539
distribution it is very essential to know
that what is the; means which event we are
368
00:33:53,539 --> 00:33:59,779
calling as this random variable so, each and
every distribution that we are discussing.
369
00:33:59,779 --> 00:34:06,359
Similarly, for this the binomial distribution
that we discussed just now and the Poisson
370
00:34:06,359 --> 00:34:10,230
distribution all the distribution that, we
are going to cover this class and as well
371
00:34:10,230 --> 00:34:13,560
as in the
successive classes. First, what you should
372
00:34:13,560 --> 00:34:18,230
try to understand is the what is the random
variable involve in it is number, is it the
373
00:34:18,230 --> 00:34:24,399
temporary direction over the time and this
thing.
374
00:34:24,399 --> 00:34:30,310
So, if we understand that which one is the
random variable is being referred here then,
375
00:34:30,310 --> 00:34:35,429
the understanding of that particular distribution
will be easier. So, here that is why I am
376
00:34:35,429 --> 00:34:42,700
repeating here, that what we are calling about
this Poisson distribution. The random
377
00:34:42,700 --> 00:34:47,190
.variable random variable is the number of
occurrence, the number of occurrence of the
378
00:34:47,190 --> 00:34:53,760
event in the time t.
So, that number here it is shown as random
379
00:34:53,760 --> 00:34:58,820
variable X and a particular value of the
random variable is x, with this parameter
380
00:34:58,820 --> 00:35:03,590
lambda over the time t which is denoted as
by
381
00:35:03,590 --> 00:35:09,790
this distribution, which is lambda t power
x by x factorial exponential minus lambda
382
00:35:09,790 --> 00:35:14,410
t.
Now, this lambda is greater than 0 and t is
383
00:35:14,410 --> 00:35:20,920
greater than 0 and this discrete random
variable that is X, which can take the value
384
00:35:20,920 --> 00:35:23,980
from 0 1 2 and it can go mathematically up
to
385
00:35:23,980 --> 00:35:31,840
infinity. Where this lambda t is the mean
rate of occurrence that is the average number
386
00:35:31,840 --> 00:35:35,910
of
the occurrence per unit time so, fine as we
387
00:35:35,910 --> 00:35:41,810
are talking about this unit time is specifically
mentioned here, this t is not required here
388
00:35:41,810 --> 00:35:46,950
that is the lambda is the mean rate of
occurrence, that is the average number of
389
00:35:46,950 --> 00:35:53,350
occurrence per unit time. So, if you multiply
with the t then, it is the total number of
390
00:35:53,350 --> 00:35:59,070
occurrence over that particular time t.
.
391
00:35:59,070 --> 00:36:04,010
The mean of the Poisson distribution now,
if we just show the first few moments, if
392
00:36:04,010 --> 00:36:08,070
you
see the mean of the Poisson distribution is
393
00:36:08,070 --> 00:36:14,491
given by expectation of that x is equal to
lambda. And this mean as well as the variance
394
00:36:14,491 --> 00:36:19,320
of this distributions are same which are
same of this parameter of this distribution
395
00:36:19,320 --> 00:36:25,090
which is lambda. And this coefficient of
Skewness here for this Poisson distribution
396
00:36:25,090 --> 00:36:28,710
is can be shown that, this is lambda power
1
397
00:36:28,710 --> 00:36:37,950
by 2 that is 1 by square root of lambda. Now,
if so this is indicating that if with the
398
00:36:37,950 --> 00:36:43,850
increase of the value of lambda so, if the
lambda value increases that is the mean of
399
00:36:43,850 --> 00:36:48,780
.occurrence value is increased. Then, the
distribution shifts from the positively skewed
400
00:36:48,780 --> 00:36:53,230
to
positively skewed distribution to a nearly
401
00:36:53,230 --> 00:36:57,130
symmetric distribution. So, for this lower,
for
402
00:36:57,130 --> 00:37:01,430
this low value or small value of this lambda
this distribution is nearly positively skewed
403
00:37:01,430 --> 00:37:08,150
now, as this lambda increases it is generally
approach to a symmetric distribution.
404
00:37:08,150 --> 00:37:09,150
.
405
00:37:09,150 --> 00:37:17,940
(No Audio From: 37:09 t. So, there is also
the additive property is there for this
406
00:37:17,940 --> 00:37:23,680
Poisson distribution as well, if there are
2 Poisson random variables with parameters.
407
00:37:23,680 --> 00:37:30,390
Now, here this lambda 1 and lambda 2 then
their sum is also a Poisson random variable
408
00:37:30,390 --> 00:37:35,080
with the parameter lambda in such a way, that
lambda is equal to lambda 1 plus lambda
409
00:37:35,080 --> 00:37:42,240
2. So, we can add more than one Poisson distribution
and this summation is also a
410
00:37:42,240 --> 00:37:47,580
random variable with Poisson distribution,
with the parameter lambda is the summation
411
00:37:47,580 --> 00:37:56,060
of the parameter of the summing of random
variables. So, this is the additive property
412
00:37:56,060 --> 00:37:58,250
of
the Poisson distribution.
413
00:37:58,250 --> 00:37:59,250
..
414
00:37:59,250 --> 00:38:06,590
Now, another distribution that is known as
the geometric distribution, this number of
415
00:38:06,590 --> 00:38:13,640
trials until the first success that is the
occurrence of an event a particular event
416
00:38:13,640 --> 00:38:16,510
obviously,
this success again here is the arbitrarily
417
00:38:16,510 --> 00:38:20,030
chosen. So, the number of trials until the
first
418
00:38:20,030 --> 00:38:25,470
success that is the occurrence of an event
in a Bernoulli sequence is given by this
419
00:38:25,470 --> 00:38:32,060
geometric distribution. Now, here what is
the random variable, as I was stressing that
420
00:38:32,060 --> 00:38:34,870
the
number of trials until the first occurrence.
421
00:38:34,870 --> 00:38:39,030
So, if I start one sequence of this Bernoulli
is
422
00:38:39,030 --> 00:38:47,950
process then how many trials I have to conduct
to get the first success. So, that number
423
00:38:47,950 --> 00:38:57,210
so, the few failure after first few failure
this first success will come. So, that number
424
00:38:57,210 --> 00:39:00,560
is
here the random variable which follow this
425
00:39:00,560 --> 00:39:06,600
geometric distribution.
If the probability of occurrence of an event
426
00:39:06,600 --> 00:39:13,040
in any particular trial is p now, you recall
from this the Bernoulli’s process that where
427
00:39:13,040 --> 00:39:17,320
each trial all the trials are independent
successive trials are independent to each
428
00:39:17,320 --> 00:39:25,810
other. And for a particular trial, the probability
of success is p then, the probability that
429
00:39:25,810 --> 00:39:30,220
the first occurrence of the event is on the
tth trial.
430
00:39:30,220 --> 00:39:36,710
Now, what we are saying is that this t will
take the value t, the first success will come
431
00:39:36,710 --> 00:39:40,710
in
the t will be given by this p multiplied by
432
00:39:40,710 --> 00:39:48,610
1 minus p power t minus 1. How we are getting
this one that is 1 minus p is the probability
433
00:39:48,610 --> 00:39:56,270
of failure which has occurred for the t minus
1 times before at the tth trial we get the
434
00:39:56,270 --> 00:39:59,730
first success.
So, this two are independent so, we are multiplying
435
00:39:59,730 --> 00:40:04,150
with each other to get the it is
distribution, which is nothing but this p
436
00:40:04,150 --> 00:40:07,660
multiplied by 1 minus p power t minus 1, this
t
437
00:40:07,660 --> 00:40:14,920
.minus 1 number of failures has occur before
the first success has come. So, that is why
438
00:40:14,920 --> 00:40:22,731
this t can take the value from 1, 2, 3 up
to like this. There is one concept, which
439
00:40:22,731 --> 00:40:26,020
is known
as the shifted geometric distribution, where
440
00:40:26,020 --> 00:40:29,100
it says that just the concept is changed to
that
441
00:40:29,100 --> 00:40:34,620
whether the number of failures before the
first success. Here what the way we discussed
442
00:40:34,620 --> 00:40:41,060
is the number of trials until I get the first
and now what I am saying is that in some other
443
00:40:41,060 --> 00:40:47,100
cases also you can see that number of trials
sorry, number of failures before the first
444
00:40:47,100 --> 00:40:48,980
success.
Now, when we are talking about the number
445
00:40:48,980 --> 00:40:53,560
of failures before the first success that
means, that is the little shifted so, that
446
00:40:53,560 --> 00:40:56,820
is supported on this zero so that even the
first trial
447
00:40:56,820 --> 00:41:02,470
itself is the success then, the number of
failure before the success is zero. So, that
448
00:41:02,470 --> 00:41:05,220
support
starts from the zero and go up to infinity.
449
00:41:05,220 --> 00:41:10,450
And that is generally, means both are same,
but
450
00:41:10,450 --> 00:41:15,680
to differentiate this two factors that is
generally in some text we will find that is
451
00:41:15,680 --> 00:41:18,510
call as
the shifted geometric distribution, but here,
452
00:41:18,510 --> 00:41:22,681
what we are considering is that this number
of trials until the first success. So, that
453
00:41:22,681 --> 00:41:26,990
is why we are the; ours support here is from
the 1,
454
00:41:26,990 --> 00:41:29,340
2 up to infinity.
.
455
00:41:29,340 --> 00:41:37,840
So, this is our distribution thus the distribution
for the geometric distribution can be
456
00:41:37,840 --> 00:41:44,110
denoted as, this is the PMF that is; which
is equals to p multiplied by 1 minus p power
457
00:41:44,110 --> 00:41:48,490
x
minus 1 where this p is the probability of
458
00:41:48,490 --> 00:41:51,720
success in each trial and this x can take
value
459
00:41:51,720 --> 00:42:01,200
from 1 to infinity. The expected value of
this geometric distribution so, this is important
460
00:42:01,200 --> 00:42:06,560
.in the sense, that this we can say as the
return period. This return period in sense
461
00:42:06,560 --> 00:42:09,250
now,
this expected value what this now; what we
462
00:42:09,250 --> 00:42:15,520
are talking about this number of trials before
the first success. Now, what we can say that
463
00:42:15,520 --> 00:42:21,750
if we take the expected value of this one,
which is indicating nothing but how frequent
464
00:42:21,750 --> 00:42:26,210
that particular success, here the success
again that particular event that we are referring
465
00:42:26,210 --> 00:42:31,880
to how frequently that particular success
is coming or returning.
466
00:42:31,880 --> 00:42:36,900
And which is a very important term known as
this return period, we will again the again
467
00:42:36,900 --> 00:42:43,900
discuss in the context of this frequent analysis
in successive modules. But here; so, this
468
00:42:43,900 --> 00:42:46,670
is
the return period that is a particular event
469
00:42:46,670 --> 00:42:50,120
is returning again, which is the expected
value
470
00:42:50,120 --> 00:42:55,670
of this geometric distribution. Now, this
expected value of geometric distribution if
471
00:42:55,670 --> 00:42:59,260
you
want to get then obviously, we can get. So,
472
00:42:59,260 --> 00:43:06,790
before that this average time between two
successive occurrence of an event in a Bernoulli
473
00:43:06,790 --> 00:43:13,110
sequence is called the mean recurrence
time or the return period. Remember that this;
474
00:43:13,110 --> 00:43:17,970
what we are discussing is for this discrete
random variable, but this can as well happen
475
00:43:17,970 --> 00:43:23,570
for the some of the; for continuous random
variable as well, which we will discuss later.
476
00:43:23,570 --> 00:43:27,940
But here, we are discussing only with respect
to this geometric distribution. So, the
477
00:43:27,940 --> 00:43:32,550
expected value of this geometric distribution
which is obviously, from this basic, this
478
00:43:32,550 --> 00:43:35,140
is
the basic equation we know that we have to
479
00:43:35,140 --> 00:43:41,850
multiply with that variable with this PMF
and sum it up over the support. So, support
480
00:43:41,850 --> 00:43:45,080
here is 1 to infinity so, if you add; if you
take
481
00:43:45,080 --> 00:43:52,569
this infinite series and it is coming to this
one so, this expected value is 1 by p.
482
00:43:52,569 --> 00:43:53,569
..
483
00:43:53,569 --> 00:44:01,670
So, the mean of this geometric distribution
so, this mean of this geometric distribution
484
00:44:01,670 --> 00:44:06,660
is
1 by p. The other moment there is variance
485
00:44:06,660 --> 00:44:14,580
of geometric distribution can be shown that
this is 1 minus p divided by p square. And
486
00:44:14,580 --> 00:44:20,760
this Skewness of the geometric distribution
again can be shown as sorry, this will be
487
00:44:20,760 --> 00:44:27,580
Skewness coefficient gamma not Var, this will
be gamma is equals to 2 minus p divided by
488
00:44:27,580 --> 00:44:33,760
square root 1 minus p. So, this is the
Skewness of this geometric distribution.
489
00:44:33,760 --> 00:44:34,760
.
490
00:44:34,760 --> 00:44:41,350
.Now, another discrete distribution that is
also being used in different civil engineering
491
00:44:41,350 --> 00:44:48,000
problem is this negative binomial distribution.
This negative binomial distribution is
492
00:44:48,000 --> 00:44:59,350
used to find the kth occurrence of an event
in a series of Bernoulli trials. Now, again
493
00:44:59,350 --> 00:45:02,700
if
you see what is the exactly that random variable,
494
00:45:02,700 --> 00:45:09,620
you are talking about is to find the kth
occurrence of an event in the series of Bernoulli
495
00:45:09,620 --> 00:45:14,710
trials. So, earlier what you are talking
about the first occurrence of that event or
496
00:45:14,710 --> 00:45:18,029
of that success here we are talking about
kth
497
00:45:18,029 --> 00:45:25,500
occurrence of that success. So, this one is
follow this negative binomial distribution.
498
00:45:25,500 --> 00:45:35,421
So, in a series of n Bernoulli trials if Tk
is the number of trials until the kth occurrence
499
00:45:35,421 --> 00:45:39,330
of
an event, then how we get that what is the
500
00:45:39,330 --> 00:45:42,780
distribution of this Tk. So, the probability
that
501
00:45:42,780 --> 00:45:54,620
Tk is equals to t, which is nothing but here
PMF for that particular trial is equals to
502
00:45:54,620 --> 00:45:59,520
given
by t minus 1 combination k minus 1 multiplied
503
00:45:59,520 --> 00:46:07,270
by p power k 1 minus p power t minus k,
where this t can take the value from k, k
504
00:46:07,270 --> 00:46:12,570
plus 1 up to infinity and for t less than
k this is
505
00:46:12,570 --> 00:46:19,930
equals to 0. Now, if we see this distribution,
the basis of this distribution we can say
506
00:46:19,930 --> 00:46:24,460
here
that what we are taking is the; at the T kth.
507
00:46:24,460 --> 00:46:31,670
So, that kth occurrence, there is kth occurrence
in this trial then just before this one so,
508
00:46:31,670 --> 00:46:35,140
if
this is equals to t so, at the tth trial we
509
00:46:35,140 --> 00:46:38,620
got that kth success or kth occurrence of
that event.
510
00:46:38,620 --> 00:46:44,690
Then, what we can say up to that t minus 1
trial, k minus 1 success is occurred. Now,
511
00:46:44,690 --> 00:46:48,510
if
we just take that what is the probability
512
00:46:48,510 --> 00:46:54,550
that k minus 1 success will come out of t
minus 1
513
00:46:54,550 --> 00:46:59,850
trials then this is a simple binomial distribution
and that distribution that probability will
514
00:46:59,850 --> 00:47:03,190
be given by; I can write it here.
515
00:47:03,190 --> 00:47:04,190
..
516
00:47:04,190 --> 00:47:16,000
That is t minus 1 combination k minus 1 power
this is p power total number of success
517
00:47:16,000 --> 00:47:25,420
that is k minus 1 multiplied by 1 minus p
power t minus k. So, here from this binomial
518
00:47:25,420 --> 00:47:29,660
distribution what we are getting total number
of trial is t minus 1 and total number of
519
00:47:29,660 --> 00:47:35,461
success is k minus 1 so, this distribution
is given by this. Now, immediate next trial
520
00:47:35,461 --> 00:47:40,640
that
is the tth trial, we are getting that kth
521
00:47:40,640 --> 00:47:44,880
we are getting the kth; that kth success.
So, for this
522
00:47:44,880 --> 00:47:50,970
particular trial, what is the probability
of a success that is p again this is independent
523
00:47:50,970 --> 00:47:57,830
of
this earlier whatever has happened. So, this
524
00:47:57,830 --> 00:48:02,730
t minus 1 number of trials has been taken,
where this k minus 1 success is there so,
525
00:48:02,730 --> 00:48:09,420
we have to calculate this probability multiplied
by; what is the probability that at tth observation
526
00:48:09,420 --> 00:48:15,030
we will get one success.
So, this one is the; we are getting directly
527
00:48:15,030 --> 00:48:19,860
from this binomial distribution, which is
t
528
00:48:19,860 --> 00:48:27,380
minus 1 combination k minus 1 probability
of success, number of success k minus 1
529
00:48:27,380 --> 00:48:34,470
multiplied by probability of failure power
t minus k. Now, at this tth trial the probability
530
00:48:34,470 --> 00:48:40,020
of success is p, because this is independent.
So, that is why as it is independent we can
531
00:48:40,020 --> 00:48:47,540
multiply directly with this one, which is
resulting you that this required distribution
532
00:48:47,540 --> 00:48:56,150
t
minus 1 C k minus 1 p power k 1 minus p power
533
00:48:56,150 --> 00:49:05,270
t minus k. So, this is the distribution
which is known as; now, here this t is taking
534
00:49:05,270 --> 00:49:11,670
the value from kth onwards. So, k, k plus
1
535
00:49:11,670 --> 00:49:17,980
until this and for this t less than k this
is zero.
536
00:49:17,980 --> 00:49:26,420
Now, in some of the text you will find that
this one is shifted, this k is shifted to
537
00:49:26,420 --> 00:49:29,540
it is
arranged in such a way that this can be shifted
538
00:49:29,540 --> 00:49:31,320
to zero. So, that the support is
539
00:49:31,320 --> 00:49:38,420
.mathematically shown from this 0 1 2 3 so
that you have to replace in such a way this
540
00:49:38,420 --> 00:49:43,040
distribution that this support should instead
of k, k plus 1 up to infinity, it should be
541
00:49:43,040 --> 00:49:49,140
0 1 2
3 in this way that is also possible. But here,
542
00:49:49,140 --> 00:49:55,350
we are taking the support from k onwards so,
the distribution function looks like this
543
00:49:55,350 --> 00:49:59,150
that is t minus 1 combination k minus 1 p
power
544
00:49:59,150 --> 00:50:02,630
k 1 minus p power t minus k.
.
545
00:50:02,630 --> 00:50:12,700
So, from the binomial law here sometimes this
we also call as this law this distribution,
546
00:50:12,700 --> 00:50:18,600
from this binomial law if there are k minus
1 occurrence of an event in the first t minus
547
00:50:18,600 --> 00:50:22,780
1
trial has been occurred. And the kth occurrence
548
00:50:22,780 --> 00:50:27,690
is that tth trial then, this probability of
Tk
549
00:50:27,690 --> 00:50:34,440
equals to; this t is equals to t minus 1 combination
the this one. Just what we have
550
00:50:34,440 --> 00:50:39,580
discussed is the; this we will get from binomial
process, multiplied by this p that is thus
551
00:50:39,580 --> 00:50:45,340
kth occurrence at the tth trial and we will
get this distribution for this negative binomial
552
00:50:45,340 --> 00:50:47,040
distribution.
553
00:50:47,040 --> 00:50:48,040
..
554
00:50:48,040 --> 00:50:55,570
Now, the mean of this negative binomial distribution
can be shown that it is k by p and
555
00:50:55,570 --> 00:51:02,150
the variance of this negative binomial distribution
will equals to k multiplied by 1 minus
556
00:51:02,150 --> 00:51:09,550
p divided by p square. Now, whatever the distribution
that we are discussing in this class
557
00:51:09,550 --> 00:51:16,360
will be used in the next to next module, where
we are talking about the different
558
00:51:16,360 --> 00:51:26,290
application to this particular civil engineering
problem with that module; so here what
559
00:51:26,290 --> 00:51:34,110
we the distribution that you are talking about
this negative binomial distribution, this
560
00:51:34,110 --> 00:51:35,760
two
variances shown.
561
00:51:35,760 --> 00:51:36,760
.
562
00:51:36,760 --> 00:51:42,700
.Next what we are talking is the hyper geometric
distribution in a series of n repeated
563
00:51:42,700 --> 00:51:50,820
trials, the outcome of the trials are not
independent. Then, the probability of x success
564
00:51:50,820 --> 00:51:58,010
and n minus x failures can be determined by
this hyper geometric distribution.
565
00:51:58,010 --> 00:52:02,820
Considering a group of n items out of which
the m are defective and the remaining n
566
00:52:02,820 --> 00:52:09,490
minus m being good, obviously. If a sample
of n items are chosen at random the
567
00:52:09,490 --> 00:52:15,020
probability of x defective item in this sample
is given by this distribution and which is
568
00:52:15,020 --> 00:52:20,500
shown in this one. One basic difference of
this earlier distribution that is that it
569
00:52:20,500 --> 00:52:25,040
is the
sampling without replacement, that is once
570
00:52:25,040 --> 00:52:32,410
we are taking out; obviously, we are not
replacing it back to the sample so, back to
571
00:52:32,410 --> 00:52:38,120
the population.
So, here that is why this one that exactly
572
00:52:38,120 --> 00:52:46,570
x success and n minus x failures is considered
out of these n repeated trials, which is shown
573
00:52:46,570 --> 00:52:53,900
by this n, there n is the number of defective
item here. So, n combination x multiplied
574
00:52:53,900 --> 00:53:01,350
by N minus m combination n minus x divided
by n combination C and this x can take value
575
00:53:01,350 --> 00:53:06,890
from value 1 to m. So, minimum possible
value is 1 and the maximum value that it can
576
00:53:06,890 --> 00:53:11,200
take is the m, because the total number of
defective item is m.
577
00:53:11,200 --> 00:53:15,690
.
578
00:53:15,690 --> 00:53:21,400
The mean of this hyper geometric distribution
is nm and variance of hyper geometric
579
00:53:21,400 --> 00:53:33,920
distribution can be shown that to follow this
expression. So, in this class there are some
580
00:53:33,920 --> 00:53:40,720
discrete distribution is discussed here and
there are some continuous distribution also
581
00:53:40,720 --> 00:53:45,960
will be covered in this next class or next
to next class. So, whatever the distribution
582
00:53:45,960 --> 00:53:46,960
that
583
00:53:46,960 --> 00:53:50,760
.we are learning with their basic properties
and the basic assumption that we have done
584
00:53:50,760 --> 00:53:56,100
and we will see some specific application.
While we are modeling the different problems
585
00:53:56,100 --> 00:54:01,190
in the civil engineering for different discipline
and that we will see later. And that time
586
00:54:01,190 --> 00:54:05,880
it
will be helpful for us to use this particular
587
00:54:05,880 --> 00:54:07,600
distribution depending on what problem at
hand.
588
00:54:07,600 --> 00:54:12,490
So, that we can understand, which is the random
variable and we are talking about and
589
00:54:12,490 --> 00:54:19,350
what is that probable behavior based on that
we can select the distribution. What we are
590
00:54:19,350 --> 00:54:26,210
discussing now and that will help to model
that particular random variable of the
591
00:54:26,210 --> 00:54:31,170
different civil engineering problem. And that
we will do mostly in this next to next
592
00:54:31,170 --> 00:54:35,890
module and in the next class we will discuss
some continuous distribution. What we
593
00:54:35,890 --> 00:54:42,780
discussed this class is the discrete and in
the next class and next to next class, we
594
00:54:42,780 --> 00:54:45,720
will
discuss some of the standard continuous distribution.
595
00:54:45,720 --> 00:54:47,039
Thank you.
596
00:54:47,039 --> 00:54:47,039
.