1
00:00:01,360 --> 00:00:24,120
Hello there. Welcome to the second lecture
of module three. In this lecture, we will
2
00:00:24,120 --> 00:00:29,980
know about the probability distribution of
random variable. In the last lecture, we have
3
00:00:29,980 --> 00:00:36,750
seen that definition and concept of this random
variable. Now, this concept and
4
00:00:36,750 --> 00:00:41,910
definition of this random variable, generally
is useful in this probability theory through
5
00:00:41,910 --> 00:00:48,720
its probability distribution. We have to know
that over the specified range of one
6
00:00:48,720 --> 00:00:53,739
particular random variable, how its probabilities
are distributed. This is what we will
7
00:00:53,739 --> 00:00:58,030
discuss in today’s class.
.
8
00:00:58,030 --> 00:01:03,649
So our outline for today’s presentation
is, first we will discuss about the general
9
00:01:03,649 --> 00:01:08,810
description of the probability distribution;
how this probability distribution, what it
10
00:01:08,810 --> 00:01:12,790
is all
about, how we can define the probability distribution
11
00:01:12,790 --> 00:01:14,180
for a random variable.
12
00:01:14,180 --> 00:01:20,220
.Basically, there are two different types
of random variables we will consider. One
13
00:01:20,220 --> 00:01:25,330
is that
probability; one is that discrete random variable
14
00:01:25,330 --> 00:01:30,400
and then the continuous random
variable. This probability mass function will
15
00:01:30,400 --> 00:01:39,580
be discussed for this, which is for discrete
random variable and probability density function
16
00:01:39,580 --> 00:01:46,510
is for this continuous random variable
and their cumulative distributions is also
17
00:01:46,510 --> 00:01:51,040
known as that cumulative distribution function.
So this will be discussed and for all these
18
00:01:51,040 --> 00:01:54,159
things, we will see some example problems
as
19
00:01:54,159 --> 00:01:55,259
well.
.
20
00:01:55,259 --> 00:02:04,670
Probability distribution of a random variable,
it says probability distribution of a random
21
00:02:04,670 --> 00:02:12,090
variable is a function that provides a complete
description of all possible values that the
22
00:02:12,090 --> 00:02:18,450
random variable can take along with their
probabilities over the range of minimum and
23
00:02:18,450 --> 00:02:26,790
maximum possible values in a statistical sense
of that random variable. So here, the
24
00:02:26,790 --> 00:02:33,890
meaning is that a random variable, in the
last class we have seen that this random
25
00:02:33,890 --> 00:02:40,280
variable can take some specific value over
the, for one particular random experiment
26
00:02:40,280 --> 00:02:44,910
the
specified that sample space can be correspondence
27
00:02:44,910 --> 00:02:50,310
to the real length through this random
variable which is nothing but the random variable
28
00:02:50,310 --> 00:02:55,750
and that random variable, is generally
a functional correspondence to the real line
29
00:02:55,750 --> 00:03:01,790
some numbers. So, it can take some numbers
that is a generally having certain range.
30
00:03:01,790 --> 00:03:02,790
.
31
00:03:02,790 --> 00:03:10,290
.And over this range, if we have just seen
it here, over this range of this variables,
32
00:03:10,290 --> 00:03:15,830
how,
suppose that one, random variable, this is
33
00:03:15,830 --> 00:03:19,900
yours,the axis for this random variable. If
you
34
00:03:19,900 --> 00:03:26,430
just see this side is your probability, then
if I say that this is the range of this random
35
00:03:26,430 --> 00:03:34,020
variable. Now thing is that, here how this
for each region if it is continuous, then
36
00:03:34,020 --> 00:03:37,239
for
some region, how this probability is distributed
37
00:03:37,239 --> 00:03:45,530
over it, for this entire range of this
random variable.
38
00:03:45,530 --> 00:03:53,640
In the context of this probability distribution,
this is known as the support of the random
39
00:03:53,640 --> 00:04:04,680
variable over this, if it is a discrete random
variable, then for the specific values the
40
00:04:04,680 --> 00:04:11,390
distribution, specific values the probability
will be specified. On the other hand, if it
41
00:04:11,390 --> 00:04:15,620
is
continuous, then it will be distributed as
42
00:04:15,620 --> 00:04:23,369
a function over this entire support.
Thus, now again, the another point here is
43
00:04:23,369 --> 00:04:27,939
the maximum and minimum possible values.
This maximum and minimum possible values,
44
00:04:27,939 --> 00:04:32,270
in obviously, this is in term, in the
statistical sense. What is meant is that,
45
00:04:32,270 --> 00:04:35,469
may be a random sampling if you take, for
any
46
00:04:35,469 --> 00:04:40,630
random variable, if you just see for one observation,
that the maximum of that one or
47
00:04:40,630 --> 00:04:46,659
minimum of that sample need not be the maximum
or minimum of that random variable.
48
00:04:46,659 --> 00:04:53,219
So, that can have some other values that will,
that is what it is meant by this statistical
49
00:04:53,219 --> 00:04:58,520
sense. This will be obviously discussed again
in details when we are going to some
50
00:04:58,520 --> 00:05:00,169
specific distribution.
51
00:05:00,169 --> 00:05:07,499
.For the time being, what is important is
that, a random variable it is having a specified
52
00:05:07,499 --> 00:05:13,819
range and this probability distribution gives
us the distribution of the probability
53
00:05:13,819 --> 00:05:21,199
specified for each, for all possible values
that the random variable can take.
54
00:05:21,199 --> 00:05:22,199
.
55
00:05:22,199 --> 00:05:31,979
Now, as we just mentioning, that it can, so
the two different concept; one is for this
56
00:05:31,979 --> 00:05:36,740
discrete random variable another one is for
the continuous random variable. As we
57
00:05:36,740 --> 00:05:42,289
discussed in the last class, that discrete
random variable here means that it can take
58
00:05:42,289 --> 00:05:46,349
some
specific value over the range of this random
59
00:05:46,349 --> 00:05:53,479
variable. It cannot take all possible values,
even though, traditionally or in most of the
60
00:05:53,479 --> 00:05:56,469
cases, this discrete random variable takes
the
61
00:05:56,469 --> 00:06:01,809
integer value. But, that is only the concept.
It can take any specific value not only
62
00:06:01,809 --> 00:06:07,710
integers but, it takes only that specific
value. So, that is a discrete random variable.
63
00:06:07,710 --> 00:06:09,869
On
the other hand, the continuous random variable,
64
00:06:09,869 --> 00:06:17,009
can take any value over the inter support
of the distribution or entire range of that
65
00:06:17,009 --> 00:06:21,649
random variable.
So in, so first we will discuss the probability
66
00:06:21,649 --> 00:06:24,409
distribution. We will discuss with respect
to
67
00:06:24,409 --> 00:06:33,379
this discrete random variable. It states that
probability distribution of a discrete random
68
00:06:33,379 --> 00:06:41,710
variable specifies the probability of each
possible value of the random variable. So
69
00:06:41,710 --> 00:06:44,529
can
see here, I, there is one random variable
70
00:06:44,529 --> 00:06:47,729
X, which can take the values 0 1 2 and so
on up
71
00:06:47,729 --> 00:06:55,560
to 9. So at each and every possible values
that the random variable can take, the
72
00:06:55,560 --> 00:06:57,259
probability is defined there.
73
00:06:57,259 --> 00:07:03,889
.So now, so there is nothing in between two
integers because, this random variable take
74
00:07:03,889 --> 00:07:09,469
the integer values only. So, in between two
integer values say for example, between this
75
00:07:09,469 --> 00:07:17,110
one and in between this two, there is nothing
is specified here. So, this space is entirely
76
00:07:17,110 --> 00:07:22,959
is
not specified by this distribution. So, here
77
00:07:22,959 --> 00:07:25,189
what we can say that, this particular, at
a
78
00:07:25,189 --> 00:07:30,709
particular point of this random variable,
for a particular specific value, this one
79
00:07:30,709 --> 00:07:34,699
can be
treated as a mass. So, this is concentrated
80
00:07:34,699 --> 00:07:39,770
at a particular point. So, this is why it
can be
81
00:07:39,770 --> 00:07:46,559
treated as a concentrated mass. That is why
it states, that being probabilities, the
82
00:07:46,559 --> 00:07:55,520
distribution function of the discrete random
variable are concentrated as mass for a
83
00:07:55,520 --> 00:08:02,610
particular value and that is why it is generally
known as the probability mass function
84
00:08:02,610 --> 00:08:09,919
and abbreviated as PMF.
So, what is meant here, that as there is nothing
85
00:08:09,919 --> 00:08:16,179
specified in between two specific values
of the random variable. So, what is specified
86
00:08:16,179 --> 00:08:22,059
for this specific value of the random
variable is can be treated as a mass of probability,
87
00:08:22,059 --> 00:08:26,949
that can be treated as a mass, so that is
why, this kind of distribution, we know it
88
00:08:26,949 --> 00:08:31,809
is known as the probability mass function,
abbreviated as PMF.
89
00:08:31,809 --> 00:08:32,809
.
90
00:08:32,809 --> 00:08:40,500
On the other hand, as for the continuous random
variable, this is not the case. So, this can
91
00:08:40,500 --> 00:08:47,190
be, this is specified over the, over a region.
So, on the other hand, the probability
92
00:08:47,190 --> 00:08:55,200
distribution of a continuous random variable
specifies, continuous distribution of the
93
00:08:55,200 --> 00:08:59,670
probability over the entire feasible range
of this random variable.
94
00:08:59,670 --> 00:09:05,050
.So, if the random variable is continuous,
then that random variable is specified over
95
00:09:05,050 --> 00:09:07,509
a
region, over a range. So, the distribution
96
00:09:07,509 --> 00:09:16,550
function should be specified in terms of a,
obviously, in terms of a function over the
97
00:09:16,550 --> 00:09:24,180
entire range. And, in contrast to the discrete
random variable, the probability is distributed
98
00:09:24,180 --> 00:09:30,899
over the entire range of that random
variable and at a particular value the magnitude
99
00:09:30,899 --> 00:09:37,079
of the distribution function can be
treated as density. Thus, the distribution
100
00:09:37,079 --> 00:09:41,680
function of continuous random variables is
generally known as probability density function
101
00:09:41,680 --> 00:09:48,610
and abbreviated as lower case of pdf.
So, this point, that is the, its concept of
102
00:09:48,610 --> 00:09:52,029
the density, I will just explain it here.
As, I was
103
00:09:52,029 --> 00:09:57,079
telling that, if this is the entire support
of the, so now what I am discussing this is
104
00:09:57,079 --> 00:10:07,110
in case
of the continuous random variable. So, for
105
00:10:07,110 --> 00:10:09,899
this continuous random variable, if I say
that,
106
00:10:09,899 --> 00:10:17,270
fine, this distribution is specified distribution
is shown like this. So, for what it, first
107
00:10:17,270 --> 00:10:21,670
of
all, what is showing that for some region,
108
00:10:21,670 --> 00:10:24,170
this probability is lower than with compared
to
109
00:10:24,170 --> 00:10:28,170
some other region. For example, this region
here, it is more and this region here, it
110
00:10:28,170 --> 00:10:32,870
is
less. Now, thing is that, if in this case,
111
00:10:32,870 --> 00:10:36,420
if I just mentioned that a specific value
of that
112
00:10:36,420 --> 00:10:43,970
random variable X, so this, is that specific
value. Now, what does this implies is, the
113
00:10:43,970 --> 00:10:56,670
probability. Now, if I just draw the same
thing for the discrete, then, what we have
114
00:10:56,670 --> 00:10:59,580
seen
just now, for a specific value of this random
115
00:10:59,580 --> 00:11:04,050
variable, the probability, what is specified;
what is concentrated, here, this nothing but,
116
00:11:04,050 --> 00:11:07,769
for the probability.
In between there is nothing is specified.
117
00:11:07,769 --> 00:11:12,060
So, in between this region, nothing is specified.
But, whatever is specified that is nothing
118
00:11:12,060 --> 00:11:14,700
but, the probability. Now, for this one, if
I just
119
00:11:14,700 --> 00:11:20,029
say for a specific value, what is this height
meant. This is very important to know that,
120
00:11:20,029 --> 00:11:27,519
this height is not the probability. Then,
where is the probability here, so here. The
121
00:11:27,519 --> 00:11:36,449
probability means that, so I have to specify
a small region around this, some small
122
00:11:36,449 --> 00:11:40,890
region. Then, what we are getting, we are
getting some area and that area is nothing
123
00:11:40,890 --> 00:11:44,639
but,
real probability is showing here. So, at a
124
00:11:44,639 --> 00:11:47,850
particular point, if I consider here, this
height is
125
00:11:47,850 --> 00:11:53,550
nothing but, we can treat that this has a
density. The density of the probability here.
126
00:11:53,550 --> 00:11:59,560
Once, you are multiplying the density for
a normal physical science, if you multiply
127
00:11:59,560 --> 00:12:05,870
that
density with its mass, then you will get its
128
00:12:05,870 --> 00:12:08,209
weight. So, similarly, here this is nothing
but,
129
00:12:08,209 --> 00:12:12,200
the density. If you multiply over a certain
range, then we will get the area and that
130
00:12:12,200 --> 00:12:15,639
area
nothing but, your probability. So, that is
131
00:12:15,639 --> 00:12:20,350
why for this distribution, for the continuous
random variable, this distribution is the
132
00:12:20,350 --> 00:12:22,779
probability density function.
133
00:12:22,779 --> 00:12:30,439
.So, this is why the, what density comes here.
So, once again, if we just read it in contrast
134
00:12:30,439 --> 00:12:39,430
to the discrete random variable, the probability
is distributed over the entire range of this
135
00:12:39,430 --> 00:12:45,759
random variable and at a particular value,
the magnitude of the distribution function
136
00:12:45,759 --> 00:12:48,330
can
be treated as density. Thus, the distribution
137
00:12:48,330 --> 00:12:54,779
function is known as this probability density
function. This we will be more clear when
138
00:12:54,779 --> 00:13:00,300
we are talking about that one of the axiom
that we have seen in the earlier classes,
139
00:13:00,300 --> 00:13:02,870
that the total probability, obviously, should
be
140
00:13:02,870 --> 00:13:11,649
equal to 1. So, this will be clear in a minute.
So here, what we are trying to say, why this
141
00:13:11,649 --> 00:13:15,269
word density.
.
142
00:13:15,269 --> 00:13:21,800
So next, we will first start with this probability
mass function, PMF, which is for this, we
143
00:13:21,800 --> 00:13:26,470
just now have seen, this is for the discrete
random variable. So, whenever we mention
144
00:13:26,470 --> 00:13:32,689
that PMF, this is for the discrete random
variable. The probability mass function, PMF,
145
00:13:32,689 --> 00:13:39,980
is the probability distribution of a discrete
random variable, say X, generally this is
146
00:13:39,980 --> 00:13:47,319
denoted by px(x). So, here this notation is
important. This p is the lower case; this
147
00:13:47,319 --> 00:13:51,360
p is
lower case; this subscript X is the, is denoting
148
00:13:51,360 --> 00:13:56,079
the random variable. So, this function is
for which random variable, this is shown as
149
00:13:56,079 --> 00:14:02,019
the upper case letter as a subscript to this
one. And this one is lower case, which is
150
00:14:02,019 --> 00:14:04,660
nothing but a specific value of this one.
This we
151
00:14:04,660 --> 00:14:09,079
also discussed in the context of some other
distribution in the last lecture.
152
00:14:09,079 --> 00:14:13,570
That so this one, this lower case is the specific
value of the random variable, which is
153
00:14:13,570 --> 00:14:20,749
shown here, x and this small p, this is nothing
but, for this probability mass function.
154
00:14:20,749 --> 00:14:29,220
.Now, it indicates the probability of the
value X equals to that specific value x taken
155
00:14:29,220 --> 00:14:32,839
by
the random variable X. So, there are some
156
00:14:32,839 --> 00:14:35,600
properties for this random variable, just
to
157
00:14:35,600 --> 00:14:40,399
what we are just telling, indicating in this
last slide is that, the first property is
158
00:14:40,399 --> 00:14:45,209
that for
each and every value, whatever the random
159
00:14:45,209 --> 00:14:50,730
variable can take should be greater than
equal to 0. So, this probability can never
160
00:14:50,730 --> 00:14:58,180
be negative. So, this is a non negative number
and the summation of all this probabilities,
161
00:14:58,180 --> 00:15:03,850
now this X is defined over some specific
values of this one, that is, so they are from
162
00:15:03,850 --> 00:15:07,459
specific values of X where this probability
is
163
00:15:07,459 --> 00:15:15,720
defined. Now, if we add up for all this possible
value values of this x, the probability of
164
00:15:15,720 --> 00:15:21,569
the all possible values of x, if you add up,
then it should end up to 1. So these two are
165
00:15:21,569 --> 00:15:25,569
the
properties of this probability mass function.
166
00:15:25,569 --> 00:15:26,569
.
167
00:15:26,569 --> 00:15:31,670
Now, some notes on this PMF. This is, these
are obvious, but still it is important to
168
00:15:31,670 --> 00:15:42,339
mention here, that in a particular case it
is certain that the outcome is only c. So,
169
00:15:42,339 --> 00:15:45,389
for a
random variable and saying that there is a
170
00:15:45,389 --> 00:15:50,999
only one outcome and that outcome is certain
and that outcome is c. Then, what is this
171
00:15:50,999 --> 00:15:53,920
px(c)? The distribution function is nothing
but,
172
00:15:53,920 --> 00:15:59,819
px(c) which implies the probability of X is
equals to c. This is the only outcome that
173
00:15:59,819 --> 00:16:02,509
is
feasible and this is certain outcome, so this
174
00:16:02,509 --> 00:16:05,290
will obviously come up. So, this is equals,
so
175
00:16:05,290 --> 00:16:12,259
this should entirely be equal to 1 to satisfy
all this properties of this PMF.
176
00:16:12,259 --> 00:16:18,889
On the other hand, if there are some mutually
exclusive outcomes for one random
177
00:16:18,889 --> 00:16:27,170
variable, that is x1 x2 up to xn. Now, if
we just say these values, this specific values
178
00:16:27,170 --> 00:16:28,170
of a
179
00:16:28,170 --> 00:16:33,779
.random variable X are mutually exclusive.
Then, if you want to calculate, what is the
180
00:16:33,779 --> 00:16:42,339
probability of either of this mutually exclusive
value that the random variable contains
181
00:16:42,339 --> 00:16:48,149
should be equal to the summation of their
individual probabilities. So, this is obvious
182
00:16:48,149 --> 00:16:51,600
in
case of a throwing a dice. There are six possible
183
00:16:51,600 --> 00:16:58,319
outcomes and if you say that all the
outcomes are equally feasible, equally possible
184
00:16:58,319 --> 00:17:04,990
and if I just take that number 1 2 to
number 4, then the probability, the total
185
00:17:04,990 --> 00:17:11,270
probability that the random variable will
take
186
00:17:11,270 --> 00:17:18,180
either 1 or 2 or 3 or 4 will be equal to the
summation of the probability of getting 1
187
00:17:18,180 --> 00:17:21,799
plus
summation of the probability plus probability
188
00:17:21,799 --> 00:17:27,319
of getting 2 plus probability of getting 3
and plus probability of getting 4. So, we
189
00:17:27,319 --> 00:17:32,510
know that this 1 to 4, these events and mutually
exclusive, so this can be, so this properties
190
00:17:32,510 --> 00:17:37,080
we explained earlier in the context of the
random variable. Here, we are explaining in
191
00:17:37,080 --> 00:17:43,980
the context of the specific value, that a
discrete random variable can take. .
192
00:17:43,980 --> 00:17:52,200
So, one small example will take on this probability
mass function. This is taken from the
193
00:17:52,200 --> 00:17:58,940
Kottegoda and Rosso book. The number of floods
recorded per year at a gauging station
194
00:17:58,940 --> 00:18:06,610
in Italy are given in this table. Find the
probability mass function and plot it. Now,
195
00:18:06,610 --> 00:18:11,809
here
for this 1939 to 1972, there are so many number
196
00:18:11,809 --> 00:18:18,090
of floods are noted here. So, 0 floods,
this table is this 0 flood has occurred in
197
00:18:18,090 --> 00:18:23,570
0 years. So, 1 flood has occurred in 2 years,
2
198
00:18:23,570 --> 00:18:28,890
floods has occurred in 6 years, 3 floods has
occurred in 7 years, in this way. So, the
199
00:18:28,890 --> 00:18:35,780
total
number, 9 floods in a year occur for 0 occurrence.
200
00:18:35,780 --> 00:18:40,450
So, if you just add of the total years,
this 34 should match with the, whatever the
201
00:18:40,450 --> 00:18:45,790
data that is available to us. Now, we have
to
202
00:18:45,790 --> 00:18:52,789
define the PMF for this one. So, this kind
of problem, the first thing that we should
203
00:18:52,789 --> 00:18:55,610
think
that, what is the random variable that we
204
00:18:55,610 --> 00:18:56,880
are talking about?
205
00:18:56,880 --> 00:19:02,600
.So, this is one of the, even though this
is sometime for this kind of problem, this
206
00:19:02,600 --> 00:19:05,010
is quite
obvious but, this is important to know that
207
00:19:05,010 --> 00:19:12,110
what is the random variable here. So,
occurring the flood is not the random variable.
208
00:19:12,110 --> 00:19:18,929
I repeat, the occurring of a flood event is
not the random variable here, rather the number
209
00:19:18,929 --> 00:19:22,580
of floods in a year that is the random
variable.
210
00:19:22,580 --> 00:19:23,580
.
211
00:19:23,580 --> 00:19:29,220
So, while solving this problem, first of all,
what we will do, we will first define that,
212
00:19:29,220 --> 00:19:34,880
what is the random variable that we are mentioning.
So, let X, that is the random variable
213
00:19:34,880 --> 00:19:40,780
denote the number of occurrence of flood.
Thus, for the given data, the probabilities
214
00:19:40,780 --> 00:19:43,250
of
the different number of the floods can be
215
00:19:43,250 --> 00:19:51,039
obtained as follows. So, p(x) equals to 0
number, if we take this number 0, that is
216
00:19:51,039 --> 00:19:54,500
the number of flood occurring 0 flood, no
flood
217
00:19:54,500 --> 00:19:59,649
in a year should be equals to 0. Because,
from this table, we see there is no such a
218
00:19:59,649 --> 00:20:04,270
year
where the number of flood is 0. Similarly,
219
00:20:04,270 --> 00:20:08,990
if we want to know the, what is the probability
that X equals to 1. That means, from this
220
00:20:08,990 --> 00:20:13,039
sample data, obviously, so X equals to 1.
So,
221
00:20:13,039 --> 00:20:17,990
that means there are two such occurrences
out of 34 years. So, 2 by 34 should be the
222
00:20:17,990 --> 00:20:23,580
probability for that specific value of the
random variable, that is X equals to 1.
223
00:20:23,580 --> 00:20:24,580
..
224
00:20:24,580 --> 00:20:29,640
So, this is what the probability of X equals
to 1 is equals to 2 by 34. Similarly, for
225
00:20:29,640 --> 00:20:34,910
X
equals to 2, 6 by 34, X equals to 3, 7 by
226
00:20:34,910 --> 00:20:38,610
34 and in so on we are going and getting all
this
227
00:20:38,610 --> 00:20:45,170
probability values. Now, similarly, we are
assuming one thing, that as, so number of
228
00:20:45,170 --> 00:20:51,270
flood, 9 floods in a year, this is occurring
0 and all we just got the summation this one.
229
00:20:51,270 --> 00:20:56,720
So, there is no higher numbers of flood. If,
I just take 10 11, there are also number of
230
00:20:56,720 --> 00:21:02,000
occurrence in a year, 10 floods in a year
0, 11 floods in a year 0.
231
00:21:02,000 --> 00:21:11,799
So, to complete the definition of this PMF,
px(x) equals to 0 for all X greater than 9.
232
00:21:11,799 --> 00:21:20,559
Now, simply we have to plot this one as a
mass function for these specific values only.
233
00:21:20,559 --> 00:21:25,490
Nothing in between 1 and 2 because, that is
not specified, which is obvious for the
234
00:21:25,490 --> 00:21:31,870
discrete random variable. So, this is the
plot. So for the 0, the probability is 0 and
235
00:21:31,870 --> 00:21:37,820
obviously for 9, probability is 0, for all
higher values is also 0. For 1, these are
236
00:21:37,820 --> 00:21:40,230
the
probabilities, whatever the value we got here,
237
00:21:40,230 --> 00:21:46,190
so we are getting this masses. Now,
obviously do not confuse that this, what is
238
00:21:46,190 --> 00:21:49,130
the meaning of this line. Basically, this
line,
239
00:21:49,130 --> 00:21:53,790
this solid line has no meaning; just for as
a geometric reference that this point refers
240
00:21:53,790 --> 00:21:58,220
to
this number 1. Otherwise, a simple a single
241
00:21:58,220 --> 00:22:05,260
dot at this point should be sufficient to
display the probability mass function.
242
00:22:05,260 --> 00:22:09,940
Now, we can, from this probability mass function,
we got the probability mass function
243
00:22:09,940 --> 00:22:15,730
for the data that we have. Several things
can be answered from this one. If it is asked
244
00:22:15,730 --> 00:22:21,021
that, what is the probability that number
of flood is greater than equal to 5? So, if
245
00:22:21,021 --> 00:22:22,021
I say
246
00:22:22,021 --> 00:22:26,789
.that number of number of floods greater than
equal 5, then obviously, I have to just add
247
00:22:26,789 --> 00:22:33,539
up these values. If I add up this values,
for the probability for 5 6 7 8 and 9, then
248
00:22:33,539 --> 00:22:36,000
I will
get what is the probability that the number
249
00:22:36,000 --> 00:22:39,130
of flood in a year is greater than 5. So,
this is
250
00:22:39,130 --> 00:22:45,279
the utility of this PMF, that all of this
kind of answer we will get from this probability
251
00:22:45,279 --> 00:22:50,950
mass function. This we will again see while
we are discussing the cumulative
252
00:22:50,950 --> 00:22:53,720
distribution function.
.
253
00:22:53,720 --> 00:23:02,070
So, this is for the discrete and there are
some standard example. There are some standard
254
00:23:02,070 --> 00:23:06,810
probability mass function, that is for the
discrete random variable, this is the binomial
255
00:23:06,810 --> 00:23:11,730
distribution. We will discuss all this distribution
again in detail in the successive lectures.
256
00:23:11,730 --> 00:23:17,799
For the time being, we can just know the names.
Binomial distribution, that means there
257
00:23:17,799 --> 00:23:23,059
are binomial distribution means, there are,
this is a bernoulli trial where there are
258
00:23:23,059 --> 00:23:27,290
two
possible outcomes. One, we just tell it a
259
00:23:27,290 --> 00:23:34,390
successes and the success for each trail is
predefined, which is known. So, now the number
260
00:23:34,390 --> 00:23:41,429
of success out of n successive such
trails, that is the number, so that number
261
00:23:41,429 --> 00:23:47,770
is a random variable and that random variable
follow this binomial distribution. Similarly,
262
00:23:47,770 --> 00:23:53,049
if there are more than two outcomes, then
for
263
00:23:53,049 --> 00:23:59,529
how many success we are getting in a set of,
say 1 to k and all this success rates are
264
00:23:59,529 --> 00:24:05,570
known, then the vector, that is that x1 x2
x3 up to xk, will follow the multinomial
265
00:24:05,570 --> 00:24:10,649
distribution. Similarly, there are different
definition for this negative binomial
266
00:24:10,649 --> 00:24:14,940
distribution, geometric distribution, hypergeometric
distribution, poisson distribution.
267
00:24:14,940 --> 00:24:22,690
These are example of the distribution of discrete
random variable which will be covered
268
00:24:22,690 --> 00:24:30,590
.in the successive lectures. Now, we will
go to the distribution function of that continuous
269
00:24:30,590 --> 00:24:31,940
random variable.
.
270
00:24:31,940 --> 00:24:36,610
So, this continuous random variable, when
we are taking, we call it as the probability
271
00:24:36,610 --> 00:24:44,700
density function. Why it is density? Just
we discussed now, so a probability density
272
00:24:44,700 --> 00:24:52,000
function abbreviated as lower case of pdf
is the probability distribution of a continuous
273
00:24:52,000 --> 00:24:59,320
random variable. Do not confuse about this
abbreviated form. This is, these are the
274
00:24:59,320 --> 00:25:03,630
abbreviation will be followed for this lecture.
But, in some standard reference book, you
275
00:25:03,630 --> 00:25:09,650
may get some other notation. But here, we
have to mean that probability density
276
00:25:09,650 --> 00:25:16,360
function, we generally abbreviated as lower
case of pdf just to differentiated from this
277
00:25:16,360 --> 00:25:21,040
cumulative distribution function, where the
d stands for the distribution. Here this d
278
00:25:21,040 --> 00:25:29,049
stands for the density.
So, this pdf is the probability distribution
279
00:25:29,049 --> 00:25:33,360
of continuous random variable. Generally,
it is
280
00:25:33,360 --> 00:25:39,750
denoted by this fx(x), which we discussed
last time also, that this is the random variable
281
00:25:39,750 --> 00:25:43,049
which is the upper case letter and this is
the specific value of the random variable
282
00:25:43,049 --> 00:25:46,549
which
is shown as the lower case of this letter.
283
00:25:46,549 --> 00:25:53,630
So, here again, you can see that this is the
distribution function defined over this one.
284
00:25:53,630 --> 00:25:58,170
So, this is the total range of this random
variable of that it can take and obviously
285
00:25:58,170 --> 00:26:01,600
here, the density is more and here the density
is
286
00:26:01,600 --> 00:26:02,750
less.
287
00:26:02,750 --> 00:26:09,600
.Now, we will see that there are obviously,
it is not that any function I will take and
288
00:26:09,600 --> 00:26:13,269
I can
tell that this is the probability density
289
00:26:13,269 --> 00:26:18,190
function. That is not the case. There are
certain
290
00:26:18,190 --> 00:26:23,029
properties, conditions should be followed
to make a particular function to be a
291
00:26:23,029 --> 00:26:27,499
probability density function and those conditions
are this.
292
00:26:27,499 --> 00:26:28,499
.
293
00:26:28,499 --> 00:26:38,299
So, there are two condition for a valid pdf.
The pdf is a continuous non negative function
294
00:26:38,299 --> 00:26:45,610
for all possible values of x. So fx(x), for
all x should be greater than equal to 0. This
295
00:26:45,610 --> 00:26:48,919
is
basically coming from the first axiom of the
296
00:26:48,919 --> 00:26:56,330
probability. So, in this graph, so everything
that is coming above this, towards the positive
297
00:26:56,330 --> 00:27:04,490
y axis and the total area bounded by the
curve and the x axis is equals to 1.
298
00:27:04,490 --> 00:27:11,980
So that, this shaded area, what you see here
below this graph, above this, above this axis
299
00:27:11,980 --> 00:27:17,520
should be equals to 1. So, this is basically
mean that, if I take the inter range of this
300
00:27:17,520 --> 00:27:25,070
random variable, then this is becoming a certain
event. So, any one, any possible value
301
00:27:25,070 --> 00:27:29,790
will take here. So, this is the entire set
of the sample space and which is equals to
302
00:27:29,790 --> 00:27:34,000
1. So,
the total probability of this entire end should
303
00:27:34,000 --> 00:27:39,130
be equals to 1. So here, we have just written
the minus infinity to plus infinity of the
304
00:27:39,130 --> 00:27:44,039
x should take care about this full one. So,
obviously this is reducing to this, the lower
305
00:27:44,039 --> 00:27:49,019
limit and the upper limit of this one, because
the rest of the places, this random variable
306
00:27:49,019 --> 00:27:55,710
is defined to be 0. So, this is the second
condition. So, if any function that passes
307
00:27:55,710 --> 00:27:59,460
through this two, then that condition can
be a
308
00:27:59,460 --> 00:28:05,290
valid pdf.
309
00:28:05,290 --> 00:28:06,290
..
310
00:28:06,290 --> 00:28:14,710
Now, this is one important concept, what I
was just discussing, while discussing the
311
00:28:14,710 --> 00:28:20,889
density, that when the pdf is graphically
portrayed, the area under the curve between
312
00:28:20,889 --> 00:28:24,600
two
limits, x1 and x2 such that, say x2 is greater
313
00:28:24,600 --> 00:28:34,380
than x1, gives the probability that the random
variable X lies in the interval of x1 to x2.
314
00:28:34,380 --> 00:28:42,370
So, this probability that this random variable
will be in between x2 and x1. So, this is
315
00:28:42,370 --> 00:28:48,380
graphically nothing but, as we have just telling
that, each and every point here, this is implied
316
00:28:48,380 --> 00:28:54,149
that the density for that particular value.
So, if I want to know what is the probability
317
00:28:54,149 --> 00:29:00,380
that this random variable be within this
limit. This is nothing but this hatched area
318
00:29:00,380 --> 00:29:07,960
in this graph. So, this area will, what how
will
319
00:29:07,960 --> 00:29:17,279
you get it at this, so we will integrate from
this x1 to x2. This integration and this will
320
00:29:17,279 --> 00:29:23,000
obviously be less than equals to 1, because
we know that this total area is equals to
321
00:29:23,000 --> 00:29:28,190
1.
Now, again if you just see this integral form,
322
00:29:28,190 --> 00:29:29,299
then it looks like this.
323
00:29:29,299 --> 00:29:30,299
..
324
00:29:30,299 --> 00:29:43,350
So, the integration here is, so integration
what we have seen here is that fx(x) dx from
325
00:29:43,350 --> 00:29:48,410
x1
to x2. Now, what we have just discussing here,
326
00:29:48,410 --> 00:29:56,250
that this is the density here. Basically,
what we are doing here for this graph, so
327
00:29:56,250 --> 00:30:00,000
we are taking a small, so d x is your small
strip
328
00:30:00,000 --> 00:30:05,480
here.
dx is the small strip here. So, this is your
329
00:30:05,480 --> 00:30:13,679
dx and this one, this f(x) gives you that
particular value for this area. So, if you
330
00:30:13,679 --> 00:30:17,580
multiply these two, which is nothing but the
probability, you are getting over the area
331
00:30:17,580 --> 00:30:20,700
dx. And, this area, you are just basically
adding
332
00:30:20,700 --> 00:30:25,690
up from this x1 to up to x2. So, that is why
you are getting the total area below these
333
00:30:25,690 --> 00:30:32,240
two
limits from x1 to x2 and that is nothing but,
334
00:30:32,240 --> 00:30:45,760
which is gives you the probability. This
probability is obviously from this x1 to x2.
335
00:30:45,760 --> 00:30:48,720
Now one thing, here again is important, so
far
336
00:30:48,720 --> 00:30:57,860
as the continuous random variable, so I just
specify this. So far as the continuous random
337
00:30:57,860 --> 00:31:05,299
variable is concerned, basically, this probability
that x1 less than equal to, so this sign I
338
00:31:05,299 --> 00:31:11,129
am just stressing the point that equal to
sign, having this equal to sign or not having
339
00:31:11,129 --> 00:31:13,860
this
equal to sign, does not mean anything because,
340
00:31:13,860 --> 00:31:21,090
ultimately for a particular specific value
the probability is 0 as this range for this
341
00:31:21,090 --> 00:31:23,630
particular value over which is the probability
is
342
00:31:23,630 --> 00:31:31,110
defining is 0. So, this equality sign, inclusion
of this equality sign or not inclusion of
343
00:31:31,110 --> 00:31:33,179
this
one does not change the total probability.
344
00:31:33,179 --> 00:31:37,240
So, what we can express is that less than
equals
345
00:31:37,240 --> 00:31:46,590
to x2 is equals to probability of x1 less
than X less than equals to x2 and whatever
346
00:31:46,590 --> 00:31:51,549
the
combination possible is that x1 less than
347
00:31:51,549 --> 00:31:57,250
equals to X less than x2 equals to probability
x1
348
00:31:57,250 --> 00:32:01,830
less than X less than x2.
349
00:32:01,830 --> 00:32:11,380
.So all these four cases, the probability
is same as long as this random variable is
350
00:32:11,380 --> 00:32:19,919
continuous. This is one important concept
here, while you are calculating the probability
351
00:32:19,919 --> 00:32:29,639
from the pdf for a continuous random variable.
Now, we will take one small example,
352
00:32:29,639 --> 00:32:38,809
mathematical example, rather to discuss about
this pdf, how to satisfy their properties.
353
00:32:38,809 --> 00:32:39,809
.
354
00:32:39,809 --> 00:32:49,520
Suppose that, the function fx(x) is equals
to its alpha x power 5 and which is defined
355
00:32:49,520 --> 00:32:54,000
over
the zone, for this x value from 0 to 1 and
356
00:32:54,000 --> 00:33:01,480
it is 0 elsewhere. Now, to be, this is a valid
pdf,
357
00:33:01,480 --> 00:33:07,309
what is the value of alpha? That we have to
determine and what is the probability of x
358
00:33:07,309 --> 00:33:11,979
that is greater than equal to 0.5.
359
00:33:11,979 --> 00:33:12,979
..
360
00:33:12,979 --> 00:33:17,470
So, if you want to know this first one, that
is the, what is the value of this alpha, then
361
00:33:17,470 --> 00:33:21,230
we
will know that the property that this should
362
00:33:21,230 --> 00:33:27,909
be from this entire range of this random
variable, that is 0 to 1 in this case, that
363
00:33:27,909 --> 00:33:35,500
this function should be equals to 1. Now,
if we do
364
00:33:35,500 --> 00:33:47,519
this one, then this will be alpha, this x
power 6 by 6, which is equals to your 1. This
365
00:33:47,519 --> 00:33:55,659
is
alpha; so, it is 0 to 1 by 6 minus 0 equals
366
00:33:55,659 --> 00:34:08,290
to 1, where the alpha is equals to 6.
This value we got. So now, what we will see,
367
00:34:08,290 --> 00:34:11,680
that this, so what we got that f(x) of this
x
368
00:34:11,680 --> 00:34:22,410
equals to your 6 x power 5 over the range,
this x less than 1 and greater than equals
369
00:34:22,410 --> 00:34:29,580
to 0
and 0 elsewhere. Now, the second thing is
370
00:34:29,580 --> 00:34:38,340
that, what is the probability that X is greater
than equal to 0.5. This we can express, that
371
00:34:38,340 --> 00:34:47,160
you know that 1 minus probability of X less
than or less than equals to, you know that
372
00:34:47,160 --> 00:34:50,230
for this continuous random variable, these
two
373
00:34:50,230 --> 00:34:59,430
are same, is equals to 0.5 less than equal
to, so 1 minus this, we can do that from 0
374
00:34:59,430 --> 00:35:10,920
to 0.5
6 x power 5 dx, which is 1 minus 6 by 6. We
375
00:35:10,920 --> 00:35:29,650
can just write to here, 0 to 0.5, so it is
equals to 1 minus, say 0.5 power 6. So, then
376
00:35:29,650 --> 00:35:33,950
we can calculate this one, this probability
with the help of this.
377
00:35:33,950 --> 00:35:39,380
For two things we want to discuss here. Now,
one is that just to get this X is greater
378
00:35:39,380 --> 00:35:42,710
than
equals to 0.5. What we can do is that, we
379
00:35:42,710 --> 00:35:49,290
can just do the integration from 0.5 to 1,
because this is the range 0.5 to 1. We can
380
00:35:49,290 --> 00:35:51,930
do this integration directly to this function
and
381
00:35:51,930 --> 00:35:56,960
we can get the probability and answer of it
will be the same. What instead of that also
382
00:35:56,960 --> 00:36:02,350
what we can do, we know that the total probability
is equals to 1. So, 1 minus the rest of
383
00:36:02,350 --> 00:36:07,460
.this part, from this means here 0 to 0.5.
What is the area that we have deducted to
384
00:36:07,460 --> 00:36:12,850
get this
probability? Basically, this relates its link,
385
00:36:12,850 --> 00:36:20,280
to the CDF because, we will just see in a
minute that what is a CDF. So, from the CDF,
386
00:36:20,280 --> 00:36:25,100
we can directly calculate, what is its
probability and that probability value, we
387
00:36:25,100 --> 00:36:30,070
can put in this place. Instead of, so, that
is why
388
00:36:30,070 --> 00:36:35,260
it is replaced in terms of this one, just
for one illustration purpose which can be
389
00:36:35,260 --> 00:36:37,290
linked to
the CDF that we are going to discuss in a
390
00:36:37,290 --> 00:36:40,250
minute. But, so far as this particular problem
is
391
00:36:40,250 --> 00:36:46,850
concerned, we can also calculate the integration
from 0.5 to 1 to get this particular
392
00:36:46,850 --> 00:36:48,450
probability answer.
.
393
00:36:48,450 --> 00:36:59,160
So second, there are, as we have given some
standard example for this PMF probability
394
00:36:59,160 --> 00:37:05,010
mass function, which is called the discrete
random variable. There are some example of
395
00:37:05,010 --> 00:37:10,610
some standard pdf as well, which is for the
continuous random variable and most popular
396
00:37:10,610 --> 00:37:16,670
distribution is a Normal or Gaussian distribution.
This Normal Distribution is a
397
00:37:16,670 --> 00:37:23,220
continuous probability distribution function
with parameters mu and sigma square, this
398
00:37:23,220 --> 00:37:26,260
is
also, this is known as variance and its probability
399
00:37:26,260 --> 00:37:33,380
density function that is pdf is expressed
as this one. This is 1 by square root of 2
400
00:37:33,380 --> 00:37:36,401
pi sigma square multiplied by exponential
of x
401
00:37:36,401 --> 00:37:44,400
minus mu whole square divided by 2 sigma square.
Now, this mu and sigma is known as the parameter
402
00:37:44,400 --> 00:37:51,380
of the distribution. Now this, if you
change, keeping the basic shape of this probability
403
00:37:51,380 --> 00:37:57,930
same, this things are implies different
properties of this particular distribution.
404
00:37:57,930 --> 00:38:01,920
Before that, what is important, so this is
not the
405
00:38:01,920 --> 00:38:08,110
.complete definition of this pdf, until and
unless you say what is its support. So here,
406
00:38:08,110 --> 00:38:10,900
the
support is minus infinity to plus infinity.
407
00:38:10,900 --> 00:38:18,240
So, in absence of this one, basically no function
is a valid pdf. So, whenever you are defining
408
00:38:18,240 --> 00:38:23,290
some pdf, the support must be specified for
that function.
409
00:38:23,290 --> 00:38:24,290
.
410
00:38:24,290 --> 00:38:31,450
For example, here, so this one, when once
you are getting this alpha 3 by 1000, then
411
00:38:31,450 --> 00:38:35,230
the
pdf is this, is equals to 3 x square by 1000
412
00:38:35,230 --> 00:38:41,840
for x 0 to 10 and equals to 0 elsewhere. So,
this support is very important. You know that,
413
00:38:41,840 --> 00:38:46,170
if you do not specify this support, then
whether this total area below curve is equals
414
00:38:46,170 --> 00:38:54,600
to 1 or not, that cannot be tested. So here,
similarly, for this normal distribution, the
415
00:38:54,600 --> 00:38:57,000
support is minus infinity to plus infinity.
So,
416
00:38:57,000 --> 00:39:01,450
here some example of this normal distribution
is shown here.
417
00:39:01,450 --> 00:39:07,420
This is basically, a bell shaped curve and
depending of this two parameter, this can
418
00:39:07,420 --> 00:39:11,140
be
changed. So, generally this mu is the location
419
00:39:11,140 --> 00:39:20,890
parameter. I repeat, this mu is the location
parameter, where is the centre of this and
420
00:39:20,890 --> 00:39:25,850
now I use this word centre very crudely. We
will discuss all these things, may be in the
421
00:39:25,850 --> 00:39:31,470
next class, but this is the location.
Now you see here, there are three different
422
00:39:31,470 --> 00:39:37,530
graphs are shown here. All are Normal
Distribution, but, for the different parameter
423
00:39:37,530 --> 00:39:44,440
value. So this blue one, the mu is 0, so its
location parameter is 0 and the black one
424
00:39:44,440 --> 00:39:47,170
is again, the mu equals to 0. So, you can
see
425
00:39:47,170 --> 00:39:54,790
that this point, both are the maximum density
is located at this 0, so here. Again, the
426
00:39:54,790 --> 00:40:01,040
sigma is the spread; the variance is the spread
above that mean.
427
00:40:01,040 --> 00:40:07,840
.So, this is 1 and for the second one, it
is 1.5. So here, you can see as the spread,
428
00:40:07,840 --> 00:40:10,620
for the
black one is more and for the blue one is
429
00:40:10,620 --> 00:40:14,260
less. For the green one here, the mu is equals
to
430
00:40:14,260 --> 00:40:20,730
2. So, you can see that, so this is shifted
and the center here again is that 2 and sigma
431
00:40:20,730 --> 00:40:24,700
is
.75, which is lower than this, the first one,
432
00:40:24,700 --> 00:40:27,570
this blue one. So, these are called this mu
and
433
00:40:27,570 --> 00:40:33,390
sigma is called some parameter of this distribution.
This Normal Distribution is
434
00:40:33,390 --> 00:40:40,150
symmetric, that you can see it is bell shaped
and skewness, so all these things, the
435
00:40:40,150 --> 00:40:46,930
skewness, mean, variance these things will
be discussed in the next class. So, and again
436
00:40:46,930 --> 00:40:51,160
the Normal Distribution also in detail we
will be discussing in subsequent classes.
437
00:40:51,160 --> 00:40:53,300
What
we are just telling here is that, this is
438
00:40:53,300 --> 00:40:55,580
over the entire support here. The support
is from the
439
00:40:55,580 --> 00:41:01,020
minus infinity to plus infinity. One function
is defined here and if you do this one, here
440
00:41:01,020 --> 00:41:05,380
you cannot do this integration. This is not
a closed form integration. The numerical
441
00:41:05,380 --> 00:41:11,300
integration has proven that this integration
from minus infinity to plus infinity is equals
442
00:41:11,300 --> 00:41:21,420
to 1. So, the area below this curve is equals
to 1 and this is known as the Normal or
443
00:41:21,420 --> 00:41:22,950
Gaussian distribution.
.
444
00:41:22,950 --> 00:41:28,561
Similarly, another important distribution
known as the exponential distribution. This
445
00:41:28,561 --> 00:41:36,010
exponential distribution is the probability
distribution function with parameter lambda,
446
00:41:36,010 --> 00:41:44,680
and its probability density function is expressed
as, the f(x) equals to lambda e power
447
00:41:44,680 --> 00:41:50,700
lambda x for x greater than 0. Again, you
see, this support is defined here is greater
448
00:41:50,700 --> 00:41:54,000
than
equal to 0 and this lambda is known as the
449
00:41:54,000 --> 00:41:58,240
parameter of this distribution and it is 0
otherwise.
450
00:41:58,240 --> 00:42:03,060
.So, for the entire support, from the minus
infinity to plus infinity, this is there.
451
00:42:03,060 --> 00:42:07,820
So, this is
basically defined for the positive x axis.
452
00:42:07,820 --> 00:42:13,940
So, these are again some example of this
exponential distribution for different values
453
00:42:13,940 --> 00:42:19,310
of lambda. So, this blue one is showing the
lambda is equals to 1. So now, this lambda,
454
00:42:19,310 --> 00:42:24,680
this parameter is generally having some
relationship with the different, as I just
455
00:42:24,680 --> 00:42:28,050
discussed the mean and all, this will be discussed
in the next class. But, for the time being,
456
00:42:28,050 --> 00:42:33,400
this lambda is the parameter for this
distribution. So and the difference between
457
00:42:33,400 --> 00:42:38,040
this is only one parameter is there as against
that normal distribution, where there are
458
00:42:38,040 --> 00:42:42,530
two parameters are there.
So, this is single parameter distribution
459
00:42:42,530 --> 00:42:48,090
function. This lambda, if we change this lambda,
you can see the, if the lambda is equals to
460
00:42:48,090 --> 00:42:50,120
1, this blue curve, the green curve is for
the
461
00:42:50,120 --> 00:42:55,760
lambda equals to 0.5 and this black one is
for lambda equals to 0.25 and these are all
462
00:42:55,760 --> 00:43:01,350
defined from this 0 to plus infinity. Now,
this integration is very easy. You can just
463
00:43:01,350 --> 00:43:04,280
test
for these values, if you do the integration
464
00:43:04,280 --> 00:43:08,500
from 0 to 1, then you will get that the total
area
465
00:43:08,500 --> 00:43:12,520
below this curve, above this x axis will be
equals to 1.
466
00:43:12,520 --> 00:43:19,240
Third one, these are basically, this is whatever
the distribution that we are mentioning
467
00:43:19,240 --> 00:43:25,570
here, both for this PMF and for the this pdf,
this is not, that this need be the complete
468
00:43:25,570 --> 00:43:29,540
list.
Only some examples we are just showing here.
469
00:43:29,540 --> 00:43:35,520
You can, some more distribution will be
covered in the successive classes as well
470
00:43:35,520 --> 00:43:39,770
and here just we are giving some example,
which are generally very important and mostly
471
00:43:39,770 --> 00:43:45,900
used in almost all the field and more
importantly, all the fields in the civil engineering.
472
00:43:45,900 --> 00:43:46,900
..
473
00:43:46,900 --> 00:43:50,620
So, the third example that we are giving is
this gamma distribution. Again, this
474
00:43:50,620 --> 00:43:57,190
distribution is a two parameter distribution.
The parameters are alpha and beta. So, this
475
00:43:57,190 --> 00:44:01,240
is
the form of this distribution. This is the
476
00:44:01,240 --> 00:44:06,230
one parameter alpha and this is another
parameter beta and this is the gamma function.
477
00:44:06,230 --> 00:44:12,540
Gamma function again is defined by this
integration form and if this alpha is a positive
478
00:44:12,540 --> 00:44:19,900
integer, then this form can be proven. So,
this distribution again, this is basically
479
00:44:19,900 --> 00:44:22,110
specified for the positive x axis, for the
negative
480
00:44:22,110 --> 00:44:29,770
side this is 0.Now, if you see, there is one
interesting point here. If you just see that,
481
00:44:29,770 --> 00:44:33,970
if
you change this alpha to be equals to 1, then,
482
00:44:33,970 --> 00:44:38,910
this is nothing but, so alpha equals to 1
gamma alpha, gamma alpha is equals to 0 factorial,
483
00:44:38,910 --> 00:44:44,030
which is equals to 1 and alpha equals
to 1, so 1 by beta. So, this is x power 0.
484
00:44:44,030 --> 00:44:48,860
So, this is 1 by beta e power x minus x by
beta.
485
00:44:48,860 --> 00:44:56,970
Now, if the 1 by beta is lambda, then this
is nothing but, lambda e power minus lambda
486
00:44:56,970 --> 00:45:02,140
x. So, this is again, if I put this alpha
equals to 1, this is becoming a exponential
487
00:45:02,140 --> 00:45:05,291
distribution.
So, here you can see, if you just change this
488
00:45:05,291 --> 00:45:12,310
parameter, then this set changes and the first
is the blue one, where the alpha equals to
489
00:45:12,310 --> 00:45:15,200
1 and beta equals to 2. As this alpha equals
to
490
00:45:15,200 --> 00:45:20,660
1, this is nothing but the exponential distribution.
For the green, it is alpha equals to 4
491
00:45:20,660 --> 00:45:27,310
and beta equals to 2 and for the black one,
alpha equals to 2 and beta equals to 1. So,
492
00:45:27,310 --> 00:45:32,380
these are again different. This is the gamma
distribution with different parameter values,
493
00:45:32,380 --> 00:45:37,400
different combination of the parameters.
494
00:45:37,400 --> 00:45:38,400
..
495
00:45:38,400 --> 00:45:44,570
Now, the another important thing in this class
that we will discuss is the cumulative
496
00:45:44,570 --> 00:45:51,020
distribution function. Now, to see it specifically
what we have seen now for the discrete
497
00:45:51,020 --> 00:45:56,550
as well as for the continuous distribution,
we have seen that, for what is the probability
498
00:45:56,550 --> 00:46:02,040
for a specific value in case of the discrete
and for what is the density of the distribution,
499
00:46:02,040 --> 00:46:06,720
and how it is distributed over the range.
Now, this cumulative distribution function;
500
00:46:06,720 --> 00:46:09,680
so,
for the earlier, for the pmf you can get the
501
00:46:09,680 --> 00:46:12,170
probability directly, for the pdf you cannot
get
502
00:46:12,170 --> 00:46:16,640
the probability directly. You have to do the
integration over the range to get that one.
503
00:46:16,640 --> 00:46:22,550
Now, CDF is the cumulative distribution function.
So, basically we are just going on
504
00:46:22,550 --> 00:46:27,410
adding up the probabilities starting from
the left extreme. So, the lower extreme of
505
00:46:27,410 --> 00:46:31,850
the
support to the higher extreme of the support.
506
00:46:31,850 --> 00:46:36,100
So, if you just go on adding up the
probabilities, the resulting graph will be
507
00:46:36,100 --> 00:46:40,700
the cumulative distribution function. So,
for a
508
00:46:40,700 --> 00:46:45,170
discrete or for a continuous random variable,
the cumulative distribution function
509
00:46:45,170 --> 00:46:52,540
abbreviated as CDF, upper case CDF, this D
stands for the distribution here and denoted
510
00:46:52,540 --> 00:46:58,440
by this Fx(x). Again, this is the random variable
and this is the specific value of the
511
00:46:58,440 --> 00:47:04,880
random variable is the nonexceedance probability
of the X and its range is between 0 to
512
00:47:04,880 --> 00:47:08,460
1.
So, as I was just discussing that we are just
513
00:47:08,460 --> 00:47:14,910
going on accumulating this thing from the
lower extreme. From the lower extreme, it
514
00:47:14,910 --> 00:47:21,660
is 0 and the upper extreme it will be 1,
obviously. So this f(x), as it is stated here
515
00:47:21,660 --> 00:47:24,950
is nothing but, the probability for a specific
x,
516
00:47:24,950 --> 00:47:32,710
.probability of the X less than equal to x.
So, whatever the lower value of that specific
517
00:47:32,710 --> 00:47:37,940
value of this X, the total probability up
to that point is nothing but, this cumulative
518
00:47:37,940 --> 00:47:43,190
distribution function. Sometimes, CDF for
the discrete random variable is denoted as
519
00:47:43,190 --> 00:47:50,140
Px(x). Just this P is, now again the upper
case letter and you have seen that PMF,
520
00:47:50,140 --> 00:47:55,640
probability mass function, we use this letter
as the lower case p.
521
00:47:55,640 --> 00:48:01,860
So, this notation will be followed for this
course as well. Now, to again just what we
522
00:48:01,860 --> 00:48:10,550
have just telling now, if you just show it
here, now this is your, the probability
523
00:48:10,550 --> 00:48:15,900
distribution function that we are doing. Now,
what we are trying to say, now to calculate
524
00:48:15,900 --> 00:48:22,160
this probability, we have seen that we have
to go for this one. Go for integration of
525
00:48:22,160 --> 00:48:28,080
this
range. Instead of that, for this CDF, what
526
00:48:28,080 --> 00:48:33,750
is meant is that we will show for this one.
For
527
00:48:33,750 --> 00:48:38,840
this specific value, I will calculate this,
what is this total area and total area will
528
00:48:38,840 --> 00:48:43,410
be put
here. So, from here, where the range basically
529
00:48:43,410 --> 00:48:46,260
is starting, so this is starting from this
0.
530
00:48:46,260 --> 00:48:53,480
Now, we are just going on adding up these
values and I will just go on adding, so this
531
00:48:53,480 --> 00:48:57,790
value means nothing but, the total area up
to this point and in this way we will go on
532
00:48:57,790 --> 00:49:04,900
adding. Once, we are reaching here we know
that total area below. Just now we
533
00:49:04,900 --> 00:49:10,040
discussed, the total area below this graph,
for a valid pdf is equals to 1. So, if you
534
00:49:10,040 --> 00:49:12,460
just go
on accumulating up to this point, obviously
535
00:49:12,460 --> 00:49:17,200
I will reach to the point, where this is equals
to 1.
536
00:49:17,200 --> 00:49:22,680
So, obviously that this axis which I have
drawn it earlier, for this pdf, need not be
537
00:49:22,680 --> 00:49:27,450
the
same for this axis. I can just use one more
538
00:49:27,450 --> 00:49:31,350
axis system, where it is starts from this
0 and
539
00:49:31,350 --> 00:49:37,090
this is ending up to this 1. And, as we are
going on adding up these things, obviously
540
00:49:37,090 --> 00:49:41,990
this
will never come down. If for some time, if
541
00:49:41,990 --> 00:49:48,020
it is, it can go horizontal that it can never
come down as this an cumulative function,
542
00:49:48,020 --> 00:49:52,950
as the quantities are getting added to this,
the
543
00:49:52,950 --> 00:50:00,800
earlier value. So, if this is understood,
all this concepts for this CDF will be clear.
544
00:50:00,800 --> 00:50:08,390
Again, if I get now, if I get this graph,
which is CDF, then for a specific value of
545
00:50:08,390 --> 00:50:11,620
this
random variable, if I want to know what is
546
00:50:11,620 --> 00:50:15,500
the probability that x is less, if this one
is x,
547
00:50:15,500 --> 00:50:21,680
then, what is the probability that random
variable less than equals to that specific
548
00:50:21,680 --> 00:50:25,720
value is
nothing but, is straight forward, we will
549
00:50:25,720 --> 00:50:28,160
get it from here, so this will be nothing
but, this
550
00:50:28,160 --> 00:50:37,980
particular probability what we are getting
it here. Now, will go one after another. They
551
00:50:37,980 --> 00:50:40,609
are the probabilities sometimes.
552
00:50:40,609 --> 00:50:41,609
..
553
00:50:41,609 --> 00:50:48,410
So this, now if you see the properties of
this CDF, this F(x), that is, which is denoted
554
00:50:48,410 --> 00:50:51,850
as
the CDF or the P(x), in case of the discrete
555
00:50:51,850 --> 00:50:54,890
random variable what is just now has told,
is
556
00:50:54,890 --> 00:51:02,430
bounded by 0 to 1. So, this is obvious. As
you are starting from this left extreme of
557
00:51:02,430 --> 00:51:06,089
this
support and going up to the right extremes,
558
00:51:06,089 --> 00:51:08,991
so it will obviously starts from 0 and it
will
559
00:51:08,991 --> 00:51:21,160
go up to 1. Secondly, this F(x) or this P(x)
is a monotonic function, which increases for
560
00:51:21,160 --> 00:51:29,741
the increasing values of x. So, this is bounded
by this 0 to 1. Again, this is monotonic,
561
00:51:29,741 --> 00:51:35,880
monotonic function, which increases with this
x. This also can be clear from this graph,
562
00:51:35,880 --> 00:51:40,700
that is, as we are adding up the area, as
we are adding up some quantity to the previous
563
00:51:40,700 --> 00:51:45,611
value, obviously this function will always
increase with increasing values of the x.
564
00:51:45,611 --> 00:51:53,980
These
two are the properties of
565
00:51:53,980 --> 00:51:54,980
the CDF.
566
00:51:54,980 --> 00:51:55,980
..
567
00:51:55,980 --> 00:52:03,110
So, now we will just take a small thing for
this one, just to discuss for this random
568
00:52:03,110 --> 00:52:10,270
variable. This is one important thing, that
is, for a discrete random variable, CDF that
569
00:52:10,270 --> 00:52:13,530
is
P(x) is obtained by summing over the values
570
00:52:13,530 --> 00:52:21,310
of this PMF. For a discrete random
variable, the CDF P(x) is the sum of the probabilities
571
00:52:21,310 --> 00:52:29,740
of all possible values of X that are
less than or equal to the argument of this
572
00:52:29,740 --> 00:52:34,240
x. So, this is equals to, for this all values
of this
573
00:52:34,240 --> 00:52:39,800
x, which is less than x should be added up.
If you take the example of the throwing a
574
00:52:39,800 --> 00:52:42,700
dice
and we know that, for this, all this, there
575
00:52:42,700 --> 00:52:45,140
are six equally probable outcomes are there.
For
576
00:52:45,140 --> 00:52:50,360
all the probabilities, the probabilities 1
by 6, if these are equally probable. Now,
577
00:52:50,360 --> 00:52:54,340
if we
want to know, what is the CDF for this one,
578
00:52:54,340 --> 00:53:00,720
this will be the starting point of our next
lecture and we will see that this is very
579
00:53:00,720 --> 00:53:04,560
important to know and where it can touch and
where it cannot touch.
580
00:53:04,560 --> 00:53:10,840
This is looks like a step function. From the
next class onwards, we will start a detail
581
00:53:10,840 --> 00:53:18,610
description of this one. So in this class,
what we have seen is that, we have seen the
582
00:53:18,610 --> 00:53:26,240
distribution of a particular random variable
and this random variable can be discrete,
583
00:53:26,240 --> 00:53:28,680
can
be continuous. In the next class also, we
584
00:53:28,680 --> 00:53:31,330
will see one example. If there are kind of
mixed
585
00:53:31,330 --> 00:53:38,230
random variable, that also we covered in this
that we told in the last class. So here, we
586
00:53:38,230 --> 00:53:45,230
will see that how to handle these issue as
a pdf, CDF for this one, and for the mixed
587
00:53:45,230 --> 00:53:50,100
random variable as well we will see. So, we
have first discussed this PMF, probability
588
00:53:50,100 --> 00:53:54,210
mass function, which is for the discrete pdf,
lower case pdf probability density function,
589
00:53:54,210 --> 00:54:01,750
which is for the continuous and then we have
seen how to calculate, how to get the CDF
590
00:54:01,750 --> 00:54:08,420
.cumulative distribution function from PMF
or from the pdf. The concept we have seen
591
00:54:08,420 --> 00:54:14,560
and for the discrete one, how to get actual
representation of this PMF will be discussed
592
00:54:14,560 --> 00:54:17,890
in
the next class along with some of the examples
593
00:54:17,890 --> 00:54:21,710
taken from the civil engineering
problems. Thank you.
594
00:54:21,710 --> 00:54:21,710
.