1
00:00:17,310 --> 00:00:23,750
Hello there, welcome to the 3rd lecture of
our module 3 and in this lecture, we will
2
00:00:23,750 --> 00:00:27,189
cover
mainly that, Descriptor of Random Variables
3
00:00:27,189 --> 00:00:33,980
and in the last class we stop somewhere at
while, we have started description on the
4
00:00:33,980 --> 00:00:39,450
CDF, that is Cumulative Distribution Function.
So what we will do we will start with that
5
00:00:39,450 --> 00:00:45,530
CDF, the description of the CDF and we will
see some problems on it, particularly pure
6
00:00:45,530 --> 00:00:51,040
CDF and we will also see some example of
the mixed variable, where some part is discrete
7
00:00:51,040 --> 00:00:58,360
and some part is continuous, we will see
one such example then, we will go to some
8
00:00:58,360 --> 00:01:02,350
general descriptors of random variable.
.
9
00:01:02,350 --> 00:01:07,930
So where we stop in the last class is the
description of the CDF for a discrete random
10
00:01:07,930 --> 00:01:12,070
variable, we know that there are two types
of random variable, one is discrete and
11
00:01:12,070 --> 00:01:20,439
another one is continuous and for both this
type of variables, we have seen what is their
12
00:01:20,439 --> 00:01:24,950
density function, that is probability density
function for continuous random variable and
13
00:01:24,950 --> 00:01:32,140
.probability mass function for discrete random
variable. Now, we are starting that CDF
14
00:01:32,140 --> 00:01:39,310
that is Cumulative Distribution Function for
a discrete random variable, so for a discrete
15
00:01:39,310 --> 00:01:45,960
random variable the CDF, that is PX(x) is
obtained by summing over with the values of
16
00:01:45,960 --> 00:01:54,369
PMF that is Probability Mass Function.
For a discrete random variable the CDF, PX(x)
17
00:01:54,369 --> 00:01:59,740
is the sum of the probabilities of all
possible values of X, that are less than or
18
00:01:59,740 --> 00:02:04,600
equal to the argument x. So if this notation,
that
19
00:02:04,600 --> 00:02:10,690
is PX(x) stands for the CDF, the value of
the CDF at x which is nothing but, the
20
00:02:10,690 --> 00:02:18,300
summation of their PMF for all possible X
which is less than this small x.
21
00:02:18,300 --> 00:02:23,790
Now this is the, so if you take the one standard
example of throwing a dice and getting
22
00:02:23,790 --> 00:02:33,150
that six different outcome 1, 2 upto 6 and
if we say that, all these are equally likely,
23
00:02:33,150 --> 00:02:38,459
then
the PMF says that at exactly at the point
24
00:02:38,459 --> 00:02:41,970
for the outcome 1 the probabilities is 1 by
6 and
25
00:02:41,970 --> 00:02:48,830
for all such outcomes the probability is 1
by 6, so this is now concentrated.
26
00:02:48,830 --> 00:02:49,830
.
27
00:02:49,830 --> 00:03:00,960
So, in the last one we have seen that its
PMF looks like this where this 1 2 3 4 5 and
28
00:03:00,960 --> 00:03:06,590
6
and all are the concentrated mass and these
29
00:03:06,590 --> 00:03:14,459
are all equal upto 1 by 6, basically this
should be the representation of this PMF and
30
00:03:14,459 --> 00:03:19,769
in some book we will see there is a stem
diagram or kind of that but, this is as we
31
00:03:19,769 --> 00:03:23,420
are telling that, these are concentrated mass
of
32
00:03:23,420 --> 00:03:29,560
probability, that is why the name PMF, so
this dots are sufficient to declare the this
33
00:03:29,560 --> 00:03:32,840
is a
PMF but, just for the reference we can just
34
00:03:32,840 --> 00:03:36,830
put one dotted line like this just to indicate
that this is referring to that particular
35
00:03:36,830 --> 00:03:40,680
outcome, if I want to know what is the CDF
for this
36
00:03:40,680 --> 00:03:48,620
.kind of descriptor then, you see that there
is some two important thing, that we should
37
00:03:48,620 --> 00:03:54,200
we should keep in mind the first one, that
is if I just take this reference line for
38
00:03:54,200 --> 00:03:59,510
this one
so this is this point is 1 by 6. Now, for
39
00:03:59,510 --> 00:04:04,209
if want to know what is the value for this
2 than I
40
00:04:04,209 --> 00:04:12,261
know, that for two this will be exactly 2
by 6 or 1 by 3, so this probability when it
41
00:04:12,261 --> 00:04:17,680
is
exactly equal to exactly equal to 2 then,
42
00:04:17,680 --> 00:04:21,430
it comes the probability comes to be 2 by
6.
43
00:04:21,430 --> 00:04:28,430
Now, what about for this line in between some
number now for the CDF, I can say that
44
00:04:28,430 --> 00:04:34,430
any number the argument can take any number
between 1 and 2 than less than that value
45
00:04:34,430 --> 00:04:42,080
say for example, 1.75, so what is the cumulative
probability up to 1.75 then obviously
46
00:04:42,080 --> 00:04:49,180
the probabilities 1 by 6, so the probability
that CDF remain constant starting from this
47
00:04:49,180 --> 00:04:51,430
1
and going upto 2.
48
00:04:51,430 --> 00:04:59,120
Now, as soon as it touches 2 it suits to 2
by 6, so generally in the representation we
49
00:04:59,120 --> 00:05:04,830
should not touch the line of 2, so this is
a continuous line it will simply start from
50
00:05:04,830 --> 00:05:09,930
1 and
go as close as it can go up to 2 but, it should
51
00:05:09,930 --> 00:05:16,680
not touch 2, immediately when it touches 2,
that is the cumulative probability at 2, that
52
00:05:16,680 --> 00:05:18,650
is less than equals 2 that is why the less
than
53
00:05:18,650 --> 00:05:30,791
and equal 2, that is why here the cumulative
distribution function value or CDF at 2 it
54
00:05:30,791 --> 00:05:32,380
is
2 by 6.
55
00:05:32,380 --> 00:05:41,020
Similarly, so it will be 1 line and it can
go as close at as it can up to 3 but, it should
56
00:05:41,020 --> 00:05:45,540
not
touch the line 3, so here, so like this so
57
00:05:45,540 --> 00:05:52,630
basically, we are getting some lines continuous
lines a step lines like this and it is going
58
00:05:52,630 --> 00:05:57,900
upto 1, so this should be the representation
so
59
00:05:57,900 --> 00:06:02,830
this is very important that it can go as close
as to the next value but, as soon as it is
60
00:06:02,830 --> 00:06:07,470
touching the next value there is a sudden
jump of this one so this kind of step function
61
00:06:07,470 --> 00:06:14,319
that we can see for a discrete random variable,
that representation of cumulative
62
00:06:14,319 --> 00:06:20,830
distribution function for a discrete random
variable.
63
00:06:20,830 --> 00:06:28,449
So in this figure you can see there are some
this steps are shown and there is, so this
64
00:06:28,449 --> 00:06:31,150
line
cannot touch this line and this dotted lines
65
00:06:31,150 --> 00:06:34,460
are nothing just the representative just to
show
66
00:06:34,460 --> 00:06:42,350
that, this point corresponds to exactly equals
to 2, so this CDF is represented only by
67
00:06:42,350 --> 00:06:49,340
some straight line parallel to the x axis
in this six different steps and finally, which
68
00:06:49,340 --> 00:06:55,490
is
touching to that 1, so this is the representation
69
00:06:55,490 --> 00:06:58,440
of the CDF for a discrete random variable.
70
00:06:58,440 --> 00:06:59,440
..
71
00:06:59,440 --> 00:07:04,699
Similarly, if you see the CDF for a continuous
random variable, where we know that,
72
00:07:04,699 --> 00:07:09,470
this particular function that probability
density function, that we discuss in last
73
00:07:09,470 --> 00:07:13,669
class
where it is having a continuous distribution
74
00:07:13,669 --> 00:07:19,629
over the entire support of the x, so if you
take that, now if you want to know the value
75
00:07:19,629 --> 00:07:21,569
of this CDF at a particular attribute equal
to
76
00:07:21,569 --> 00:07:29,990
x, that means this FX(x) that means, the total
value up to that particular point we have
77
00:07:29,990 --> 00:07:30,990
to
see.
78
00:07:30,990 --> 00:07:31,990
.
79
00:07:31,990 --> 00:07:40,250
.So, here if we see that, for a continuous
random variable this may be the distribution
80
00:07:40,250 --> 00:07:47,509
looks like this, now at a particular value
here if it is x, then this value if I want
81
00:07:47,509 --> 00:07:52,479
to draw
that CDF here now, so at this value this point
82
00:07:52,479 --> 00:07:59,449
represents the total area from the left
support upto that point x; so this value represents
83
00:07:59,449 --> 00:08:03,949
this one so this will be a monotonically
increasing function, which will go and touch
84
00:08:03,949 --> 00:08:11,160
up to the maximum value that it can take is
1 and it should starts from 0. So this value
85
00:08:11,160 --> 00:08:14,180
is nothing but, the integration, that means
the
86
00:08:14,180 --> 00:08:19,460
total area of this function upto that point
that means the integration of that function
87
00:08:19,460 --> 00:08:22,680
from
the left support to that particular point.
88
00:08:22,680 --> 00:08:23,680
.
89
00:08:23,680 --> 00:08:27,430
This is exactly mathematically represented
here, that you can see that F(x) equals to
90
00:08:27,430 --> 00:08:30,090
from
this minus infinity to that particular point
91
00:08:30,090 --> 00:08:36,470
integration of that, that is pdf probability
density function and this will give you the
92
00:08:36,470 --> 00:08:42,250
CDF of a continuous random variable.
One thing is important here, once again I
93
00:08:42,250 --> 00:08:45,180
am repeating the fact that this one will not
give
94
00:08:45,180 --> 00:08:51,709
you the direct probability, so this is the
representation of the probability density
95
00:08:51,709 --> 00:08:56,851
at a
particular point but, not the probability
96
00:08:56,851 --> 00:09:03,610
but, this function at that point it is representing
the probability of X being less than equals
97
00:09:03,610 --> 00:09:11,730
to small x that particular value.
So, if from this one again, mathematically
98
00:09:11,730 --> 00:09:14,420
we can if we just take the differentiation
of
99
00:09:14,420 --> 00:09:20,830
this CDF cumulative distribution function,
then we will end up to the probability density
100
00:09:20,830 --> 00:09:21,830
function.
101
00:09:21,830 --> 00:09:22,830
..
102
00:09:22,830 --> 00:09:29,769
Now we will see two examples of CDF the first
one is a straight forward one example of
103
00:09:29,769 --> 00:09:34,420
this exponential distribution, that time between
two successive events of railway
104
00:09:34,420 --> 00:09:41,209
accidents can be expressed as this probability
density function, if FX(x) equals to lambda
105
00:09:41,209 --> 00:09:48,670
e power minus lambda x, which is the support
is declared as from 0 to infinity.
106
00:09:48,670 --> 00:09:56,390
So, where lambda is a is called the parameter,
which is estimated to a 0.2, so these are
107
00:09:56,390 --> 00:10:01,550
the parameters it is estimated from we will
just see and in the successive class how to
108
00:10:01,550 --> 00:10:07,590
estimate the parameters of the distribution,
so with this parameter lambda equals to 0.2
109
00:10:07,590 --> 00:10:13,690
we have to find out what is the probability
of the time between two successive events
110
00:10:13,690 --> 00:10:15,180
of
rail accident exceeding 10 units; suppose
111
00:10:15,180 --> 00:10:17,630
that units suppose that units is not specified
or
112
00:10:17,630 --> 00:10:23,019
mentioned here so such 10 units, what is the
probability that two successive events of
113
00:10:23,019 --> 00:10:25,279
rail
accidents will exceed 10 units.
114
00:10:25,279 --> 00:10:30,019
So, first of all what we have to do from this
probability density function, we can directly
115
00:10:30,019 --> 00:10:35,501
get it from this integration, now if we want
to know that, what is this probability, what
116
00:10:35,501 --> 00:10:37,470
is
the cumulative density function of this one,
117
00:10:37,470 --> 00:10:43,269
we will know that we have to integrate, this
probability density function from this minus
118
00:10:43,269 --> 00:10:51,360
infinity to x, so here it is from minus
infinity to 0 it is 0. So, from 0 to x we
119
00:10:51,360 --> 00:10:54,639
will integrate it so to get that so if we
do this
120
00:10:54,639 --> 00:10:59,510
simple integration we will get this 1 minus
e power minus lambda x which is the
121
00:10:59,510 --> 00:11:07,200
cumulative distribution function for this
probability density function, now if we put
122
00:11:07,200 --> 00:11:08,200
any
123
00:11:08,200 --> 00:11:14,750
.particular value of x here, that means we
are directly getting the probability of the
124
00:11:14,750 --> 00:11:17,770
random variable X being less than equal to
small x.
125
00:11:17,770 --> 00:11:25,579
Now, our question is find the probability
of the time between two successive events
126
00:11:25,579 --> 00:11:28,170
of
the rail accident exceeding 10 units; now
127
00:11:28,170 --> 00:11:30,529
it is given that exceeding 10 units that means
if
128
00:11:30,529 --> 00:11:38,200
I put here 10, that means I will get the probability
of non-exceeding 10 units; so if I want
129
00:11:38,200 --> 00:11:42,850
to calculate the exceeding than the total
probability we know 1, so you have to deduct
130
00:11:42,850 --> 00:11:48,850
that particular value from the total probability
to get what is the probability of exceeding
131
00:11:48,850 --> 00:11:50,610
10 units.
.
132
00:11:50,610 --> 00:11:57,660
Exactly the same thing has done here, that
is probability of the exceeding 10 units is
133
00:11:57,660 --> 00:12:02,250
equals to probability X greater than equal
to 10, so X greater than equal to 10 is nothing
134
00:12:02,250 --> 00:12:07,670
but, 1 minus X less than equals to 10 and
probability X less than equals to 10 is nothing
135
00:12:07,670 --> 00:12:15,370
but, FX(10). So this FX(10) we are directly
getting from this our probability density
136
00:12:15,370 --> 00:12:23,150
function, this will be minus means after taking
out this bracket this will obviously
137
00:12:23,150 --> 00:12:29,010
become plus, so this is eventually coming
to this after putting this value of this lambda
138
00:12:29,010 --> 00:12:35,320
this will eventually come e power minus 10
multiplied by 0.2, that is a value of lambda.
139
00:12:35,320 --> 00:12:40,079
So this will be minus if you are putting this
parenthesis if we take out this parenthesis
140
00:12:40,079 --> 00:12:48,050
obviously this will be the plus, so which
is shown it here so the probability is 0.135
141
00:12:48,050 --> 00:12:54,589
so
13.5 percent it is probability is that, the
142
00:12:54,589 --> 00:13:01,339
railway accident exceeding 10 units for a
particular for that case is 0.135.
143
00:13:01,339 --> 00:13:07,820
.Now if you want to see how it is distributed,
this show where that probability density
144
00:13:07,820 --> 00:13:14,220
function is shown and this probability that
cumulative density function is shown, now
145
00:13:14,220 --> 00:13:16,839
if
we want to see how it is varying over this
146
00:13:16,839 --> 00:13:19,070
X, then this is the graph.
.
147
00:13:19,070 --> 00:13:25,250
So, you see here so at x equals to 0 the value
of these probability density function is 0.2
148
00:13:25,250 --> 00:13:30,810
obviously which is the value of lambda and
gradually as it is goes to infinity, you see
149
00:13:30,810 --> 00:13:36,050
it
is becoming it is coming down and being asymptotic
150
00:13:36,050 --> 00:13:42,990
to the value 0, now if I want to get
that CDF that means this CDF is this one;
151
00:13:42,990 --> 00:13:48,480
so that CDF will be nothing but, the
integration of this area, so gradually the
152
00:13:48,480 --> 00:13:51,250
that integration will increase and its starts
from 0
153
00:13:51,250 --> 00:14:00,040
and will be asymptotic to the value 1 and
it will be 1 at X equals to infinity, so this
154
00:14:00,040 --> 00:14:03,449
curve
that you see, this is your CDF and this one
155
00:14:03,449 --> 00:14:07,230
this is your PDF for the exponential.
This is that we call an example of the exponential
156
00:14:07,230 --> 00:14:14,440
distribution that is showed you in the
last class, so these are how this PDF and
157
00:14:14,440 --> 00:14:19,139
CDF for this case looks like.
158
00:14:19,139 --> 00:14:20,139
..
159
00:14:20,139 --> 00:14:25,170
Another interesting problem is taken here;
this is on thus daily rainfall at a rain gauge
160
00:14:25,170 --> 00:14:29,350
station. Assume that, the daily rainfall at
a rain gauge station follow the following
161
00:14:29,350 --> 00:14:35,390
distribution now the daily rainfall if you
take from a from a particular station than
162
00:14:35,390 --> 00:14:39,459
what
we see that most of the time there will be
163
00:14:39,459 --> 00:14:48,000
many 0 values and for some non-0 values it
will come so, what is done is first of all
164
00:14:48,000 --> 00:14:51,310
you just collect that reading.
.
165
00:14:51,310 --> 00:14:56,860
And in that reading, we gets some non-zero
values, so 1 2.1 like this, we will get and
166
00:14:56,860 --> 00:15:03,860
then we get many 0s again some value 0.1 and
so, what is meant is there this daily
167
00:15:03,860 --> 00:15:10,190
.rainfall value that kind of value is the
series is having many 0 values, so if you
168
00:15:10,190 --> 00:15:13,920
want to
calculate, so those probability distribution
169
00:15:13,920 --> 00:15:19,350
function what we generally do is that we
generally exclude these 0 value first and
170
00:15:19,350 --> 00:15:22,390
we see what is the first the probability of
getting
171
00:15:22,390 --> 00:15:27,500
the value 0 and then we feed the another probability
distribution for this non 0 value, this
172
00:15:27,500 --> 00:15:37,370
is a this is one example of the mix distribution
where the at 0 there is some probability is
173
00:15:37,370 --> 00:15:45,430
concentrated, so if you see this one so at
0 if it is 0 at 0 some probability is concentrated
174
00:15:45,430 --> 00:15:49,950
here and for this non 0 values it may have
some distribution.
175
00:15:49,950 --> 00:15:57,319
Obviously, the more the that the magnitude
of rainfall depth obvious the density will
176
00:15:57,319 --> 00:16:04,459
come down, so this is one example of, so here
some probability is concentrated here and
177
00:16:04,459 --> 00:16:11,589
it is coming up. Now, you can see that at
0 if some value is concentrated here so the
178
00:16:11,589 --> 00:16:16,610
total
area under this graph obviously would be 1
179
00:16:16,610 --> 00:16:21,610
minus what is this value is concentrated as
at
180
00:16:21,610 --> 00:16:27,360
0.
So this example, we have just we are considered
181
00:16:27,360 --> 00:16:31,089
here and we have just taken one
example that this f(x) equals to here you
182
00:16:31,089 --> 00:16:35,399
see that 40 percent probability is concentrated
at
183
00:16:35,399 --> 00:16:48,540
x equals to 0, now that for x greater than
0 the CDF is c e power minus x by 4 and
184
00:16:48,540 --> 00:16:55,540
elsewhere it is 0, that is from minus infinity
to less than 0 this value is 0; so this the
185
00:16:55,540 --> 00:16:59,319
complete description of the CDF and this is
the example of the mix distribution, where
186
00:16:59,319 --> 00:17:04,400
there is a probability mass here concentrated
at 0 and a continuous distribution for
187
00:17:04,400 --> 00:17:10,200
greater than 0 values.
So first, we have to find out c and then you
188
00:17:10,200 --> 00:17:13,000
have to answer that what is the probability
of
189
00:17:13,000 --> 00:17:20,380
daily rainfall exceeding 10 centimeter, so
this x is having the unit of centimeter here,
190
00:17:20,380 --> 00:17:23,670
so
first what we have to do so find out c, so
191
00:17:23,670 --> 00:17:26,500
c is a constant here, so if we have to find
out the
192
00:17:26,500 --> 00:17:31,631
proper value of c that means, the total area
under this curve should be equals to 1 that
193
00:17:31,631 --> 00:17:34,190
thing we have do we have to satisfy.
194
00:17:34,190 --> 00:17:35,190
..
195
00:17:35,190 --> 00:17:39,640
So, it is done from the minus infinity plus
infinity, from the second axiom of probability
196
00:17:39,640 --> 00:17:44,330
we know that this total so the condition of
this PDF that this total integration should
197
00:17:44,330 --> 00:17:47,360
be
equals to 1, now I know that, from minus infinity
198
00:17:47,360 --> 00:17:54,500
to less than 0 the value obviously 0, at
0 the concentrated probability mass is 0.4
199
00:17:54,500 --> 00:17:58,080
and from 0 to infinity this that function
we
200
00:17:58,080 --> 00:18:02,900
know and this should be equals to 0. So if
you just if you just do this integration and
201
00:18:02,900 --> 00:18:08,250
solve this equation for this one unknown here,
so we will after doing some step you will
202
00:18:08,250 --> 00:18:16,710
get that c equals to 0.15, so the value of
c we got so, the complete description of this
203
00:18:16,710 --> 00:18:24,070
probability distribution will be FX(x) equals
to 0.4 for x equals to 0 is equals to 0.15
204
00:18:24,070 --> 00:18:29,670
e
power minus x by 4 for x greater than 0 and
205
00:18:29,670 --> 00:18:31,549
0 elsewhere.
206
00:18:31,549 --> 00:18:32,549
..
207
00:18:32,549 --> 00:18:37,840
So here it is written that for x equals to
0 that, now if I want to get the CDF that
208
00:18:37,840 --> 00:18:43,970
cumulative distribution function for x equals
to 0, this FX(x) that is X exactly equals
209
00:18:43,970 --> 00:18:47,290
to 0
equals to 0.4, now for this range that is
210
00:18:47,290 --> 00:18:50,600
x greater than 0, FX(x) which is nothing but,
the
211
00:18:50,600 --> 00:18:56,940
probability X less than equals to 0 it should
be equals to that concentrated mass at X
212
00:18:56,940 --> 00:19:03,900
equals to 0.4 plus the integration from 0
to that particular point x of this function.
213
00:19:03,900 --> 00:19:10,050
So if you do this few step we will get the
distribution function as that for this x greater
214
00:19:10,050 --> 00:19:18,180
than 0 in this zone is 1 minus point 6 e power
minus x, now the once you get this CDF
215
00:19:18,180 --> 00:19:26,170
then the rest of whatever the answers we are
looking for that is for example, here we are
216
00:19:26,170 --> 00:19:31,680
looking the answer for this x greater than
10 centimeter, so this is the final representation
217
00:19:31,680 --> 00:19:42,280
of this CDF that is F(x) equals to 0.4 for
x equals to 0 1 minus 0.6 e power 2.5 x is
218
00:19:42,280 --> 00:19:52,290
greater than 0 and 0 elsewhere, so this will
be 0.25 x so instead of minus x this will
219
00:19:52,290 --> 00:19:55,930
be
0.25 x is the correction here needed.
220
00:19:55,930 --> 00:20:02,250
So this, with this final form of the CDF that
we got now we will put that x is equals to
221
00:20:02,250 --> 00:20:06,180
10
to get that the probability here it is asking
222
00:20:06,180 --> 00:20:08,420
what is probability that rainfall again here,
it is
223
00:20:08,420 --> 00:20:11,950
shown that exceeding 10 centimeters so we
will first calculate the probability from
224
00:20:11,950 --> 00:20:16,810
this
CDF what is that probability of this daily
225
00:20:16,810 --> 00:20:19,320
rainfall less than equals to 10 centimeter,
than
226
00:20:19,320 --> 00:20:22,400
from total probability one if you just deduct
we will get that answer.
227
00:20:22,400 --> 00:20:23,400
..
228
00:20:23,400 --> 00:20:26,770
So, from this 1 if you just put that X less
than equals to 10, that is F(x) equals to
229
00:20:26,770 --> 00:20:30,860
10
which is nothing but, 1 minus this equation
230
00:20:30,860 --> 00:20:35,830
0.9507 we will get from this 1, so the
probability of daily rainfall exceeding 10
231
00:20:35,830 --> 00:20:38,120
centimeters, obviously will be greater than
1,
232
00:20:38,120 --> 00:20:45,300
so 1 minus probability X less than 10 is equals
to 0.0493 is this probability.
233
00:20:45,300 --> 00:20:54,370
So, up to this what we have seen is that,
we have just seen that PMF for which is for
234
00:20:54,370 --> 00:20:56,670
the
discrete random variable, that is probability
235
00:20:56,670 --> 00:21:00,560
mass function than we have seen that
probability density function, which is for
236
00:21:00,560 --> 00:21:06,060
the continuous random variable and last we
discuss about this cumulative distribution
237
00:21:06,060 --> 00:21:10,430
function for both discrete and continuous
random variables.
238
00:21:10,430 --> 00:21:11,430
..
239
00:21:11,430 --> 00:21:15,980
So, some notes here on this the probability
mass function PMF is the probability
240
00:21:15,980 --> 00:21:22,690
distribution of a discrete random variable,
the probability density function PDF is the
241
00:21:22,690 --> 00:21:27,640
probability distribution of a continuous random
variable, Cumulative distribution
242
00:21:27,640 --> 00:21:35,970
function is the non-exceedance probability
of a random variable and its range is between
243
00:21:35,970 --> 00:21:40,540
0 and 1.
So now the main descriptors, now sometimes
244
00:21:40,540 --> 00:21:49,580
if we do not get any a particular close form
of that close form of this probability density
245
00:21:49,580 --> 00:21:54,800
function or probability mass function of a
variable, then from the observed data there
246
00:21:54,800 --> 00:21:59,720
are some descriptors of this random variable
that we will see, so that approximately with
247
00:21:59,720 --> 00:22:05,310
this descriptors a random variable the nature
of random variable can be known to us so this
248
00:22:05,310 --> 00:22:11,790
is our next focus to know that, what are
the different main descriptors of a random
249
00:22:11,790 --> 00:22:12,890
variable.
250
00:22:12,890 --> 00:22:13,890
..
251
00:22:13,890 --> 00:22:17,620
So what we will see that, what is meant by
this descriptors of random variables that
252
00:22:17,620 --> 00:22:20,040
we
will see first, then we will know what is
253
00:22:20,040 --> 00:22:25,130
mean or the expected value, then variance
and
254
00:22:25,130 --> 00:22:30,650
standard deviation., then skewness, then one
where is there for that called kurtosis, and
255
00:22:30,650 --> 00:22:35,390
then we will also see the analogies with the
properties of the area, that is area under
256
00:22:35,390 --> 00:22:37,670
the
PDF probability density function in terms
257
00:22:37,670 --> 00:22:44,130
of incase of the continuous random variable,
that we will see just to relate that how this
258
00:22:44,130 --> 00:22:48,990
thing can be call as a moment that we will
see.
259
00:22:48,990 --> 00:22:49,990
.
260
00:22:49,990 --> 00:22:58,050
.So, probabilistic description of random variable,
if you want to see take some random
261
00:22:58,050 --> 00:23:04,160
observation of a random variables, some random
sample data if we take then we will we
262
00:23:04,160 --> 00:23:09,660
can see these thing, that is the probabilistic
characteristics of a random variable can be
263
00:23:09,660 --> 00:23:16,090
described completely, if the form of the distribution
function pdf or PMF is known
264
00:23:16,090 --> 00:23:22,120
obviously what we discuss so far. And the
associated parameters are specified for
265
00:23:22,120 --> 00:23:27,970
example, that a for the exponential distribution,
that lambda e power minus lambda x so
266
00:23:27,970 --> 00:23:33,060
that lambda is the parameter, so if you know
that what is that density function pdf that
267
00:23:33,060 --> 00:23:36,180
is
it is exponential form and if you know the
268
00:23:36,180 --> 00:23:40,910
value of lambda then it is completely known
to us, that the total description is given
269
00:23:40,910 --> 00:23:44,950
to us; so this is what is its meant by this
point.
270
00:23:44,950 --> 00:23:50,940
Now, in the real life scenarios where the
nature of the distribution function of the
271
00:23:50,940 --> 00:23:54,910
random
variable is not known, an approximate description
272
00:23:54,910 --> 00:24:00,290
becomes necessary, the approximate
description of the probabilistic characteristics
273
00:24:00,290 --> 00:24:04,730
of the random variable can be given in
terms of the main descriptors of the random
274
00:24:04,730 --> 00:24:09,930
variables, so this is why so if we do not
know the exactly the close form of this equation
275
00:24:09,930 --> 00:24:14,650
this is why this main descriptors
becomes very important, just to know the nature
276
00:24:14,650 --> 00:24:18,460
of the distribution of that particular
random variable.
277
00:24:18,460 --> 00:24:19,460
.
278
00:24:19,460 --> 00:24:27,800
So, there are the first four description,
that are very important and obviously there
279
00:24:27,800 --> 00:24:31,390
are
higher side but, first four is very is important
280
00:24:31,390 --> 00:24:35,300
and mainly is responsible for that nature
of
281
00:24:35,300 --> 00:24:39,730
this distribution the first one is the measure
of central tendency, these measure of central
282
00:24:39,730 --> 00:24:46,320
.tendency means; where the centre is, now
the centre is again a subjective word, in
283
00:24:46,320 --> 00:24:49,670
what
sense we are looking for the centre, so a
284
00:24:49,670 --> 00:24:56,180
given a random variable given the distribution,
where its centre is, that can be there are
285
00:24:56,180 --> 00:24:58,181
three different ways, that you can say that,
what
286
00:24:58,181 --> 00:25:03,220
is the it is central tendency. Second thing
is the measure of dispersion, how so about,
287
00:25:03,220 --> 00:25:06,800
that
central point, how the distribution is disperses,
288
00:25:06,800 --> 00:25:10,120
how it is spread around that particular
central value.
289
00:25:10,120 --> 00:25:14,500
Then measure of skewness, whether there is
any skewed, whether it is skewed, whether
290
00:25:14,500 --> 00:25:16,890
it
is symmetric, so this is a measure of symmetry,
291
00:25:16,890 --> 00:25:21,820
you can say that whether it is
symmetrical or has some skewness to either
292
00:25:21,820 --> 00:25:25,400
to the left or right of this central point
and
293
00:25:25,400 --> 00:25:31,400
measure of peakedness, so whether the of peak
of that, distribution is very high or low
294
00:25:31,400 --> 00:25:37,060
like that so you will go one after another,
we will start with this measure of central
295
00:25:37,060 --> 00:25:40,750
tendency.
.
296
00:25:40,750 --> 00:25:46,420
The central value of a random variable, so
within the range of the possible values of
297
00:25:46,420 --> 00:25:50,090
a
random variable the different values are associated
298
00:25:50,090 --> 00:26:00,210
with different probability density, so
the central value cannot generally be expressed
299
00:26:00,210 --> 00:26:06,030
in terms of the midpoint of the possible
range.
300
00:26:06,030 --> 00:26:07,030
..
301
00:26:07,030 --> 00:26:16,200
So just for a small thing if we just say,
that if you just take that particular example
302
00:26:16,200 --> 00:26:19,900
of the
discrete random variable first for the dice,
303
00:26:19,900 --> 00:26:27,180
we have seen that 1 2 3 4 5 and 6 and if you
see the all these things are equation probable,
304
00:26:27,180 --> 00:26:32,090
then we can say the central is somewhere
around 3.5, the outcome central tendency is
305
00:26:32,090 --> 00:26:36,170
3.5.
Now, think in the situation, where this out
306
00:26:36,170 --> 00:26:41,530
comes or not equally likely, so the mass is
concentrated 1 is from for here, for 2 it
307
00:26:41,530 --> 00:26:46,370
is here, 3 it is here, 4 it is here; so if
it is like this
308
00:26:46,370 --> 00:26:55,270
then it may not be that midpoint of this outcome
may not be the central tendency of this
309
00:26:55,270 --> 00:26:59,520
1; obviously when I am putting this dot, obviously
I have kept in mind, that the
310
00:26:59,520 --> 00:27:04,980
summation of all these should be close to
one those axioms what I mean is that obviously
311
00:27:04,980 --> 00:27:10,160
after satisfying all the axiom, that is needed
before I can declare that this is a valid
312
00:27:10,160 --> 00:27:16,240
PMF,
so if the what here it is understood there
313
00:27:16,240 --> 00:27:22,740
is central tendency is that, the central tendency
need not be always close to the midpoint of
314
00:27:22,740 --> 00:27:29,270
the observation that we see, it depends on
this how much density is associated to each
315
00:27:29,270 --> 00:27:35,930
outcome of that particular random variable,
so this is what is meant for this central
316
00:27:35,930 --> 00:27:39,570
tendency.
And there are three different
317
00:27:39,570 --> 00:27:44,110
ways how you can express that the first the
central
318
00:27:44,110 --> 00:27:51,070
tendency, central value of the random variable
can be expressed in terms of three
319
00:27:51,070 --> 00:27:55,101
quantities.
The first is the mean or expected value, second
320
00:27:55,101 --> 00:27:57,540
is mode, and third one is median.
321
00:27:57,540 --> 00:27:58,540
..
322
00:27:58,540 --> 00:28:04,500
So we will see the mean first which is expected
value as well of the random variable, the
323
00:28:04,500 --> 00:28:11,390
mean value or the expected value of a random
variable is the weighted average of the
324
00:28:11,390 --> 00:28:18,900
different values of the random variable based
on their associated probabilities, for a
325
00:28:18,900 --> 00:28:29,040
discrete random variable X with PMF, PX(xi)
the expected value is E(x) is equals to xi
326
00:28:29,040 --> 00:28:37,980
multiplied by that associated probability.
So instead of just simply calling this the
327
00:28:37,980 --> 00:28:44,450
average of this this outcome where what we
are doing is that we are just taking the
328
00:28:44,450 --> 00:28:51,580
weighted sum, the weighted average the average
is taken by the of the associated
329
00:28:51,580 --> 00:28:57,490
probability so this is the expected value.
Now, instead of multiplying the same values
330
00:28:57,490 --> 00:29:02,400
here, that is the same values are 1 by 6 for
each and every outcome, I am multiplying the
331
00:29:02,400 --> 00:29:09,300
each out come by their weighted
probability, so this may be something but,
332
00:29:09,300 --> 00:29:14,360
the when I am putting the weight for these
two, this is obviously very high, so obviously,
333
00:29:14,360 --> 00:29:22,210
this the central tendency will pull that
central part towards this particular observation,
334
00:29:22,210 --> 00:29:27,420
because here the probability the
concentration the mass is very high compared
335
00:29:27,420 --> 00:29:32,800
to the other outcome.
Now, taking this same thing, same concept
336
00:29:32,800 --> 00:29:35,770
for this continuous random variable, that
for
337
00:29:35,770 --> 00:29:43,540
that continuous random variable X with pdf,
FX(x) the expected value is that for the full
338
00:29:43,540 --> 00:29:49,950
range of this random for the full support
of the random variable X that multiplied by
339
00:29:49,950 --> 00:29:55,300
x
gives you that particular expected value of
340
00:29:55,300 --> 00:29:58,350
that random variable X, we will just
341
00:29:58,350 --> 00:30:02,540
.discussing a minute, how this is related
to some kind of moment we will just come in
342
00:30:02,540 --> 00:30:11,300
that point a little later.
So, as we have seen that, when we are talking
343
00:30:11,300 --> 00:30:18,220
about the expected value of that particular
random variable, then we are multiplying that
344
00:30:18,220 --> 00:30:25,500
variable with that particular value only
with that x or here the outcome the x; now
345
00:30:25,500 --> 00:30:29,200
the expected value instead of you can get
the
346
00:30:29,200 --> 00:30:33,350
expected value of any function of the random
variable, the function of the random
347
00:30:33,350 --> 00:30:38,860
variable will be discussed in subsequent classes
but, here instead of only it is what we’re
348
00:30:38,860 --> 00:30:46,010
trying to say here is that, so that expected
value can be obtained for any other functions
349
00:30:46,010 --> 00:30:52,010
of that random variable as well simply by
replacing this x by that function and this
350
00:30:52,010 --> 00:30:54,880
here x
by that particular function.
351
00:30:54,880 --> 00:30:55,880
.
352
00:30:55,880 --> 00:31:00,450
So here, that is the g(X) here the g(X) is
the function the random variable X then when
353
00:31:00,450 --> 00:31:03,650
X
is discrete obviously we have to go for this
354
00:31:03,650 --> 00:31:09,090
summation multiplying that g(X) with that
individual masses, the probability mass for
355
00:31:09,090 --> 00:31:13,350
that particular outcome and when X is
continuous then we are taking the integration
356
00:31:13,350 --> 00:31:18,230
instead of multiplying only by E(X) by
multiplying by this function that is g(X)
357
00:31:18,230 --> 00:31:22,910
into FX(x) dx, so we get that expected value
of
358
00:31:22,910 --> 00:31:30,870
that random variable of the function of the
random variable g(X).
359
00:31:30,870 --> 00:31:31,870
..
360
00:31:31,870 --> 00:31:38,830
Now, there are so, if we know that probability
density function then we can calculate that
361
00:31:38,830 --> 00:31:44,120
mean, now if we have some observations then
from the observations if we want to get
362
00:31:44,120 --> 00:31:50,980
what is that sample estimate, now there are
different criteria before I declare that,
363
00:31:50,980 --> 00:31:54,360
this is
a this is a particular estimated of that particular
364
00:31:54,360 --> 00:32:02,650
variable, that will be discussed later those
are known as that consistency unbiasedness.
365
00:32:02,650 --> 00:32:05,080
So this thing after satisfying those things
if
366
00:32:05,080 --> 00:32:10,600
we have the different observation for a particular
random variable, then the mean of that
367
00:32:10,600 --> 00:32:15,110
particular random variable can be expressed
as this where this n is the total number of
368
00:32:15,110 --> 00:32:20,330
observation is taken for the random variable,
so sum it up and then divided by this total
369
00:32:20,330 --> 00:32:32,940
number of observation, what we get is the
mean of that observation of that sample.
370
00:32:32,940 --> 00:32:33,940
..
371
00:32:33,940 --> 00:32:39,010
Other two measure of central tendency is mode
and median of the random variable, the
372
00:32:39,010 --> 00:32:45,120
mode is says that the mode is the most probable
value of the random variable, so out of
373
00:32:45,120 --> 00:32:51,370
the different outcome in case of the discrete
and for this overall range where the, which
374
00:32:51,370 --> 00:32:59,750
one is the most probable value for this one
and this is the value of the random variable
375
00:32:59,750 --> 00:33:04,160
with the highest probability density. Obviously
when we are calling this probability
376
00:33:04,160 --> 00:33:09,200
density this is for the continuous, so if
you say this one show, we see that here at
377
00:33:09,200 --> 00:33:13,030
2 that
the probability that mass is concentration
378
00:33:13,030 --> 00:33:16,779
maximum at 2, so obvious the mode here is
2.
379
00:33:16,779 --> 00:33:17,779
.
380
00:33:17,779 --> 00:33:26,880
.Now and for if I just say that this is your
some probability distribution function then,
381
00:33:26,880 --> 00:33:36,540
obviously at this point where you see the
density is maximum this your mode, again if
382
00:33:36,540 --> 00:33:41,000
you see that, there are some valley and there
are some peak so there may be some other
383
00:33:41,000 --> 00:33:46,280
things so these are generally known as the
bimodal or multi model, so this is having
384
00:33:46,280 --> 00:33:51,470
more than 1 mode so this is the secondary
mode here, where there is a secondary peak.
385
00:33:51,470 --> 00:33:56,090
For example the standard normal distribution
if you see this is unimodal and the mode is
386
00:33:56,090 --> 00:34:04,310
here at 0 and we will see that, what this
mode that, this mean mode and median is same
387
00:34:04,310 --> 00:34:08,149
for this kind of symmetric distribution, that
is a normal distribution this is same, that
388
00:34:08,149 --> 00:34:11,010
we
will see later; what we are mean here is that,
389
00:34:11,010 --> 00:34:18,899
for the discrete random variable the where
the probability mass is maximum that is your
390
00:34:18,899 --> 00:34:23,869
mode and where the probability density is
maximum, that is your mode for that random
391
00:34:23,869 --> 00:34:30,840
variable.
And then it is median, the median is the value
392
00:34:30,840 --> 00:34:38,260
of the random variable at which the values
on both side of it are equally probable if
393
00:34:38,260 --> 00:34:42,649
the Xm is the median of a random variable
X
394
00:34:42,649 --> 00:34:51,580
then FX(xm) is equals to 0.5.
Now, this is one thing that, where should,
395
00:34:51,580 --> 00:34:55,340
when you will discuss this one first in case
of
396
00:34:55,340 --> 00:35:00,820
continuous random variable, so I am just started
going on integrating from its left support
397
00:35:00,820 --> 00:35:05,730
if it is minus infinity, from the minus infinity
I am going on integrating and I will stop
398
00:35:05,730 --> 00:35:11,050
at
some point where that total area covered is
399
00:35:11,050 --> 00:35:18,940
equals to 0.5, so that means this is the point
where the probability less than that particular
400
00:35:18,940 --> 00:35:25,750
value is 0.5 greater than that particular
value is 0.5, so this value is your median.
401
00:35:25,750 --> 00:35:34,770
Similarly, you will go on add up this probabilities
and where it will touch that 0.5 for that
402
00:35:34,770 --> 00:35:40,980
corresponding CDF, the that particular value
so if we will just add this plus this so where
403
00:35:40,980 --> 00:35:46,730
it will touch this 0.5, that particular value
will be the median for this discrete random
404
00:35:46,730 --> 00:35:53,860
variable. So that median means that less than
that particular value and higher and about
405
00:35:53,860 --> 00:35:59,000
that, the higher than the particular value
both are equally probable as a total, so left
406
00:35:59,000 --> 00:36:01,190
hand
side total probability is 0.5, right hand
407
00:36:01,190 --> 00:36:04,620
side total probability is 0.5, that midpoint
is your
408
00:36:04,620 --> 00:36:05,710
median.
409
00:36:05,710 --> 00:36:06,710
..
410
00:36:06,710 --> 00:36:12,080
Now, what we are discussing in case of this
normal distribution, so when we say that this
411
00:36:12,080 --> 00:36:18,460
is a normal distribution having a completely
symmetric distribution and it is having some
412
00:36:18,460 --> 00:36:24,380
a mean here, so in this case you see that,
exactly at that particular point where this
413
00:36:24,380 --> 00:36:30,170
touching the peak being the nature of the
symmetry this is covering the total area 0.5,
414
00:36:30,170 --> 00:36:37,880
so
this is your mode. Now if you take that integration
415
00:36:37,880 --> 00:36:45,130
take that mean, then you will see that
exactly this point is becoming your mean and
416
00:36:45,130 --> 00:36:53,090
being this is the highest density point, this
the same point is your mode; so for the normal
417
00:36:53,090 --> 00:36:59,820
distribution, the mean mode and median
this is that where the 50 percent probability
418
00:36:59,820 --> 00:37:06,890
is covered that is called the median so for
this normal distribution this mean mode and
419
00:37:06,890 --> 00:37:12,559
the median are same point for this
symmetric distribution that is a normal distribution.
420
00:37:12,559 --> 00:37:13,559
..
421
00:37:13,559 --> 00:37:23,800
So, this is meant that the mean mode and median
are each a measure of central value of
422
00:37:23,800 --> 00:37:29,400
the random variable the mean mode and median
of a random variable X are
423
00:37:29,400 --> 00:37:36,430
conventionally denoted by this x bar, which
is mean this x tilde, which is mode and xm
424
00:37:36,430 --> 00:37:40,080
is
the median of the particular random variable
425
00:37:40,080 --> 00:37:43,920
X.
If the pdf probability density function of
426
00:37:43,920 --> 00:37:50,610
a random variable is symmetric and unimodal,
obviously which is the normal distribution
427
00:37:50,610 --> 00:37:56,280
it is symmetric, it is unimodal that is having
only 1 mode, then the mean mode and median
428
00:37:56,280 --> 00:38:01,480
coincide, just what we discuss just now.
.
429
00:38:01,480 --> 00:38:08,970
.Now, the second descriptor of this random
variable is the dispersion, so dispersion
430
00:38:08,970 --> 00:38:11,210
what
is meant is a spread over this mean, that
431
00:38:11,210 --> 00:38:17,651
is I know once I identify where it is tending
towards the central I want to know, how it
432
00:38:17,651 --> 00:38:23,340
is distributed about that mean so the
dispersion of a random variable corresponds
433
00:38:23,340 --> 00:38:31,400
to how closely the values of the variate are
clustered or how widely it is spread around
434
00:38:31,400 --> 00:38:34,490
the central value, in the following figure
if
435
00:38:34,490 --> 00:38:40,590
you say that, this is the X1 and X2 have the
same mean but, their dispersion about the
436
00:38:40,590 --> 00:38:46,730
mean is different, so this one obviously this
one which is for this X1, that is f(x1) this
437
00:38:46,730 --> 00:38:50,070
is
the less dispersed compared to the other one
438
00:38:50,070 --> 00:38:51,580
which is that f(x2).
.
439
00:38:51,580 --> 00:38:59,040
Now, how to measure this one that is the,
this measure of this dispersion, there are
440
00:38:59,040 --> 00:39:05,060
generally we use 3 different measure the,
first one is variance which is denoted as
441
00:39:05,060 --> 00:39:08,651
sigma
x square, standard deviation sigma x, and
442
00:39:08,651 --> 00:39:11,119
coefficient of variation CV.
443
00:39:11,119 --> 00:39:12,119
..
444
00:39:12,119 --> 00:39:17,050
Now the variance, if we see that variance,
which is generally also denoted that the Var
445
00:39:17,050 --> 00:39:21,800
of
X of this random variable X is a measure of
446
00:39:21,800 --> 00:39:28,960
the dispersion of the variate taking the mean
as a central value, now when we are measuring,
447
00:39:28,960 --> 00:39:37,220
that spread around the mean out of the 3
measures of central tendency, if we pick up
448
00:39:37,220 --> 00:39:39,700
the mean and then if you calculate how it
is
449
00:39:39,700 --> 00:39:44,260
spread around that mean, then that is known
as the variance.
450
00:39:44,260 --> 00:39:53,540
For a discrete random variable x with PMF,
px(xi) the variance of the x is this variance
451
00:39:53,540 --> 00:39:57,600
is
equals to summation of xi minus mu x, that
452
00:39:57,600 --> 00:40:03,960
square multiplied by px(Xi) so this the
measure of this variance, I will just explain
453
00:40:03,960 --> 00:40:07,230
how this things are meant with the context
of
454
00:40:07,230 --> 00:40:12,590
this area with the context of this moment,
how it comes; now this mu x as I told mu x
455
00:40:12,590 --> 00:40:16,030
is
nothing but, the expected value of that particular
456
00:40:16,030 --> 00:40:17,030
x.
457
00:40:17,030 --> 00:40:18,030
..
458
00:40:18,030 --> 00:40:26,540
Now, so let us first complete this one before
I go for this pictorial representation, so
459
00:40:26,540 --> 00:40:31,280
this
is for the discrete random variable and this
460
00:40:31,280 --> 00:40:37,250
one is for the continuous random variable,
now if the continuous random variable X having
461
00:40:37,250 --> 00:40:45,320
the pdf of this fx(x) than the variance of
x is the we x minus mu x which is the expected
462
00:40:45,320 --> 00:40:51,400
value that square multiplied by f(x) dx
this is, this gives you that, that variance
463
00:40:51,400 --> 00:40:55,150
of X, now there are some if you just expand
it a
464
00:40:55,150 --> 00:41:00,270
little, then we can come out to this one,
just to expand this particular value and take
465
00:41:00,270 --> 00:41:02,610
this
one as this that is, that it will come that
466
00:41:02,610 --> 00:41:08,340
x square multiplied by f(x) dx.
Now, you just recall that that expected value
467
00:41:08,340 --> 00:41:12,590
of the function, now here the function is
x
468
00:41:12,590 --> 00:41:16,920
square, multiplied by f(x) dx, which is nothing
but, the expected value of x square, this
469
00:41:16,920 --> 00:41:19,110
is
the function of the random variable, which
470
00:41:19,110 --> 00:41:24,500
is the x square minus 2 mu x; now this
constant when we are taking this one, so this
471
00:41:24,500 --> 00:41:29,970
is already known this is constant so the 2
mu x can be taken out, 2 mu x multiplied by
472
00:41:29,970 --> 00:41:37,310
x f(x) dx, x f(x) dx is nothing but, that
expected value E(x) plus this again the constant,
473
00:41:37,310 --> 00:41:42,710
so thus plus mu x square. Now after just
doing this, so then again this e x is the
474
00:41:42,710 --> 00:41:47,330
mu x, so minus 2 mu x square plus mu x, so
this is
475
00:41:47,330 --> 00:41:54,720
e x square minus mu x square so variance can
also be represented like this.
476
00:41:54,720 --> 00:41:55,720
..
477
00:41:55,720 --> 00:42:05,870
So, this standard deviation is the another
measure of this measure of dispersion where
478
00:42:05,870 --> 00:42:11,860
this standard deviation sigma x is expressed
as the positive square root of the variance
479
00:42:11,860 --> 00:42:17,400
f(x) which is the square root of this variance
of X.
480
00:42:17,400 --> 00:42:21,490
And the coefficient of variation CVx is a
dimensionless measure of dispersion it is
481
00:42:21,490 --> 00:42:24,940
a
rescue of the standard deviation to the mean,
482
00:42:24,940 --> 00:42:33,270
so this CVx is equals to sigma x by mu x.
Now from this for all other higher thing that
483
00:42:33,270 --> 00:42:41,620
is the first the how it is becoming the higher
thing that, I will just explain it here that
484
00:42:41,620 --> 00:42:49,270
is what you are doing when we are measuring
this that spread that is the spread around
485
00:42:49,270 --> 00:42:54,280
this mean what we are trying to take is that,
first what we are doing we’re taking this
486
00:42:54,280 --> 00:42:57,501
is your mean and we are taking that X minus
mu x.
487
00:42:57,501 --> 00:42:58,501
..
488
00:42:58,501 --> 00:43:02,120
That is a particular value x if I take that
minus mu x suppose that this is somewhere
489
00:43:02,120 --> 00:43:12,330
where is the origin is here, so I am taking
the value x here, this is your x this is your
490
00:43:12,330 --> 00:43:15,570
mu
x that is I know that is your mean, so I am
491
00:43:15,570 --> 00:43:20,330
taking the x minus mu x that is nothing but,
from the distance from the mean to that particular
492
00:43:20,330 --> 00:43:26,940
value. Now this one I am just
multiplying it with that particular density
493
00:43:26,940 --> 00:43:31,040
at that point that is the dx and multiplying
that
494
00:43:31,040 --> 00:43:36,030
one with the distance to this one, this is
a kind of the second moment from starting
495
00:43:36,030 --> 00:43:39,150
from
this mean of the particular distribution.
496
00:43:39,150 --> 00:43:47,290
So, we will see in a moment, how it is represented
as the second moment about this mean
497
00:43:47,290 --> 00:43:53,600
and we can go for this, so the mean is nothing
but, the first moment with respect to the
498
00:43:53,600 --> 00:43:58,220
origin and variance is the second moment with
respect to the mean; now like that I can
499
00:43:58,220 --> 00:44:01,760
go to the third moment with respect to mean
fourth moment with respect to the mean
500
00:44:01,760 --> 00:44:07,610
fifth, sixth in this way we can go and each
and every moment will give some property of
501
00:44:07,610 --> 00:44:14,610
the distribution. For example, the second
moment that is this one with respect to this
502
00:44:14,610 --> 00:44:20,430
mean is giving you the measure of the dispersion,
how it is dispersed around the mean.
503
00:44:20,430 --> 00:44:26,990
So we will come that 1 how it is represented
incase of the mu particularly. So before that
504
00:44:26,990 --> 00:44:33,960
we will see, so here that that this distance
square multiplied by this density that is
505
00:44:33,960 --> 00:44:38,620
f(x)
multiplied by this d(x) the small infinitesimally
506
00:44:38,620 --> 00:44:41,010
small area.
507
00:44:41,010 --> 00:44:42,010
..
508
00:44:42,010 --> 00:44:45,960
Now, if I want to know the sample statistics
of this measure of dispersion the sample
509
00:44:45,960 --> 00:44:50,490
statistics of the variance is given by so
this sample statistics means again for the
510
00:44:50,490 --> 00:44:53,930
random
variables some observations are taken xi for
511
00:44:53,930 --> 00:45:01,340
n if the number of samples are n, then xi
minus x bar which is the mean of that particular
512
00:45:01,340 --> 00:45:08,720
sample mean that minus this one square
summing up them and divided by n minus 1 this
513
00:45:08,720 --> 00:45:15,030
n minus 1 is due to the to make this
estimate as unbiased.
514
00:45:15,030 --> 00:45:21,540
The sample statistics for the standard deviation
as we told that is the positive square root
515
00:45:21,540 --> 00:45:27,930
of this variance, so this full quantity power
half that is square root of this one which
516
00:45:27,930 --> 00:45:36,340
gives the standard deviation, for this standard
deviation we go for this first root but, for
517
00:45:36,340 --> 00:45:41,990
the higher moments when you go for this measure
of that skewness and all we generally
518
00:45:41,990 --> 00:45:48,520
do not take that the third root of that one
to this one, this is here for the for the
519
00:45:48,520 --> 00:45:50,700
first one
that is for the measure of variance we take
520
00:45:50,700 --> 00:45:55,990
the root just to see that if you see this
expression, then you will see that unit of
521
00:45:55,990 --> 00:45:58,520
the standard deviation is equal to the unit
of the
522
00:45:58,520 --> 00:46:02,460
random variable itself.
So, this is a sometime has greater help so
523
00:46:02,460 --> 00:46:04,820
that is why we take that 1 square root and
we
524
00:46:04,820 --> 00:46:12,320
declared that 1 as a standard deviation, which
we do not do for the higher order moments
525
00:46:12,320 --> 00:46:17,191
and the sample estimate of this coefficient
of variation is the sample estimate of the
526
00:46:17,191 --> 00:46:21,140
s and
this x bar is that sample mean of this to
527
00:46:21,140 --> 00:46:25,380
get that coefficient of variation.
528
00:46:25,380 --> 00:46:26,380
..
529
00:46:26,380 --> 00:46:31,570
So, one example problem we will take for this
thing, that is whatever we got the mean
530
00:46:31,570 --> 00:46:36,220
and this standard deviation and variance,
so the time between two the same example that
531
00:46:36,220 --> 00:46:41,620
we discuss in this in the context of this
CDF the time between two successive rail
532
00:46:41,620 --> 00:46:47,350
accidents can be described with an exponential
pdf that is fT(t) equals to lambda e power
533
00:46:47,350 --> 00:46:53,240
minus lambda t, t greater than or equal to
0 and 0 for this other areas that is less
534
00:46:53,240 --> 00:46:58,460
than 0.
The pdf and CDF are we have seen it earlier
535
00:46:58,460 --> 00:47:04,410
that is how it looks like pdf and CDF in
such cases, so we have to find out the mean,
536
00:47:04,410 --> 00:47:10,090
mode, median, coefficient of variation, so
whatever the statistics we have seen just
537
00:47:10,090 --> 00:47:14,560
so far we will just see, how it is for this
exponential distribution, how it can be calculated
538
00:47:14,560 --> 00:47:19,180
for this kind of distribution.
539
00:47:19,180 --> 00:47:20,180
..
540
00:47:20,180 --> 00:47:27,720
So, this is the pdf of this T which we have
also seen that, this value is lambda and it
541
00:47:27,720 --> 00:47:31,430
is
gradually coming down and getting asymptotically
542
00:47:31,430 --> 00:47:37,600
to this 0 at infinity, so this is the pdf
for the random variable T.
543
00:47:37,600 --> 00:47:38,600
.
544
00:47:38,600 --> 00:47:44,670
Now, the mean of this mean time between the
successive events of this rail accident, if
545
00:47:44,670 --> 00:47:49,170
this is the distribution then, we know that
mu t which is the expected value of this t
546
00:47:49,170 --> 00:47:53,900
is the
over the entire range of this pdf the t multiplied
547
00:47:53,900 --> 00:48:02,320
by lambda e power minus lambda t dt, if
we just do this small this integration by
548
00:48:02,320 --> 00:48:06,150
parts, then we will see that this mu T is
equals to
549
00:48:06,150 --> 00:48:14,260
.1 by lambda. Now, here one thing you can
see that so this is that lambda that, we are
550
00:48:14,260 --> 00:48:17,250
getting here and even we have seen that the
sample estimates.
551
00:48:17,250 --> 00:48:23,360
So, if you take some sample of this particular
event then if we calculate their mean the
552
00:48:23,360 --> 00:48:28,040
sample mean and if you take the inverse of
that sample mean then we will get the
553
00:48:28,040 --> 00:48:32,470
measure of this lambda, so this is how means
we method of this is called the method of
554
00:48:32,470 --> 00:48:38,100
moment to get that estimate of this parameters,
just few slide before we are discussed, we
555
00:48:38,100 --> 00:48:42,530
are taking that how to estimate the parameters
it is one of the method, that even how to
556
00:48:42,530 --> 00:48:47,160
get that one but, that is not the context
what we are discussing here, now we are just
557
00:48:47,160 --> 00:48:53,230
getting that for this distribution what is
the mean, so the mean are the expected value
558
00:48:53,230 --> 00:48:57,910
of
the random variable is 1 by lambda.
559
00:48:57,910 --> 00:49:07,890
So, this mean t bar mu t is 1 by lambda, from
the pdf it can be observed that the
560
00:49:07,890 --> 00:49:15,500
probability density is highest at t equals
to 0, so the density values if we see if we
561
00:49:15,500 --> 00:49:17,310
just
look this density value so we see that at
562
00:49:17,310 --> 00:49:20,750
t equals to 0 itself thus value the magnitude
of
563
00:49:20,750 --> 00:49:27,980
this pdf is maximum which is lambda; so from
the definition of the mode we can say that
564
00:49:27,980 --> 00:49:35,600
a mode that is t tilde is equals to 0, so
mode t at 0, mean is at 1 by lambda. Now we
565
00:49:35,600 --> 00:49:41,340
will
see where the median is, so median means we
566
00:49:41,340 --> 00:49:47,480
have to calculate that, we have to integrate
this one and get some value where it is covering
567
00:49:47,480 --> 00:49:52,850
the 50 percent of the total area, total
area is 1 we know, so 0.5 we have to integrate
568
00:49:52,850 --> 00:49:59,849
it from the 0 to some value x, where it will
be equal to 0.5 to get the median.
569
00:49:59,849 --> 00:50:00,849
.
570
00:50:00,849 --> 00:50:06,070
.So, exactly the same thing is done, so for
the 0 to tm that is the value of this median
571
00:50:06,070 --> 00:50:10,030
of
this value should be equals to 0.5, so if
572
00:50:10,030 --> 00:50:12,950
you do this integration we will get that the
mean
573
00:50:12,950 --> 00:50:22,170
becomes the tm equals to this and tm equals
to 0.693 mu t, that is 1 by lambda the
574
00:50:22,170 --> 00:50:27,800
variance of this T; now if you want to get
that variance of that one, so again we will
575
00:50:27,800 --> 00:50:31,280
use
that same expression that is a random variable
576
00:50:31,280 --> 00:50:36,320
t minus 1 by lambda 1 by lambda is
nothing but, we know that this mean that is
577
00:50:36,320 --> 00:50:41,640
expected value of this T so if you take the
t
578
00:50:41,640 --> 00:50:49,570
minus 1 by lambda square then multiplied by
this probability density dt then we do this
579
00:50:49,570 --> 00:50:54,860
integration by parts and we see that this
sigma t square which is variance is equals
580
00:50:54,860 --> 00:50:57,110
to 1
by lambda square.
581
00:50:57,110 --> 00:51:05,160
So we have seen that mean is 1 by lambda for
this exponential distribution and the
582
00:51:05,160 --> 00:51:07,670
variance is 1 by lambda square.
.
583
00:51:07,670 --> 00:51:13,490
Now, the standard deviation again we know
the standard deviation should be positive
584
00:51:13,490 --> 00:51:17,720
square root of the variance, so standard deviation
is again equals to 1 by lambda, so
585
00:51:17,720 --> 00:51:21,869
standard deviation the magnitude of the are
same the mean and the standard deviation
586
00:51:21,869 --> 00:51:26,940
which is again that 1 by lambda.
So, the coefficient of variation if you want
587
00:51:26,940 --> 00:51:29,090
to calculate, coefficient of variation of
the
588
00:51:29,090 --> 00:51:33,440
exponential distribution is equals to sigma
T by mu T, we know that so this is 1 by
589
00:51:33,440 --> 00:51:39,930
lambda by 1 by lambda which is equals to 1,
so coefficient of variation is equals to 1.
590
00:51:39,930 --> 00:51:51,190
.So there, are two other measure of dispersion
and one is that measure of skewness and
591
00:51:51,190 --> 00:52:00,830
other one is measure of peakedness and we
will see this one later, these are the the
592
00:52:00,830 --> 00:52:06,140
higher
moments what we have seen today is that how
593
00:52:06,140 --> 00:52:12,339
the first moment about that about the
origin and in the next class as well.
594
00:52:12,339 --> 00:52:13,339
.
595
00:52:13,339 --> 00:52:21,560
What we will see that, how it is becoming
that mean from this so one particular general
596
00:52:21,560 --> 00:52:26,630
distribution if you take and how this moments
are actually coming from this one with
597
00:52:26,630 --> 00:52:34,070
respect to the origin, so this is basically
your mu x, how it is means the moment with
598
00:52:34,070 --> 00:52:38,920
respect to the origin and the once we get
the moment with respect to the origin that
599
00:52:38,920 --> 00:52:43,480
means we are getting the value of mean and
all other higher moments we calculate with
600
00:52:43,480 --> 00:52:47,950
respect to that mean.
The first that we calculate is that with respect
601
00:52:47,950 --> 00:52:53,470
to the mean that is a second moment with
respect to the mean what we got that is called
602
00:52:53,470 --> 00:53:00,720
the variance, one open question that I can
put before I conclude today’s class is that,
603
00:53:00,720 --> 00:53:03,369
as I told that second moment with respect
to
604
00:53:03,369 --> 00:53:10,480
the mean is variance, what is the first moment
with respect to mean, so the first moment
605
00:53:10,480 --> 00:53:16,100
with respect to the mean that means I have
to take that particular value minus that mean
606
00:53:16,100 --> 00:53:21,880
minus that mean multiplied that density and
that we will get so interestingly or this
607
00:53:21,880 --> 00:53:24,780
is
mathematically very easy to state, that the
608
00:53:24,780 --> 00:53:30,540
first moment with respect to the mean is
always 0, because you are basically whatever
609
00:53:30,540 --> 00:53:32,850
the positive side of that mean and whatever
610
00:53:32,850 --> 00:53:41,360
.the negative side of that mean both are cancelled
out to results in that the first moment
611
00:53:41,360 --> 00:53:46,040
with respect to the mean is equals to 0.
So, we start with this second moment, the
612
00:53:46,040 --> 00:53:50,770
second moment is the measure of variance
third moment skewness, fourth moment peakedness
613
00:53:50,770 --> 00:53:55,680
like that, fifth moment, sixth
moment, are there basically upto the fourth
614
00:53:55,680 --> 00:54:00,290
moment is sufficient to describe a particular
random variable. Today’s class we discuss
615
00:54:00,290 --> 00:54:05,660
up to the variance, next class we will start
with the description of the skewness and the
616
00:54:05,660 --> 00:54:11,840
kurtosis and we will also see more details
how this can be related to as a moment; so
617
00:54:11,840 --> 00:54:14,490
we will that, will be the starting off again
for
618
00:54:14,490 --> 00:54:22,480
the first moment and that moment generating
function for a random variable. So, thank
619
00:54:22,480 --> 00:54:25,069
you for today’s class.
620
00:54:25,069 --> 00:54:25,069
.