1
00:00:16,970 --> 00:00:23,830
Hello and welcome to the forth lecture of
our, this current module, module 3 this module
2
00:00:23,830 --> 00:00:25,880
is on random variable.
3
00:00:25,880 --> 00:00:31,019
And in this lecture, we will cover that further
descriptors of
4
00:00:31,019 --> 00:00:32,099
random variable.
5
00:00:32,099 --> 00:00:36,360
Basically, in the last class what we did?
6
00:00:36,360 --> 00:00:42,800
We started with the description
of CDF and after that, we started some of
7
00:00:42,800 --> 00:00:48,769
the descriptors of the random variable, if
the;
8
00:00:48,769 --> 00:00:56,699
in absence of the a proper definition of this
pdf and CDF, if we know some sample data
9
00:00:56,699 --> 00:01:01,160
then from the sample data there can be, we
can define some descriptors of the random
10
00:01:01,160 --> 00:01:08,270
variable, and with that we get some idea about
how the particular random variable pdf
11
00:01:08,270 --> 00:01:11,869
distribution the behavior of the pdf we know.
12
00:01:11,869 --> 00:01:17,310
So, in the last class, we saw that it is central
tendency and that central tendency as we
13
00:01:17,310 --> 00:01:21,690
saw that it is in terms of it is mean, mode
or median.
14
00:01:21,690 --> 00:01:25,960
And then we saw that coefficient of
dispersion that is measure of dispersion in
15
00:01:25,960 --> 00:01:28,940
terms of variance and standard deviation.
16
00:01:28,940 --> 00:01:33,430
So,
today’s lecture that we will start with
17
00:01:33,430 --> 00:01:37,220
some more further descriptors of this random
variable.
18
00:01:37,220 --> 00:01:38,220
..
19
00:01:38,220 --> 00:01:41,540
And we will start with skewness and kurtosis.
20
00:01:41,540 --> 00:01:47,520
This skewness and kurtosis are the higher
moments as we have just described in the last
21
00:01:47,520 --> 00:01:52,440
that analogy, that we have given that
analogy with these moments.
22
00:01:52,440 --> 00:01:56,299
This is the higher moments with the respect
to the mean
23
00:01:56,299 --> 00:02:03,400
and standard deviation so we will see that
one, first that skewness and kurtosis.
24
00:02:03,400 --> 00:02:06,260
Then we
will see some analogy with the moment of area
25
00:02:06,260 --> 00:02:08,690
that we indicated in this last class.
26
00:02:08,690 --> 00:02:11,330
And
with this, we will start that moment generating
27
00:02:11,330 --> 00:02:16,120
function and characteristic function, these
two are very useful function and if we can
28
00:02:16,120 --> 00:02:21,980
define these functions, then we will see that
the description of all such moments that is
29
00:02:21,980 --> 00:02:27,000
basically, this mean, standard deviation or
mean, variance, Skewness, Kurtosis this is
30
00:02:27,000 --> 00:02:28,000
up to fourth.
31
00:02:28,000 --> 00:02:30,510
And above also that higher
moments are also possible.
32
00:02:30,510 --> 00:02:35,480
So, we will see that how, if we know the moment
generating function and characteristic
33
00:02:35,480 --> 00:02:41,120
function, and then how all these moments in
any order moment of any other, how it can
34
00:02:41,120 --> 00:02:43,880
be defined with a help of this two functions.
35
00:02:43,880 --> 00:02:47,099
Then we will start, because these are for
this
36
00:02:47,099 --> 00:02:54,220
for some distribution for if some sample data
is available, then how we can
37
00:02:54,220 --> 00:02:55,220
that
38
00:02:55,220 --> 00:02:59,230
particular sample data, we can represent graphically,
so that we can have some idea
39
00:02:59,230 --> 00:03:03,190
about this descriptors with respect to its
graphical representation.
40
00:03:03,190 --> 00:03:04,190
..
41
00:03:04,190 --> 00:03:10,780
So, this is we will try to cover today’s
lecture and we will start with the skewness.
42
00:03:10,780 --> 00:03:15,670
As we
told that this skewness of a random variable
43
00:03:15,670 --> 00:03:19,380
is the asymmetry of it is probability
distribution.
44
00:03:19,380 --> 00:03:24,130
So, this is a measure of the measure of skewness
may be express in terms of
45
00:03:24,130 --> 00:03:28,090
this expectation of X minus mu x.
46
00:03:28,090 --> 00:03:31,069
If you recall, that in the last class what
we discuss
47
00:03:31,069 --> 00:03:32,069
that?
48
00:03:32,069 --> 00:03:36,110
That the first moment when we are taking,
when we are looking for the central
49
00:03:36,110 --> 00:03:40,030
tendency, we take the moment with respect
to the origin.
50
00:03:40,030 --> 00:03:46,000
And so, it gives that the
distance from the origin to that pdf which
51
00:03:46,000 --> 00:03:49,330
is nothing but that it is mean in the location
of
52
00:03:49,330 --> 00:03:54,410
it is mean that also minute in terms of some
graphical representation.
53
00:03:54,410 --> 00:04:00,330
And now, all the higher order moments, if
we take, take it with respect to the mean
54
00:04:00,330 --> 00:04:01,330
if we
take.
55
00:04:01,330 --> 00:04:05,850
And in the last class, we concluded if the
first moment with respect to the mean is
56
00:04:05,850 --> 00:04:07,209
becoming 0.
57
00:04:07,209 --> 00:04:12,990
So, this skewness that what we are talking
about, this is also a moment with
58
00:04:12,990 --> 00:04:20,060
respect to this mean and that mean it is the
third moment and second moment we
59
00:04:20,060 --> 00:04:24,629
discussed in this last class, last lecture
that it was a variance.
60
00:04:24,629 --> 00:04:32,050
So, this is a measure of
symmetry about this mean so, how this particular
61
00:04:32,050 --> 00:04:36,880
distribution, if we see it here, then we
will come to know that.
62
00:04:36,880 --> 00:04:37,880
..
63
00:04:37,880 --> 00:04:41,510
Now, from the origin I know how it is distributed
and all.
64
00:04:41,510 --> 00:04:44,720
So, we know that where it is
the location of this mean.
65
00:04:44,720 --> 00:04:48,460
Now, what we are trying to do from with respect
to this
66
00:04:48,460 --> 00:04:52,490
symmetry that with respect to this mean weather
it is symmetric or not.
67
00:04:52,490 --> 00:04:57,110
So, if it is not
symmetric then how it is distributed about
68
00:04:57,110 --> 00:04:58,110
the mean that is the goal.
69
00:04:58,110 --> 00:05:02,630
So, that is why when
we are taking the third moment X minus this
70
00:05:02,630 --> 00:05:06,440
mu so this sign remains.
71
00:05:06,440 --> 00:05:10,110
Sign remains
means, it will show that if it is more this
72
00:05:10,110 --> 00:05:12,669
side, then it will be negative, then if it
is more
73
00:05:12,669 --> 00:05:16,710
this side it will be positive, means more
means that more disperse.
74
00:05:16,710 --> 00:05:21,750
If it is the dispersion
is more this side than it will be negative
75
00:05:21,750 --> 00:05:25,430
and if it is this side then it will be positive.
76
00:05:25,430 --> 00:05:27,540
So, what we will do?
77
00:05:27,540 --> 00:05:31,270
We will just take the expectation of this
one, as usual for incase of
78
00:05:31,270 --> 00:05:38,570
this discrete random variable and in case
of continuous random variable, for this discrete
79
00:05:38,570 --> 00:05:40,380
the expectation we know that.
80
00:05:40,380 --> 00:05:43,330
Now, this is becoming we see, this is becoming
a function
81
00:05:43,330 --> 00:05:44,530
of thus random variable.
82
00:05:44,530 --> 00:05:50,900
So, X is our random variable and this X minus
mu x power cube
83
00:05:50,900 --> 00:05:53,520
is a function of our random variable X.
84
00:05:53,520 --> 00:05:58,160
So, we know how to take the moment or we
know how to take this functions as the expectation
85
00:05:58,160 --> 00:06:03,980
of a function, that we described in the
last lecture with respect to the g(X).
86
00:06:03,980 --> 00:06:08,720
So, this function will come, will be multiplied
by
87
00:06:08,720 --> 00:06:13,830
that probability density were the; for the
discrete it is the probability value for that
88
00:06:13,830 --> 00:06:15,300
particular outcome.
89
00:06:15,300 --> 00:06:21,350
And we will sum up all this things to get
the moment with respect
90
00:06:21,350 --> 00:06:24,229
what is the third moment with respect to the
mean.
91
00:06:24,229 --> 00:06:31,440
.And similarly for the continuous random variable
X where its pdf, if it is pdf is defined
92
00:06:31,440 --> 00:06:33,440
by this f X(x).
93
00:06:33,440 --> 00:06:38,490
Then this expectation of this function, again
we will be express in terms of
94
00:06:38,490 --> 00:06:41,241
a so this summation now is converts to this
integration.
95
00:06:41,241 --> 00:06:45,890
And this integration over this
entire support of this random variable X,
96
00:06:45,890 --> 00:06:48,460
we have taken it from minus infinity plus
infinity.
97
00:06:48,460 --> 00:06:53,420
So, this gives that cube multiplied by the
density and if we do this integration,
98
00:06:53,420 --> 00:06:56,740
we will get the measure of skewness.
99
00:06:56,740 --> 00:07:00,870
So, this will get the third moment, third
moment of
100
00:07:00,870 --> 00:07:05,010
that variable with respect to mean.
101
00:07:05,010 --> 00:07:06,010
.
102
00:07:06,010 --> 00:07:15,800
Now, if we see that if the probability distribution
is symmetric about this mean then its,
103
00:07:15,800 --> 00:07:23,750
this moment will be equal to 0, this will
be 0, because whatever now, if you see here.
104
00:07:23,750 --> 00:07:24,750
..
105
00:07:24,750 --> 00:07:31,340
Now, if I draw one particular distribution
which is symmetric now, this is your location
106
00:07:31,340 --> 00:07:32,340
of mean.
107
00:07:32,340 --> 00:07:33,350
Now, what we are doing actually?
108
00:07:33,350 --> 00:07:41,199
We are taking a small, a small area and as
this is that distance as, what we are taking,
109
00:07:41,199 --> 00:07:45,139
X minus mu x and this power is cube.
110
00:07:45,139 --> 00:07:49,800
So that
whether this is on the negative side with
111
00:07:49,800 --> 00:07:54,190
respect to this mean or this is, so this is
on the
112
00:07:54,190 --> 00:07:56,270
positive side with respect to this mean.
113
00:07:56,270 --> 00:07:59,990
Now, being there, being it symmetric with
respect
114
00:07:59,990 --> 00:08:03,419
to this one, being this power 3 this two are
cancels out.
115
00:08:03,419 --> 00:08:10,139
So, if this one is symmetric, then
this moment will be exactly equals to 0.
116
00:08:10,139 --> 00:08:17,620
So, this is what is explained here, if the
probability distribution is symmetric about
117
00:08:17,620 --> 00:08:22,190
mu x, then this expectation of X minus mu
x
118
00:08:22,190 --> 00:08:28,460
power cube this is third moment is equals
to 0, which is one example is also shown in
119
00:08:28,460 --> 00:08:30,469
this diagram.
120
00:08:30,469 --> 00:08:31,469
..
121
00:08:31,469 --> 00:08:37,539
Now, we will compare with this symmetric,
this black one again we have just written
122
00:08:37,539 --> 00:08:44,449
which is we know that this is a symmetric
for which this thus third moment with respect
123
00:08:44,449 --> 00:08:45,880
to mean is 0.
124
00:08:45,880 --> 00:08:51,350
Now, if we see this blue one, this blue one
is known as the positively
125
00:08:51,350 --> 00:08:52,350
skewed.
126
00:08:52,350 --> 00:08:55,949
Now, we will see how this one is positively
skewed, how it is distributed.
127
00:08:55,949 --> 00:09:01,739
If for
X greater than mu x, the values are more widely
128
00:09:01,739 --> 00:09:09,119
disperse than for the X less than equals
to mu x then this will obviously, become positive.
129
00:09:09,119 --> 00:09:20,389
So, if this is becoming positive, then
the skewness of this blue pdf is a positively
130
00:09:20,389 --> 00:09:22,600
skewed, we say it is positively skewed.
131
00:09:22,600 --> 00:09:25,689
Now,
this one is greater than for this zone.
132
00:09:25,689 --> 00:09:26,689
.
133
00:09:26,689 --> 00:09:37,010
.Now, if I just saw another one here, what
is meant by this one is.
134
00:09:37,010 --> 00:09:44,149
Now, as this side is
disperse if we take this mean, may be the
135
00:09:44,149 --> 00:09:51,440
mean will be somewhere at this location, this
mean will come here.
136
00:09:51,440 --> 00:09:57,160
Now, if we see that data range towards the
negative side it was left
137
00:09:57,160 --> 00:10:00,410
side of this mean and the right side of this
mean that is this is your mu x.
138
00:10:00,410 --> 00:10:06,690
So, this side if I
say, this is your X is greater than mu x now,
139
00:10:06,690 --> 00:10:12,519
and this side is your X less than mu x.
140
00:10:12,519 --> 00:10:17,249
Now,
you can see that this side only widely spread,
141
00:10:17,249 --> 00:10:23,079
this is widely spread than compared to this
one, this is lesser, this is more dense compared
142
00:10:23,079 --> 00:10:24,899
to thus right hand side.
143
00:10:24,899 --> 00:10:29,519
Then when we are
taking that X minus mu x power cube this is
144
00:10:29,519 --> 00:10:36,220
giving you more weight age that is why the
total summation will be positive in case when
145
00:10:36,220 --> 00:10:37,829
it is skewed like this.
146
00:10:37,829 --> 00:10:40,649
So, that example is shown here.
147
00:10:40,649 --> 00:10:45,410
So, somewhere the main location will be in
this here
148
00:10:45,410 --> 00:10:50,850
where the right side, the positive side with
respect to the mean will be more disperse,
149
00:10:50,850 --> 00:10:54,470
than compared to this a left hand side.
150
00:10:54,470 --> 00:10:59,540
So, that resulting in that so this side it
is most this
151
00:10:59,540 --> 00:11:01,569
quantity will become positive.
152
00:11:01,569 --> 00:11:05,540
So, this blue pdf what is shown here, that
is positively
153
00:11:05,540 --> 00:11:07,870
skewed.
154
00:11:07,870 --> 00:11:08,870
.
155
00:11:08,870 --> 00:11:12,600
Just opposite if we see, we will get that
what is called the negatively skewed.
156
00:11:12,600 --> 00:11:16,089
So, in this
case it is just opposite that is X less than
157
00:11:16,089 --> 00:11:19,500
equals to mu x in this zone, in this region
the
158
00:11:19,500 --> 00:11:26,319
values are more widely disperse, than for
that X greater than mu x.
159
00:11:26,319 --> 00:11:32,230
So, somewhere in this
location, in this point this mean will come
160
00:11:32,230 --> 00:11:34,810
and this side it will be less disperse compare
161
00:11:34,810 --> 00:11:35,989
.to this the side.
162
00:11:35,989 --> 00:11:40,649
So, obviously, due to this power cube this
quantity will become negative.
163
00:11:40,649 --> 00:11:49,879
So, what is shown here by this blue line this
pdf is known as negatively skewed pdf.
164
00:11:49,879 --> 00:11:50,879
.
165
00:11:50,879 --> 00:12:01,139
Now, for up to this second moment what we
have seen that, it can be viewed as a power
166
00:12:01,139 --> 00:12:06,209
of this random variable, when we are taking
this moment with respect to the mean.
167
00:12:06,209 --> 00:12:09,589
And
we are taking is as a square then; obviously,
168
00:12:09,589 --> 00:12:14,990
that particular quantity is also having the
same unit that the random variable is having.
169
00:12:14,990 --> 00:12:19,079
Now, for our convenience, when we are
increasing the order of this moment, when
170
00:12:19,079 --> 00:12:23,699
we are going for this third moment of fourth
moment, then keeping the same unit may not
171
00:12:23,699 --> 00:12:26,790
be that convenient.
172
00:12:26,790 --> 00:12:34,149
So, to make it
convenient what we do, is that value, that
173
00:12:34,149 --> 00:12:40,129
moment is generally normalized with respect
to its some power of the standard deviation.
174
00:12:40,129 --> 00:12:45,170
So, the power of the standard deviation is
selected in such a way, that the quantity
175
00:12:45,170 --> 00:12:46,259
becomes dimensionless.
176
00:12:46,259 --> 00:12:52,230
So, this is exactly is
done here to get that and is known this coefficient
177
00:12:52,230 --> 00:12:53,230
of skewness.
178
00:12:53,230 --> 00:13:02,009
So, the coefficient of skewness here is the
convenient dimensionless measure of
179
00:13:02,009 --> 00:13:09,579
asymmetry given by so this is your expectation
of this function which is the third order
180
00:13:09,579 --> 00:13:10,579
moment.
181
00:13:10,579 --> 00:13:16,790
Now, this is obviously, a this quantity is
having an unit of this cube of the unit
182
00:13:16,790 --> 00:13:17,999
of this random variable.
183
00:13:17,999 --> 00:13:22,470
So, the standard deviation also, we are taking
the same power to
184
00:13:22,470 --> 00:13:25,569
get the coefficient of skewness.
185
00:13:25,569 --> 00:13:31,350
So, the coefficient of skewness as we are
dividing it by
186
00:13:31,350 --> 00:13:36,860
this power 3, this is becoming the total quantity,
this gamma becoming dimensionless.
187
00:13:36,860 --> 00:13:45,630
So, this now, the; if this becomes 0, in case
of the perfectly symmetric pdf, perfectly
188
00:13:45,630 --> 00:13:50,649
.symmetric Probability Density Function, then
this when it is becoming 0, this function
189
00:13:50,649 --> 00:13:53,029
is
becoming, this value becoming 0.
190
00:13:53,029 --> 00:13:56,589
So, the gamma equals to 0 indicate the; this
is
191
00:13:56,589 --> 00:14:01,861
asymmetric pdf and if it is positive then
it is positively skewed, if it is negative
192
00:14:01,861 --> 00:14:02,861
then it is
negatively skewed.
193
00:14:02,861 --> 00:14:08,149
So, this denominator is only to make this
quantity dimensionless.
194
00:14:08,149 --> 00:14:16,559
Now, this is from the now, if we have some
sample data, if we want to calculate this
195
00:14:16,559 --> 00:14:20,059
one,
calculate this measure that is this coefficient
196
00:14:20,059 --> 00:14:21,750
of skewness from the sample data.
197
00:14:21,750 --> 00:14:24,879
There are
some samples statistic based on which we can
198
00:14:24,879 --> 00:14:27,939
calculate this coefficient of skewness.
199
00:14:27,939 --> 00:14:32,990
Now, this coefficient of Skewness, the sample
estimate of this coefficient of skewness is
200
00:14:32,990 --> 00:14:40,269
expressed as this, where we take the individual
observation and deduct it from the mean.
201
00:14:40,269 --> 00:14:45,489
So, it is giving basically the distance from;
distance of each and every observation from
202
00:14:45,489 --> 00:14:46,489
the mean.
203
00:14:46,489 --> 00:14:50,739
And we are taking x power cube summing it
up for the all observation, this n
204
00:14:50,739 --> 00:14:54,419
is the number of observation as shown it here,
n equals to number of observation.
205
00:14:54,419 --> 00:15:00,589
So, we are summing it up multiplying it by
n, will come to this one and divided by this
206
00:15:00,589 --> 00:15:02,689
is
the sample estimate, this s is the sample
207
00:15:02,689 --> 00:15:07,619
estimate of this standard deviation that we
discuss in the last lecture.
208
00:15:07,619 --> 00:15:10,750
So, this is the sample estimate of the standard
deviation, we
209
00:15:10,750 --> 00:15:12,369
are taking it is power cube.
210
00:15:12,369 --> 00:15:16,309
So, this unit and this unit get cancelled
and these are the
211
00:15:16,309 --> 00:15:22,699
normalizing constants so, this n divided by
n minus 1 multiplied by n by 2 just make this
212
00:15:22,699 --> 00:15:24,540
one is consistent and unbiased.
213
00:15:24,540 --> 00:15:28,949
So, this is this total quantity what is given
is that, this is
214
00:15:28,949 --> 00:15:37,489
you are the coefficient of Skewness, depending
on this data available from x1 to xn.
215
00:15:37,489 --> 00:15:38,489
.
216
00:15:38,489 --> 00:15:43,509
.Now, we will go to this forth moment, forth
moment with respect to the mean and which
217
00:15:43,509 --> 00:15:44,800
is known as the kurtosis.
218
00:15:44,800 --> 00:15:52,470
Now, so the measure of kurtosis is the fourth
moment with
219
00:15:52,470 --> 00:15:53,470
respect to mean.
220
00:15:53,470 --> 00:15:59,019
So, we are taking the expectation of the function
X minus mu x power
221
00:15:59,019 --> 00:16:01,300
4.
222
00:16:01,300 --> 00:16:06,100
So and again to make this one dimensionless,
what we are using is?
223
00:16:06,100 --> 00:16:09,910
We are using that
this sigma x power 4.
224
00:16:09,910 --> 00:16:14,660
Now, as this again similar to this coefficient
of skewness that is we
225
00:16:14,660 --> 00:16:20,479
are making, we are taking the power of this
standard deviation 4 to make this total unit,
226
00:16:20,479 --> 00:16:23,499
total this one as the unit less.
227
00:16:23,499 --> 00:16:25,790
Now, what is important here?
228
00:16:25,790 --> 00:16:30,229
What does this coefficient of kurtosis implies?
229
00:16:30,229 --> 00:16:32,310
For
example, thus we have seen that mean, what
230
00:16:32,310 --> 00:16:41,759
does it mean implies, because this is a
central tendency, that then the coefficient
231
00:16:41,759 --> 00:16:47,290
of variation this is indicating that its dispersion
with respect to mean.
232
00:16:47,290 --> 00:16:51,380
Then skewness, skewness is showing how whether
it is more
233
00:16:51,380 --> 00:16:56,499
spread towards the right hand side of mean
or the left hand side of the mean so that
234
00:16:56,499 --> 00:16:57,519
is the
measure of symmetry.
235
00:16:57,519 --> 00:17:03,249
Now, what is this kurtosis if we see, then
we will see that this is also power 4 so as
236
00:17:03,249 --> 00:17:04,549
it is
power 4.
237
00:17:04,549 --> 00:17:08,860
And obviously, for the whatever on the left
hand side and the right hand side
238
00:17:08,860 --> 00:17:13,470
that will be negated and that will make 0.
239
00:17:13,470 --> 00:17:16,840
Now, these 4 is actually is a measure of
peakedness.
240
00:17:16,840 --> 00:17:22,230
Now, this measure of peakedness means, how
the peak, how does that
241
00:17:22,230 --> 00:17:25,359
particular peak of this pdf look like.
242
00:17:25,359 --> 00:17:26,359
.
243
00:17:26,359 --> 00:17:35,370
.Now, if we see here that difference between
say this is one, so this is also a symmetric
244
00:17:35,370 --> 00:17:36,370
distribution.
245
00:17:36,370 --> 00:17:47,919
And if I draw another one, this is also another
symmetric distribution and
246
00:17:47,919 --> 00:17:51,029
there mean are also say same.
247
00:17:51,029 --> 00:17:56,630
So, the first moment is same, second moment
is the
248
00:17:56,630 --> 00:18:00,080
variance also will be same if we see just
the total thing.
249
00:18:00,080 --> 00:18:03,529
So, we will see that the variance
of these two distributions with respect to
250
00:18:03,529 --> 00:18:05,340
the mean will be same.
251
00:18:05,340 --> 00:18:11,220
Now, the coefficient of
the skweness, skweness also will be same in
252
00:18:11,220 --> 00:18:13,640
the both the cases it is symmetric so that
is
253
00:18:13,640 --> 00:18:15,120
why it is both the cases will be 0.
254
00:18:15,120 --> 00:18:18,399
Now, the difference of this peak of this,
how it is peak
255
00:18:18,399 --> 00:18:22,850
at this point, that will be reflected with
respect to its fourth moment.
256
00:18:22,850 --> 00:18:27,919
And that is why this
fourth moment, when it is normalized by the
257
00:18:27,919 --> 00:18:33,120
fourth power of this standard deviation
which is known this coefficient of kurtosis,
258
00:18:33,120 --> 00:18:35,050
this is known as this measure of peakedness.
259
00:18:35,050 --> 00:18:43,590
So, this measure of peakedness with this one
having a sample statistics of this kurtosis
260
00:18:43,590 --> 00:18:46,080
again the sample estimates.
261
00:18:46,080 --> 00:18:48,730
Sample estimate of this coefficient of kurtosis
is expressed in
262
00:18:48,730 --> 00:18:52,679
terms of this that is each and every individual
its measure.
263
00:18:52,679 --> 00:18:57,230
The distance from this mean
power 4 summing up for this all the observation
264
00:18:57,230 --> 00:19:02,250
and divided it by the 4 power of this
standard deviation and this normalizing constant
265
00:19:02,250 --> 00:19:05,220
to make this estimate unbiased and
consistent.
266
00:19:05,220 --> 00:19:13,860
Now, this 1 will be equal to 3, in case of
this most, mostly use distribution
267
00:19:13,860 --> 00:19:17,540
known as this Gaussian distribution or normal
distribution.
268
00:19:17,540 --> 00:19:22,929
So, for this normal
distribution, if we calculate this quantity
269
00:19:22,929 --> 00:19:26,740
so this measure of peakedness will become
equal to 3.
270
00:19:26,740 --> 00:19:33,570
Now, if I say, that this measure of peakedness
is less than 3, then we will say that this
271
00:19:33,570 --> 00:19:43,630
peak is attained is little bit lesser than
this standard normal distribution and for
272
00:19:43,630 --> 00:19:44,930
that here.
273
00:19:44,930 --> 00:19:49,500
And if it is more, if it is peak is more,
if it is more than 3 then it is more peak,
274
00:19:49,500 --> 00:19:55,580
than with
respect to in compared to that normal distribution.
275
00:19:55,580 --> 00:20:02,289
So, this one sometimes in some of the
text, this sample estimate is also expressed
276
00:20:02,289 --> 00:20:05,380
in terms of this minus 3, just to make that
this
277
00:20:05,380 --> 00:20:11,820
K, if the K is equals to 0, then it is for
the same to this normal distribution.
278
00:20:11,820 --> 00:20:14,669
And if it is
negative then it is lower than normal distribution,
279
00:20:14,669 --> 00:20:18,700
if it is positive higher the normal
distribution, but this is also correct.
280
00:20:18,700 --> 00:20:22,200
So, we can, we have to take that this is equals
to 3 in
281
00:20:22,200 --> 00:20:25,000
case of this normal distribution.
282
00:20:25,000 --> 00:20:31,810
So, with this what is a there afterwards you
can now, understand that we have discuss
283
00:20:31,810 --> 00:20:35,720
about this first moment, second moment, third
moment, forth moment.
284
00:20:35,720 --> 00:20:38,130
And obviously,
the first moment with respect to the origin
285
00:20:38,130 --> 00:20:42,389
we discuss, then we have seen that first
moment with the respect to the mean is 0.
286
00:20:42,389 --> 00:20:44,110
Second moment with respect to the mean and
287
00:20:44,110 --> 00:20:49,450
.we understood what is this implication, then
third moment with respect to mean, forth
288
00:20:49,450 --> 00:20:50,570
moment with respect to mean.
289
00:20:50,570 --> 00:20:53,769
And in this way we can go on increasing, we
can go for
290
00:20:53,769 --> 00:20:58,629
the fifth, sixth and seventh moment and for
each cases you will get some of this
291
00:20:58,629 --> 00:21:04,090
descriptors and up to this fourth one it is
almost describing almost everything.
292
00:21:04,090 --> 00:21:13,490
But still we will say if we get the measure
of this all this kinds of coefficient that
293
00:21:13,490 --> 00:21:17,350
is all
this moments with respect to mean.
294
00:21:17,350 --> 00:21:19,850
If we say, then we can equivalently, we can
say that
295
00:21:19,850 --> 00:21:24,159
this the all the properties of the pdf is
known to us.
296
00:21:24,159 --> 00:21:30,639
But so, in that one we will just see in
a minute, that how this one is deleted, how
297
00:21:30,639 --> 00:21:38,240
we can say that from a single function, how
can get all this measures, all this moments
298
00:21:38,240 --> 00:21:39,929
that we will see.
299
00:21:39,929 --> 00:21:44,240
But, before that these
moments this analogy of this now, why this
300
00:21:44,240 --> 00:21:49,919
is called moment that is it is analogy with
respect to the area of the pdf will be discuss
301
00:21:49,919 --> 00:21:50,980
now.
302
00:21:50,980 --> 00:21:51,980
.
303
00:21:51,980 --> 00:21:57,830
So, now see if we say that this is one of
the standard pdf that we have shown here.
304
00:21:57,830 --> 00:22:03,071
Now,
it is mean will be some point where it is
305
00:22:03,071 --> 00:22:04,071
c.g is.
306
00:22:04,071 --> 00:22:08,020
Now, considering a unit area having the
general shape as shown in this following figure.
307
00:22:08,020 --> 00:22:14,530
Now, this is if I take a very small area
very small length along this X axis which
308
00:22:14,530 --> 00:22:20,630
is dx and if we calculate, what is the total
moment and this is your origin.
309
00:22:20,630 --> 00:22:24,480
So, if we calculate, what is the moment, that
is due to
310
00:22:24,480 --> 00:22:29,679
this particular area then this distance will
be multiplied by this total area to get this
311
00:22:29,679 --> 00:22:30,679
one.
312
00:22:30,679 --> 00:22:36,480
Now, if we integrate this one, this full area,
then we will get what is the total moment
313
00:22:36,480 --> 00:22:39,250
of
this area with respect to the origin.
314
00:22:39,250 --> 00:22:45,360
.Now, if we divide that quantity with respect
to this total area, then we will get one
315
00:22:45,360 --> 00:22:49,909
distance that distance from this origin that
is passing through its c.g.
316
00:22:49,909 --> 00:22:53,790
So, this is, this show
and we know that from this properties of this
317
00:22:53,790 --> 00:22:58,679
standard, properties of this pdf that this
area total area is equals to 1.
318
00:22:58,679 --> 00:23:04,289
Now, if we just see this discussion here,
is the centroidal
319
00:23:04,289 --> 00:23:07,840
distance xo of this area, that is xo is equals
to.
320
00:23:07,840 --> 00:23:10,270
So, this one what we are doing is that?
321
00:23:10,270 --> 00:23:12,970
That
particular area that is f(x) multiplied by
322
00:23:12,970 --> 00:23:13,970
dx.
323
00:23:13,970 --> 00:23:14,970
So, what is this?
324
00:23:14,970 --> 00:23:19,629
So, this one, this height here
which is that f(x) at the location x that
325
00:23:19,629 --> 00:23:24,760
multiplied by this small length dx thus is
giving
326
00:23:24,760 --> 00:23:25,760
this area.
327
00:23:25,760 --> 00:23:26,760
.
328
00:23:26,760 --> 00:23:29,200
That one multiplied by this x is giving with
this moment.
329
00:23:29,200 --> 00:23:31,789
And if we integrate from minus
infinity to plus infinity, we are getting
330
00:23:31,789 --> 00:23:37,880
the total moment and that divided by area;
obviously, is giving the distance of this
331
00:23:37,880 --> 00:23:40,120
centroid of this particular area.
332
00:23:40,120 --> 00:23:43,940
And we know
that this total area is 1 so this 1 is equals
333
00:23:43,940 --> 00:23:50,510
to x, the centroidal distance from the origin
is
334
00:23:50,510 --> 00:23:55,620
equals to integration minus infinity plus
infinity x multiplied by f(x) dx.
335
00:23:55,620 --> 00:23:57,230
Now, what is
this again?
336
00:23:57,230 --> 00:23:59,730
This is nothing but the expectation of the
x.
337
00:23:59,730 --> 00:24:03,850
The way we define that
expectation of the x is, that x multiplied
338
00:24:03,850 --> 00:24:05,379
by f(x) dx.
339
00:24:05,379 --> 00:24:10,289
So, this distance is the centroidal
distance of this area from the origin is nothing
340
00:24:10,289 --> 00:24:15,450
but it is mean the distance from the origin
to this one is nothing but it is mean.
341
00:24:15,450 --> 00:24:21,600
So, it is; so, as this one is also the moment
with about origin of this irregularly shaped
342
00:24:21,600 --> 00:24:22,600
area.
343
00:24:22,600 --> 00:24:28,110
Just comparing this with the expression for
the mean or the expected value of a
344
00:24:28,110 --> 00:24:33,960
.continuous random variable, the mean can
be referred as the first moment about the
345
00:24:33,960 --> 00:24:37,620
origin of the pdf with respect to the random
variable.
346
00:24:37,620 --> 00:24:46,940
So, this is how we just join the
analogy between this total area taking it
347
00:24:46,940 --> 00:24:49,419
is with respect to this moment and multiplied
by
348
00:24:49,419 --> 00:24:51,169
its distance from the origin.
349
00:24:51,169 --> 00:24:55,509
So, this first moment about the origin is
nothing but is it is
350
00:24:55,509 --> 00:25:00,679
mean which is equals to this function is nothing
but equals to it is expectation.
351
00:25:00,679 --> 00:25:09,049
Now, if we want to take the second moment
with respect to the origin then this area
352
00:25:09,049 --> 00:25:14,370
multiplied by this distances square will give
you divided by total area, will give you the
353
00:25:14,370 --> 00:25:15,389
I
am sorry.
354
00:25:15,389 --> 00:25:21,390
So, this area multiplied by this distance
square will give you that second
355
00:25:21,390 --> 00:25:22,390
moment.
356
00:25:22,390 --> 00:25:23,390
.
357
00:25:23,390 --> 00:25:33,909
Now, this second moment the; what we have
seen is that with respect to this with the
358
00:25:33,909 --> 00:25:38,980
instead of taking with respect to the origin
for our convenience, we take the second
359
00:25:38,980 --> 00:25:43,510
moment onwards we take that with respect to
this location of this mean.
360
00:25:43,510 --> 00:25:49,340
So, this xo in
case of this first moment, this xo what we
361
00:25:49,340 --> 00:25:54,600
have shown that as a distance from the origin
to this centroid of this area is nothing but
362
00:25:54,600 --> 00:26:01,759
from this expression which is reflecting that
this is nothing but the expectation of this
363
00:26:01,759 --> 00:26:04,080
x which is nothing but this mean.
364
00:26:04,080 --> 00:26:09,650
So, this point is the distance from this origin
to this mean now, the second moment.
365
00:26:09,650 --> 00:26:15,970
Now, what I told I repeat, that from this
second moment onwards, we calculate it with
366
00:26:15,970 --> 00:26:22,190
respect to this mean and we take this distance
from this mean to this infinitesimally small
367
00:26:22,190 --> 00:26:28,419
area and multiplied by it is distance and
calculate its moments.
368
00:26:28,419 --> 00:26:29,419
..
369
00:26:29,419 --> 00:26:38,700
So, the moment of inertia about the mean,
that I was telling that the second there is
370
00:26:38,700 --> 00:26:42,759
moment of inertia, second moment that is x
minus mu x.
371
00:26:42,759 --> 00:26:47,790
So, this x minus mu x is,
coming from that as, because we are taking
372
00:26:47,790 --> 00:26:49,309
it with respect to the mean.
373
00:26:49,309 --> 00:26:54,460
So, the distance
now, this function which we are taking first
374
00:26:54,460 --> 00:27:00,200
only x because that was from the origin that
was actually that was x minus 0.
375
00:27:00,200 --> 00:27:03,080
So, as we are taking it with respect to the
mean, we are
376
00:27:03,080 --> 00:27:09,159
taking that x minus mu x and second moments,
we are taking it is square multiplied by
377
00:27:09,159 --> 00:27:15,059
this small length dx and it is value at that
point x.
378
00:27:15,059 --> 00:27:19,700
And this quantity, we are integrating it
for this full support area to get this one.
379
00:27:19,700 --> 00:27:22,749
So, this is also what we see is that, this
is also now.
380
00:27:22,749 --> 00:27:27,669
Now, this one from this earlier
expression of this moments, this is the second
381
00:27:27,669 --> 00:27:31,490
moment about the mean of this irregularly
shaped area.
382
00:27:31,490 --> 00:27:36,490
So, comparing it with the expression for the
variance of a continuous
383
00:27:36,490 --> 00:27:42,411
random variable, the variance can be referred
as the second moment of the pdf of a
384
00:27:42,411 --> 00:27:48,259
random variable about the mean, what we discuss
so far.
385
00:27:48,259 --> 00:27:53,690
And also in the last class, we at
the towards the end, we discuss the first
386
00:27:53,690 --> 00:27:59,390
moment of the pdf of a random variable about
the mean is zero.
387
00:27:59,390 --> 00:28:03,100
And this is discussed why it will be zero,
because whatever the right
388
00:28:03,100 --> 00:28:07,620
hand side of this mean and left hand side
of this mean and power; obviously, will be
389
00:28:07,620 --> 00:28:12,350
zero, will negate each other, will cancel
each other that is why that moment will always
390
00:28:12,350 --> 00:28:18,719
become zero, irrespective of the safe of the
pdf.
391
00:28:18,719 --> 00:28:19,719
..
392
00:28:19,719 --> 00:28:28,139
So, in general if I see, that nth moment of
a random variable about the origin is expressed
393
00:28:28,139 --> 00:28:30,619
as expectation of X power n.
394
00:28:30,619 --> 00:28:33,100
If I just now, we are taking as the with respect
to the origin
395
00:28:33,100 --> 00:28:39,690
that is why x power n is minus infinity to
plus infinity x power n fX(x) dx.
396
00:28:39,690 --> 00:28:45,220
And the
similarly, that nth moment of the random variable
397
00:28:45,220 --> 00:28:52,830
about mean is expressed as that,
expectation X minus mu x power n is equals
398
00:28:52,830 --> 00:28:56,110
to minus infinity to plus infinity from this
x
399
00:28:56,110 --> 00:28:57,110
minus.
400
00:28:57,110 --> 00:29:00,639
So, this expression that we have seen that,
this is for the nth moment with respect
401
00:29:00,639 --> 00:29:01,639
to the mean.
402
00:29:01,639 --> 00:29:06,570
Hence the coefficient of skewness can be referred
as the third moment of
403
00:29:06,570 --> 00:29:12,370
the pdf of a random variable about the mean
normalized by its cube of the standard
404
00:29:12,370 --> 00:29:19,549
deviation is exactly what we discuss, when
we are discussing it is estimates.
405
00:29:19,549 --> 00:29:25,241
So, this is the cube of the standard deviation
means, this we are taking the third moment
406
00:29:25,241 --> 00:29:29,880
to cancel this one to make it dimensionless,
we are normalizing it by with respect to the
407
00:29:29,880 --> 00:29:32,309
cube of the standard deviation.
408
00:29:32,309 --> 00:29:36,039
Similarly, for this coefficient of kurtosis
this can be
409
00:29:36,039 --> 00:29:41,460
referred as the fourth moment of the pdf of
the random variable about the mean
410
00:29:41,460 --> 00:29:44,750
normalized by its fourth power of the standard
deviation.
411
00:29:44,750 --> 00:29:50,679
So, these analogies if you keep
in mind, then you will know that why we call
412
00:29:50,679 --> 00:29:58,600
this quantity as the moment of the pdf with
respect to origin or with respect to mean
413
00:29:58,600 --> 00:30:03,279
as the case may be.
414
00:30:03,279 --> 00:30:04,279
..
415
00:30:04,279 --> 00:30:10,549
Now, as just few minutes back I was discussing
that these four things, that we have
416
00:30:10,549 --> 00:30:17,669
discussed that is central tendency, variance,
variation dispersion, then its measure of
417
00:30:17,669 --> 00:30:19,929
symmetry, measure of peakedness.
418
00:30:19,929 --> 00:30:23,610
All these things are the representation of
one moment
419
00:30:23,610 --> 00:30:29,029
with respect to one moment, with respect to
mean.
420
00:30:29,029 --> 00:30:34,890
And in this way we can go on
increasing to take the moments from 5th, 6th,
421
00:30:34,890 --> 00:30:38,119
7th, and in this way.
422
00:30:38,119 --> 00:30:41,399
So, every time I have to
define that function I have to normalize to
423
00:30:41,399 --> 00:30:42,649
this one.
424
00:30:42,649 --> 00:30:47,740
Now the next thing, that we are going to discuss
is the moment generating function and
425
00:30:47,740 --> 00:30:52,700
we will see with the help of a single function,
how all the moments can be known.
426
00:30:52,700 --> 00:30:55,299
And
this is very important in the sense, if all
427
00:30:55,299 --> 00:30:58,309
the moments are known, then it is equivalent
to
428
00:30:58,309 --> 00:31:03,460
know all the properties of this pdf that is
why, this moment generating function is very
429
00:31:03,460 --> 00:31:06,320
important and this is what we are going to
discuss next.
430
00:31:06,320 --> 00:31:15,460
So, this moment generating function, the expectation
of e power sX this X is the random
431
00:31:15,460 --> 00:31:16,460
variable here.
432
00:31:16,460 --> 00:31:20,710
So, this e powers sX is again a function of
the random variable X.
433
00:31:20,710 --> 00:31:26,080
So,
expectation of e power sX, which is a function
434
00:31:26,080 --> 00:31:31,950
of the random variable X is known as the
moment generating function of the random variable
435
00:31:31,950 --> 00:31:32,950
X.
436
00:31:32,950 --> 00:31:35,249
So, I have to take the expectation
of this value.
437
00:31:35,249 --> 00:31:38,370
So, how this expectation looks like.
438
00:31:38,370 --> 00:31:46,080
So, this moment generating function which
is denoted by GX(s) is expressed as GX(s)
439
00:31:46,080 --> 00:31:51,571
equals to expectation of e power sX equals
to minus infinity to plus infinity e power
440
00:31:51,571 --> 00:31:54,919
sX
this function I am taking and that multiplied
441
00:31:54,919 --> 00:31:58,560
by this probability density function and
442
00:31:58,560 --> 00:32:04,690
.integrating with respect to dx minus infinity
to infinity.
443
00:32:04,690 --> 00:32:11,010
So, this particular function is
your that moment generating function.
444
00:32:11,010 --> 00:32:14,940
So, if X is discrete, then this is for this
continuous
445
00:32:14,940 --> 00:32:16,320
that is why we took this integration.
446
00:32:16,320 --> 00:32:20,509
Now, if X is discrete then again show GX(s)
is
447
00:32:20,509 --> 00:32:27,830
expressed as the similar way to how we express
that expectation of a function of random
448
00:32:27,830 --> 00:32:33,669
variable of a function of a discrete random
variable, this is the summation of all xi
449
00:32:33,669 --> 00:32:37,169
e
power sxi multiplied by the value of this
450
00:32:37,169 --> 00:32:40,429
probability all those respective points.
451
00:32:40,429 --> 00:32:48,309
So, this is expression for the discrete random
variable and this is for this continuous
452
00:32:48,309 --> 00:32:51,789
random variable is nothing but this moment
generating function.
453
00:32:51,789 --> 00:32:56,480
Now, we will see why
these things are here, how these things are
454
00:32:56,480 --> 00:32:59,029
useful and why these things are important.
455
00:32:59,029 --> 00:33:00,029
.
456
00:33:00,029 --> 00:33:03,470
And these usefulness of this moment generating
function.
457
00:33:03,470 --> 00:33:10,289
That first derivative of that
GX(s), evaluated as s equals to 0 results
458
00:33:10,289 --> 00:33:13,749
in the expected value, which is the first
moment
459
00:33:13,749 --> 00:33:18,619
of the random variable with respect to the
origin.
460
00:33:18,619 --> 00:33:25,340
Now, derivative of this GX(s) and so
the this is the first derivative I am taking
461
00:33:25,340 --> 00:33:27,789
and after this derivative I am putting that
x
462
00:33:27,789 --> 00:33:29,179
equals to 0.
463
00:33:29,179 --> 00:33:35,720
So, by this simple calculus, we will get that
from this will basically turn to
464
00:33:35,720 --> 00:33:41,630
that minus infinity to plus infinity one x
we will get due to the derivative and if our
465
00:33:41,630 --> 00:33:44,809
sx
when we are putting s equals to 0 that becoming
466
00:33:44,809 --> 00:33:48,629
1 so that is 1 multiplied by fX(x) dx.
467
00:33:48,629 --> 00:33:54,269
Now, this function is nothing but the first
moment of that.
468
00:33:54,269 --> 00:34:02,659
So, when we are taking the
first derivative, the relative function is
469
00:34:02,659 --> 00:34:05,489
your first moment with respect to the origin.
470
00:34:05,489 --> 00:34:11,020
.Similarly, the second derivative of GX(s),
evaluated at s equals to 0 results in the
471
00:34:11,020 --> 00:34:13,500
second
moment of the random variable with respect
472
00:34:13,500 --> 00:34:14,500
to the origin.
473
00:34:14,500 --> 00:34:15,500
.
474
00:34:15,500 --> 00:34:16,500
Let us see.
475
00:34:16,500 --> 00:34:21,669
So, we are taking the double derivative and
evaluated with respect to ds.
476
00:34:21,669 --> 00:34:24,470
So,
this x now, is becoming x multiplied by x
477
00:34:24,470 --> 00:34:27,730
that is x square multiplied e power sx, s
equals
478
00:34:27,730 --> 00:34:28,730
to 0.
479
00:34:28,730 --> 00:34:32,429
So, e power sx is equals to 1 multiplied by
this fX(x).
480
00:34:32,429 --> 00:34:36,599
So, this is nothing but again
that second moment with the respect to, this
481
00:34:36,599 --> 00:34:40,389
is expression for the second moment with
respect to origin.
482
00:34:40,389 --> 00:34:45,240
So, thus if we just continue, the first moment,
second moment and in
483
00:34:45,240 --> 00:34:54,099
general, if we want to express that the nth
moment, nth derivative of GX(s), evaluated
484
00:34:54,099 --> 00:35:00,579
at
x equals to 0 results in the nth moment of
485
00:35:00,579 --> 00:35:04,480
the random variable with respect to origin.
486
00:35:04,480 --> 00:35:12,000
So, that nth derivative here again so this
as, we are taking this nth derivative, this
487
00:35:12,000 --> 00:35:14,599
x
power is becoming n and which is nothing but
488
00:35:14,599 --> 00:35:19,760
the expression for the nth moment of the
random variable, basically the pdf of the
489
00:35:19,760 --> 00:35:22,260
random variable here.
490
00:35:22,260 --> 00:35:26,610
So, this is how we can
use this one.
491
00:35:26,610 --> 00:35:28,250
So, if we know this GX(s).
492
00:35:28,250 --> 00:35:31,119
Now, we can take the derivative of whatever
the
493
00:35:31,119 --> 00:35:33,460
derivative that we need.
494
00:35:33,460 --> 00:35:38,540
So, whatever order of this derivative, that
we need that nth
495
00:35:38,540 --> 00:35:39,540
order.
496
00:35:39,540 --> 00:35:41,460
So, this n can be in general.
497
00:35:41,460 --> 00:35:44,681
So, nth order derivative, we can take and
we put that x
498
00:35:44,681 --> 00:35:45,750
equals to 0.
499
00:35:45,750 --> 00:35:49,280
So, we will get directly that nth moment.
500
00:35:49,280 --> 00:35:53,869
So, this is the usefulness of this
moment generating function.
501
00:35:53,869 --> 00:35:54,869
..
502
00:35:54,869 --> 00:36:01,460
Now, there is another one called the characteristic
function, this expectation of this e
503
00:36:01,460 --> 00:36:02,460
power isx.
504
00:36:02,460 --> 00:36:07,270
Now, instead of the putting that sx, we are
putting one complex variable that
505
00:36:07,270 --> 00:36:11,020
is i, i is equals to square root of minus
1.
506
00:36:11,020 --> 00:36:14,420
So, if we take the expectation of this one
this
507
00:36:14,420 --> 00:36:18,660
expectation is known as the characteristic
function of that random variable X.
508
00:36:18,660 --> 00:36:25,530
So, his
characteristic function is denoted as ϕX
509
00:36:25,530 --> 00:36:26,860
(s).
510
00:36:26,860 --> 00:36:30,309
So, we know if it is a expectation of this
one.
511
00:36:30,309 --> 00:36:38,119
So, this can be expressed as that expectation
of this e power isx, which is the integration
512
00:36:38,119 --> 00:36:43,369
of this minus infinity to plus infinity e
power isx fX(x) dx.
513
00:36:43,369 --> 00:36:49,980
Now, how we can relate this
particular function with respect to the moments
514
00:36:49,980 --> 00:36:50,980
that.
515
00:36:50,980 --> 00:36:51,980
.
516
00:36:51,980 --> 00:36:56,600
.So, we can take that again for this characteristic
function also, we can take that nth
517
00:36:56,600 --> 00:37:03,240
derivative of that one and we can multiply
it by 1 by i power n.
518
00:37:03,240 --> 00:37:08,590
And that function is
evaluated at s equals to 0, which is nothing
519
00:37:08,590 --> 00:37:17,609
but will give you that expectation of X power
n that is the minus infinity to plus infinity
520
00:37:17,609 --> 00:37:20,690
X power n multiplied by fX(x) dx.
521
00:37:20,690 --> 00:37:25,670
Where i
equals to 1, this characteristic function,
522
00:37:25,670 --> 00:37:28,210
use of this characteristic function is mostly
is
523
00:37:28,210 --> 00:37:33,391
seen in the electrical engineering not much
in the civil engineering, but this moment
524
00:37:33,391 --> 00:37:41,119
generating function is important in this field.
525
00:37:41,119 --> 00:37:47,650
Now, as we have seen that, we have some descriptors
of this random variable and that
526
00:37:47,650 --> 00:37:50,740
random variable, we have discussed.
527
00:37:50,740 --> 00:37:55,050
And from this; now, if we have some sample
data
528
00:37:55,050 --> 00:38:00,880
that is available to us, now that sample data
we have to first of all represent graphically,
529
00:38:00,880 --> 00:38:06,940
just to see how it is shape look like, how
it is disperse so that how it is displayed.
530
00:38:06,940 --> 00:38:10,140
So that,
we can get some idea about that; about its
531
00:38:10,140 --> 00:38:12,770
distribution pattern and about its distribution.
532
00:38:12,770 --> 00:38:19,420
So, now these graphical representation of
this sample data, sample data for any random
533
00:38:19,420 --> 00:38:25,200
experiment that outcome, that data that we
are having in the contest of this civil
534
00:38:25,200 --> 00:38:30,550
engineering we can take that stream flow data,
rainfall data or the strength of the
535
00:38:30,550 --> 00:38:36,750
concrete number or accidents on the particular
stretch of high way.
536
00:38:36,750 --> 00:38:39,810
So, these are some
sample data that we can see.
537
00:38:39,810 --> 00:38:43,770
And now, if we want to see that how this random
variable behaves.
538
00:38:43,770 --> 00:38:47,760
So, we have to plot
that data first and we have to see that how
539
00:38:47,760 --> 00:38:48,760
this distribution look like.
540
00:38:48,760 --> 00:38:54,890
And there are few
techniques how graphically we can represent
541
00:38:54,890 --> 00:38:56,950
that particular dataset.
542
00:38:56,950 --> 00:39:01,970
So that is what we
are going to discuss next two different ways
543
00:39:01,970 --> 00:39:05,620
different popular ways, how we can
represent the sample data in terms of the
544
00:39:05,620 --> 00:39:11,150
graphical views.
545
00:39:11,150 --> 00:39:12,150
..
546
00:39:12,150 --> 00:39:15,390
Before that, we will just to conclude that
the descriptor of the random variable, we
547
00:39:15,390 --> 00:39:18,230
will
put some note on this.
548
00:39:18,230 --> 00:39:22,069
That the description of the probabilistic
characteristics of a
549
00:39:22,069 --> 00:39:27,920
random variable can be given in terms of this
main descriptors of this random variable.
550
00:39:27,920 --> 00:39:36,420
The central value of a random variable can
be expressed in terms of mean, mode or
551
00:39:36,420 --> 00:39:40,180
median that we have seen in the last class
mostly that mean is also called the first
552
00:39:40,180 --> 00:39:43,850
moment of this pdf of the random variable
about the origin.
553
00:39:43,850 --> 00:39:48,369
That we have seen, when we
are discussing that representation of this
554
00:39:48,369 --> 00:39:51,119
area that is analogy with this moment of the
area
555
00:39:51,119 --> 00:39:54,040
pdf.
556
00:39:54,040 --> 00:40:00,960
Then this dispersion, asymmetry and peakedness
of the pdf of a random variable
557
00:40:00,960 --> 00:40:08,470
described in terms of their second moment,
third moment and fourth moment of the pdf
558
00:40:08,470 --> 00:40:11,599
of the random variable about it is mean this
is important.
559
00:40:11,599 --> 00:40:16,800
So that when we are taking this,
when we are want to know that is dispersion,
560
00:40:16,800 --> 00:40:22,530
asymmetry and peakedness, we are taking
that moment with respect to it is mean.
561
00:40:22,530 --> 00:40:26,120
And last, we saw that this moment generating
function and this characteristic function
562
00:40:26,120 --> 00:40:30,369
are helpful, because these help to obtain
all the
563
00:40:30,369 --> 00:40:36,010
moments of a random variable in an alternative
way.
564
00:40:36,010 --> 00:40:37,010
..
565
00:40:37,010 --> 00:40:41,470
Now, what just now, we are discussing that
we will know see how what are the different
566
00:40:41,470 --> 00:40:46,770
techniques that this sample data and this
sample data how we can represent graphically,
567
00:40:46,770 --> 00:40:50,490
you can represent that sample data.
568
00:40:50,490 --> 00:40:59,819
So, for convenient representation of data,
a data set may be divided into halves or
569
00:40:59,819 --> 00:41:00,829
quarters.
570
00:41:00,829 --> 00:41:09,880
When an ordered dataset is divided into halves,
the division point is called the
571
00:41:09,880 --> 00:41:10,900
median.
572
00:41:10,900 --> 00:41:16,319
So, ordered dataset here means, that we are
just arranging the dataset in an
573
00:41:16,319 --> 00:41:21,940
increasing order and we will see that where
it is exactly dividing it between two and
574
00:41:21,940 --> 00:41:25,839
that
division point, that particular point is known
575
00:41:25,839 --> 00:41:27,319
as the median.
576
00:41:27,319 --> 00:41:31,750
As you have seen that it is
that 50 percent, when the 50 percent probability
577
00:41:31,750 --> 00:41:34,630
is covered they have called it is median.
578
00:41:34,630 --> 00:41:42,180
So and when an ordered dataset is divided
into quarters, that is four different quarters,
579
00:41:42,180 --> 00:41:45,599
then the division points are called the sample
quartiles.
580
00:41:45,599 --> 00:41:46,599
..
581
00:41:46,599 --> 00:41:52,380
So, these are some terms that will know in
detail that is first is that quartile.
582
00:41:52,380 --> 00:41:59,900
So, this first
quartiles is the Q1 represent as Q1.
583
00:41:59,900 --> 00:42:06,710
It is the value of the dataset such that one-fourth
of the
584
00:42:06,710 --> 00:42:10,560
observations are less than this particular
value.
585
00:42:10,560 --> 00:42:12,740
So, this is the first fourth.
586
00:42:12,740 --> 00:42:16,099
So, when we
are taking this a dataset and we are just
587
00:42:16,099 --> 00:42:18,160
making it in an increasing order.
588
00:42:18,160 --> 00:42:21,750
So, it is that
particular point when we say that one-fourth
589
00:42:21,750 --> 00:42:26,599
of the observation as less than that
particular value that is the first quartile.
590
00:42:26,599 --> 00:42:32,400
Similarly, the second quartile means, when
we are talking about this 50 percent of the
591
00:42:32,400 --> 00:42:35,340
data that is half of the data.
592
00:42:35,340 --> 00:42:40,390
It is the value of that dataset such that
the half of the
593
00:42:40,390 --> 00:42:43,900
observations are less than this particular
value.
594
00:42:43,900 --> 00:42:50,839
So, this is known as this as this quartile,
which is also equivalent to the median, just
595
00:42:50,839 --> 00:42:54,369
what we discuss because it is the half, half
of
596
00:42:54,369 --> 00:42:56,789
the data less than that particular value.
597
00:42:56,789 --> 00:43:01,079
And third quartile is similar to the earlier,
but this
598
00:43:01,079 --> 00:43:05,210
is that three-fourth that is the 75 percent
of the data when it is less than.
599
00:43:05,210 --> 00:43:10,850
So, it is a value
of the dataset such that three-fourth of the
600
00:43:10,850 --> 00:43:15,049
observations are less than this particular
value.
601
00:43:15,049 --> 00:43:20,830
Now, why this things is important, this quartile
as important so that if we know that the
602
00:43:20,830 --> 00:43:24,309
different quartile, if know that total range
of this dataset.
603
00:43:24,309 --> 00:43:28,180
And after that, if we know that
what are it is quartile, then this will give
604
00:43:28,180 --> 00:43:34,250
you some idea that how the distribution of
this
605
00:43:34,250 --> 00:43:36,010
dataset over a particular range.
606
00:43:36,010 --> 00:43:40,079
Whether if you see that there are lot of data
is there
607
00:43:40,079 --> 00:43:48,349
within the first quartile or the first quartile
or in between the first and second quartile
608
00:43:48,349 --> 00:43:49,620
or
whatever it is.
609
00:43:49,620 --> 00:43:53,780
So, this quartile when we are giving this
quartile value along with the total
610
00:43:53,780 --> 00:44:00,650
.range of this dataset this helps us to understand
how dispersed the dataset is over it is
611
00:44:00,650 --> 00:44:02,230
entire range.
612
00:44:02,230 --> 00:44:08,800
Otherwise simply if we see that what is this
median value, that gives only that where the
613
00:44:08,800 --> 00:44:11,520
50 percent data is there.
614
00:44:11,520 --> 00:44:15,609
So, instead of that if we get a distributed
fashion like this, then it
615
00:44:15,609 --> 00:44:21,160
will be helpful to know it is dispersion and
it will be more graphically it is done, in
616
00:44:21,160 --> 00:44:24,119
terms
of the different box plot that will come in
617
00:44:24,119 --> 00:44:29,180
a minute before that we will see that in terms
of the percentile.
618
00:44:29,180 --> 00:44:30,180
.
619
00:44:30,180 --> 00:44:35,910
So, if when we talk about the percentile of
a particular dataset that is available to
620
00:44:35,910 --> 00:44:39,789
us, that
a 100 multiplied by pth percentile.
621
00:44:39,789 --> 00:44:45,000
So, before I go this one this p is your cumulative
probability.
622
00:44:45,000 --> 00:44:50,590
So, here p; so, if we see that p is the cumulative
probability so that it varies
623
00:44:50,590 --> 00:44:52,430
from the 0 to 1.
624
00:44:52,430 --> 00:44:57,410
Now, for any percentile value if we so for
the p equals to 0.1 means, this
625
00:44:57,410 --> 00:45:04,250
quantity will give you the 10th percentile
or p equals to 0.6 will give you the 60th
626
00:45:04,250 --> 00:45:05,250
percentile value.
627
00:45:05,250 --> 00:45:11,511
So, in general the 100pth percentile value
in an ordered dataset, ordered
628
00:45:11,511 --> 00:45:17,069
dataset means, you have just ordered dataset
means it is arranged in an ascending order.
629
00:45:17,069 --> 00:45:24,079
Ordered dataset is such that the 100 multiplied
by p percent of the observations are equal
630
00:45:24,079 --> 00:45:27,030
to or less than this particular value.
631
00:45:27,030 --> 00:45:33,690
So, basically from the quartile to the percentile
so percentile we are making it more
632
00:45:33,690 --> 00:45:40,070
general and define for each and every value
that we need.
633
00:45:40,070 --> 00:45:46,400
So, by making this p is any
number; is any continuous number from starting
634
00:45:46,400 --> 00:45:49,010
from the 0 to 1 put any value that
635
00:45:49,010 --> 00:45:51,990
.particular percentile you will obtain.
636
00:45:51,990 --> 00:45:55,480
Now, if we just want to see that quartile
with
637
00:45:55,480 --> 00:46:01,710
respect to the percentile, then it is very
easy to compare that this first, second and
638
00:46:01,710 --> 00:46:06,569
the
third quartile are equal to the 25th, 50th
639
00:46:06,569 --> 00:46:09,440
and 75th, percentile values in an ordered
dataset
640
00:46:09,440 --> 00:46:10,440
respectively.
641
00:46:10,440 --> 00:46:15,579
So, this first quartile is nothing but the
25th percentile, second quartile is
642
00:46:15,579 --> 00:46:21,539
nothing but the 50th percentile and third
quartile is nothing but the 75th percentile.
643
00:46:21,539 --> 00:46:22,539
.
644
00:46:22,539 --> 00:46:29,030
Now, this kind of representation if we want
to present in a graphical way, then we will
645
00:46:29,030 --> 00:46:34,470
see that this can be done through a through
a box plot.
646
00:46:34,470 --> 00:46:42,839
Now, to prepare that box plot we
have to know the percentiles, particularly
647
00:46:42,839 --> 00:46:50,170
quartiles that is 25th 50th and 75th and
sometimes the 5th and 95th as well.
648
00:46:50,170 --> 00:46:53,740
So, to get a particular percentile that is
100pth
649
00:46:53,740 --> 00:46:55,820
percentile, how can we calculate the percentile.
650
00:46:55,820 --> 00:46:58,240
So, these are the steps involved in that.
651
00:46:58,240 --> 00:47:00,910
So, suppose that there are total dataset number
of
652
00:47:00,910 --> 00:47:02,599
dataset there is available is n.
653
00:47:02,599 --> 00:47:06,440
So, n observations of the dataset are ordered
from the
654
00:47:06,440 --> 00:47:13,119
smallest to the largest that is in an ascending
order, the available data set of size n is
655
00:47:13,119 --> 00:47:14,680
arranged first.
656
00:47:14,680 --> 00:47:19,380
The product of np is determined so if p is
known a particular desired
657
00:47:19,380 --> 00:47:20,720
percentile is known.
658
00:47:20,720 --> 00:47:23,299
So, that particular p value is known to you.
659
00:47:23,299 --> 00:47:26,000
So, I know thus total
number of dataset available n.
660
00:47:26,000 --> 00:47:30,490
So, np this product that multiplication of
n and p is
661
00:47:30,490 --> 00:47:31,490
known.
662
00:47:31,490 --> 00:47:34,100
Now, if this np is an integer.
663
00:47:34,100 --> 00:47:38,589
So, np can become one integer or it may not
be an
664
00:47:38,589 --> 00:47:39,660
integer.
665
00:47:39,660 --> 00:47:48,309
So, if it is an integer say that integer is
k, then the mean of the kth and k plus
666
00:47:48,309 --> 00:47:49,880
.first observation.
667
00:47:49,880 --> 00:47:55,369
So, from k and immediate next observation
these two observations are
668
00:47:55,369 --> 00:48:00,760
taken and their mean is calculated and that
gives that 100pth percentile.
669
00:48:00,760 --> 00:48:09,450
If np is not an integer, then it is rounded
up to the next highest integer and the
670
00:48:09,450 --> 00:48:13,299
corresponding observation gives that 100pth
percentile.
671
00:48:13,299 --> 00:48:18,730
This can also be done with
respect to that, that you can also linearly
672
00:48:18,730 --> 00:48:22,010
change this particular with respect to if
it is not
673
00:48:22,010 --> 00:48:29,529
an integer, then these two values that is
kth and in between this k plus first observation.
674
00:48:29,529 --> 00:48:34,020
In between this two, this also can be linearly
interpolated.
675
00:48:34,020 --> 00:48:40,559
Or for simplicity sake it can be
taken to the nearest, a nearest integer value
676
00:48:40,559 --> 00:48:45,750
and that particular value we will give the
100pth percentile.
677
00:48:45,750 --> 00:48:46,750
.
678
00:48:46,750 --> 00:48:54,079
So, now in this way whatever the desired percentile
is required that is known to us and
679
00:48:54,079 --> 00:48:57,990
we can represent that information in terms
of the box plot.
680
00:48:57,990 --> 00:49:05,980
So, the information regarding
the quartile and the inter quartile range
681
00:49:05,980 --> 00:49:12,190
in an ordered dataset can be depicted with
the
682
00:49:12,190 --> 00:49:14,990
help of a diagram called the box plot.
683
00:49:14,990 --> 00:49:19,369
Now, the significant information depicted
on a box
684
00:49:19,369 --> 00:49:27,309
plot are the sample minimum, first quartile,
median or second quartile, third quartile
685
00:49:27,309 --> 00:49:31,029
and
sample maximum.
686
00:49:31,029 --> 00:49:37,770
So, these five points are essential before
we prepare one box plots.
687
00:49:37,770 --> 00:49:41,260
So, we know that how
to get that from a dataset, just know we have
688
00:49:41,260 --> 00:49:43,550
seen how to obtain this first quartile, how
to
689
00:49:43,550 --> 00:49:48,990
obtain the second quartile and third quartile
as well as a sample minimum value and as
690
00:49:48,990 --> 00:49:51,569
well as this sample maximum values.
691
00:49:51,569 --> 00:49:52,569
..
692
00:49:52,569 --> 00:49:57,200
Now, once we get this information, this is
represented in such a way.
693
00:49:57,200 --> 00:50:01,309
That range between
the first and the third quartile, first means
694
00:50:01,309 --> 00:50:08,050
that 25th percentile and third is the 75th
percentile, the range between these two quartiles
695
00:50:08,050 --> 00:50:11,310
is represented by a rectangle.
696
00:50:11,310 --> 00:50:12,310
.
697
00:50:12,310 --> 00:50:22,670
So, and the median, so if I just one after
another if I just draw it here, so this one
698
00:50:22,670 --> 00:50:30,630
is your,
the 1st quartile and this one say is your
699
00:50:30,630 --> 00:50:36,539
3rd quartile and this is represented in terms
of a
700
00:50:36,539 --> 00:50:37,720
rectangle.
701
00:50:37,720 --> 00:50:45,119
Now, the second step is that the median value
is represented by a line within
702
00:50:45,119 --> 00:50:46,210
this rectangle.
703
00:50:46,210 --> 00:50:51,750
Now, this median value, with this one is that
it can be either, it is not
704
00:50:51,750 --> 00:50:56,010
.necessary that we will be exactly middle
of this rectangle; it can be any point either
705
00:50:56,010 --> 00:50:59,440
this
side or towards that side it will be.
706
00:50:59,440 --> 00:51:05,690
Then, the range between the first quartile
and the
707
00:51:05,690 --> 00:51:10,230
minimum value and also between the third quartile
and the maximum value are
708
00:51:10,230 --> 00:51:12,440
connected by lines.
709
00:51:12,440 --> 00:51:16,650
So that the minimum suppose, there this is
the minimum value and
710
00:51:16,650 --> 00:51:24,289
this the maximum value, then this two are
connected by the lines.
711
00:51:24,289 --> 00:51:26,490
So these two are connected by these lines.
712
00:51:26,490 --> 00:51:29,579
Now, again for the very large dataset that
5th
713
00:51:29,579 --> 00:51:35,559
percentile and 95th percentile values may
be used in the place of the sample minimum
714
00:51:35,559 --> 00:51:38,069
and the sample maximum respectively.
715
00:51:38,069 --> 00:51:39,069
.
716
00:51:39,069 --> 00:51:44,980
So, here is one representation is shown, the
dispersion and the skewness of the dataset
717
00:51:44,980 --> 00:51:47,990
as,
we just discuss is that can be indicated by
718
00:51:47,990 --> 00:51:51,180
the spacing between the parts of the box in
a
719
00:51:51,180 --> 00:51:52,180
box plot.
720
00:51:52,180 --> 00:51:56,380
The box plot can indicate the outliers, if
any, in the population.
721
00:51:56,380 --> 00:51:59,620
So, here is one
example is given, so this is the real axis
722
00:51:59,620 --> 00:52:01,410
and where this box plot is shown.
723
00:52:01,410 --> 00:52:12,440
So, this is the
11.1 is the sample minimum or the 5th percentile,
724
00:52:12,440 --> 00:52:17,829
19.85 is your first quartile, 23.9 is
median or second quartile, 27.85 is the third
725
00:52:17,829 --> 00:52:23,000
quartile and 36.7 is the is 95th quantile
or
726
00:52:23,000 --> 00:52:26,430
the sample maximum value.
727
00:52:26,430 --> 00:52:27,430
..
728
00:52:27,430 --> 00:52:31,289
There are other ways also, how we can graphically
represent this one, the first one is the
729
00:52:31,289 --> 00:52:32,289
histogram.
730
00:52:32,289 --> 00:52:35,490
In a histogram, the area of the rectangle
gives the probability and thus the
731
00:52:35,490 --> 00:52:39,430
height of the rectangle is proportional to
the probability.
732
00:52:39,430 --> 00:52:44,230
The rectangles representing the
successive values of the random variable always
733
00:52:44,230 --> 00:52:49,679
touch those adjacent to them so that
there are no gaps in between.
734
00:52:49,679 --> 00:52:50,679
.
735
00:52:50,679 --> 00:52:56,490
So, here is one example of this histogram
is shown so that any height of this rectangle
736
00:52:56,490 --> 00:52:58,799
is
proportional to its probability.
737
00:52:58,799 --> 00:53:03,510
Basically, in this range number of that frequency,
in this
738
00:53:03,510 --> 00:53:06,920
.particular class is shown by this numbers
in this axis.
739
00:53:06,920 --> 00:53:12,110
So, obviously the height of these
rectangles are represented by this histogram.
740
00:53:12,110 --> 00:53:18,130
Now, how we can be useful to this one by
graphically looking this one, we can see about
741
00:53:18,130 --> 00:53:23,320
its distribution and how it looks like and
we can fit some our desired distribution like
742
00:53:23,320 --> 00:53:29,160
this, we can have some idea how the
distribution is looks like.
743
00:53:29,160 --> 00:53:30,160
.
744
00:53:30,160 --> 00:53:31,530
But there is another one called the probability
bar chart.
745
00:53:31,530 --> 00:53:37,190
So, in a probability bar chart the
height of the rectangle is equal to the probability.
746
00:53:37,190 --> 00:53:43,309
However, unlike the probability
histogram, the width of this rectangle has
747
00:53:43,309 --> 00:53:45,410
no meaning; no significance.
748
00:53:45,410 --> 00:53:48,099
The rectangles
representing the successive values of the
749
00:53:48,099 --> 00:53:53,760
random variables do not touch their adjacent
ones.
750
00:53:53,760 --> 00:53:58,569
So, obviously there is no meaning only the
height of this bars are important and
751
00:53:58,569 --> 00:54:00,690
which is showing your probability.
752
00:54:00,690 --> 00:54:01,690
..
753
00:54:01,690 --> 00:54:05,099
So, here is one comparison when we call that
probability bar chart.
754
00:54:05,099 --> 00:54:09,510
That mean this, at
this particular data point, this is the probability
755
00:54:09,510 --> 00:54:15,720
associated with this one is 2; this can also
be scaled, so that this height of this rectangle
756
00:54:15,720 --> 00:54:20,630
can, we can that time will say that it is
proportional to the probability.
757
00:54:20,630 --> 00:54:24,369
So, these are the how we can represent.
758
00:54:24,369 --> 00:54:26,740
So, this is the
histogram, how we can represent the sample
759
00:54:26,740 --> 00:54:34,539
data, just to get the idea how it is distributed
over the entire range of this sample data.
760
00:54:34,539 --> 00:54:39,822
So with this, we will stop today and next
class we will see some description of this
761
00:54:39,822 --> 00:54:41,810
discrete random variable.
762
00:54:41,810 --> 00:54:46,839
So, today what we learned is that some more
descriptors of
763
00:54:46,839 --> 00:54:52,380
these random variable and their analogy with
this moment of area, area represented by
764
00:54:52,380 --> 00:54:53,930
this pdf.
765
00:54:53,930 --> 00:54:57,309
And then we have seen that how the graphically,
we can represent the sample
766
00:54:57,309 --> 00:55:03,980
data to get the idea of their distribution
over the entire range of this sample data.
767
00:55:03,980 --> 00:55:08,190
So, we
will see next lecture with the description
768
00:55:08,190 --> 00:55:10,270
of this discrete probability distribution.
769
00:55:10,270 --> 00:55:11,430
Thank
you.
770
00:55:11,430 --> 00:55:11,430
.