1
00:00:14,210 --> 00:00:15,840
hello and welcome to today's lecture so i
would begin by doing a brief recap of what
2
00:00:15,840 --> 00:00:20,340
we have discussed in last class mainly ways
of quantifying dispersion in a population
3
00:00:20,340 --> 00:00:26,110
or a sample right and one of the widely used
matrix for characterizing this variation in
4
00:00:26,110 --> 00:00:35,930
the data is using standard deviation right
you either use a sigma to describe standard
5
00:00:35,930 --> 00:00:43,160
deviation of a population and this is given
by summation summation of x minus mu whole
6
00:00:43,160 --> 00:01:02,969
square by capital n or s is for the sample
the summation of x minus x bar whole square
7
00:01:02,969 --> 00:01:17,380
by n minus one right so again once again note
this . minus one so when you are doing a sample
8
00:01:17,380 --> 00:01:37,049
then it is ah you know it is ah thought of
that by dividing by n minus one you get a
9
00:01:37,049 --> 00:01:41,810
better estimate of standard deviation of the
population right so what exactly is the practical
10
00:01:41,810 --> 00:01:46,770
significance of the standard deviation and
this we had discussed in last class and ah
11
00:01:46,770 --> 00:02:02,509
which is what chebyshevs theorem tells us
so it says that given a number k greater than
12
00:02:02,509 --> 00:02:11,090
one and a set of n measurements what you're
guaranteed is at least one minus one by k
13
00:02:11,090 --> 00:02:15,260
square proportion of the measurements will
lie within k standard deviations of their
14
00:02:15,260 --> 00:02:23,590
mean right so if i substitute k equal to two
then that would become one minus one fourth
15
00:02:23,590 --> 00:02:29,360
which is seventy five percent of the measurements
are expected to lie within one standard deviation
16
00:02:29,360 --> 00:02:38,190
of the mean which means seventy five percent
of the data will lie between ah mu or x bar
17
00:02:38,190 --> 00:02:49,790
minus sigma and x bar plus sigma
so as an example you have . n equal to twenty
18
00:02:49,790 --> 00:02:57,180
six mean seventy five variance is hundred
so in this case if i have x bar is equal to
19
00:02:57,180 --> 00:03:02,170
ah seventy five and n is equal to twenty six
variance is equal to hundred so i can roughly
20
00:03:02,170 --> 00:03:07,470
calculate s is approximately equal to ten
right so within seventy five plus minus ten
21
00:03:07,470 --> 00:03:14,750
you have three fourths of the population . right
so three fourths of the number of variables
22
00:03:14,750 --> 00:03:24,830
will actually lie within this range which
is minus sixty fi[ve]- ah sixty five to eighty
23
00:03:24,830 --> 00:03:30,480
five similarly i can do the same thing for
two standard deviations and so you have ah
24
00:03:30,480 --> 00:03:34,810
seventy five plus minus twenty in this case
we'll have contain one minus one by nine that
25
00:03:34,810 --> 00:03:40,610
is eight by nine fraction of the population
eight by nine is roughly ninety percent of
26
00:03:40,610 --> 00:03:51,310
the population but as i had state last week
. ah last class that for a generic distribution
27
00:03:51,310 --> 00:04:00,790
so chebyshevs theorem is is actually a very
conservative estimate so this is your x bar
28
00:04:00,790 --> 00:04:15,150
x bar minus s x bar plus s right so they say
that this much so this is roughly ah what
29
00:04:15,150 --> 00:04:28,870
we calculated is seventy five percent and
in the generic case . ok
30
00:04:28,870 --> 00:04:34,520
x bar plus two s and x bar minus two s this
is ninety percent of the population
31
00:04:34,520 --> 00:04:39,159
but chebyshevs theorem is a very conservation
approach so it does not make any assumptions
32
00:04:39,159 --> 00:04:44,599
of how the distribution of data is there in
contrast for what you observed for a mount
33
00:04:44,599 --> 00:04:52,979
like distribution which is gaussian distribution
a normal distribution you will see there is
34
00:04:52,979 --> 00:05:05,370
sixty eight percent of the data which we are
expected to be there within plus minus one
35
00:05:05,370 --> 00:05:11,469
standard deviation so as opposed to . seventy
five percent predicted by chebyshevs theorem
36
00:05:11,469 --> 00:05:16,150
ah in normal distribution sixty eight percent
stay within the plus minus one standard deviation
37
00:05:16,150 --> 00:05:20,479
ok but plus minus two standard deviations
ninety five percent of the data is there ok
38
00:05:20,479 --> 00:05:34,330
and three standard deviations ninety nine
point seven percent of the data is there so
39
00:05:34,330 --> 00:05:40,730
this also brings us to the concept of zee
score or it is relative standing right
40
00:05:40,730 --> 00:05:58,219
so what exactly is zee score it is basically
defined by x minus mean by standard deviation
41
00:05:58,219 --> 00:06:07,660
and you can do this calculation if ah you
know for a particular experiment if your mean
42
00:06:07,660 --> 00:06:16,840
is twenty five standard deviation is four
and x is thirty then zee score returns your
43
00:06:16,840 --> 00:06:25,219
value of thirty minus twenty five by four
which is one point one one point two five
44
00:06:25,219 --> 00:06:37,509
ok now you can use zee score to get an estimate
of what whether a particular data point is
45
00:06:37,509 --> 00:06:45,990
an outlier or not and this can be you know
clearly gleaned from this particular example
46
00:06:45,990 --> 00:06:51,759
we worked out in last class so what you see
. if you see look at the data points all the
47
00:06:51,759 --> 00:06:56,770
points are clustered between one and four
except for this one particular value which
48
00:06:56,770 --> 00:07:02,979
is fifteen right so we can clearly see it
seems to us that fifteen is outlier or very
49
00:07:02,979 --> 00:07:18,110
close to being an outlier you can do this
calculation we had worked out what exactly
50
00:07:18,110 --> 00:07:23,340
its value is we i do not remember but you
can find out whether as per this ah this statement
51
00:07:23,340 --> 00:07:28,979
if the zee score comes out to be greater than
three or not
52
00:07:28,979 --> 00:07:35,919
another way of ah you know characterizing
our relative standing is using the concept
53
00:07:35,919 --> 00:07:41,550
of percentile right so p th percentile is
the value which is greater than p percent
54
00:07:41,550 --> 00:07:46,720
of the measurements so hundred percentile
is essentially so that person who is in the
55
00:07:46,720 --> 00:07:53,050
hundred percent or ninety nine percentile
is pretty much better than it has performed
56
00:07:53,050 --> 00:08:02,249
better than ninety nine percent of the population
in a class ok so you can use these particular
57
00:08:02,249 --> 00:08:09,069
positions to determine how you will calculate
the first quartile or third quartile the second
58
00:08:09,069 --> 00:08:15,389
quartile . is of course at position point
five star n plus one is nothing but the median
59
00:08:15,389 --> 00:08:23,439
ok so this represents at what twenty five
percent of the data point is first quartile
60
00:08:23,439 --> 00:08:30,919
seventy five fifty percent of the data point
is third quartile and interquartile range
61
00:08:30,919 --> 00:08:38,479
is defined as q three minus q one
ok so using this ah you know these values
62
00:08:38,479 --> 00:08:45,720
one can plot what is called a box plot and
in a box plot so the lowest value is your
63
00:08:45,720 --> 00:08:56,870
minimum ok this this ah this the box outlines
so you have the ah q one which is the first
64
00:08:56,870 --> 00:09:08,130
quartile q two or the median q three or the
third quartile and this is your maximum so
65
00:09:08,130 --> 00:09:15,440
what you also see are points which may lie
outside these definition of box so if you
66
00:09:15,440 --> 00:09:22,210
take this point this coincides with the maximum
value of the distribution but this point or
67
00:09:22,210 --> 00:09:45,610
for that matter this point really is . much
outside the box limits so these points are
68
00:09:45,610 --> 00:09:50,530
examples of outliers and it is perhaps not
ah you know completely surprising that in
69
00:09:50,530 --> 00:09:53,340
many experimental data you do have outliers
ok
70
00:09:53,340 --> 00:10:02,500
so this square inside the box actually denotes
the median the mean what you see here in this
71
00:10:02,500 --> 00:10:07,540
particular population you have variables if
you look at the y axis you have variables
72
00:10:07,540 --> 00:10:14,570
which very all the way from around twenty
or thirty or fifty all the way to six hundred
73
00:10:14,570 --> 00:10:20,220
so when you take an average the effect of
that six hundred is going to have a much greater
74
00:10:20,220 --> 00:10:29,780
effect than a value of fifty which is why
in this particular case the mean is slightly
75
00:10:29,780 --> 00:10:37,100
shifted above the position of the median so
you take this particular example it is the
76
00:10:37,100 --> 00:10:43,580
other way round where the median is here and
the mean is here ok so based on this if you
77
00:10:43,580 --> 00:10:46,200
were to you know plot it in terms of histograms
so as opposed to having a distribution like
78
00:10:46,200 --> 00:10:50,470
this where your position of mean median mode
all coincide . you might either shift to the
79
00:10:50,470 --> 00:10:56,120
left or to the right ok so the way to detect
outliers is using this particular formula
80
00:10:56,120 --> 00:11:11,080
so you can ah you can construct fence where
the lower fence is given by q one minus one
81
00:11:11,080 --> 00:11:22,720
point five times interquartile range and the
upper. fence is q three plus one point five
82
00:11:22,720 --> 00:11:28,130
interquartile range ok so let us just work
out a sample case of how we will actually
83
00:11:28,130 --> 00:11:32,630
plot our box plot
ok so let me. write down the points you have
84
00:11:32,630 --> 00:11:37,140
the points three fifty three hundred five
twenty three forty three twenty two ninety
85
00:11:37,140 --> 00:11:44,180
two sixty and three thirty ok so first step
of course is to sort in ascending order so
86
00:11:44,180 --> 00:11:53,480
my lowest value here is two sixty then . two
ninety you can have three hundred . i have
87
00:11:53,480 --> 00:11:57,720
a three twenty three twenty . three thirty
. two three forties . and one five twenty
88
00:11:57,720 --> 00:12:05,540
. ok so we can already see clearly here that
as opposed to all of these points which kind
89
00:12:05,540 --> 00:12:09,890
of are clustered together this data point
seems to be out of the plot ok
90
00:12:09,890 --> 00:12:18,650
so but let us find out do our necessary calculations
once again so you have to two sixty two ninety
91
00:12:18,650 --> 00:12:25,220
three hundred three twenty three thirty three
forty three forty five twenty so my total
92
00:12:25,220 --> 00:12:43,390
number of measurements is one two three four
five six seven eight n is equal to eight that
93
00:12:43,390 --> 00:12:55,720
means ah . ok so my median position is going
to be somewhere like this . my median value
94
00:12:55,720 --> 00:12:59,800
will be half of three twenty and three thirty
ok equal to three twenty five the position
95
00:12:59,800 --> 00:13:11,910
of q one position will be one times n plus
one equal to nine by four it was two point
96
00:13:11,910 --> 00:13:20,040
two five so after two so this is going to
be the position of q one this is your median
97
00:13:20,040 --> 00:13:26,950
ok and q three position is going to be three
forth into nine its twenty seven by four is
98
00:13:26,950 --> 00:13:33,780
six four twenty four six point seven five
so one two three four five six seven right
99
00:13:33,780 --> 00:13:41,110
so q three is going to somewhere here ok
so my q one value will be two ninety plus
100
00:13:41,110 --> 00:13:48,430
point two five times ten which is going to
be two ninety plus two point five . two ninety
101
00:13:48,430 --> 00:13:57,920
two point five q three is going to be six
point seven five one two three four five six
102
00:13:57,920 --> 00:14:02,800
is three forty plus point seven five into
three forty so it will still get the value
103
00:14:02,800 --> 00:14:10,890
ah so q three is in this particular position
one seventy five times three forty no but
104
00:14:10,890 --> 00:14:19,670
the do you know three forty minus three forty
which is nothing but three forty only ok so
105
00:14:19,670 --> 00:14:28,130
we have calculated the values of q one and
q three so now we need to see our median so
106
00:14:28,130 --> 00:14:33,360
for this particular distribution i have q
one as two ninety two point five q three is
107
00:14:33,360 --> 00:14:44,560
equal to three forty so which would mean that
i q r is equal to q three minus q one is forty
108
00:14:44,560 --> 00:14:50,100
seven . point five
ok so now we know that the lower fence so
109
00:14:50,100 --> 00:14:55,930
lower fence q one minus one point five times
i q r q one is two ninety two minus one point
110
00:14:55,930 --> 00:15:00,400
five into forty seven point five so which
will be around lets say one times forty seven
111
00:15:00,400 --> 00:15:04,510
point five which will be around two fifty
i don't know the exact value please calculate
112
00:15:04,510 --> 00:15:09,600
the but as you can see if you look at our
points once more the lowest value is two sixty
113
00:15:09,600 --> 00:15:15,080
that means that there are no lower outliers
so this implies that there are no lower outliers
114
00:15:15,080 --> 00:15:18,370
ok i can similarly calculate the value of
q three plus one point five so upper fence
115
00:15:18,370 --> 00:15:24,890
. q three plus one point five times i q r
q three is three forty plus one point five
116
00:15:24,890 --> 00:15:40,730
into forty seven point five so which will
give me a value so if i assume this as fifty
117
00:15:40,730 --> 00:15:52,340
so this is approximately is three forty plus
one point five times fifty so approximately
118
00:15:52,340 --> 00:15:54,880
is three forty plus fifty or seventy five
is roughly four one five so this implies that
119
00:15:54,880 --> 00:16:05,200
the number so there is an outlier there is
an upper outlier and which is nothing which
120
00:16:05,200 --> 00:16:19,750
is the value is equal to five twenty ok
so if i were to construct the plot if i were
121
00:16:19,750 --> 00:16:35,170
to construct the plot my plot would look something
like . this ok so as you can see that there
122
00:16:35,170 --> 00:16:43,170
is no so the minimum is two sixty and because
so this is your lower fence your upper fence
123
00:16:43,170 --> 00:16:47,750
is somewhere here ok and this value lies much
above so this means that so you have an error
124
00:16:47,750 --> 00:16:53,510
bar which sticks out but ah this is much outside
ok so there are actually no points here there
125
00:16:53,510 --> 00:16:57,950
are no data points in this region no data
points in this region ok because after three
126
00:16:57,950 --> 00:17:04,120
forty . where are the points because after
three forty there is you directly have five
127
00:17:04,120 --> 00:17:22,220
twenty so this just means that there is no
data point here but your error but this shows
128
00:17:22,220 --> 00:17:40,249
you up to the you know maximum time there
are no points here ok so with that . i will
129
00:17:40,249 --> 00:17:45,580
show you how to generate a box plot
ah now we come to another interesting concept
130
00:17:45,580 --> 00:17:49,330
of moments ok so as per so pearson was the
first statistician to make use of moments
131
00:17:49,330 --> 00:17:54,390
to describe data now what is how is that moment
defined so you have moment about so moment
132
00:17:54,390 --> 00:18:10,929
about any variable about zero is defined as
summation y to the power r by n where r can
133
00:18:10,929 --> 00:18:33,960
have one two three any value ok so clearly
so this is moment so this. is moment about
134
00:18:33,960 --> 00:18:43,190
zero ok so in general moment so this is the
rth moment this is the rth moment . in general
135
00:18:43,190 --> 00:18:50,380
moment about a is defined by m r star between
the generic is y minus a whole to the power
136
00:18:50,380 --> 00:18:54,590
r by n ok so now let us see what the moments
conveyed so you have moment about zero m r
137
00:18:54,590 --> 00:18:59,960
star is defined by summation y r by n it is
obvious that if i put r equal to one then
138
00:18:59,960 --> 00:19:06,190
m one star is equal to summation y by n which
is nothing is equal to y bar right so first
139
00:19:06,190 --> 00:19:09,559
moment about zero is your mean what about
r is equal to two so r equal to two then why
140
00:19:09,559 --> 00:19:13,010
i have a. m two star about zero is summation
y square by n right so as you can clearly
141
00:19:13,010 --> 00:19:15,639
see that this gives me i know that the way
standard deviation or variance is defined
142
00:19:15,639 --> 00:19:17,529
you have a term of y minus y bar whole square
by n .
143
00:19:17,529 --> 00:19:20,169
so in other words if you were to go through
the moment so the rth sample moment about
144
00:19:20,169 --> 00:19:23,889
the mean would then have this particular value
which is m r is equal to summation y minus
145
00:19:23,889 --> 00:19:31,650
y bar whole to the power r by n ok so m two
about zero is y square so this would mean
146
00:19:31,650 --> 00:19:39,149
that if i mean. m r about zero i put y bar
is equal to zero i have y to the power r so
147
00:19:39,149 --> 00:19:47,490
m r about the mean so in that case m r is
defined as summation y minus y bar whole to
148
00:19:47,490 --> 00:20:00,070
the power r by n so m one in this case will
be summation y minus y bar by n and this is
149
00:20:00,070 --> 00:20:08,049
nothing but summation y minus summation y
bar by n so summation y is equal to n times
150
00:20:08,049 --> 00:20:14,899
y bar and summation y bar n times . is nothing
but n y bar so this would give me a value
151
00:20:14,899 --> 00:20:37,289
of zero ok so first moment about the mean
is zero this obviously brings us to the second
152
00:20:37,289 --> 00:20:41,280
case that what is the second moment about
zero so this would be defined by y minus y
153
00:20:41,280 --> 00:20:48,220
bar whole square by n so as you can see that
if this was for a population m two is nothing
154
00:20:48,220 --> 00:21:11,980
but variance ok so m two is variance so i
am i am going to make that an approximation
155
00:21:11,980 --> 00:21:15,970
because if it is for a sample then you have
to be. n minus one but this is very simply
156
00:21:15,970 --> 00:21:20,789
is equal to the variance ok so m two is you
can i can say population variance ok so similarly
157
00:21:20,789 --> 00:21:23,919
i can calculate this value which is m three
is equal to summation y minus . y bar whole
158
00:21:23,919 --> 00:21:29,130
cube to the power n right
now let us consider a very symmetric distribution
159
00:21:29,130 --> 00:21:39,629
if my distribution was symmetric so there
is symmetry right in this distribution if
160
00:21:39,629 --> 00:21:48,070
i look at how m three is defined then i know
that for if there is a symmetric would mean
161
00:21:48,070 --> 00:21:58,049
that for every value which is to the left
of this there is similar value at similar
162
00:21:58,049 --> 00:22:18,369
frequency to the right of this right so lets
say this is
163
00:22:18,369 --> 00:22:27,129
y bar this is y one and this is y two so the
frequency of y one and the frequency of y
164
00:22:27,129 --> 00:22:36,080
two is symmetric is equal and that is how
the distribution is called it's a symmetric
165
00:22:36,080 --> 00:22:45,110
distribution in that case so if i have for
every y one so i have two things symmetric
166
00:22:45,110 --> 00:22:55,419
and lets say this distance is the same so
y one minus y bar is equal to lets say minus
167
00:22:55,419 --> 00:23:03,110
delta y and y two plus y bar is going to be
plus delta y y two minus y bar is going to
168
00:23:03,110 --> 00:23:07,421
be plus delta y so if i do . this summation
it just means that for every y one which is
169
00:23:07,421 --> 00:23:12,909
to the left of y bar so at whatever contribution
this gives which will be negative in nature
170
00:23:12,909 --> 00:23:18,759
if the another point which is equal equidistance
in the positive axis and has same frequency
171
00:23:18,759 --> 00:23:24,230
will give me a positive response and anything
cubed if you have a negative number its cube
172
00:23:24,230 --> 00:23:31,350
is negative if you have a positive number
its cube is positive so if you add these two
173
00:23:31,350 --> 00:23:38,490
terms so it will be like lets say f times
minus del y cubed plus f times del y cubed
174
00:23:38,490 --> 00:23:48,279
and these two terms equate to zero so this
would mean that m three ok so so this would
175
00:23:48,279 --> 00:23:50,059
mean that m three is. will return you a value
of zero for odd for symmetric distributions
176
00:23:50,059 --> 00:23:53,780
ok . and this is same for any m r so m r about
the mean is going to be zero for symmetric
177
00:23:53,780 --> 00:23:57,871
distributions for r is equal to odd so in
other words m one m three m one is zero m
178
00:23:57,871 --> 00:23:59,480
three is zero m five is zero and so on and
so forth ok so clearly for all symmetric distributions
179
00:23:59,480 --> 00:24:03,330
you have the odd moments about the mean return
you a value of zero
180
00:24:03,330 --> 00:24:06,440
ok now lets say the variable y that we are
measuring is actually some quantity it is
181
00:24:06,440 --> 00:24:07,440
not just a number it is a quantity lets say
temperature or height so m three will have
182
00:24:07,440 --> 00:24:08,440
so each of them have different units right
so if i were to say y represents height then
183
00:24:08,440 --> 00:24:13,490
unit of m one ok in terms of meter lets say
it is in meters m three . unit is meter cubed
184
00:24:13,490 --> 00:24:14,490
m five unit is meter fifth so in other words
these units are not the same can there be
185
00:24:14,490 --> 00:24:15,507
a way of compressing this information and
coming up with a non dimensional parameter
186
00:24:15,507 --> 00:24:16,507
and that is what that is the what this measure
of skewness gives us ok
187
00:24:16,507 --> 00:24:21,570
so skewness is defined in a slightly different
way is a three is equal to summation of y
188
00:24:21,570 --> 00:24:27,250
minus y bar whole cubed by summation of y
minus y bar whole square whole to the power
189
00:24:27,250 --> 00:24:29,350
of three by two i can again rewrite as m three
by m two whole to the power three by two ok
190
00:24:29,350 --> 00:24:30,490
so as what you can clearly see that m three
will have units of meter cubed m two will
191
00:24:30,490 --> 00:24:33,950
have units of meter square whole to the power
three by two will give you immunity of meter
192
00:24:33,950 --> 00:24:35,429
cubed and this is after all a number a dimensionless
number so this parameter a three is called
193
00:24:35,429 --> 00:24:36,429
a skewness a three . is called skewness and
and for any distribution so as as it is you
194
00:24:36,429 --> 00:24:38,120
know obvious from the words q itself so for
any symmetric distribution any symmetric distribution
195
00:24:38,120 --> 00:24:41,029
my skewness a three has to be zero ok so it
is neither skewed in this direction or skewed
196
00:24:41,029 --> 00:24:43,649
in this direction ok so this is what skewness
is about
197
00:24:43,649 --> 00:24:46,940
now what kind of a value can a three be negative
if we look at our definition of a three so
198
00:24:46,940 --> 00:24:52,440
if lets say we take particular distribution
this is skewed to the left ok so this is skewed
199
00:24:52,440 --> 00:25:01,610
to the right ok so this is going to be my
mode my mean will my mean this will be where
200
00:25:01,610 --> 00:25:03,563
my mean will lie ok and this will be where
my . median will lie ok so what you can clearly
201
00:25:03,563 --> 00:25:08,139
see is when i do this computation for a three
tells me that there are lot many number of
202
00:25:08,139 --> 00:25:13,200
points which are less than my mean so this
is my y bar value and all for all these values
203
00:25:13,200 --> 00:25:16,169
i am going to get this this component will
return me a negative value ok and only for
204
00:25:16,169 --> 00:25:22,100
few of the others this quantity is going to
be return me a positive value so when i actually
205
00:25:22,100 --> 00:25:27,031
do this calculation i am going to get a value
of a three which is going to be negative so
206
00:25:27,031 --> 00:25:28,770
a three is going to be negative for this kind
of distributions ok so we will do one sample
207
00:25:28,770 --> 00:25:29,840
calculation to see whether what we think is
will remain it like that ok
208
00:25:29,840 --> 00:25:33,730
[vocalized-noise] in the other case so in
the other case if it is skewed in the other
209
00:25:33,730 --> 00:25:39,950
direction if you have a distribution like
this so this is your mode . this is your me
210
00:25:39,950 --> 00:25:57,370
here is where your mean will lie so i can
clearly see that for all these points which
211
00:25:57,370 --> 00:26:05,169
are to the right of the mean y minus y bar
is going to be positive ok and as by this
212
00:26:05,169 --> 00:26:08,330
token i will get a value of a three which
is positive ok so let us ah take a sample
213
00:26:08,330 --> 00:26:11,249
example ah let us take a sample example where
we calculate the skewness of a distribution
214
00:26:11,249 --> 00:26:13,799
ok so let me write down some numbers which
are which kind of portray this picture so
215
00:26:13,799 --> 00:26:16,620
lets say my variables are one one one sorry
one ok so this this is doing this particular
216
00:26:16,620 --> 00:26:18,450
kind of a case so one one one one two three
four ok three four five six seven ok let us
217
00:26:18,450 --> 00:26:20,649
let us do this distribution ok we have three
ones one two one three one four . so your
218
00:26:20,649 --> 00:26:22,110
y bar is equal to three plus two plus three
plus four by six is equal to six ten by two
219
00:26:22,110 --> 00:26:23,110
this two y bar is two now i can calculate
my y minus y bar so i have one one one two
220
00:26:23,110 --> 00:26:24,110
three four ok so for value of one it is minus
one minus one minus one zero one two ok so
221
00:26:24,110 --> 00:26:25,110
in this case y minus y bar whole cube will
give me minus one minus one minus one zero
222
00:26:25,110 --> 00:26:26,110
one two cube is eight ok
so in this particular case even though the
223
00:26:26,110 --> 00:26:27,110
distribution is wise to the left i can see
that summation y minus y bar whole cube will
224
00:26:27,110 --> 00:26:28,110
give me a value of three plus three four six
right so in this case it is though it is skewed
225
00:26:28,110 --> 00:26:29,110
to the left it is now it is . still a three
is giving me a value which is kind of positive
226
00:26:29,110 --> 00:26:32,379
ok so though but you can see that if these
numbers were much to the left ok so if you
227
00:26:32,379 --> 00:26:36,312
had you know a few more of two three four
and you had one number as eight and you did
228
00:26:36,312 --> 00:26:38,900
the you know you had two more of two three
more of two and do this for this particular
229
00:26:38,900 --> 00:26:40,629
distribution you might see that this will
slowly become negative
230
00:26:40,629 --> 00:26:41,629
ok so with that i conclude today's class so
what we have done is come up with this metric
231
00:26:41,629 --> 00:26:42,629
of skewness so starting from standard deviation
and going to how we want to do relative standing
232
00:26:42,629 --> 00:26:43,629
by using zee score and then from there we
went on to see how we can come up with relative
233
00:26:43,629 --> 00:26:44,629
matrix of finding out moments and coming up
with matrix to characterize the way a distribution
234
00:26:44,629 --> 00:26:45,629
is ok so skewness gives us the value for any
symmetric distribution skewness will return
235
00:26:45,629 --> 00:26:46,730
you a value of zero but typically . if it
is biased if most of your data lies to the
236
00:26:46,730 --> 00:26:47,730
left of your mean then. sometimes this skewness
value can be negative versus if your data
237
00:26:47,730 --> 00:26:53,379
is to the right it can be positive ok with
that i conclude today's lecture
238
00:26:53,379 --> 00:26:56,190
thank you for your attention .