1
00:00:13,740 --> 00:00:21,260
hello and welcome you all to today's lecture
hope you have had the time to go through our
2
00:00:21,260 --> 00:00:24,840
discussion in previous lecture where we focused
on three matrix of quantifying data its mean
3
00:00:24,840 --> 00:00:30,850
median and mode today we will begin with recapping
what we are discussed briefly and then go
4
00:00:30,850 --> 00:00:36,140
on to other another important aspect of quantifying
data which is to quantify the variation in
5
00:00:36,140 --> 00:00:44,140
data ok so let us begin by starting with our
discussion of arithmetic mean so we had shown
6
00:00:44,140 --> 00:00:57,350
we are discussed that you could do ah arithmetic
mean simply by writing x bar is equal to summation
7
00:00:57,350 --> 00:01:10,610
of x i by n where n is the number of observations
. in the sample so the summation means from
8
00:01:10,610 --> 00:01:25,509
i is equal to one to i equal to n so it is
simply x one plus x two plus up to x n by
9
00:01:25,509 --> 00:01:38,060
n ok and similarly as per jargon if so this
x bar is for a sample if you are doing for
10
00:01:38,060 --> 00:01:43,399
a population then you replace x bar by mu
and it is summation x i by capital n ok so
11
00:01:43,399 --> 00:01:46,439
now we had then discussed about the different
kinds of transformations which can which can
12
00:01:46,439 --> 00:01:57,159
be done on on arithmetic mean and and seen
how you would vary the different values right
13
00:01:57,159 --> 00:02:13,240
so for example if you have y is equal to a
of x then we come to the conclusion y bar
14
00:02:13,240 --> 00:02:29,350
is equal to a x bar and it is easy to prove
because if you have y is equal to a x i then
15
00:02:29,350 --> 00:02:38,040
y bar is defined by summation of y i by n
. is equal to summation of a x i by n ok so
16
00:02:38,040 --> 00:02:50,240
from then on so if y bar is equal to summation
of a x i by n since a is a constant we can
17
00:02:50,240 --> 00:03:00,940
take it out and write summation x i by n is
equal to a x bar ok so in other words when
18
00:03:00,940 --> 00:03:05,560
you have a p factor multiplied you know p
factor a operated on x and that is how you
19
00:03:05,560 --> 00:03:11,940
calculate y you simply [ma/multiply] multiply
a with x bar to obtain the value of y bar
20
00:03:11,940 --> 00:03:17,060
in the case of y is equal to c plus x then
y bar is nothing since average of a constant
21
00:03:17,060 --> 00:03:23,040
is a constant my c plus x bar the. third case
in the general case where y equal to c plus
22
00:03:23,040 --> 00:03:35,570
a x then this would give me the formula y
bar is equal to c plus a x bar ok so these
23
00:03:35,570 --> 00:03:44,370
transformations are particularly helpful when
we are you know doing things manually by hand
24
00:03:44,370 --> 00:03:49,290
ok .
so this is arithmetic mean and we found that
25
00:03:49,290 --> 00:03:52,780
one of the main caveats of the arithmetic
mean is if you have a wide variation in your
26
00:03:52,780 --> 00:03:59,720
values so let's just take one particular example
let say if i have my values as one one one
27
00:03:59,720 --> 00:04:13,900
two two twenty right so i have six values
my x bar is equal to three plus two plus two
28
00:04:13,900 --> 00:04:27,100
plus twenty by six ok seven so twenty seven
by six it will be four point five ok so we
29
00:04:27,100 --> 00:04:34,501
can clearly see my numbers are one one one
two two and i have kind of this outlier which
30
00:04:34,501 --> 00:04:38,999
is twenty this completely shifts my average
to a value of four point five which really
31
00:04:38,999 --> 00:04:47,860
doesn't have any relevance to how ah how my
data looks in other words this is one of the
32
00:04:47,860 --> 00:04:56,690
main deficiencies of arithmetic mean it is
sensitive to outliers so one of the alternates
33
00:04:56,690 --> 00:05:00,090
to arithmetic mean this use. this particular
concept . of geometric mean so where we calculate
34
00:05:00,090 --> 00:05:10,030
geometric mean by so geometric mean is nothing
but root of pi of x i and pi of x i ok ah
35
00:05:10,030 --> 00:05:23,289
so this is the n th root of pi of x i pi of
x i means x one dot x two dot x three dot
36
00:05:23,289 --> 00:05:27,229
dot dot x n ok
so now what we find is for these particular
37
00:05:27,229 --> 00:05:34,919
values that i have chosen fifteen ten five
eight seventeen hundred so you can clearly
38
00:05:34,919 --> 00:05:42,680
see in that most of the values live within
seventeen except for this one number which
39
00:05:42,680 --> 00:05:50,440
is hundred ok so if i calculate the arithmetic
mean for this particular sample the arithmetic
40
00:05:50,440 --> 00:05:57,389
mean turns out to be twenty five point eight
and as with the previously discussed case
41
00:05:57,389 --> 00:06:02,270
we can see that twenty five point eight is
much bigger than the number seventeen ok so
42
00:06:02,270 --> 00:06:07,150
the alternative if we calculate the geometric
mean it gives me the value of fourteen point
43
00:06:07,150 --> 00:06:12,770
seven which is much closer to this population
so this this is an example . which shows that
44
00:06:12,770 --> 00:06:27,110
arithmetic mean a so geometric mean is much
less sensitive to variations in outliers compared
45
00:06:27,110 --> 00:06:34,099
to the arithmetic mean
and in the generic case so in this particular
46
00:06:34,099 --> 00:06:44,560
case that we work out you find that the. geometric
that the geometric mean is less than the arithmetic
47
00:06:44,560 --> 00:06:47,379
mean and is it true for any data set so it
can be shown so let us take two numbers let
48
00:06:47,379 --> 00:06:53,029
us take two numbers a and b a and b so my
arithmetic mean will be defined by a plus
49
00:06:53,029 --> 00:06:56,719
b by two and geometric mean will be square
root of a b ok so is can i say anything us
50
00:06:56,719 --> 00:07:02,629
to how geometric mean and arithmetic mean
relate to each other so i can write so if
51
00:07:02,629 --> 00:07:08,020
my a plus b ok a plus b whole square ok a
plus b whole square is a square plus b square
52
00:07:08,020 --> 00:07:14,220
plus two a b ok so in other words you can
see that . a plus b whole square if i take
53
00:07:14,220 --> 00:07:24,509
if i divide by you know [ba/by] by two ok
a plus b by two whole square is a square by
54
00:07:24,509 --> 00:07:33,379
by four by two by four so a b by two so since
this is always positive since this quantity
55
00:07:33,379 --> 00:07:41,520
is always positive i can clearly say that
ah so i can clearly see that so this term
56
00:07:41,520 --> 00:07:45,759
is nothing but arithmetic mean whole square
ok and [e/is] is so it's basically has to
57
00:07:45,759 --> 00:07:50,059
be greater than geometric mean ok
so if geometric mean a square of a b so i
58
00:07:50,059 --> 00:07:54,910
can clearly say that eight. whole square in
general case arithmetic mean is greater or
59
00:07:54,910 --> 00:08:12,129
equal to geometric mean so as we can see if
a is equal to b for the case a equal to b
60
00:08:12,129 --> 00:08:19,499
arithmetic mean is equal to a and equal to
geometric mean so i can have this particular
61
00:08:19,499 --> 00:08:25,469
equation which says that arithmetic mean is
always greater or equal to geometric mean
62
00:08:25,469 --> 00:08:33,339
. so this is one of the reasons why your geometric
mean is much less sensitive to extreme values
63
00:08:33,339 --> 00:08:52,970
now we have the next concept of median right
so the median the median of a set of n measurements
64
00:08:52,970 --> 00:09:02,440
is the value that falls in the middle position
when measurements are ordered from smallest
65
00:09:02,440 --> 00:09:15,300
to largest so in other words the median position
is point five slash n plus one ok so if you
66
00:09:15,300 --> 00:09:20,029
have five numbers let say one two three four
five then your median number and this is a
67
00:09:20,029 --> 00:09:25,630
sorted data set so you can see that this is
my median
68
00:09:25,630 --> 00:09:34,740
but if you have six numbers let say you have
one two three four five six then the median
69
00:09:34,740 --> 00:09:44,790
position comes in between here and this is
why your median is going to be so the in this
70
00:09:44,790 --> 00:09:51,029
case median is equal to three in this case
my median is equal to three plus four by two
71
00:09:51,029 --> 00:10:13,870
is equal to three point five ok
so you have to find the position median position
72
00:10:13,870 --> 00:10:19,400
. as half into n plus one and then find out
whether you have to average between two numbers
73
00:10:19,400 --> 00:10:28,959
if your data set is even or you have an unique
value if your data set is odd ok the third
74
00:10:28,959 --> 00:10:40,639
metric we had discussed was the mode so mode
is the most frequently occurring value and
75
00:10:40,639 --> 00:10:44,880
this is for example the you know ah the number
of visits to a dental clinic in a typical
76
00:10:44,880 --> 00:10:51,310
week this is the data so how do you calculate
the mode you won't first find out the frequency
77
00:10:51,310 --> 00:10:59,670
distribution so we can see that you have one
two three four five six seven eight nine as
78
00:10:59,670 --> 00:11:05,740
the number of values i can correspo[nd] so
this is x and this is my frequency f so i
79
00:11:05,740 --> 00:11:08,850
can see that for one i have two ok
so let us you know without going through the
80
00:11:08,850 --> 00:11:14,240
entire list i think my median values so for.
five for example one two three four five six
81
00:11:14,240 --> 00:11:18,300
seven so there are seven values for the number
five and six is of course much small . four
82
00:11:18,300 --> 00:11:24,660
values is is one two three four five five
for four and if i am not mistaken seven is
83
00:11:24,660 --> 00:11:45,511
the value which is maximal occurring eight
is or so so seven occurs for the most number
84
00:11:45,511 --> 00:11:55,360
of time. five the number five is the most
frequent in other words five is your mode
85
00:11:55,360 --> 00:12:01,510
ok so this ah brings us to the question as
to which of these three values should you
86
00:12:01,510 --> 00:12:18,510
really ah you know consider making mean a
median or mode and it is clear you know generally
87
00:12:18,510 --> 00:12:28,410
mode is used when you describe large data
sets mean and median can be used interchangeably
88
00:12:28,410 --> 00:12:37,019
for both small and large datasets and as we
discuss so again again you know the arithmetic
89
00:12:37,019 --> 00:12:46,079
mean is of course sensitive to its outliers
but the median is less sensitive to outliers
90
00:12:46,079 --> 00:12:57,959
ok so let us just do some few examples
so in this particular example let say you
91
00:12:57,959 --> 00:13:09,899
have the numbers one two three two four . two
eight thirty six three two five forty five
92
00:13:09,899 --> 00:13:24,079
thirty six eighty nine so if i were to arrange
them in the proper order one two three four
93
00:13:24,079 --> 00:13:30,290
four twos are there there is three threes
three threes four is just one then you have
94
00:13:30,290 --> 00:13:36,029
five then you have six then you have eight
then thirty six forty five eighty nine ok
95
00:13:36,029 --> 00:13:52,720
so i have my total n is one two three four
five six seven eight nine ten and thirteen
96
00:13:52,720 --> 00:14:07,199
fourteen fifteen fifteen so my median position
is going to be fifteen plus one by two is
97
00:14:07,199 --> 00:14:18,100
equal to by two is equal to eight so my median
is four five six seven eight so this happens
98
00:14:18,100 --> 00:14:23,990
to be my median right my average when i take
the . average because of the presence of these
99
00:14:23,990 --> 00:14:30,389
three numbers my. my mean is of course going
to be much greater than the median and mode
100
00:14:30,389 --> 00:14:40,120
is the maximum occurring value which is two
so mode is equal to two median is equal to
101
00:14:40,120 --> 00:14:53,089
three ah median is equal to three and mean
is of course greater than median and is greater
102
00:14:53,089 --> 00:15:01,960
than mode so in this particular example so
in this example that we worked out we came
103
00:15:01,960 --> 00:15:11,589
to the conclusion that mode was less than
the median was less than the mean ok
104
00:15:11,589 --> 00:15:18,189
so let us consider the next example ok this
is a you know next example as again we can
105
00:15:18,189 --> 00:15:23,029
so if you look at the data set so and if i
order arrange them in order to then three
106
00:15:23,029 --> 00:15:28,310
ok so there is another two three three three
. three three there are five threes there
107
00:15:28,310 --> 00:15:44,810
is no four there is one five one six then
you have thirty six no sorry twenty nine thirty
108
00:15:44,810 --> 00:15:54,040
six thirty six thirty nine forty forty one
so thirty nine forty forty one what you clearly
109
00:15:54,040 --> 00:16:07,920
see in this data set if i can partition this
data into two groups ok one so there is so
110
00:16:07,920 --> 00:16:18,040
as compared to the previous case where there
was only three numbers which were huge here
111
00:16:18,040 --> 00:16:26,949
you have one two three four five six numbers
which are reasonably huge ok so it gives us
112
00:16:26,949 --> 00:16:30,260
the idea that you really have two subpopulations
. in this whole set so in this case neither
113
00:16:30,260 --> 00:16:38,350
the mean not the median not the mode would
make sense in fact if you group if i were
114
00:16:38,350 --> 00:16:47,100
to group them separately for this group i
can have work out my median four five six
115
00:16:47,100 --> 00:16:52,829
seven eight nine ten eleven twelve six so
median is three mode is three and mean will
116
00:16:52,829 --> 00:16:58,959
of course will slightly greater than three
because no its you know it will be approximately
117
00:16:58,959 --> 00:17:04,690
three only slightly higher than three and
for this set you have numbers from twenty
118
00:17:04,690 --> 00:17:08,750
nine thirty six thirty six thirty nine forty
forty one ok
119
00:17:08,750 --> 00:17:15,570
so for these numbers my median four five six
seven median is going to be thirty six plus
120
00:17:15,570 --> 00:17:17,640
one two three four thirty nine by two my mode
there is no mode . no mode is equal to thirty
121
00:17:17,640 --> 00:17:27,650
six and median and will also be somewhere
in between so what you see here is from the
122
00:17:27,650 --> 00:17:39,360
previous case where there were it seemed that
there were only three outliers in this particular
123
00:17:39,360 --> 00:17:48,190
case they are clearly two different sets so
it begs the question that what would be the
124
00:17:48,190 --> 00:17:54,029
best way of. you know quantifying this kind
of a data so again let us take this particular
125
00:17:54,029 --> 00:17:57,960
example where you have symmetric versus an
asymmetric distribution so [i/in] in three
126
00:17:57,960 --> 00:18:01,279
different days you can have in terms of high
profiles you can have three different distributions
127
00:18:01,279 --> 00:18:04,490
what you see is on day three the data is very
symmetric on day one it is q to the left on
128
00:18:04,490 --> 00:18:09,650
day three day two it is q to the right so
it so it tells us that in addition to quantifying
129
00:18:09,650 --> 00:18:14,470
mean median and mode there must be other way
of capturing this variation in this data ok
130
00:18:14,470 --> 00:18:21,059
and one of one of the measures . which is
very frequently used is this measure of range
131
00:18:21,059 --> 00:18:24,990
which is nothing but maximum minus minimum
ok so i can define the range as maximum minus
132
00:18:24,990 --> 00:18:27,210
minimum
so in this particular case so my minimum is
133
00:18:27,210 --> 00:18:31,770
forty my maximum is hundred so that brings
us range is equal to hundred minus forty equal
134
00:18:31,770 --> 00:18:37,950
to sixty but as you can see is the range by
itself won't have any meaning unless the the
135
00:18:37,950 --> 00:18:44,180
values are also put in context so for example
if i have if i have numbers as one two and
136
00:18:44,180 --> 00:18:50,690
then i add these other numbers which is forty
sixty seventy five ninety and hundred then
137
00:18:50,690 --> 00:19:00,600
my range is ninety nine right versus in the
case it was sixty so the concept of range
138
00:19:00,600 --> 00:19:11,740
has to be thought about in the with respect
to the minimum or the maximum . similarly
139
00:19:11,740 --> 00:19:27,700
for example if you can have data going from
thousand all the way to five thousand or one
140
00:19:27,700 --> 00:19:44,809
two five thousand so it does not so your range
has to be you know thought about in the concept
141
00:19:44,809 --> 00:20:00,210
of your maximum or minimum so if you are if
you have again outliers then the range is
142
00:20:00,210 --> 00:20:06,140
too broad it does not particularly give a
clear data as to where bulk of the data is
143
00:20:06,140 --> 00:20:16,809
situated
ok so another way of measuring variability
144
00:20:16,809 --> 00:20:27,350
is using the mean absolute deviation so we
can work out this particular example if you
145
00:20:27,350 --> 00:20:33,620
so you are mean absolute so mean absolute
deviation is summation of mod of x i minus
146
00:20:33,620 --> 00:20:39,080
x bar by n ok so in this particular case let
us say if i have a data set as one two five
147
00:20:39,080 --> 00:20:43,159
eight . twelve eight one seven five forty
two i have to calculate my x bar so my x bar
148
00:20:43,159 --> 00:20:46,309
becomes three plus five eight eight plus eight
sixteen twenty eight thirty six thirty seven
149
00:20:46,309 --> 00:20:53,470
forty four plus five forty nine ninety one
by one two three four five six seven eight
150
00:20:53,470 --> 00:20:58,049
nine ten so approximately nine let say so
my mean absolute deviation is nothing so this
151
00:20:58,049 --> 00:21:02,740
value becomes eight which is plus two minus
nine mod of two minus nine is seven plus mod
152
00:21:02,740 --> 00:21:07,040
of five minus nine plus mod of eight minus
nine and so on and so forth so i can calculate
153
00:21:07,040 --> 00:21:09,790
this exact value i can calculate this [ex/exact]
exact value as x bar as mean absolute deviation
154
00:21:09,790 --> 00:21:12,530
equal to eight plus . seven plus four plus
one plus three plus one plus eight plus two
155
00:21:12,530 --> 00:21:16,080
plus four plus thirty three whole divided
by ten ok so it roughly comes to eight plus
156
00:21:16,080 --> 00:21:20,480
seven fifteen twenty twenty four thirty two
and six thirty eight seventy one by ten is
157
00:21:20,480 --> 00:21:26,640
roughly seven ok so as you can clearly see
in your values that your x bar was now so
158
00:21:26,640 --> 00:21:30,834
x bar is nine and this mean absolute deviation
is seven so this reason because your value
159
00:21:30,834 --> 00:21:39,210
is ranged across a wide range from one all
the way to forty two so when your x bar
160
00:21:39,210 --> 00:21:51,250
and and and this mean absolute deviation is
comparable that means that you have a wide
161
00:21:51,250 --> 00:21:55,700
heterogeneity in your data
so the most the most widely used metric . the
162
00:21:55,700 --> 00:22:05,330
most widely used metric as a sign of deviation
as a. as a you know mark of variance is standard
163
00:22:05,330 --> 00:22:11,120
deviation ok so i can let's come to the formula
of standard deviation what you can see here
164
00:22:11,120 --> 00:22:15,880
you have so instead of doing just the mean
absolute deviation you square the differences
165
00:22:15,880 --> 00:22:22,080
so whether or not it is positive or negative
whether you are you know your x values is
166
00:22:22,080 --> 00:22:28,230
less than the population mean or the greater
than the population mean this square is always
167
00:22:28,230 --> 00:22:32,580
positive you add them up and then you divide
by the total number of observation and you
168
00:22:32,580 --> 00:22:35,190
take a square because you had squared them
up while adding so this is your definition
169
00:22:35,190 --> 00:22:38,850
of standard deviation for a population for
standard deviation of a sample it's pretty
170
00:22:38,850 --> 00:22:45,049
much the same except there is a notable difference
instead of dividing by capital n . you divide
171
00:22:45,049 --> 00:22:48,210
by n minus one
so this this is the small difference in how
172
00:22:48,210 --> 00:22:50,149
you define the standard deviation between
a population and between a sample and this.
173
00:22:50,149 --> 00:22:58,250
the deviation of n minus one for a sample
is simply to take into account that when your
174
00:22:58,250 --> 00:23:03,630
sample size is small when you divide by n
minus one it gives the better estimate of
175
00:23:03,630 --> 00:23:12,610
the standard deviation of the whole population
ok and variance so you can either so sigma
176
00:23:12,610 --> 00:23:20,450
square is equal to variance of the for the
population and s's square [is equal/is so]
177
00:23:20,450 --> 00:23:27,290
so variance of population and s's square is
variance of sample so sigma square is nothing
178
00:23:27,290 --> 00:23:38,240
but summation x i minus mu whole square by
capital n and sigma square so s square is
179
00:23:38,240 --> 00:23:55,410
nothing but summation x i minus x bar whole
square by n minus one . ok so these are called
180
00:23:55,410 --> 00:24:11,019
variances so what you can clearly see is variance
is just you know it is always positive and
181
00:24:11,019 --> 00:24:16,299
it is square of the standard deviation
so how do we you know go about computing the
182
00:24:16,299 --> 00:24:27,710
variation but you can clearly see these are
two distributions you can see that in one
183
00:24:27,710 --> 00:24:31,049
of them it has a much you know prominent peak
in the middle and then these other values
184
00:24:31,049 --> 00:24:36,640
are less prevalent verses the second distribution
of you know ah is much more broader so in
185
00:24:36,640 --> 00:24:43,500
other words if we calculate the standard deviation
it will turn out that my standard deviation
186
00:24:43,500 --> 00:24:48,519
for this population is going to be smaller
than the standard deviation from this population
187
00:24:48,519 --> 00:24:57,960
so this is what the variability will convey
now i can there is just one small mathematical
188
00:24:57,960 --> 00:25:03,399
trick so when i talk of you know summation
of x i minus x bar whole square so i can let
189
00:25:03,399 --> 00:25:16,679
us so the. so i can expand it so this i can
. write it as x i square minus two x i x bar
190
00:25:16,679 --> 00:25:29,330
plus x bar square ok so i can then bring it
out i can write summation x i square minus
191
00:25:29,330 --> 00:25:31,570
summation two x i x bar plus summation x bar
square ok
192
00:25:31,570 --> 00:25:36,610
so each of them is i is equal to one to n
i is equal to one to n i is equal to one to
193
00:25:36,610 --> 00:25:44,240
n so this remains as summation x i square
but in this particular term since x bar is
194
00:25:44,240 --> 00:26:02,470
the mean i can take it out so i can take out
two x bar summation x i and i can write plus
195
00:26:02,470 --> 00:26:11,419
summation x bar square ok so summation x i
is nothing but n times x bar ok so this equation
196
00:26:11,419 --> 00:26:18,490
then becomes summation x i square minus two
x bar into n x bar and summation x bar square
197
00:26:18,490 --> 00:26:27,630
. summed up n times this is also i is equal
to one to n this will be n x bar square so
198
00:26:27,630 --> 00:26:34,559
this final expression is summation x i square
minus n x bar square . so this is a useful
199
00:26:34,559 --> 00:26:39,260
formula when we are doing it ok so this is
this is what i have written here that your
200
00:26:39,260 --> 00:26:42,029
measures of variability this this thing can
be simplified to form this so as opposed to
201
00:26:42,029 --> 00:26:50,170
taking the difference from mean if you have
x i you can just add them up and then you
202
00:26:50,170 --> 00:26:54,910
you you know you are calculating. sample mean
or population mean and you just my subtract
203
00:26:54,910 --> 00:27:00,440
is n x bar square to obtain this particular
value now let us do some transformations of
204
00:27:00,440 --> 00:27:04,120
standard deviation. transformations with standard
deviation so we again come to this particular
205
00:27:04,120 --> 00:27:08,029
ah you know term where you have three particular
cases y is equal to a x .
206
00:27:08,029 --> 00:27:09,929
let say so if. if this was my sigma y if s
y the question is how is s y and s x related
207
00:27:09,929 --> 00:27:20,350
so what is s y n s x what is the relationship
between s y and s x ok so the way to do it
208
00:27:20,350 --> 00:27:22,300
so i know my s y so let say if i were to do
s y square. or let us say n is n minus one
209
00:27:22,300 --> 00:27:23,450
s y square is nothing but summation y i minus
y bar whole square now y i so i can put it
210
00:27:23,450 --> 00:27:24,549
as a x i minus a x bar whole square is nothing
but a e can be taken common s square into
211
00:27:24,549 --> 00:27:25,900
summation x i minus x bar whole square so
i can write n minus one into s y square is
212
00:27:25,900 --> 00:27:26,900
equal to a square into this term is n minus
one into s x square so this . would give to
213
00:27:26,900 --> 00:27:30,961
me that s y is equal to a into s x so i can
cancel this terms out and this is the final
214
00:27:30,961 --> 00:27:32,040
formula which remains a ok so s y is nothing
but a into s x for this particular case so
215
00:27:32,040 --> 00:27:33,070
let say if ok so if my y is defined by c plus
x then i can think of writing similarly i
216
00:27:33,070 --> 00:27:36,950
can write so you see constant so n minus one
into s y square is equal to summation y i
217
00:27:36,950 --> 00:27:41,920
minus y bar whole square but you see in this
case y i is c plus x i and minus y bar is
218
00:27:41,920 --> 00:27:48,440
c plus x bar whole square so i can deduct
c from each other which is nothing but x i
219
00:27:48,440 --> 00:27:49,929
minus x bar whole square so
so this is nothing but n minus one into . s
220
00:27:49,929 --> 00:27:50,929
x square so this would give me that s y square
s y is equal to s x so when you have a constant
221
00:27:50,929 --> 00:27:53,530
mean when you have a constant mean added to
this value it does not change the final standard
222
00:27:53,530 --> 00:27:54,696
deviation so in other words standard deviation
is insensitive to any constant mean added
223
00:27:54,696 --> 00:27:56,840
the in the most general case when y is equal
to c plus a x then by combining the previous
224
00:27:56,840 --> 00:27:58,242
concepts we can arrive at the equation s y
should be simply is equal to s a s x ok because
225
00:27:58,242 --> 00:27:59,242
the c does not come into play while computing
the standard deviation ok now this x this
226
00:27:59,242 --> 00:28:00,399
thing can be extended to find out standard
deviation for grouped data. for grouped data
227
00:28:00,399 --> 00:28:03,730
i mean that if you have x x i an you have
an f i for the corresponding value so x one
228
00:28:03,730 --> 00:28:08,970
is f one x two is f two so on and so forth
then . i know my x bar is summation f i x
229
00:28:08,970 --> 00:28:15,110
i by capital n or by summation f i ok all
i need to do is to compute these frequencies
230
00:28:15,110 --> 00:28:16,539
and put them in place to get the final value
ok
231
00:28:16,539 --> 00:28:17,980
so that is all about the basics of standard
deviation so i hope you understand so into
232
00:28:17,980 --> 00:28:19,750
ins to summarize we saw how from mean and
median and mode how they can compare and what
233
00:28:19,750 --> 00:28:21,340
kind of values they are arithmetic mean is
of course sensitive to outliers median is
234
00:28:21,340 --> 00:28:22,830
not sensitive to outliers at all mod in the
case when you have a bimodal distribution
235
00:28:22,830 --> 00:28:25,180
then neither measure makes any value it is
better to split the data into two different
236
00:28:25,180 --> 00:28:27,409
distributions and then separately calculate
their either mean median or mode for that
237
00:28:27,409 --> 00:28:28,559
from there we went on to discussing what is
standard deviation and i hope you of. you
238
00:28:28,559 --> 00:28:29,559
are convinced that standard deviation is a
very important metric of quantifying how your
239
00:28:29,559 --> 00:28:30,559
values are dispersed across so mean itself
by itself does not convey . the picture of
240
00:28:30,559 --> 00:28:33,130
how dispersed your data is
so outliers will indeed have an effect in
241
00:28:33,130 --> 00:28:35,380
the standard deviation and one important point
to note is when you have for the population
242
00:28:35,380 --> 00:28:36,470
you divide by n ah capital n when you calculate
the standard deviation for the sample you
243
00:28:36,470 --> 00:28:37,470
divide simply by n minus one and this is because
when your sample size is small then dividing
244
00:28:37,470 --> 00:28:38,470
by n minus one gives a much better estimate
of the population. standard deviation and
245
00:28:38,470 --> 00:28:40,510
we ended up by doing some few transformations
just like calculating the mean for different
246
00:28:40,510 --> 00:28:42,490
transformations i hope we have seen that how
if you have a preterm or [con/constant] you
247
00:28:42,490 --> 00:28:43,640
know a constant it doesn't. in have any impact
on the standard deviation of the population
248
00:28:43,640 --> 00:28:45,260
but when you have a pre factor a in front
of x then your s five is simply multiplied
249
00:28:45,260 --> 00:28:50,159
you know s x multiplied by u a with that i
you know i would like to thank you for your
250
00:28:50,159 --> 00:28:56,470
attention and will meet again in next lecture
thank you . .