1
00:00:13,469 --> 00:00:17,600
dear students welcome to today's lecture so
we would ah as always would begin with a brief
2
00:00:17,600 --> 00:00:22,060
recap of what we had covered in last lecture
and then go on from there so in last lecture
3
00:00:22,060 --> 00:00:24,490
we had discussed two things when so how to
calculate arithmetic mean given a data set
4
00:00:24,490 --> 00:00:29,399
so arithmetic mean represents one of the most
widely used matrix for representing your data
5
00:00:29,399 --> 00:00:36,850
and took and we also briefly discussed about
transformations is given a data set how can
6
00:00:36,850 --> 00:00:41,970
you make use of transformations to make it
easier for you to calculate arithmetic mean
7
00:00:41,970 --> 00:00:46,920
particularly if you are doing it by hand ok
so ah there were three main transformations
8
00:00:46,920 --> 00:01:00,000
we had discussed . in last class y is equal
to a x y is equal to c plus x and y is equal
9
00:01:00,000 --> 00:01:06,820
to c plus a x so just to give you an example
if you let's say have numbers like ten twenty
10
00:01:06,820 --> 00:01:12,320
twenty two to thirty five forty six seventy
eight so what should you do so what you see
11
00:01:12,320 --> 00:01:20,200
there are two things which are going on these
are increasing roughly in terms of in jumps
12
00:01:20,200 --> 00:01:28,440
of ten or so and you have a range from ten
to eighty right so either what we can do is
13
00:01:28,440 --> 00:01:36,430
so particularly if you are doing by hand and
in this particular case there are only five
14
00:01:36,430 --> 00:01:38,240
numbers so we can even add up by hand so one
possibility is divide by ten
15
00:01:38,240 --> 00:01:41,990
ok so then you have the numbers one to point
two three point five four point six and seven
16
00:01:41,990 --> 00:01:44,570
point eight these you can add up by hand very
easily the second possibility is the second
17
00:01:44,570 --> 00:01:58,850
possibility is like we did we identify the
smallest number which is ten the largest number
18
00:01:58,850 --> 00:02:07,409
which is roughly eighty . so which is ninety
so. on one ah you know so the average is roughly
19
00:02:07,409 --> 00:02:22,590
you know forty five so average here is roughly
equal to ten plus seventy eight by two which
20
00:02:22,590 --> 00:02:31,600
is roughly equal to forty five sorry eighty
ninety so forty five so we can apply the transformation
21
00:02:31,600 --> 00:02:41,780
y is equal to x minus forty five here and
then in that case my numbers will be minus
22
00:02:41,780 --> 00:02:47,640
thirty five ok so y values are minus thirty
five minus twenty three minus ten plus one
23
00:02:47,640 --> 00:02:55,010
plus thirty three ok then i can add them up
very easily and find out what is my average
24
00:02:55,010 --> 00:02:56,040
ok
so what you see that there is no not one rule
25
00:02:56,040 --> 00:02:58,190
which will make things work depending on what
kind of transformation you use there can be
26
00:02:58,190 --> 00:03:00,790
more than one rule ok so that is one thing
so today we will discuss ah the next thing
27
00:03:00,790 --> 00:03:10,959
which is let's say you have a discrete data
set so if you look at the data set here so
28
00:03:10,959 --> 00:03:19,120
you have numbers . which are all first of
all discrete and then they are repeated ok
29
00:03:19,120 --> 00:03:27,170
so if this was entire data set what you can
clearly see is the number ten has been repeated
30
00:03:27,170 --> 00:03:34,920
so by frequency these are my x values my frequency
ten has been repeated once twenty has been
31
00:03:34,920 --> 00:03:43,720
repeated once thirty has been repeated twice
forty has been repeated once sixty yeah sixty
32
00:03:43,720 --> 00:03:50,220
has been repeated one two three four times
and seventy has been repeated two times no
33
00:03:50,220 --> 00:04:07,540
sorry three times ok so what you see these
are discrete values which are repeated
34
00:04:07,540 --> 00:04:22,490
so if i were to go back to the way we calculated
our average we want to do ah so in general
35
00:04:22,490 --> 00:04:39,420
we can write it as frequency is f one f two
f three so on and so forth and these guys
36
00:04:39,420 --> 00:04:54,700
we . call as x one x two so we can calculate
my x bar is simply x one into f one times
37
00:04:54,700 --> 00:05:05,970
plus x two which has been repeated f two times
plus x three which is repeated f three times
38
00:05:05,970 --> 00:05:16,600
so on and so forth plus x n which has been
repeated f n times ok and we want to divide
39
00:05:16,600 --> 00:05:21,320
by the total number of observation which is
nothing by f one plus f two plus f three plus
40
00:05:21,320 --> 00:05:28,770
dot dot dot plus f n ok so we can the final
expression becomes x bar is nothing but summation
41
00:05:28,770 --> 00:05:38,280
f i into x i by summation f i ok so as you
know as before i did not put the start and
42
00:05:38,280 --> 00:05:44,170
end values but this just summation alone means
that i am doing a sum from i is equal to one
43
00:05:44,170 --> 00:06:02,509
to i equal to n ok . so you can. do this as
an exercise so you know what
44
00:06:02,509 --> 00:06:16,720
are the values of f one f two what are the
values of x one x two and please calculate
45
00:06:16,720 --> 00:06:27,980
the final average ok so this is what it is
you can have to basically calculate the frequency
46
00:06:27,980 --> 00:06:31,949
of each number is discrete variable and their
value and then calculate the total sum as
47
00:06:31,949 --> 00:06:42,039
per this particular equation
now let us i wanted to discuss one more thing
48
00:06:42,039 --> 00:06:51,840
so even in this case let's say this example
that we took the variables are ten twenty
49
00:06:51,840 --> 00:07:04,510
thirty so on and so forth right so you have
a range so your minimum value is ten your
50
00:07:04,510 --> 00:07:09,600
maximum value is around seventy ok fine you
have a range so you can see that the jump
51
00:07:09,600 --> 00:07:16,410
is roughly by sixty imagine i have data which
is one two three four and then forty five
52
00:07:16,410 --> 00:07:25,990
eighty nine so on and so forth so what i can
clearly see is if i take a mean arithmetic
53
00:07:25,990 --> 00:07:35,860
mean i will get a value which is going to
be . much greater than these two and will
54
00:07:35,860 --> 00:07:41,759
be much more biased toward these bigger numbers
right so this is one of the principal weaknesses
55
00:07:41,759 --> 00:07:49,199
of calculating arithmetic mean if you have
large variation in your data set then arithmetic
56
00:07:49,199 --> 00:08:01,880
mean might give you a value which really does
not mean anything ok so what might be an alternative
57
00:08:01,880 --> 00:08:08,030
in our approach to come up with a better number
so one number which is often used or you know
58
00:08:08,030 --> 00:08:21,270
which is used to calculate this is called
the geometric mean ok so as opposed to adding
59
00:08:21,270 --> 00:08:27,120
up the numbers ok as opposed to adding up
the numbers you take their product and you
60
00:08:27,120 --> 00:08:33,909
take their root nth root ok so if we have
n numbers . x one . x two . x three . dot
61
00:08:33,909 --> 00:08:39,969
dot dot x n so the way we root geometric mean
and in short we write g m is equal to nth
62
00:08:39,969 --> 00:08:49,019
root . x one into x two into x three dot dot
x n . so the way to you know write it in short
63
00:08:49,019 --> 00:08:56,809
mathematically is to write nth root and you
put this thing called pi ok pi of x i ok and
64
00:08:56,809 --> 00:09:10,209
i is equal to one to n here . so this means
product ok so if i is equal to one to ah n
65
00:09:10,209 --> 00:09:18,239
equal to one then it will simply be one if
i i n is five then you have x one x two x
66
00:09:18,239 --> 00:09:22,819
three x four is five so on and so forth ok
so this is a simple representation of geometric
67
00:09:22,819 --> 00:09:29,249
. mean
now let us take an example and see what this
68
00:09:29,249 --> 00:09:40,860
you know what this rule does so imagine you
have a data set as follows which is fifteen
69
00:09:40,860 --> 00:09:47,509
ten five eight seventeen hundred so my numbers
are fifteen ten five eight seventeen hundred
70
00:09:47,509 --> 00:09:53,439
ok so if i were to you know sort these numbers
and put them next to each other in terms of
71
00:09:53,439 --> 00:10:05,540
. ascending order so i have five eight ten
fifteen seventeen hundred so but just by looking
72
00:10:05,540 --> 00:10:14,740
at the data i can clearly see that the number
hundred is is really an outlier in other words
73
00:10:14,740 --> 00:10:22,980
outlier means maybe this data was taken when
there's some error in the procedure of acquisition
74
00:10:22,980 --> 00:10:34,019
of data some experimental error because i
can see that my number is you know the other
75
00:10:34,019 --> 00:10:47,149
numbers are much closer to each other so here
my five to seventeen range is twelve only
76
00:10:47,149 --> 00:10:55,480
right so seventeen minus five is twelve versus
hundred minus five is ninety five now how
77
00:10:55,480 --> 00:11:03,579
can i so if i account for hundred in calculating
the average what number do i get so my arithmetic
78
00:11:03,579 --> 00:11:09,720
mean x bar becomes thirteen plus twenty five
plus seventeen plus hundred c four c of six
79
00:11:09,720 --> 00:11:20,290
numbers right so thirty fifty five one fifty
five one fifty five . by six is equal to two's
80
00:11:20,290 --> 00:11:29,829
square one fifty five so if i would to roughly
right it is close to twenty six ok this close
81
00:11:29,829 --> 00:11:33,399
to twenty six ok the exact value is twenty
five point eight ok
82
00:11:33,399 --> 00:11:36,130
now so clearly i can see that twenty five
point eight would be a value which is right
83
00:11:36,130 --> 00:11:44,360
here which is way further than most of the
other numbers so really this arithmetic mean
84
00:11:44,360 --> 00:11:50,290
gives me a number which does which is not
representative of the whole population how
85
00:11:50,290 --> 00:11:56,160
can i come up with an alternative number so
geometric mean is one such approach to get
86
00:11:56,160 --> 00:12:06,499
a number which is much more representative
of your data so if i do the same exercise
87
00:12:06,499 --> 00:12:12,040
with the geometric mean so what i get is a
value of fourteen point seven which is still
88
00:12:12,040 --> 00:12:20,249
much so we have a value which is you know
which is between ten and fifteen so which
89
00:12:20,249 --> 00:12:26,500
is still more representative of the population
so you have changed from twenty five point
90
00:12:26,500 --> 00:12:33,470
eight to fourteen point seven so this is one
of the . big advantages of geometric mean
91
00:12:33,470 --> 00:12:40,050
over arithmetic mean particularly when your
data set in your data set there is huge heterogeneity
92
00:12:40,050 --> 00:13:02,420
and there are one or two values which are
way larger then what is the most of the other
93
00:13:02,420 --> 00:13:04,679
values
ok so is one more important measure so let
94
00:13:04,679 --> 00:13:11,199
us say that you draw a histogram and you plot
the data ok and your plot data looks like
95
00:13:11,199 --> 00:13:17,379
this right so logic would dictate that if
you do the arithmetic mean of all these population
96
00:13:17,379 --> 00:13:23,859
ok where arithmetic mean will somewhere lie
here this is where your arithmetic mean lies
97
00:13:23,859 --> 00:13:32,670
ok now there is one more way of doing it ok
so what you can do is you can sort these numbers
98
00:13:32,670 --> 00:13:41,269
entire numbers x one x two up to x n in ascending
or descending order and try to find which
99
00:13:41,269 --> 00:13:44,920
is the number which is directly in the middle
. ok so it is some it is a it it would be
100
00:13:44,920 --> 00:13:49,259
a very nice alternative to the arithmetic
mean and this particular metric is an has
101
00:13:49,259 --> 00:14:00,629
a name it is called the median ok so this
is how you define the median the median of
102
00:14:00,629 --> 00:14:04,069
a set of n measurements is the value that
falls in the middle position when measurements
103
00:14:04,069 --> 00:14:07,910
are ordered from smallest to largest ok so
of course whether you order from smallest
104
00:14:07,910 --> 00:14:12,050
to largest or largest or smallest that does
not matter but so you basically collect find
105
00:14:12,050 --> 00:14:14,889
the number which is at this position which
is point five times n plus one
106
00:14:14,889 --> 00:14:18,339
ok so let us do the previous numbers you had
if i go back to the previous slide and my
107
00:14:18,339 --> 00:14:21,040
numbers were fifteen ten five eight seventeen
hundred ok fifteen ten five eight seventeen
108
00:14:21,040 --> 00:14:26,290
and hundred so if i sort them and i write
it . together five eight . ten fifteen seventeen
109
00:14:26,290 --> 00:14:32,939
hundred . ok so these are this is the sorted
list so i put them together so i have these
110
00:14:32,939 --> 00:14:40,769
numbers five eight ten fifteen seventeen . hundred
what is my n n is the total number of numbers
111
00:14:40,769 --> 00:14:56,899
i have which is one two three four five six
so my median position . is what is equal to
112
00:14:56,899 --> 00:15:01,269
half into n plus one right so point five times
n plus one is equal to point five into seven
113
00:15:01,269 --> 00:15:08,649
is equal to three point five . ok so three
point five means what the position is somewhere
114
00:15:08,649 --> 00:15:19,040
right in the middle of ten and fifteen ok
this is the third number this is the fourth
115
00:15:19,040 --> 00:15:41,749
number but my median is the number which is
right at position three point five
116
00:15:41,749 --> 00:15:47,149
so how do i find the number which is three
point five so i use what is called as interpolation
117
00:15:47,149 --> 00:15:52,119
ok . so my median as per this . will be the
average so will be ten plus half into ten
118
00:15:52,119 --> 00:16:03,790
plus fifteen this is the position ok it is
middle of this so we will be get ten plus
119
00:16:03,790 --> 00:16:16,339
half into ten plus fifteen is twelve point
five is equal to so and then its position
120
00:16:16,339 --> 00:16:30,129
is half sorry sorry it is simply at this position
it will is going to be half into ten plus
121
00:16:30,129 --> 00:16:48,149
five fifteen is twelve point five so your
median is twelve point five ok so you get
122
00:16:48,149 --> 00:16:52,499
a number which is even better it's it's so
it is much more me you know representative
123
00:16:52,499 --> 00:17:02,179
of this entire data set the effect of hundred
has completely been eliminated so and the
124
00:17:02,179 --> 00:17:17,339
other advantage is median is of course much
more easier to calculate than geometric mean
125
00:17:17,339 --> 00:17:22,339
. ok so let us solve another problem of . calculating
the medium
126
00:17:22,339 --> 00:17:29,080
ok so let us take this following data set
again so what i can have so let me first count
127
00:17:29,080 --> 00:17:36,750
so i want to calculate the median right i
want to number them and i want to see what
128
00:17:36,750 --> 00:17:46,450
is the pose so first of all how many numbers
way a one two three four five five into five
129
00:17:46,450 --> 00:17:52,299
is twenty five so n is equal to twenty five
i want to order their numbers together so
130
00:17:52,299 --> 00:17:55,299
i have to sort them in terms of you know lowest
to highest number ok so my lowest number is
131
00:17:55,299 --> 00:18:02,529
eighteen point three the next highest number
is nineteen point two the next highest number
132
00:18:02,529 --> 00:18:08,929
is nineteen point two you have twenty twenty
twenty twenty the next highest number is twenty
133
00:18:08,929 --> 00:18:13,500
i have one twenty one here . after twenty
one . i have twenty three here so twenty three
134
00:18:13,500 --> 00:18:20,230
after twenty three is twenty four point three
is there here and twenty four point two is
135
00:18:20,230 --> 00:18:27,450
here so twenty four point two twenty four
point three a twenty four point two to twenty
136
00:18:27,450 --> 00:18:34,120
four point five . then after twenty four point
five you have twenty five you have twenty
137
00:18:34,120 --> 00:18:41,080
five twenty five point two twenty five point
six so it is twenty five point five also then
138
00:18:41,080 --> 00:18:45,410
twenty five point six twenty five point five
is there twenty five point six after that
139
00:18:45,410 --> 00:18:52,080
you have twenty six point six . . ok then
after twenty six point six you have twenty
140
00:18:52,080 --> 00:19:05,770
seven point five so i can write the remaining
numbers now i know that n is twenty five right
141
00:19:05,770 --> 00:19:15,490
so my median position is basically so half
into n plus one position is equal to half
142
00:19:15,490 --> 00:19:19,100
into twenty five plus one is equal to thirteen
right so. i did not complete it but whatever
143
00:19:19,100 --> 00:19:25,830
number one one two three. one two three four
five six seven eight nine ten eleven twelve
144
00:19:25,830 --> 00:19:30,350
thirteen so your median is twenty five point
six so median is twenty five point six
145
00:19:30,350 --> 00:19:33,809
ok so sometimes you will get an exact value
so if your number of data is odd so when n
146
00:19:33,809 --> 00:19:37,360
is odd then half of n plus one will give you
an exact value so you directly choose this
147
00:19:37,360 --> 00:19:45,320
value but when n is even half of n plus one
will give you . an you know a fraction so
148
00:19:45,320 --> 00:19:47,990
you have to interpolate between two numbers
as was the case previously so in this case
149
00:19:47,990 --> 00:19:53,370
i had six observations so my median position
was three point five in which case i had to
150
00:19:53,370 --> 00:19:58,710
interpolate between the two numbers but when
n is equal to odd ok so i can get the exact
151
00:19:58,710 --> 00:20:14,240
value because this is an exact position so
i can just collect the number at that position
152
00:20:14,240 --> 00:20:20,340
so there is one more metric ok there is one
more metric which is often used this is called
153
00:20:20,340 --> 00:20:24,529
the more mode the most frequently occurring
value ok so in our case so let us say for
154
00:20:24,529 --> 00:20:28,720
example this is a data set of number of visits
to ah you know. n dental clinic in a typical
155
00:20:28,720 --> 00:20:34,870
week and we can see that which is the most
frequently observing value so let us tabulate
156
00:20:34,870 --> 00:20:40,240
again ok i have my x and i have my frequency
ok so number of visits one i have two . i
157
00:20:40,240 --> 00:20:46,820
have three i have four i have five i have
six i have seven i have eight i have nine
158
00:20:46,820 --> 00:20:49,620
ok
so frequency for one is two frequency for
159
00:20:49,620 --> 00:20:55,559
two is one frequency for three is one two
three frequency for four is one two three
160
00:20:55,559 --> 00:21:01,590
four five so frequency for five is one two
three four five one two three four five six
161
00:21:01,590 --> 00:21:05,700
seven frequency for six is one frequency for
seven is one two three frequency for eight
162
00:21:05,700 --> 00:21:10,279
is one two frequency for nine is one ok so
clearly you look at the frequency column identify
163
00:21:10,279 --> 00:21:13,559
the maximum number which is seven that means
your mode is five ok so mode is the most frequently
164
00:21:13,559 --> 00:21:18,039
occurring value which . is five so on an average
five number of visits are there in a given
165
00:21:18,039 --> 00:21:23,820
week
but let's say in this you did this you went
166
00:21:23,820 --> 00:21:27,530
through this exercise and let's say hypothetically
let's say you have another case where even
167
00:21:27,530 --> 00:21:34,269
four gives you the number seven or you know
one also gives you the number seven then what
168
00:21:34,269 --> 00:21:51,179
do you do so you do not have one unique answer
and that is exactly what brings us to this
169
00:21:51,179 --> 00:21:59,820
plot where you have you ah know your distribution
might be unimodal that means there is a heap
170
00:21:59,820 --> 00:22:08,880
which is with a unique peak ok or you can
have bimodal and there there are two peaks
171
00:22:08,880 --> 00:22:17,659
in this curve so this is a bimodal distribution
in which case you have to report both the
172
00:22:17,659 --> 00:22:23,820
mode values what's is the unimodal distribution
ok so now that we have three different matrix
173
00:22:23,820 --> 00:22:33,269
of quantifying the data one is mean one is
median one is mode so which one do we choose
174
00:22:33,269 --> 00:22:42,320
. which one do we choose so a rule of thumb
is mode is used for large data sets let's
175
00:22:42,320 --> 00:22:47,740
just say for hypothetical case you had only
five observations and these operation one
176
00:22:47,740 --> 00:22:54,789
two two four five so in this case reporting
a mode doesn't make sense because your data
177
00:22:54,789 --> 00:23:09,090
set is too small ok so in this case it is
better to do either a mean or a median so
178
00:23:09,090 --> 00:23:17,279
as the rule of thumb if you look back at the
rule of thumb mode is used for large data
179
00:23:17,279 --> 00:23:23,889
sets what is large if you have fifty observations
i would say that is a reasonable number to
180
00:23:23,889 --> 00:23:28,870
calculate or to report mode as one of the
representative matrix
181
00:23:28,870 --> 00:23:37,000
the other two matrix mean and median are used
for both small and large data sets so as we
182
00:23:37,000 --> 00:23:44,809
said that you know one of the caveats of mode
is depending on your distribution you may
183
00:23:44,809 --> 00:23:49,039
not have one unique value but there might
be more than one but both mean and median
184
00:23:49,039 --> 00:23:53,690
gives you one particular value . the other
beauty about median is median is insensitive
185
00:23:53,690 --> 00:23:58,050
to outliers. advert seen in the previous case
where we could clearly see that you know if
186
00:23:58,050 --> 00:24:02,669
you. when we are doing the arithmetic mean
this guy hundred will directly influence your
187
00:24:02,669 --> 00:24:06,440
observations but when you are doing the median
then essentially you are finding out the position
188
00:24:06,440 --> 00:24:08,990
which is here and then that eliminates the
possibility or effect of outliers ok so this
189
00:24:08,990 --> 00:24:12,429
is one of the you know beauty of this median
metric that median is not is less sensitive
190
00:24:12,429 --> 00:24:17,659
to outliers but you have multiple values which
are all big and they are spread across then
191
00:24:17,659 --> 00:24:22,000
median might have a outlier there is another
way of thinking about when to use mean when
192
00:24:22,000 --> 00:24:26,419
to use median and mean to use mode
ok so let's say you did experiments where
193
00:24:26,419 --> 00:24:33,499
there were two conditions are two you know
two experimental conditions a and b and that
194
00:24:33,499 --> 00:24:38,990
you have obtained your data set . and you
want to ask do a and b differ ok in other
195
00:24:38,990 --> 00:24:40,910
words if i . ok so let's say you have done
your frequency histogram this is for condition
196
00:24:40,910 --> 00:24:45,179
a and for condition b it is this is for condition
b ok so from a peaked distribution which has
197
00:24:45,179 --> 00:24:48,909
a distinct maxima b is reasonably flat in
other words all values are equally reasonable
198
00:24:48,909 --> 00:24:53,260
so clearly this would tend to say that there
is a big difference so and you can report
199
00:24:53,260 --> 00:24:56,809
mode as this value but in the example that
i took there is no mode to report
200
00:24:56,809 --> 00:25:03,049
second thing is a bigger than b in that case
median might be a good way to look at it ok
201
00:25:03,049 --> 00:25:09,350
and last how much you want to quantify the
extent to which a and b differ and in that
202
00:25:09,350 --> 00:25:14,100
case we can report the mean ok so there are
different you know context in which you might
203
00:25:14,100 --> 00:25:19,419
. use a mean median or a mode let us go through
few examples where we compute all these three
204
00:25:19,419 --> 00:25:30,360
quantities ok so let us again come to this
particular example the number of visits to
205
00:25:30,360 --> 00:25:34,330
a dental clinic in a typical week right so
what we wanted to so i had calculated that
206
00:25:34,330 --> 00:25:39,350
frequency distribution ok so this was the
frequency distribution earlier ok so my mode
207
00:25:39,350 --> 00:25:46,677
sorry i let me rewrite it ok let me rewrite
it ok so i have x and i have frequency . so
208
00:25:46,677 --> 00:25:50,700
you have where is a one two three four five
six seven eight . nine
209
00:25:50,700 --> 00:25:55,899
ok so one is two times one is two times two
is one time three is three times four is five
210
00:25:55,899 --> 00:26:02,419
times five is seven times six is one times
seven is three times eight is two times nine
211
00:26:02,419 --> 00:26:04,029
is one time . ok so for this particular distribution
we want to identify ok we want to identify
212
00:26:04,029 --> 00:26:10,490
the mode which we already have so my mode
is equal to five let us calculate the mean
213
00:26:10,490 --> 00:26:19,950
then so mean i can use the formula x bar is
equal to summation f x by summation f this
214
00:26:19,950 --> 00:26:25,580
would give the value one into two plus two
into one plus three into three plus four into
215
00:26:25,580 --> 00:26:28,549
five plus five into seven plus six into one
plus seven into three plus eight into two
216
00:26:28,549 --> 00:26:35,480
plus nine into one my total number of observations
is two plus one three three plus three six
217
00:26:35,480 --> 00:26:41,580
six plus five eleven eleven plus seven eighteen
plus one nineteen nineteenth plus three twenty
218
00:26:41,580 --> 00:26:48,470
two twenty two twenty four twenty five so
let us calculate the sum here so you here.
219
00:26:48,470 --> 00:26:52,700
we have two plus two four four plus nine thirteen
. thirteen plus twenty thirty three thirty
220
00:26:52,700 --> 00:26:56,940
three plus thirty five sixty eight sixty eight
plus six seventy four seventy four plus twenty
221
00:26:56,940 --> 00:27:01,259
one is ninety five ninety five. plus sixteen
is hundred and one hundred and one plus nine
222
00:27:01,259 --> 00:27:08,590
is hundred and ten so it is hundred and ten
by twenty five ok so we just go to the next
223
00:27:08,590 --> 00:27:12,150
page
so we had found mean is equal to hundred and
224
00:27:12,150 --> 00:27:14,080
ten by twenty five four plus four point four
mode we have found out as five and median
225
00:27:14,080 --> 00:27:17,190
so. if we arrange the numbers ok so for calculating
median we have to arrange the numbers we have
226
00:27:17,190 --> 00:27:18,190
one one two two three three three four four
four four four five five five five five . one
227
00:27:18,190 --> 00:27:19,190
two three four five six seven six seven seven
seven eight eight nine right so i want to
228
00:27:19,190 --> 00:27:21,460
find out so median is at position half into
twenty five plus one which is position thirteen
229
00:27:21,460 --> 00:27:25,870
and what is position thirteen one two three
four five six seven eight nine ten eleven
230
00:27:25,870 --> 00:27:28,110
twelve thirteen so we get . a median is equal
to five so these are your three observations
231
00:27:28,110 --> 00:27:38,510
you see in this particular case mean more
median they are all very close to each other
232
00:27:38,510 --> 00:27:45,779
but still there is a difference ok so ah
in so in the general context if i were to
233
00:27:45,779 --> 00:27:48,100
draw the curves so the best . case situation
we will discuss this particular curve which
234
00:27:48,100 --> 00:27:52,679
is a gaussian or. you know a gaussian . distribution
a gaussian or a normal distribution . for
235
00:27:52,679 --> 00:27:57,769
this particular case mean equal to mode equal
to median but for all other cases for all
236
00:27:57,769 --> 00:27:59,800
other cases in general so this is one particular
case but your data might look like this or
237
00:27:59,800 --> 00:28:01,220
data might look like this ok so this is x
and this is frequency ok so in this case for
238
00:28:01,220 --> 00:28:05,740
example let's say this is my mode my median
is right in the middle median will probably
239
00:28:05,740 --> 00:28:10,600
be somewhere here so this is my median an
average will somewhere like here this is my
240
00:28:10,600 --> 00:28:14,909
mean so in this case i have mode greater sorry
less than median less than mean . ok and in
241
00:28:14,909 --> 00:28:19,559
the other case so this is case b this is case
a where all the three is equal so this is
242
00:28:19,559 --> 00:28:22,820
case b and for case c you will have mean less
than median less than mode .
243
00:28:22,820 --> 00:28:28,840
ok so with that you get an idea that you can
have three matrix mean median mode and then
244
00:28:28,840 --> 00:28:34,940
depending on your data sets so you have you
know how to calculate mean median mode and
245
00:28:34,940 --> 00:28:40,309
you see that depending on the data set either
there will be a left shift of the data as
246
00:28:40,309 --> 00:28:48,919
in case b here or there will be a right sheet
of the data as in case c here or in the best
247
00:28:48,919 --> 00:28:54,119
case situation if you collect enough statistics
most processes in nature follow the normal
248
00:28:54,119 --> 00:28:55,570
distribution so you will have this distribution
in which case whether you do the mean median
249
00:28:55,570 --> 00:29:02,380
mode you will get the last give the same value
with that i thank you for today's lecture
250
00:29:02,380 --> 00:29:05,559
we'll meet again in next week
thank you .