1
00:00:14,730 --> 00:00:21,430
hello and welcome to today's class so in our
last lecture i had discussed about the type
2
00:00:21,430 --> 00:00:23,560
of data you acquire from a given experiment
and the ways and means of presenting that
3
00:00:23,560 --> 00:00:27,140
data so in today's class we will see how you
from that data you can extract some quantitative
4
00:00:27,140 --> 00:00:30,789
parameters to distinguish between two experimental
conditions and and say whether those differences
5
00:00:30,789 --> 00:00:35,010
are significant or what kind of ah you know
ah conclusions can you draw by looking at
6
00:00:35,010 --> 00:00:38,270
the data
so before we get started i would like to do
7
00:00:38,270 --> 00:00:48,140
a brief recap of what we had discussed in
last class . so in the last class i had discussed
8
00:00:48,140 --> 00:00:55,530
about the types of data which would mean categorical
data which is more you know qualitative in
9
00:00:55,530 --> 00:00:58,030
nature or quantitative data even within the
types of data in quantitative data you might
10
00:00:58,030 --> 00:01:10,040
have discrete variables like the number of
visits to a clinic so on and so forth or continuous
11
00:01:10,040 --> 00:01:23,280
variables like the time it takes to for a
student to solve a problem ok
12
00:01:23,280 --> 00:01:34,759
then we discussed about the types of plotting
and ah so. expressing for you know standard
13
00:01:34,759 --> 00:01:51,280
representation of qualitative data is used
ah is using pie charts or bar charts but the
14
00:01:51,280 --> 00:01:56,319
caveat is that these kind of charts you can
only plot when the number of conditions or
15
00:01:56,319 --> 00:02:02,989
the number of you know ah distribution is
not widely distributed in other words if you
16
00:02:02,989 --> 00:02:08,630
having a pie chart if you have to show two
hundred conditions then it is impossible to
17
00:02:08,630 --> 00:02:14,120
fathom any sense from that plot because everything
has to be represented in terms of a circle
18
00:02:14,120 --> 00:02:24,160
ok then we talked about the histogram . it
is one of the most widest used distributions
19
00:02:24,160 --> 00:02:44,080
ok and how you go by converting data raw data
by using frequency tables you you know you
20
00:02:44,080 --> 00:02:54,340
make bins you make frequency tables so that
you can put that data into some form and you
21
00:02:54,340 --> 00:03:05,379
then you plot in terms of values and their
frequency and you get what is known as a histogram
22
00:03:05,379 --> 00:03:10,700
right
so this is a sample representation of a histogram
23
00:03:10,700 --> 00:03:16,849
right we can have these are this is my frequency
axis ok this is my variable x let's say this
24
00:03:16,849 --> 00:03:21,950
vertical axis my frequency ok and this is
how i can plot them now these frequency axis
25
00:03:21,950 --> 00:03:26,430
can be plotted in absolute terms . or in relative
terms and this relative frequency is important
26
00:03:26,430 --> 00:03:33,329
when you are comparing between two separate
experiments or two different conditions and
27
00:03:33,329 --> 00:03:39,879
the number of data points is not the same
let's say in one case you have obtained the
28
00:03:39,879 --> 00:04:00,580
data for . fifty total of fifty points and
the other case there are hundred points
29
00:04:00,580 --> 00:04:10,109
so if you were to put that in the same frequency
data the plot might look so i if i were to
30
00:04:10,109 --> 00:04:14,709
overlay ah on this existing plot a different
plot it might look like this but in reality
31
00:04:14,709 --> 00:04:19,600
if you normalize so if you actually normal
if you normalize this data ok so that what
32
00:04:19,600 --> 00:04:32,110
you might have your data might actually look
like this ok so this you have relative frequency
33
00:04:32,110 --> 00:04:44,050
where you divide with respect to the maximum
the total number of observations ok
34
00:04:44,050 --> 00:04:59,450
after histogram we discussed about line and
scatter plots these are widely used ok so
35
00:04:59,450 --> 00:05:04,660
you just you know depending on your variables
let's say you have two variables x and y and
36
00:05:04,660 --> 00:05:21,140
you are plotting how y is varying with respect
to x you can put them as scatter plots where
37
00:05:21,140 --> 00:05:28,720
they are individually point you know each
point is the x y representation in two d but
38
00:05:28,720 --> 00:05:36,460
of course with scatter points you can convert
it into line plot only if it makes sense . if
39
00:05:36,460 --> 00:05:46,250
these data have some very clear trend that
it can be approximated by a line then it makes
40
00:05:46,250 --> 00:05:53,760
sense to make it as a line plot i also briefly
discussed about this double y plots where
41
00:05:53,760 --> 00:06:05,190
you have to let's say you have three variables
x y and which are variable with which are
42
00:06:05,190 --> 00:06:11,790
changing with respect to a variable z then
you can plot . then you can plot let's say
43
00:06:11,790 --> 00:06:17,020
. x and y with respect to your variable z
and let's just say that as i let's ah ah take
44
00:06:17,020 --> 00:06:21,140
a simple example where z is time and x and
y are two variables. which are varying with
45
00:06:21,140 --> 00:06:25,160
time and we want to see how x and y correlate
so as per the plot i have drawn as x decreases
46
00:06:25,160 --> 00:06:41,690
so in in our case there is my time axis at
x decreases with time y increases with time
47
00:06:41,690 --> 00:06:45,190
which tends to say that they are inversely
correlated . ok so next thing i wanted to
48
00:06:45,190 --> 00:06:54,900
talk about so in terms when you are making
the plot . few things to keep in mind you
49
00:06:54,900 --> 00:07:04,520
must label your axis appropriately you must
choose appropriate units so for example let's
50
00:07:04,520 --> 00:07:14,760
say we are talking about cell speed you can
report or you know any speed you can report
51
00:07:14,760 --> 00:07:25,140
in micron per minute micron per hour meter
per minute meter per hour so on and so forth
52
00:07:25,140 --> 00:07:28,170
depending on what you are measuring and what
is the minimum amount you can measure in.
53
00:07:28,170 --> 00:07:33,660
accurate manner ok
so that also brings us that you want to choose
54
00:07:33,660 --> 00:07:39,910
the appropriate range let's say you are seeing
a population explosion and you the maximum
55
00:07:39,910 --> 00:07:45,470
value you choose for your y axis right so
if i go to the example let's say this is time
56
00:07:45,470 --> 00:07:48,530
and this is my population . ok if i want to
convey that my population is really increasing
57
00:07:48,530 --> 00:07:52,470
with time and i choose my y range such that
my curve only looks like this then what i
58
00:07:52,470 --> 00:07:58,230
want to convey does not flow with the way
i have plotted the data much rather what i
59
00:07:58,230 --> 00:08:05,300
should do . is i should plot the same curve
with so if this is my max value so i can choose
60
00:08:05,300 --> 00:08:11,490
this max value as reasonably high so i can
really see a nonlinear growth in in the way
61
00:08:11,490 --> 00:08:15,170
the population is expanding as opposed to
choosing an arbitrary large value for the
62
00:08:15,170 --> 00:08:20,860
y axis . ok
the other thing important thing to note is
63
00:08:20,860 --> 00:08:28,840
when you have large variations in data let's
say most of your values i am just plotting
64
00:08:28,840 --> 00:08:41,560
this is my time axis and this is my x axis
ok and x values are you know either you have
65
00:08:41,560 --> 00:08:55,050
very low values point one point two one and
then suddenly the next value goes to you know
66
00:08:55,050 --> 00:09:05,829
forty fifty hundred so it is very. difficult
to represent all these values of x in the
67
00:09:05,829 --> 00:09:16,259
same plot so there are two approaches so what
you do you either put an insert a break in
68
00:09:16,259 --> 00:09:28,399
the axis so let's say my zero to one is in
one graph i and and the next. point start
69
00:09:28,399 --> 00:09:36,519
is from thirty five so my forty point will
rise somewhere here . and my you know let's
70
00:09:36,519 --> 00:09:46,970
say if this. large values point eight it'll.
it'll be somewhere here so i can see both
71
00:09:46,970 --> 00:09:56,180
these points on the same axis
the alternative to that is to choose is to
72
00:09:56,180 --> 00:10:02,610
plot over a wider range and what you do is
you show a smaller range let's say this is
73
00:10:02,610 --> 00:10:07,360
from zero to hundred and this is from zero
to ah sorry this vertical axis is from zero
74
00:10:07,360 --> 00:10:13,550
to this this is your time axis ok so this
vertical axis is from zero to hundred but
75
00:10:13,550 --> 00:10:20,300
the inset is only from zero to one so here
you can have these extra points and then you
76
00:10:20,300 --> 00:10:29,230
can clearly see how they vary in y axis so
this is using insets and this is using breaks
77
00:10:29,230 --> 00:10:32,879
. ok
so let us ah just begin today's lecture with
78
00:10:32,879 --> 00:10:50,939
one example ok so this is a you know sample
representation of data of ten students in
79
00:10:50,939 --> 00:10:59,120
two exams ok and we want to find out how we
should represent this data so . i presume
80
00:10:59,120 --> 00:11:04,120
it is reasonably clear to most of you that
since it is x and y for ten students then
81
00:11:04,120 --> 00:11:10,120
what you can do is the easiest way to plot
this data is using a scatter plot ok so you
82
00:11:10,120 --> 00:11:15,959
can plot using scatter and for each point
you can see is a sixty one and forty nine
83
00:11:15,959 --> 00:11:22,370
and so on and so forth so let's say your maximum
value is in the range of zero to hundred zero
84
00:11:22,370 --> 00:11:27,629
to hundred ok and then accordingly you can
put individual points and see how these performances
85
00:11:27,629 --> 00:11:30,930
are correlated so you're aim might have been
to see that how the students are performing
86
00:11:30,930 --> 00:11:38,029
across different exams and these are in two
different exams to to perhaps two different
87
00:11:38,029 --> 00:11:52,670
subjects so if you see a very positive correlation
so let's let's just say one simple case that
88
00:11:52,670 --> 00:12:06,100
you have sorted this data and you have plotted
them separately such that so here i have just
89
00:12:06,100 --> 00:12:20,490
randomly plotted but you have plotted in such
a way that the student so this is . the student
90
00:12:20,490 --> 00:12:27,260
with the lowest score in exam one he or she
also scores the lowest score in exam in exam
91
00:12:27,260 --> 00:12:34,970
two so they have the lowest score so. if a
curve if you would sort the data and re plot
92
00:12:34,970 --> 00:12:41,379
this it conveys the message that the students
have ah reasonably they vary in their standards
93
00:12:41,379 --> 00:12:51,540
and the student who is good is consistently
good and that is the reason why high score
94
00:12:51,540 --> 00:13:00,290
in. exam one also translates into a high score
in exam two and. similarly for the student
95
00:13:00,290 --> 00:13:18,249
who is weak low score in exam one is also
closely related to a low score in exam two
96
00:13:18,249 --> 00:13:27,089
ok
so let us take another example . ok so we
97
00:13:27,089 --> 00:13:33,819
now have this is the data for r b c counts
of a healthy individual measured in fifteen
98
00:13:33,819 --> 00:13:38,880
successive days ok so is is it has units of
ten to the power six cells per m l per micro
99
00:13:38,880 --> 00:13:49,110
litre u litre is short form for micro litre
so what you can see is you can have these
100
00:13:49,110 --> 00:13:55,329
important values so . the question is how
should we represent this data so if i were
101
00:13:55,329 --> 00:14:10,040
to look at the values which are occurring
i have four point nine i have five i have
102
00:14:10,040 --> 00:14:28,999
five point two i have five point three i have
five point four and i have five point five
103
00:14:28,999 --> 00:14:39,220
ok so these are the values and i can plot
their frequency ok four point nine appears
104
00:14:39,220 --> 00:14:48,329
twice five appears one five point two appears
one two three four times ok five point three
105
00:14:48,329 --> 00:14:53,300
appears two times five point four appears
three times and five point five appears once
106
00:14:53,300 --> 00:14:56,819
ok
so the very fact that i have actually converted
107
00:14:56,819 --> 00:15:02,939
into frequency plot would mean that eventually
. you would plot this data as a histogram
108
00:15:02,939 --> 00:15:11,040
so these are your individual values four point
nine five so on and so forth this is a five
109
00:15:11,040 --> 00:15:18,699
point five value and what i can see is so
if i were to so this is my two one then i
110
00:15:18,699 --> 00:15:25,569
have a maximum four then i have two three
one so if i were to actually connect these
111
00:15:25,569 --> 00:15:30,769
data so my curve would look something like
this so the histogram looks something like
112
00:15:30,769 --> 00:15:38,480
this ok
now whether or not it is so this is too few
113
00:15:38,480 --> 00:15:47,739
numbers so it is difficult to. comment on
it but if let's say i asked you then following
114
00:15:47,739 --> 00:15:56,170
question let's say in today's score so this
is you know the your values for the last several
115
00:15:56,170 --> 00:16:09,709
days and todays score is five points seven
is it unusual so you can clearly see from
116
00:16:09,709 --> 00:16:14,160
your ex ah you know existing. histogram that
there is no five point seven value so one
117
00:16:14,160 --> 00:16:19,620
would argue that . five point seven is a little
bit unnatural maybe you are you know suffering
118
00:16:19,620 --> 00:16:22,129
from some disease or infection so it's it
warrants a doctor's examination ok
119
00:16:22,129 --> 00:16:26,089
so let me take another example . ok so imagine
imagine we have you know i actually take the
120
00:16:26,089 --> 00:16:32,550
statistics of average students height in a
class ok and i do it on three successive days
121
00:16:32,550 --> 00:16:41,139
and this is so this is the plot of height
and relative frequency on day one this is
122
00:16:41,139 --> 00:16:59,550
my distribution on day two this is my distribution
and on day three this is my distribution so
123
00:16:59,550 --> 00:17:05,710
this is you know mind you it is not from three
different populations it is from the same
124
00:17:05,710 --> 00:17:14,300
population right so when the you know it is
a little bit baffling as you can see that
125
00:17:14,300 --> 00:17:21,280
there are some overlapping regions but the
curves look reasonably different from each
126
00:17:21,280 --> 00:17:26,079
other so what are the qualitative aspects
of these curves . which are different what
127
00:17:26,079 --> 00:17:32,090
i can clearly see is if i were to look at
the maximum in peak height so for which you
128
00:17:32,090 --> 00:17:48,690
have the maximum number of observations it
is at this value to the left on day one to
129
00:17:48,690 --> 00:17:52,040
the right on day two and to the middle on
day three .
130
00:17:52,040 --> 00:17:56,799
now if i were to imagine that on an average
your population consists of boys and girls
131
00:17:56,799 --> 00:18:02,429
and on an average boys are you know are taller
than girls then it is possible that on day
132
00:18:02,429 --> 00:18:06,130
one you had a greater number of girls present
in the lecture on day two you had a greater
133
00:18:06,130 --> 00:18:11,580
prevalence of boys attending the lecture and
on day three there is a reasonably equal distribution
134
00:18:11,580 --> 00:18:15,239
or equal fraction of the boys or the girls
who were present in the class ok
135
00:18:15,239 --> 00:18:23,320
so this you know begs the question that how
can i capture this data ok how do i go about
136
00:18:23,320 --> 00:18:33,980
in trying to capture this data and that brings
us to the next topic that how can i come up
137
00:18:33,980 --> 00:18:41,279
with numerical measures to describe data and
. perhaps the most popular one the widest
138
00:18:41,279 --> 00:18:49,460
used one is arithmetic mean which most likely
all of you have heard right so average or
139
00:18:49,460 --> 00:19:01,190
arithmetic mean the way it is defined is let's
say i have a set of variables x one x two
140
00:19:01,190 --> 00:19:12,590
x three dot dot dot dot n variables x n right
now n so this is my sample ok i have drawn
141
00:19:12,590 --> 00:19:18,630
a sample of n from a population y one or let's
say x one x two x capital n this is my population
142
00:19:18,630 --> 00:19:30,800
ok so just to. you know ah stop any ah stop
any confusion let me write them as y one y
143
00:19:30,800 --> 00:19:40,169
two and y n so essentially you have derived
a sample of small n small n is less than capital
144
00:19:40,169 --> 00:19:45,929
n ok and we want to know what is the arithmetic
average ok so as most of you are perhaps aware
145
00:19:45,929 --> 00:19:50,340
mean or sample mean is. defined by this variable
. called x bar which is nothing but x one
146
00:19:50,340 --> 00:19:53,789
plus x two plus x three plus up to x n divided
by the number of observations which is n ok
147
00:19:53,789 --> 00:19:58,360
so in simple language in simple language i
can also write it as in a compressed form
148
00:19:58,360 --> 00:20:03,140
i write summation x i n i is equal to one
to n which means so you put the summation
149
00:20:03,140 --> 00:20:11,580
sign here and it goes from i is equal to one
to i is equal to n so this is for i equal
150
00:20:11,580 --> 00:20:23,970
to one it is x one for i is equal to two it
is x two so on and so forth so this is your
151
00:20:23,970 --> 00:20:32,490
definition of sample mean similarly i can
define my population mean so it is typically
152
00:20:32,490 --> 00:20:46,860
used mu is a term which is used to differently
you know define the population mean it is
153
00:20:46,860 --> 00:20:55,769
similarly by y i by capital n
so sometimes many times you will see we just
154
00:20:55,769 --> 00:21:06,899
put the summation sign without actually writing
the limits but it is understood when we don't
155
00:21:06,899 --> 00:21:15,049
put the limits it is from the first value
to the last value so this is from the . first
156
00:21:15,049 --> 00:21:26,679
value to the last value ok so in simple speak
i would then so i can accordingly simplify
157
00:21:26,679 --> 00:21:39,620
x bar as summation x i by n and this is summation
y i by capital n ok
158
00:21:39,620 --> 00:21:49,850
so now let us do some transformations simple
thing so imagine i had ah this variable x
159
00:21:49,850 --> 00:22:08,330
ok and i define this variable y which is nothing
but a times x ok and i want to know what is
160
00:22:08,330 --> 00:22:14,630
the relationship between x bar and y bar ok
how are they connected so how do i go about
161
00:22:14,630 --> 00:22:35,400
it so what i should do is i can write so if
y is equal to a x so this would mean my y
162
00:22:35,400 --> 00:22:43,649
one is equal to a x one y two is equal to
a x two y three equal to a x three dot y n
163
00:22:43,649 --> 00:22:49,760
is equal to a x n ok .
so then we can derive so what is my den definition
164
00:22:49,760 --> 00:23:03,750
of y bar y bar has to be nothing but y one
plus y two plus y three up to y n by n so
165
00:23:03,750 --> 00:23:12,120
y one is nothing but a x one plus a x two
. up to a x n by n i can take a common and
166
00:23:12,120 --> 00:23:15,769
it is x one plus x two up to x n by n ok so
if this is . so let me write it in a new page
167
00:23:15,769 --> 00:23:24,140
my y bar is nothing but a times x one plus
x two up to x n . y n which is nothing but
168
00:23:24,140 --> 00:23:29,590
this is nothing but x bar right so x one plus
x two up to x n by n is x bar so y bar is
169
00:23:29,590 --> 00:23:33,950
nothing but a x bar so this tells you . that
when you have a pre factor which is multiplied
170
00:23:33,950 --> 00:23:39,870
on a variable x then the average of the new
variable y is simply the pre factor multiplied
171
00:23:39,870 --> 00:23:40,870
by the average of x ok
now we'll let us do another simple transformation
172
00:23:40,870 --> 00:23:45,470
imagine y is nothing but c plus x . ok so
i want to ask what is y bar and x bar how
173
00:23:45,470 --> 00:23:49,269
are they related so what you can do the same
exercise you can do the same thing but y bar
174
00:23:49,269 --> 00:24:03,799
is now c plus x one plus c plus x two plus
c plus x n by n . ok so if that is so so what
175
00:24:03,799 --> 00:24:09,610
i can take in my definition of y bar i can
take c common but c is one plus one plus one
176
00:24:09,610 --> 00:24:19,080
n times then you have x one plus x two plus
x n and this whole thing divided by . n ok
177
00:24:19,080 --> 00:24:24,860
now this one plus one plus one the you know
n times is simply n so this gives me the formula
178
00:24:24,860 --> 00:24:29,970
c plus and this remaining thing is x bar so
y bar is c x bar ok so when you have a constant
179
00:24:29,970 --> 00:24:41,330
to a variable x added what you can do is you
can simply put y bar is a constant plus the
180
00:24:41,330 --> 00:24:50,440
x bar so if i were to generalize these two
rules so if i have a variable y which is c
181
00:24:50,440 --> 00:24:52,881
plus a x then i can write the formula y bar
is c plus a x bar . ok
182
00:24:52,881 --> 00:24:54,960
so these you know this is what that you know
three rules of transformation is so let us
183
00:24:54,960 --> 00:25:01,010
apply this transformation to a sample example
so imagine . imagine these are the test scores
184
00:25:01,010 --> 00:25:03,830
of the twenty students right so if i right
i'd write down like this sixty one ninety
185
00:25:03,830 --> 00:25:09,789
three eighty seven forty two . fifty five
sixty seven eighty two so on and so forth
186
00:25:09,789 --> 00:25:12,100
ok so i will only take these following numbers
ok
187
00:25:12,100 --> 00:25:16,759
so now let's say this is my design variable
x so i want to add the i want to find out
188
00:25:16,759 --> 00:25:21,330
x bar now what i see the range so if i just
look at these few numbers this is the smallest
189
00:25:21,330 --> 00:25:29,010
ok and this is the largest ok so what i can
do is if i define so the it boils down to
190
00:25:29,010 --> 00:25:30,010
the question how should i define y so that
it'll be convenient for me to calculate this
191
00:25:30,010 --> 00:25:33,149
average using this using these transformations
so what you can do one of the transformations
192
00:25:33,149 --> 00:25:34,490
which you might think of doing is dividing
by ten so everything becomes six point one
193
00:25:34,490 --> 00:25:39,809
nine point three eight point seven but still
you have to add lot of the fractions which
194
00:25:39,809 --> 00:25:47,750
is you know not so convenient if you are doing
it by hand .
195
00:25:47,750 --> 00:25:51,279
so easier alternative is to define y so if
i define y so my smallest is forty two largest
196
00:25:51,279 --> 00:25:58,259
is ninety three let us choose y as x minus
some value which is in the middle of forty
197
00:25:58,259 --> 00:26:02,030
and ninety ok so which is sixty five sixty
five is nothing but average of forty and ninety
198
00:26:02,030 --> 00:26:05,350
so forty plus ninety by two is sixty five
if i define y is x minus sixty five then this
199
00:26:05,350 --> 00:26:08,164
guy becomes minus four this guy becomes plus
twenty eight this becomes plus twenty two
200
00:26:08,164 --> 00:26:09,164
this becomes minus twenty three fifty five
means minus ten sixty seven means two eighty
201
00:26:09,164 --> 00:26:10,809
two means seventeen ok so i can add these
numbers up this is much more reasonable than
202
00:26:10,809 --> 00:26:27,179
adding much bigger numbers so what you can
see
203
00:26:27,179 --> 00:26:33,159
is so twenty two and twenty three is almost
cutting each other out so these two put together
204
00:26:33,159 --> 00:26:36,009
is plus one . and here you have so twenty
eight plus two thirty minus ten twenty plus
205
00:26:36,009 --> 00:26:38,289
seventeen twenty plus seventeen plus one eighteen
thirty eight minus four so thirty eight minus
206
00:26:38,289 --> 00:26:45,640
four is thirty four ok
so you can calculate the x bar which is nothing
207
00:26:45,640 --> 00:26:53,639
but ah your you know your y bar is nothing
but thirty four by let's say number of observations
208
00:26:53,639 --> 00:26:55,590
here one two three four five six seven which
is roughly equal to five if i approximate
209
00:26:55,590 --> 00:26:57,510
y bar by x bar then i can look at this equation
and i can say how is x bar then so if x bar
210
00:26:57,510 --> 00:26:58,789
is nothing but y bar plus sixty five so x
bar so i can then find x bar is nothing but
211
00:26:58,789 --> 00:27:01,289
approximately seventy is my average . ok so
without using calculators you can use simple
212
00:27:01,289 --> 00:27:04,470
transformations to make them go from one to
other and it is and. make it easier for you
213
00:27:04,470 --> 00:27:05,984
to do this ok
we could have . and and please remember that
214
00:27:05,984 --> 00:27:06,984
this value of sixty five is not a magic number
this we chose based on the range of the numbers
215
00:27:06,984 --> 00:27:07,984
ok so one simple rule of thumb you can do
is . if you have all these numbers let's say
216
00:27:07,984 --> 00:27:08,984
x one x two dot dot x n you first sort them
. from lowest to highest ok and let's say
217
00:27:08,984 --> 00:27:09,984
we call that y one y two so on and so forth
y n so in our previous case previous case
218
00:27:09,984 --> 00:27:10,984
y one was roughly forty two and y n is roughly
ninety three . ok
219
00:27:10,984 --> 00:27:11,984
so what we can now do we can have this transformation
y is equal to x minus c or this c that we
220
00:27:11,984 --> 00:27:12,984
define c is nothing but what we choose was
average of y one . and y n ok so the idea
221
00:27:12,984 --> 00:27:13,984
is that when you choose something in between
and and apply this transformation so if i
222
00:27:13,984 --> 00:27:16,143
were to look at this again so if we i if i
choose a value in between i will get some
223
00:27:16,143 --> 00:27:17,143
numbers which are negative and some numbers
which are positive because sixty five is roughly
224
00:27:17,143 --> 00:27:18,143
in the middle of this range and when i have
negative and positive then they kind of cancel
225
00:27:18,143 --> 00:27:19,143
each other out and you get a lower number
so we want to get to as low as sum as possible
226
00:27:19,143 --> 00:27:20,143
that we can directly compute the total sum
by hand and then accordingly we can apply
227
00:27:20,143 --> 00:27:21,143
this transformation
so this is the way to go about it you can
228
00:27:21,143 --> 00:27:22,143
roughly you sort the numbers from lowest to
highest you identify the numbers and you roughly
229
00:27:22,143 --> 00:27:23,143
so again we did not put c exactly as forty
two plus ninety three by two because again
230
00:27:23,143 --> 00:27:24,143
the process would have been much more difficult
to do by hand we wanted an approximation which
231
00:27:24,143 --> 00:27:25,143
we can which is easy to . implement and we
can see by hand what we are doing ok
232
00:27:25,143 --> 00:27:26,143
so let us just take one more example which
is the next example ok so again you see that
233
00:27:26,143 --> 00:27:27,143
you have ah you have this entire range of
numbers which goes from so this is body mass
234
00:27:27,143 --> 00:27:28,143
indices it goes so if i look at look through
the screen my lowest. the smallest number
235
00:27:28,143 --> 00:27:29,143
i have is probably eighteen by three eighteen
point three so smallest number is eighteen
236
00:27:29,143 --> 00:27:30,143
point three so this is my smallest the largest
number largest number is twenty nine point
237
00:27:30,143 --> 00:27:31,143
one ok twenty nine point one ok so then my
as per this my average is going to be eighteen
238
00:27:31,143 --> 00:27:32,143
point three plus twenty nine point one by
two so twenty nine and eighteen is forty seven
239
00:27:32,143 --> 00:27:33,143
. forty seven points point four by two is
roughly equal to forty eight so it's twenty
240
00:27:33,143 --> 00:27:34,143
four
so your average is roughly twenty four so
241
00:27:34,143 --> 00:27:35,143
you can then add it up so if i look at this
score again so my eighteen point three will
242
00:27:35,143 --> 00:27:36,143
give me a value of eighteen point three minus
twenty four while this twenty nine point one
243
00:27:36,143 --> 00:27:37,143
twenty nine point one will lead minus twenty
four will give me roughly five and this will
244
00:27:37,143 --> 00:27:38,143
roughly minus six so you see that these two
numbers if i add up then i'll get a value
245
00:27:38,143 --> 00:27:39,143
which is close to zero
in this way we can make use of transformations
246
00:27:39,143 --> 00:27:40,143
to make our life lot easier particularly when
we are not working with the calculator so
247
00:27:40,143 --> 00:27:41,143
with that i would ah stop in this lecture
so summary i would summarize by saying that
248
00:27:41,143 --> 00:27:42,143
arithmetic mean represents one of the simplest
ways of of representing data odds extracting
249
00:27:42,143 --> 00:27:43,143
some quantitative metric to characterize your
. data and then using basic transformations
250
00:27:43,143 --> 00:27:44,143
you can simplify and do things by hand to
find out what is the arithmetic mean of your
251
00:27:44,143 --> 00:27:45,143
given data set so either y is equal to c plus
x or y is equal to you know c x or y is equal
252
00:27:45,143 --> 00:27:46,143
to a plus c x are. three simple transformations
which you can do so imagine if your numbers
253
00:27:46,143 --> 00:27:47,143
were hundred two hundred like that then by
hundred dividing by hundred and making it
254
00:27:47,143 --> 00:27:48,143
one two three would be one simple way with
that i thank you for ah you know for your
255
00:27:48,143 --> 00:27:49,143
cooperation and i look forward to meeting
you in the next lecture
256
00:27:49,143 --> 00:27:49,145
thank you .