1
00:00:11,799 --> 00:00:19,980
So far we have looked at discrete distributions
like Poisson, Binomial. Let us now look at
2
00:00:19,980 --> 00:00:25,530
Continuous Distribution and the most important
one out of that is Normal Distribution. Normal
3
00:00:25,530 --> 00:00:30,550
Distribution is also called Bell shaped curve,
Gaussian curve and so on. And generally we
4
00:00:30,550 --> 00:00:39,280
assume any data behaves like a normally distributed
that means, if I take 10 or 20 students in
5
00:00:39,280 --> 00:00:45,039
a class and measure their heights, generally
it will fall Normal Distribution. That means
6
00:00:45,039 --> 00:00:49,660
there will be an average, there will be some
students who will have less height than the
7
00:00:49,660 --> 00:00:56,820
average, some students will have greater than
the average and the proportion will be almost
8
00:00:56,820 --> 00:00:57,820
same.
9
00:00:57,820 --> 00:01:02,949
There are certain terminologies which we need
to learn and they are quite simple you must
10
00:01:02,949 --> 00:01:09,840
have studied long time back also. They are
called mean, median, mode. What is a mean?
11
00:01:09,840 --> 00:01:14,750
Suppose I have a set of data and I want to
find the mean of this data set. Mean is nothing
12
00:01:14,750 --> 00:01:18,060
but taking the average. So it is quite simple.
13
00:01:18,060 --> 00:01:26,470
We add all these and divide and then we get
the mean. Now what is the median of this?
14
00:01:26,470 --> 00:01:36,860
Suppose we arrange the data set in this fashion.
The middle point is called the median. If
15
00:01:36,860 --> 00:01:41,970
you have even number of data so the middle
point will be average of these two whereas,
16
00:01:41,970 --> 00:01:46,470
if we have odd set of data the middle point
will be the center data point. In this particular
17
00:01:46,470 --> 00:01:55,860
case we have 8 data sets and the median will
be average of 12 and 13. Now what is mode?
18
00:01:55,860 --> 00:02:05,739
Mode is nothing but the value which comes
more often. For example, it is centered around
19
00:02:05,739 --> 00:02:14,700
that value, if you look at these data set
8, 11, 12, 12, 13, 14, 16, 18 we find it 12
20
00:02:14,700 --> 00:02:20,730
as more common or it the distribution is centered
around this so the mode is 12 in this case.
21
00:02:20,730 --> 00:02:29,530
So this particular data set we have a mean
of 13, median of 12.5 and mode as 12 and generally
22
00:02:29,530 --> 00:02:34,890
for a Normal Distribution the mean, median,
mode are the same. That is why Normal Distribution
23
00:02:34,890 --> 00:02:42,100
is called a uniformly distributed data set
and look like a bell shaped well distributed
24
00:02:42,100 --> 00:02:44,370
data set.
25
00:02:44,370 --> 00:02:52,770
Then there is something called Midhinge. Suppose
you have a quartile 1. What is quartile 1?
26
00:02:52,770 --> 00:02:59,640
You have a large data set assume that you
can divide this data set into 4 quarters or
27
00:02:59,640 --> 00:03:05,510
quartiles. This is called the first quartile
and this is called the third quartile. We
28
00:03:05,510 --> 00:03:12,810
have divided it into 4 data sets so you have
the Q 1 here. So 25 percent of the data will
29
00:03:12,810 --> 00:03:20,950
be smaller than this Q 1, 75 percent of the
data will be larger than this Q 1. You have
30
00:03:20,950 --> 00:03:26,090
a Q 2 that is a quartile 2. So 50 percent
of the data will be smaller than this and
31
00:03:26,090 --> 00:03:30,880
50 percent of the data will be larger than
this. Then we have the third quartile Q 3,
32
00:03:30,880 --> 00:03:38,350
75 percent of the data will be below and this
Q 3 and 25 percent will be above this. If
33
00:03:38,350 --> 00:03:45,440
it is very uniformly distributed you will
have each of these quarters same but if it
34
00:03:45,440 --> 00:03:51,410
is not uniformly distributed you will have
some variations between Q 1 and Q 3 and so
35
00:03:51,410 --> 00:03:55,300
on actually.
So, Midhinge is nothing but the middle point
36
00:03:55,300 --> 00:04:04,390
like that is Q 1 plus Q 3 by 2. In this particular
case we have here and we have here take an
37
00:04:04,390 --> 00:04:13,520
average it comes to 501.45. So generally the
quartiles are very useful to determine whether
38
00:04:13,520 --> 00:04:19,200
the data set is uniformly distributed or is
it skewed in one particular range, whether
39
00:04:19,200 --> 00:04:25,060
Q 1 is not same as Q 3, maybe it is skewed
and so on actually. That is the advantage
40
00:04:25,060 --> 00:04:28,789
of looking at the quartiles in data set.
41
00:04:28,789 --> 00:04:37,249
Range, range is nothing but the largest point
minus the smallest point. If you have a large
42
00:04:37,249 --> 00:04:45,080
data set like this. The range of the range
of the data is 509 to 591. If I am measuring
43
00:04:45,080 --> 00:04:54,130
the fermentation yield between 50 and 20 degrees,
I would say 50 is one end, 20 is another end
44
00:04:54,130 --> 00:04:59,640
the range is 30 degrees. If I am measuring
the growth of an organism between pH 3 and
45
00:04:59,640 --> 00:05:05,169
8 so the range will be 5. This is very obvious
and we have been using that then there is
46
00:05:05,169 --> 00:05:11,710
also inter quartile range that is nothing
but Q 3 minus Q 1. You know Q3 you know Q
47
00:05:11,710 --> 00:05:18,990
1. You find the difference that is called
the inter quartile range. Now, let us look
48
00:05:18,990 --> 00:05:21,080
at the variability of the data.
49
00:05:21,080 --> 00:05:27,909
The mean is ok, median is ok, mode is ok.
But then we would like to know, how these
50
00:05:27,909 --> 00:05:33,249
data set varies with respect to this average?
That is a very, very important point because
51
00:05:33,249 --> 00:05:37,860
that gives you an idea about the spread of
the data. Suppose I am measuring the height
52
00:05:37,860 --> 00:05:44,979
of the student in my class I am getting an
average of 5.5. Do all the students have their
53
00:05:44,979 --> 00:05:52,290
height very close to 5.5? Or is there a large
difference from this average of 5.5? Do we
54
00:05:52,290 --> 00:05:57,719
have students with 6? Do we have students
with 5? So that will give a very large spread
55
00:05:57,719 --> 00:06:02,630
and that is going to give you a very large
standard deviation. Whereas if the height
56
00:06:02,630 --> 00:06:09,849
of the students are very close to 5.5, 5.6
or 5.4 or 5.3 the variations are very going
57
00:06:09,849 --> 00:06:15,330
to be very small then the standard deviation
of this data set is also going to be very
58
00:06:15,330 --> 00:06:18,349
small.
How do you calculate that? You must have studied
59
00:06:18,349 --> 00:06:26,009
long time back. If x bar is the average of
the data, suppose I have 500.4, 502.8, 499.8,
60
00:06:26,009 --> 00:06:36,249
499.1, 503.1, 498.1 as the data set I taken
an average which is this. So, x bar minus
61
00:06:36,249 --> 00:06:43,389
x whole square that is I take the difference
with respect to the average square it up then,
62
00:06:43,389 --> 00:06:50,009
I add all of them. This is called sum of squares.
This is called sum of squares and this is
63
00:06:50,009 --> 00:07:00,889
also called variance. So the sample sum of
squares divided by n minus 1, n is the number
64
00:07:00,889 --> 00:07:09,770
of data points. So sum of squares divided
n minus 1 is called the sample variance. I
65
00:07:09,770 --> 00:07:18,840
take the difference between the x bar, which
is the mean and the data point square it up,
66
00:07:18,840 --> 00:07:24,580
I add up there I get this something called
sum of squares. If I divide this sum of squares
67
00:07:24,580 --> 00:07:31,449
by n minus 1 we get something called sample
variance. It is denoted by a square.
68
00:07:31,449 --> 00:07:37,289
Why is this called sample? Because you have
taken a small set of data, so the sample standard
69
00:07:37,289 --> 00:07:41,089
deviation, we take a square root of that,
that gives you the sample standard deviation
70
00:07:41,089 --> 00:07:46,550
now this variance and the standard deviation
gives you an indication of the spread of the
71
00:07:46,550 --> 00:07:53,450
data. That means, how much the data is spread,
If this variance is very large or the standard
72
00:07:53,450 --> 00:07:58,939
deviation is very large, you can tell the
data spread with respect to the mean or the
73
00:07:58,939 --> 00:08:04,490
average is also very, very large. If the standard
deviation is very small, then we can say the
74
00:08:04,490 --> 00:08:08,819
spread of the data with respect the mean is
also very small.
75
00:08:08,819 --> 00:08:13,710
So that is the advantage of this and variance
is very, very important as I mentioned in
76
00:08:13,710 --> 00:08:21,869
my first class that variation is part of any
data and so understanding this variation,
77
00:08:21,869 --> 00:08:30,490
the reasons for these variation is very, very
important in the area of statistics and identifying
78
00:08:30,490 --> 00:08:36,300
what are the causes? What are the reasons
for this particular variance? Is very, very
79
00:08:36,300 --> 00:08:43,589
important, so this standard deviation or the
variance of the sample is the way this is
80
00:08:43,589 --> 00:08:49,589
how you calculate, x bar minus x square that
is come given as some squares then divided
81
00:08:49,589 --> 00:08:51,740
by n minus 1.
82
00:08:51,740 --> 00:08:58,709
Now, in the previous class I mentioned about
population and sample. Population is something
83
00:08:58,709 --> 00:09:07,269
very, very big. You cannot even comprehend,
it is a very large data point set of data.
84
00:09:07,269 --> 00:09:15,110
It is like telling the height the average
height of an Indian is 5.5 feet. That means,
85
00:09:15,110 --> 00:09:22,709
it involves billions of Indians that is a
population, whereas if I take about 10 people
86
00:09:22,709 --> 00:09:28,621
or walking down the street and take their
average height than that is called a sample.
87
00:09:28,621 --> 00:09:35,370
That will be represented generally as x bar
and whereas when I look at the average height
88
00:09:35,370 --> 00:09:41,839
of an Indian I will call that as new. Similarly,
generally we say sigma as the population variance
89
00:09:41,839 --> 00:09:47,910
and s as the sample variance.
So analogous to a square, we also have sigma
90
00:09:47,910 --> 00:09:54,040
square, where we say sum of squares divided
by N as you might have noticed, instead of
91
00:09:54,040 --> 00:09:59,630
n minus 1 which we have it here. Here, we
are just having n because the n population
92
00:09:59,630 --> 00:10:06,110
the number of data points are huge that does
not make much difference whether we take n
93
00:10:06,110 --> 00:10:11,360
or n minus one. The population standard deviation
of course, is square root of this.
94
00:10:11,360 --> 00:10:16,629
Now, we can calculate all these mean, mode,
median standard deviation from Excel also
95
00:10:16,629 --> 00:10:24,019
right? There are some commands like average.
Suppose if I have a large set of data, I use
96
00:10:24,019 --> 00:10:28,769
this function to calculate average. If I have
a larger of data, I can use this function
97
00:10:28,769 --> 00:10:33,740
median to calculate the median, if I have
a large set of data, I can use this function
98
00:10:33,740 --> 00:10:40,860
called mode to calculate the mode or the central
tendency. If I have a large set of data I
99
00:10:40,860 --> 00:10:45,990
can use this command called standard deviation,
to calculate the standard deviation of the
100
00:10:45,990 --> 00:10:51,000
data set. For example, let us just look at
Excel.
101
00:10:51,000 --> 00:10:57,120
Suppose assume that I have some data points
and just giving randomly some data points.
102
00:10:57,120 --> 00:11:10,350
I need calculate the average. I put a v e
r a g e, average. I put all these points here
103
00:11:10,350 --> 00:11:15,570
and I get an average of this, so easy. If
you want to calculate standard deviation of
104
00:11:15,570 --> 00:11:25,230
this sample set I just write s d e v and then
I mark all these I get the standard deviation.
105
00:11:25,230 --> 00:11:40,259
Suppose I want to calculate the median, I
just say median. Median is nothing, but the
106
00:11:40,259 --> 00:11:49,310
midpoint right. So 13 is the median. As you
can see 12, 12, 13, 13, 14, 15 the middle
107
00:11:49,310 --> 00:12:02,269
point is median. Now mode will be is a central
tendency. We have mode as 12 here, we have
108
00:12:02,269 --> 00:12:07,431
average or the mean here, we have the median
here, we have the mode here, we have the standard
109
00:12:07,431 --> 00:12:14,829
deviation. These are quite simple commands,
which Excel also has it and we can calculate
110
00:12:14,829 --> 00:12:18,470
all these in Excel also. It is very simple.
111
00:12:18,470 --> 00:12:24,939
Now, let us look at Normal Distribution. It
is a most important in statistics, as I said
112
00:12:24,939 --> 00:12:31,519
we assume of many systems behaves in a normal
fashion but of course, there are some tests
113
00:12:31,519 --> 00:12:36,570
which we have to perform to find out whether
it is the data set follows a Normal Distribution.
114
00:12:36,570 --> 00:12:41,970
So if does not follow then we have to be very
careful to use some of these statistical analysis
115
00:12:41,970 --> 00:12:48,320
and statistical test we need to remember them.
Normal Distribution is very uniform, it is
116
00:12:48,320 --> 00:12:54,110
like a bell shaped, what the area under the
left hand side is exactly equal to the area
117
00:12:54,110 --> 00:13:00,510
under the right hand side the equation is
given like this, f x is equal to that is probability
118
00:13:00,510 --> 00:13:05,910
of this function x is equal to 1 by square
root 1 by sigma square root of 2 pi e per
119
00:13:05,910 --> 00:13:14,589
minus x minus mu whole square by 2 sigma square.
So mu is the mean or the average and sigma
120
00:13:14,589 --> 00:13:20,470
is the standard deviation.
Normal Distribution is symmetric. We know
121
00:13:20,470 --> 00:13:25,959
that in a Normal Distribution the mean, mode
and median will all be equal to mu. Especially
122
00:13:25,959 --> 00:13:35,029
in a Normal Distribution mean is equal to
mode is equal to median equal to mu.
123
00:13:35,029 --> 00:13:42,209
There is something called Standardized Normal
Distribution. The Normal Distribution we can
124
00:13:42,209 --> 00:13:49,569
convert it into Standardized Normal Distribution.
That is generally represented as z how do
125
00:13:49,569 --> 00:13:58,180
you convert that? We take z is equal to x
minus mu by sigma, mu is the mean of the population
126
00:13:58,180 --> 00:14:02,800
sigma is the standard deviation. When we do
that?
127
00:14:02,800 --> 00:14:10,209
What will happen the mean will become 0 that
means you are sort of transforming it and
128
00:14:10,209 --> 00:14:18,740
you are shifting it. So that the mean become
0 and the area under the curve becomes 1 and
129
00:14:18,740 --> 00:14:24,329
sigma becomes 1. So mean becomes 0 that means,
you have shifted your curve and then you have
130
00:14:24,329 --> 00:14:28,759
adjusted your curve. So that the standard
deviation is 1 and the area under the curve
131
00:14:28,759 --> 00:14:38,480
is exactly 1. This is very, very useful because
instead of handling problems where the averages
132
00:14:38,480 --> 00:14:46,110
and standard deviations are differing wide
apart. When we use the Standardized Normal
133
00:14:46,110 --> 00:14:52,069
Distribution, we will know that the mean is
0 and the area under the curve is always 1.
134
00:14:52,069 --> 00:14:57,930
So that is very useful to use. We can convert
most of the problems into Standardized Normal
135
00:14:57,930 --> 00:15:03,899
Distribution and there are tables which talk
about area under the curve for different values
136
00:15:03,899 --> 00:15:07,430
of x actually. We will do some problem and
then that.
137
00:15:07,430 --> 00:15:14,319
This is a Standardized Normal Distribution
where we have the mean as 0, area under the
138
00:15:14,319 --> 00:15:24,990
curve is 1 and sigma is 1. So we have mu plus
sigma mu minus sigma mu plus 2 sigma mu minus
139
00:15:24,990 --> 00:15:32,430
2 sigma mu plus 3 sigma minus 3 sigma. So
corresponding to that z if you see it will
140
00:15:32,430 --> 00:15:38,779
become 1 sigma will become 1, 2 sigma will
become plus 2, 3 sigma will become plus 3.
141
00:15:38,779 --> 00:15:44,319
So minus 1 sigma will become minus 1, minus
2, sigma will become minus 2, minus 3, sigma
142
00:15:44,319 --> 00:15:49,899
will become minus 3.
All you have to do is here is substitute there
143
00:15:49,899 --> 00:15:57,920
mu is equal to 0 x equal to minus a minus
3 sigma. So z will become minus 3.
144
00:15:57,920 --> 00:16:05,509
Now as I said this is area under this curve
it is equated to 1 in a Standardized Normal
145
00:16:05,509 --> 00:16:14,129
Distribution. If you have plus or minus 1
sigma this particular area is 68.3 percent
146
00:16:14,129 --> 00:16:24,310
of the total area that means, it will be 0.683
plus or minus 1 sigma is 0.683 plus minus,
147
00:16:24,310 --> 00:16:34,499
2 sigma, it is 95.4 percent. That means, approximately
0.95. This area spanning the plus or minus
148
00:16:34,499 --> 00:16:46,440
2 of z will be 0.954. Similarly plus or minus
3 sigma will span 99.7 percent of this area.
149
00:16:46,440 --> 00:16:54,660
So plus or minus 1 sigma will span 68.3 percent
of the area or it will have value of 0.683
150
00:16:54,660 --> 00:17:03,790
or plus or minus 2 sigma will have a value
of 95.4 or 0.954 area plus or minus 3 sigma
151
00:17:03,790 --> 00:17:10,100
will be 0.997 and so on we can have plus or
minus 4 sigma 5, 6 and so on actually because
152
00:17:10,100 --> 00:17:17,120
this is an exponentially decaying. As we go
along we will add little bit of the area,
153
00:17:17,120 --> 00:17:21,339
because the area as we go long becomes smaller
and smaller. But still it will try to span
154
00:17:21,339 --> 00:17:28,190
as much of the areas possible.
When you say plus or minus 1 sigma this area
155
00:17:28,190 --> 00:17:35,910
is 68.3. That means, the remaining area this
plus this is going to be approximately 32
156
00:17:35,910 --> 00:17:41,980
percent. Similarly plus or minus 2 sigma this
area is 95 percent say when you say it 95
157
00:17:41,980 --> 00:17:47,330
percent the remaining area is 5 percent. That
means, this side will be 2.5 percent and his
158
00:17:47,330 --> 00:17:52,670
side will be 2.5 percent assuming need to
be symmetric now similarly plus or minus 3
159
00:17:52,670 --> 00:17:59,340
sigma if we call this as 99 percent the remaining
outside will be totally 1 percent. That means,
160
00:17:59,340 --> 00:18:06,170
this side will 0.5 percent this side will
0.5 percent. So, plus minus, 2 sigma will
161
00:18:06,170 --> 00:18:15,670
be 0.95 area outside will be 0.5. This side
will be 0.25 other side will 0.25 similarly
162
00:18:15,670 --> 00:18:21,340
plus or minus 3 sigma will be approximately
99 percent. So outside area will be 1 percent.
163
00:18:21,340 --> 00:18:30,910
That means, this side is 0.5 percent, other
side 05 percent or 0.005 and 0.005. That is
164
00:18:30,910 --> 00:18:36,090
the advantage of converting data the set of
a Normal Distribution to Standardized Normal
165
00:18:36,090 --> 00:18:42,160
Distribution. So What you do is if I know
the mu and if I know sigma all I do is z is
166
00:18:42,160 --> 00:18:48,270
equal to x minus mu by sigma because you are
shifting the curve. So that the x become 0,
167
00:18:48,270 --> 00:18:55,610
sorry and the area under the curve, becomes
1 and sigma becomes 1. When I say plus or
168
00:18:55,610 --> 00:19:02,880
minus sigma z is equal to 1 minus 1. When
I say plus or minus 2 sigma z will be plus
169
00:19:02,880 --> 00:19:09,540
2 and minus 2. When I say plus or minus 3
sigma z will be plus 3 and minus 3 now these
170
00:19:09,540 --> 00:19:14,730
numbers are also very important when you say
plus or minus 1 sigma area is 68.3 percent
171
00:19:14,730 --> 00:19:21,340
plus or minus 2 sigma 95 percent plus or minus
3 sigma 99 percent. These numbers will become
172
00:19:21,340 --> 00:19:26,200
very important because later on we are going
to you will keep on looking at these 95 percent
173
00:19:26,200 --> 00:19:29,440
99 percent.
So when you say 95 percent you are talking
174
00:19:29,440 --> 00:19:34,490
in terms plus or minus 2 sigma. When you are
talking 99 percent, we are talking in terms
175
00:19:34,490 --> 00:19:41,150
of plus or minus 3 sigma. Generally, in statistics
most of these significant analysis is done
176
00:19:41,150 --> 00:19:48,140
around 95 percent. That is 2 plus or minus
2 sigma or 99 percent that means plus or minus
177
00:19:48,140 --> 00:19:56,560
3 sigma. We are looking at data spreading
around a average with plus or minus 2 sigma
178
00:19:56,560 --> 00:20:04,700
which is 95 percent or plus or minus 3 sigma
which is 99 percent. Many in the future we
179
00:20:04,700 --> 00:20:10,480
are going to use these 2 numbers 95 and 99
and now you understand what it mean? 95 percent
180
00:20:10,480 --> 00:20:17,250
means it is spanning a plus or minus 2 sigma
area 99 percent means plus or minus 3 sigma
181
00:20:17,250 --> 00:20:18,440
area.
182
00:20:18,440 --> 00:20:27,460
Now Z can also be calculated with Excel. There
is a command called NORMSDIST Z. NORMSDIST
183
00:20:27,460 --> 00:20:41,750
Z, and I said the area under the curve is
1. If I want to calculate, what is this area?
184
00:20:41,750 --> 00:20:50,930
And what is this area? At this place suppose
I give a value of Z here and I want to calculate
185
00:20:50,930 --> 00:20:57,940
this area and I want calculate this area.
I can use this particular command, 1 minus
186
00:20:57,940 --> 00:21:04,270
NORMSDIST Z NORMSDIST this particular area.
If we want to know what this area is I can
187
00:21:04,270 --> 00:21:11,430
just say 1 minus NORMSDIST. When I put Z is
equal to 0 in NORMSDIST it will give me us
188
00:21:11,430 --> 00:21:17,540
point 5 that is this area correct because
this total area is 1. We can say this area
189
00:21:17,540 --> 00:21:27,460
is 0.5. When Z is equal to 1 here, that means
here. So this area is equal to 0.841. When
190
00:21:27,460 --> 00:21:38,650
Z is equal to 2 this area is 0.77. The remaining
area will be 1 minus 0.977, that means 0.023
191
00:21:38,650 --> 00:21:47,460
and if we put here Z is equal to 3 it will
give me as 0.998. If you want to calculate
192
00:21:47,460 --> 00:22:03,640
remaining area I put 1 minus NORMSDIST. Let
me do it for you here, I just say NORMSDIST,
193
00:22:03,640 --> 00:22:15,050
oh sorry NORMSDIST it is, yeah that is 0.5.
194
00:22:15,050 --> 00:22:28,060
So when I put NORMSDIST is equal to 1 that
is 841 that means, what I am saying is when
195
00:22:28,060 --> 00:22:38,870
I put it here, this side of the area is 0.84
when I put it here Z is equal to 2 this area
196
00:22:38,870 --> 00:22:51,040
is 0.977 and 0.84. So when I put it as 2 then
97, the remaining whatever is 1 the right
197
00:22:51,040 --> 00:23:01,890
side if you want to calculate I put 1 minus
this that is equal to 0.2275. That is whatever
198
00:23:01,890 --> 00:23:11,040
on the right hand side is given by 0.02275.
Similarly, when I put Z is equal to 3, it
199
00:23:11,040 --> 00:23:16,360
gives me 0.998 as there is this area. If you
want to know what is this area on the right
200
00:23:16,360 --> 00:23:25,350
side I will say 1 minus this. Using Excel
also we can do and the command here is NORMSDIST
201
00:23:25,350 --> 00:23:31,840
and you can also use the graph pad also to
do the same thing actually you know. But the
202
00:23:31,840 --> 00:23:40,120
graph pad gives it to you in another form,
when you put z it gives you the area on both
203
00:23:40,120 --> 00:23:46,830
sides outside area actually.
Whereas Excel gives this area graph pad gives
204
00:23:46,830 --> 00:23:54,610
you the area outside, both sides that is called
two-tail, the two-tail. So when I give Z is
205
00:23:54,610 --> 00:24:03,840
equal to 1 it gives me this area. When I give
Z equal to 1, it gives me this area and so
206
00:24:03,840 --> 00:24:09,120
on actually, here it is giving these 2. Suppose
if we want to know only one side of it, I
207
00:24:09,120 --> 00:24:21,740
just divide by 2 to get the area on only one
side of it, understand?
208
00:24:21,740 --> 00:24:29,410
We can use the GraphPad also to calculate
as you can see it is tells you how to calculate
209
00:24:29,410 --> 00:24:49,530
different parameters here statistical to calculate.
We can say, here we have
210
00:24:49,530 --> 00:24:56,880
so we can give a number suppose give a number
as 0. It is giving here as p value that is
211
00:24:56,880 --> 00:25:11,860
whatever is outside. If I give a number as
Z is equal to 1 as giving as 0.3173 that is
212
00:25:11,860 --> 00:25:25,160
the outside and so on actually. Even I give
Z is equal to 2. So it is giving 0.0455. That
213
00:25:25,160 --> 00:25:37,520
means, it is giving this area 0.0455 is almost
that is this one it is giving it as 0.0455
214
00:25:37,520 --> 00:25:45,290
that is approximately 0.5 and so on actually.
Actually there is a mistake here, this should
215
00:25:45,290 --> 00:25:58,010
be 1 here. We can use either the NORMSDIST
command in Excel or we can use GraphPad to
216
00:25:58,010 --> 00:26:05,330
calculate Z and you can use numerically using
a calculator also from this formula is Z is
217
00:26:05,330 --> 00:26:12,080
equal to x minus mu by sigma.
This is very useful because when we convert
218
00:26:12,080 --> 00:26:17,790
any data into a Standardized Normal Distribution
we are shifting it so that the mean comes
219
00:26:17,790 --> 00:26:24,250
out to be 0 and the area under the curve comes
out to be 1. So we will lot of problems as
220
00:26:24,250 --> 00:26:34,840
we go along using this type of command. For
example, if I want to calculate 2.5 sigma.
221
00:26:34,840 --> 00:26:42,020
What is Z? It is very simple. All I do is
I will put a 2.5 sigma mu will become 0. So
222
00:26:42,020 --> 00:26:50,600
Z will become 2.5. Now for Z is equal to 2.5
what s the area? I can use one of these, I
223
00:26:50,600 --> 00:26:58,900
can use NORMSDIST to calculate this area and
then subtract from 1 to get this area. So
224
00:26:58,900 --> 00:27:19,049
how do we do that? I will put go to Excel
I will do 1 minus NORMSDIST will give 0.00621.
225
00:27:19,049 --> 00:27:25,640
That is this area is 0.00621 whereas, if you
want the whole area that is what NORMSDIST
226
00:27:25,640 --> 00:27:37,360
give actually. That is the advantage of converting
a Normal Distribution into Standardized Normal
227
00:27:37,360 --> 00:27:44,690
Distribution and we are going to do many problems
using this particular command. If we look
228
00:27:44,690 --> 00:27:49,900
at this table this is called a single tail
Z table.
229
00:27:49,900 --> 00:27:55,990
It is single tail because we are looking at
only this. When Z is equal to 0 that means
230
00:27:55,990 --> 00:28:06,840
if it is here, this area is 0.5 that what
this gives. When Z is equal to 0 here, when
231
00:28:06,840 --> 00:28:18,670
Z is equal to 1 here this area will be 0.1587.
If we are looking at 2 tails but that means,
232
00:28:18,670 --> 00:28:25,560
if we are looking at both the sides all you
have to do is multiply 0.1587 with 2. So you
233
00:28:25,560 --> 00:28:35,090
will get you will get the about 0.316 correct.
In fact, that is what this is 0.316 that is
234
00:28:35,090 --> 00:28:43,090
both the sides area are double tailed. Whereas
this table gives you the single tail here
235
00:28:43,090 --> 00:28:49,440
and similarly if you are looking at for Z
is equal to 2 what is the area on this side?
236
00:28:49,440 --> 00:28:59,490
It will be going down 0.0228. If we want two-tail
then, I multiply 0.0228 with 2 that comes
237
00:28:59,490 --> 00:29:10,240
around 0.0456 and that is what we have here
the graph gives 0.0456 because graph pad gives
238
00:29:10,240 --> 00:29:16,960
you on both the sides it is called the two-tail.
I am introducing one more terminology that
239
00:29:16,960 --> 00:29:22,940
is called single tail and two-tail 1 side
of it single tail. If you are looking at a
240
00:29:22,940 --> 00:29:27,890
situation where you have both sides of it
that is called two-tail, we will be using
241
00:29:27,890 --> 00:29:33,930
this terminology quite often. So GraphPad
gives you area for both the sides. If you
242
00:29:33,930 --> 00:29:38,680
want to calculate only 1 side I will divide
by 2 or if I use this table this table gives
243
00:29:38,680 --> 00:29:43,460
you area outside on only 1 side. So, if you
want to calculate both sides then I multiply
244
00:29:43,460 --> 00:29:56,170
by 2. For 2, I am getting 0.228, for 3 it
x 0.0013. If I want a 2 tail, it will become
245
00:29:56,170 --> 00:30:03,310
0.0026. That is what graph pad gives 0.0026.
We can use different approaches. We can use
246
00:30:03,310 --> 00:30:10,730
this table, we can use the NORMDIST which
gives in a different way and you need to convert
247
00:30:10,730 --> 00:30:16,790
1 minus and then if you want 2 tail you multiply
by 2 or we can use this graph pad calculator
248
00:30:16,790 --> 00:30:21,470
which anyway straight away gives both the
sides. So, many different approaches by which
249
00:30:21,470 --> 00:30:30,580
one could calculate the area under the curve
either internally area or the 1 minus area
250
00:30:30,580 --> 00:30:36,790
for a given Z. This is called a Standardized
Normal Distribution. In the next class we
251
00:30:36,790 --> 00:30:41,610
will look at some problems related to this
Standardized Normal Distribution and how useful
252
00:30:41,610 --> 00:30:44,320
it is you will see when you start doing problems
in this case.
253
00:30:44,320 --> 00:30:45,420
Thank you very much for your time.