1
00:00:12,690 --> 00:00:19,000
Hello again, welcome to the course on Biostatistics
and Design of Experiments. In this class,
2
00:00:19,000 --> 00:00:29,509
we are going to briefly touch about various
statistical test, that one should understand
3
00:00:29,509 --> 00:00:34,051
and how to go about using them. So, there
are different types of statistical test, some
4
00:00:34,051 --> 00:00:41,660
are used for comparing mean of a data and
some are used for comparing variance of data
5
00:00:41,660 --> 00:00:46,199
and some are used for comparing ratios of
data and so on. So, we will talk about that.
6
00:00:46,199 --> 00:00:51,789
Before going into that, let me recall again
how you go about doing this hypothesis, you
7
00:00:51,789 --> 00:00:56,309
have something called the null hypothesis,
which is no difference or state as score.
8
00:00:56,309 --> 00:01:05,880
So, imagine I am comparing the IQ's of 2 different
classes or class of students, then I would
9
00:01:05,880 --> 00:01:11,930
say the null hypothesis will be there is no
difference in the IQ of students in class
10
00:01:11,930 --> 00:01:18,750
a and class b. Then we also have the alternate
hypothesis, if I am going to say yes there
11
00:01:18,750 --> 00:01:26,460
is possible difference then I would say the
alternate hypothesis will be the, IQ average
12
00:01:26,460 --> 00:01:33,570
of class a is different from IQ average of
b. Then you can also have another situation
13
00:01:33,570 --> 00:01:40,010
where instead of just saying different, the
IQ average of class a could be better than
14
00:01:40,010 --> 00:01:46,970
the IQ average of class b. That is we are
comparing only the better part of it or the
15
00:01:46,970 --> 00:01:53,020
IQ average of a class a is worst than that
of that class b, then we are comparing only
16
00:01:53,020 --> 00:02:00,831
the worst part of it. So we have those 3 situations,
no difference is the null hypothesis, there
17
00:02:00,831 --> 00:02:06,800
is a difference is the alternate hypothesis
and that is called a two-tailed comparison
18
00:02:06,800 --> 00:02:11,750
that is a different, it could be greater or
worst. Then we are comparing only 1 side of
19
00:02:11,750 --> 00:02:16,690
it greater side of it or worst side of it,
then that is called one-tailed test.
20
00:02:16,690 --> 00:02:23,381
So we decide on the hypothesis with the tail
and then we also decide on the p value, that
21
00:02:23,381 --> 00:02:32,360
is am I going to test against 95 % confidence
or am I going to test against 99 % confidence.
22
00:02:32,360 --> 00:02:44,350
So p of 0.05 indicates 95 % and p of 0.01
indicates 99 %. Once I decide on all these,
23
00:02:44,350 --> 00:02:49,070
that means I decide on the type of hypothesis,
I decide on whether it is a single tailed
24
00:02:49,070 --> 00:02:56,920
or a two-tailed test, then I decide on the
p I want to look at, then I calculate using
25
00:02:56,920 --> 00:03:06,090
different test something called t, and then
from there I will calculate may be the p value
26
00:03:06,090 --> 00:03:12,160
and then I will say whether the p value which
I have calculated is less than 0.05 or it
27
00:03:12,160 --> 00:03:21,840
is more. So if it is less than 0.05 obviously,
there is a difference so obviously, I cannot
28
00:03:21,840 --> 00:03:25,980
accept null hypothesis so I have to reject
the null hypothesis that means, I have to
29
00:03:25,980 --> 00:03:32,050
accept the alternate hypothesis. Now if the
p value is greater than 0.05 for a 95 % confidence
30
00:03:32,050 --> 00:03:37,030
obviously, there is no reason for me to reject
the null hypothesis.
31
00:03:37,030 --> 00:03:47,541
So, imagine I am testing 2 drugs in the market
which affects this sleeping pattern. So, the
32
00:03:47,541 --> 00:03:53,210
null hypothesis could be, there is no difference
between the 2 drugs, the alternate there is
33
00:03:53,210 --> 00:03:59,960
a difference at a 95 % confidence interval.
Now what type of hypothesis equations we will
34
00:03:59,960 --> 00:04:08,670
put first, the two-tailed under a single tailed.
For a two-tailed your alternate will be mu
35
00:04:08,670 --> 00:04:18,130
a is not equal to mu b, your null hypothesis
will be mu a equal to mu b, that is the two-tail.
36
00:04:18,130 --> 00:04:23,940
For single tailed test null hypothesis could
be mu a is equal to mu b, alternate hypothesis
37
00:04:23,940 --> 00:04:30,030
could be mu a is greater than mu b, that means
drug a has an increased effect than drug b
38
00:04:30,030 --> 00:04:37,570
or mu a could be less than mu b, that mean
drug a has decreased effect than drug b that
39
00:04:37,570 --> 00:04:43,040
is a lower tailed and upper tailed. Now there
is something called error, there are 2 types
40
00:04:43,040 --> 00:04:44,040
of error.
41
00:04:44,040 --> 00:04:49,320
There is something called type 1 error and
there is something called type 2 error. What
42
00:04:49,320 --> 00:04:56,449
is type 1 error? We are rejecting null hypothesis
h naught, when h naught is true that means,
43
00:04:56,449 --> 00:05:03,220
we are rejecting the null hypothesis where
as in reality the null hypothesis is true
44
00:05:03,220 --> 00:05:08,530
that is called type 1 error. Then you have
the type 2 error, we fail to reject the null
45
00:05:08,530 --> 00:05:14,990
hypothesis when the alternate hypothesis is
true. Whereas, instead of accepting alternate
46
00:05:14,990 --> 00:05:22,130
hypothesis we do not accept alternate, we
accept null hypothesis. We have 2 situations,
47
00:05:22,130 --> 00:05:31,669
one is called the do not reject null hypothesis,
null hypothesis is true null alternate hypothesis
48
00:05:31,669 --> 00:05:37,910
is true. So here this is the correct, null
hypothesis is true you do not reject null
49
00:05:37,910 --> 00:05:45,229
hypothesis, similarly alternate hypothesis
need to accept reject null hypothesis so this
50
00:05:45,229 --> 00:05:54,660
is the correct situation, whereas in some
times although we need to, null hypothesis
51
00:05:54,660 --> 00:06:01,979
is true we end up rejecting the null hypothesis
that is called the type 1 error or alpha error.
52
00:06:01,979 --> 00:06:09,419
Whereas in the other situation we do not reject
the null hypothesis whereas, we have H 1 is
53
00:06:09,419 --> 00:06:17,040
true that is called the type 2 error and that
is called the beta error.
54
00:06:17,040 --> 00:06:25,490
So type 1 error, is generally the probability
95 % or 99 % or 90 % which we make use of
55
00:06:25,490 --> 00:06:36,070
in our statistical calculation. That is where
if you take a larger alpha then obviously
56
00:06:36,070 --> 00:06:45,400
you are very sure that you will not reject
the null hypothesis, where when H naught is
57
00:06:45,400 --> 00:06:52,280
true but then your significance level also
gets affected by that. So this table is very
58
00:06:52,280 --> 00:07:04,509
important when H naught is true, we reject
the H naught that is called the type 1 error.
59
00:07:04,509 --> 00:07:12,330
When H 1 is true, we fail to reject the null
hypothesis that is called the type 2 error
60
00:07:12,330 --> 00:07:17,090
or beta error. So type 1 error is called the
alpha error and type 2 is called the beta
61
00:07:17,090 --> 00:07:23,139
error. So we need to always, sort of balance
between the alpha and the beta error and generally
62
00:07:23,139 --> 00:07:32,610
we give more importance to the alpha error
actually.
63
00:07:32,610 --> 00:07:39,410
As you know we have a continuous data and
alternatively we have the discrete data. Discrete
64
00:07:39,410 --> 00:07:47,370
data we use equations like Binomial or we
use Poisson and so on. So when we have the
65
00:07:47,370 --> 00:07:56,260
continuous data there are many situations,
we can have one sample that means I know the
66
00:07:56,260 --> 00:08:01,380
population details, I take a small sample
and I am comparing with the population that
67
00:08:01,380 --> 00:08:09,890
is called a one sample. For example, I know
the 12th standard average from a particular
68
00:08:09,890 --> 00:08:19,680
school is 95 %, so I take 10 students in that
school and calculate their 12th standard average.
69
00:08:19,680 --> 00:08:30,370
I may get some x bar, now I want to know whether
this x bar is related to the 95 % school average,
70
00:08:30,370 --> 00:08:36,339
which is more like a population or is it very
far away. So here we are collecting only 1
71
00:08:36,339 --> 00:08:41,610
set of samples and then comparing with the
population that is called a one sample test
72
00:08:41,610 --> 00:08:46,520
or one sample t-test. Here we are comparing
the mean of the sample with the population
73
00:08:46,520 --> 00:08:53,650
mean. Now what is Two samples, suppose I am
comparing performance of drug A and performance
74
00:08:53,650 --> 00:08:59,880
of drug B and trying to tell there is no statistically
significant difference between their performance
75
00:08:59,880 --> 00:09:04,980
or there is significance difference that means,
I am comparing the means of 2 samples that
76
00:09:04,980 --> 00:09:10,959
is why it is called the two sample t-test.
Now again you can have Multiple samples, I
77
00:09:10,959 --> 00:09:20,250
may be comparing drug A, B, C, D so, I could
be having lots of different means that is
78
00:09:20,250 --> 00:09:26,191
called a Multiple samples actually.
So there are many, many ways by which one
79
00:09:26,191 --> 00:09:32,390
could analyze these data one is called the
Study Stable or Run Charts, then we can look
80
00:09:32,390 --> 00:09:38,640
at the Shape we can create a histogram and
see whether it looks normally distributor,
81
00:09:38,640 --> 00:09:43,520
then we can look at the Data whether it is
and so on actually. And then, there is something
82
00:09:43,520 --> 00:09:50,910
called Chi Squared Test and then there is
something called t test, t-tests are generally
83
00:09:50,910 --> 00:09:58,790
meant for comparing means, chi square test
are generally meant for comparing ratios.
84
00:09:58,790 --> 00:10:03,459
And there is something called F test, which
is generally meant for comparing variances
85
00:10:03,459 --> 00:10:08,411
or spreads actually.
In the t test we have 3 types of t test, one
86
00:10:08,411 --> 00:10:14,029
sample t-test that means, I take only 1 sample
and then compare it with the population or
87
00:10:14,029 --> 00:10:19,870
I can have a two sample t-test, where I am
comparing 2 sets of samples or I may be comparing
88
00:10:19,870 --> 00:10:31,070
paired t-test that means, there is a relationship
between the sample items with a and sample
89
00:10:31,070 --> 00:10:39,880
items in b. So when I am comparing variances
there is something called F test or if I am
90
00:10:39,880 --> 00:10:45,550
having multiple samples and I am comparing
variances then this is something called ANOVA
91
00:10:45,550 --> 00:10:52,690
analysis of variance test and if I am comparing
the spread there is something called homogeneity
92
00:10:52,690 --> 00:10:57,130
of variance. There are different types of
test that are possible and we are going to
93
00:10:57,130 --> 00:11:04,110
spend lot of time on each one of them actually.
So we can have only one sample collected,
94
00:11:04,110 --> 00:11:08,231
comparing it with the population or we could
be having two samples collected, comparing
95
00:11:08,231 --> 00:11:14,040
with the population or we could have multiple
samples collected, comparing with the population.
96
00:11:14,040 --> 00:11:20,350
Then if I am comparing means then there is
something called t test, the one sample t-test,
97
00:11:20,350 --> 00:11:25,260
the two sample t-test, paired t-test. If I
am comparing variances there is something
98
00:11:25,260 --> 00:11:31,610
called f test, if I am comparing a variances
of a large number of data sets or samples
99
00:11:31,610 --> 00:11:36,050
this analysis of variance. If I am looking
at spread of the data I can use something
100
00:11:36,050 --> 00:11:41,360
called Homogeneity of variance and so on.
So large number of tests are possible we will
101
00:11:41,360 --> 00:11:47,459
talk about each one of them in detail and
we are going to spend with some examples also,
102
00:11:47,459 --> 00:11:50,570
so do not worry about it.
Refer Slide Time: 11:50)
103
00:11:50,570 --> 00:11:54,040
.
So if you are comparing means, there is something
104
00:11:54,040 --> 00:12:03,550
called t test. We can get average of a sample
one and then I get a mean of sample two and
105
00:12:03,550 --> 00:12:09,100
then I am going to find out whether both the
means come from the same population or each
106
00:12:09,100 --> 00:12:15,399
of the mean come from 2 different population
that is called a t-test, t-tests are quite
107
00:12:15,399 --> 00:12:23,960
robust even for non normal data. Generally
we can say the standard deviations have to
108
00:12:23,960 --> 00:12:28,950
be similar, but there can be some difference
in the standard deviation also but still,
109
00:12:28,950 --> 00:12:34,040
t test is good. In t-test we have 3 types
1 sample t-test, 2 sample t-test and paired
110
00:12:34,040 --> 00:12:37,950
t test.
So one sample t-test you are taking a sample
111
00:12:37,950 --> 00:12:44,310
out and then you are getting the mean and
the variance of that sample and you are comparing
112
00:12:44,310 --> 00:12:51,940
it with the mean of the population like I
gave you some examples actually like, I take
113
00:12:51,940 --> 00:13:00,970
10 students from a class and then get their
mean average, class average, remarks average
114
00:13:00,970 --> 00:13:06,750
then I compare it with the school average
and try to tell whether these averages are
115
00:13:06,750 --> 00:13:13,709
far away from the school average or they fall
into the same population, I can do that sort
116
00:13:13,709 --> 00:13:23,540
of study. I can collect IQ of 10 students
in a university and then try to say whether
117
00:13:23,540 --> 00:13:33,329
the mean IQ falls within the university average
IQ or it falls outside that, so that is one
118
00:13:33,329 --> 00:13:39,420
sample t-test here we are taking only 1 sample.
Two sample t-test if I am going to have 2
119
00:13:39,420 --> 00:13:45,480
sets of samples, I am taking 10 students from
1 university 10 students from another university
120
00:13:45,480 --> 00:13:51,770
and getting their getting their IQ's and then
comparing their IQ's and trying to say whether
121
00:13:51,770 --> 00:13:57,310
the IQ's are statistically different or there
is no statistically significant difference
122
00:13:57,310 --> 00:14:02,389
between these 2 IQ's. So that is called two
sample t-test because I am using 2 sets of
123
00:14:02,389 --> 00:14:07,910
samples.
Paired t-test, if you are pairing 2 sets of
124
00:14:07,910 --> 00:14:18,260
data then difference in result should be 0
for example, I take 10 cats and I test a drug
125
00:14:18,260 --> 00:14:26,519
A on the 10 cats and look at their outcome
then on the same 10 cats I give drug B and
126
00:14:26,519 --> 00:14:30,790
look at the outcome. So the different should
be 0, if there is no difference between in
127
00:14:30,790 --> 00:14:39,240
the drug A and drug B. If there is a statistically
significant difference away from 0 then I
128
00:14:39,240 --> 00:14:46,550
can say yes, drug A is different from drug
B because, I have used the same volunteer
129
00:14:46,550 --> 00:14:54,420
cats and I am testing drug a seeing some performance
change, then I am testing drug B seeing some
130
00:14:54,420 --> 00:15:00,490
performance change, if drug A and drug B have
to be same then the performance change we
131
00:15:00,490 --> 00:15:04,780
observed, the difference in performance change
we observed should be equal to 0 that is called
132
00:15:04,780 --> 00:15:09,829
the paired t-test. So we are going to look
at examples of the each one of them. So you
133
00:15:09,829 --> 00:15:15,380
do not worry about that.
So, interestingly all these tests are looking
134
00:15:15,380 --> 00:15:23,380
at means that means averages. Average or mean
of the samples which you are taking it out,
135
00:15:23,380 --> 00:15:28,550
whether it is one sample t-test, 2 sample.
Now you may ask the question suppose instead
136
00:15:28,550 --> 00:15:35,200
of 2 samples if I have many more samples what
will I do of course I can do, take 2 sets
137
00:15:35,200 --> 00:15:41,670
of samples 1 at a time and do a two sample
t-test, but there is another approach which
138
00:15:41,670 --> 00:15:48,089
is much faster that is called ANOVA analysis
of variance, it is called the 1 way ANOVA
139
00:15:48,089 --> 00:15:50,899
we will talk about that later in the course.
140
00:15:50,899 --> 00:16:01,699
Now, if you are comparing variances that means,
you are comparing these standard error. So
141
00:16:01,699 --> 00:16:07,191
here we are comparing variances, this is generally
valid both for normal and non normal. So the
142
00:16:07,191 --> 00:16:13,290
H naught will be sigma 1 square is equal to
sigma 2 square and so on, we cannot reject
143
00:16:13,290 --> 00:16:18,550
h naught when p is greater than 0.05. The
alternate could be sigma a square is different
144
00:16:18,550 --> 00:16:28,130
from sigma b square, p is less than 0.05 we
have to reject H naught and accept H a. So
145
00:16:28,130 --> 00:16:35,811
you see the tests are there, for Comparing
Variances. So in the t-test we are comparing
146
00:16:35,811 --> 00:16:39,329
means, here we are comparing variances.
147
00:16:39,329 --> 00:16:47,870
One test that is there is called F test, that
means I am comparing the variation from here
148
00:16:47,870 --> 00:16:56,089
sample 1 and a sample 2. So, the H naught
could be sigma 1 square is equal to sigma
149
00:16:56,089 --> 00:17:00,649
2 square that means variances are same or
the alternate could be sigma 1 square is different
150
00:17:00,649 --> 00:17:06,529
from sigma 2 square. So we calculate the f
ratio which is given by s 1 square by s 2
151
00:17:06,529 --> 00:17:14,819
square, s 1 and the s 2 are the sample standard
deviation, so s 1 square is the variance.
152
00:17:14,819 --> 00:17:23,549
There is a table called F table, for a 95
% or 99 % you will get F value and if the
153
00:17:23,549 --> 00:17:30,120
table F value is greater than the F you calculate,
then you accept H naught and if the table
154
00:17:30,120 --> 00:17:38,529
F value is less than the F you calculate,
you reject H naught and accept H a. For different
155
00:17:38,529 --> 00:17:44,029
degrees of freedom, the degrees of freedom
for data set 1 is n 1 minus 1, if you have
156
00:17:44,029 --> 00:17:50,419
used n data sets and degrees of freedom for
data set 2 is n 2 minus 1, if you have collected
157
00:17:50,419 --> 00:17:53,019
n 2 samples that is called the F test.
158
00:17:53,019 --> 00:17:58,279
And then you also have ANOVA, when you are
comparing a large number of data sets in the
159
00:17:58,279 --> 00:18:03,409
previous F test you have only 2 data sets,
1 and 2. So your saying sigma 1 square is
160
00:18:03,409 --> 00:18:09,680
equal to sigma 2 square, alternative sigma
1 square is not equal to sigma 2 square. So
161
00:18:09,680 --> 00:18:16,149
ANOVA is very, very powerful because we can
collect a large number of data sets I am comparing
162
00:18:16,149 --> 00:18:22,549
the IQ's of university A, university B, university
C, university D and trying to find whether
163
00:18:22,549 --> 00:18:27,649
there is a statistical significant difference
or not. I am comparing drug A, B, C, D in
164
00:18:27,649 --> 00:18:34,570
clinical trials, I want to perform analysis
to find out whether there is a statistically
165
00:18:34,570 --> 00:18:41,529
significant difference, then I use ANOVA here.
The variances of the samples are approximately
166
00:18:41,529 --> 00:18:46,129
equal, the response within any given level
are normally distributed these are the assumptions.
167
00:18:46,129 --> 00:18:55,950
So H naught will be all these are same, whereas
H a is at least 1 variance is different then
168
00:18:55,950 --> 00:19:01,270
you get a p value less than 0.05 ok.
169
00:19:01,270 --> 00:19:08,570
We have the t test, different types of t tests
which are very powerful for comparing mean,
170
00:19:08,570 --> 00:19:14,759
then we have the test for comparing variances
like F test and ANOVA and then for ratios
171
00:19:14,759 --> 00:19:20,879
we have something called chi squared test
which we will talk about later. So if there
172
00:19:20,879 --> 00:19:26,429
is something called the power of the test
or if H 1 is true, so that the distribution
173
00:19:26,429 --> 00:19:34,150
of X is specified by H 1 then the probability
of rejecting the H0 is the power of the test
174
00:19:34,150 --> 00:19:45,299
for distribution. H 1 is true, so you have
2 situations you do not reject H0 that is
175
00:19:45,299 --> 00:19:54,059
beta error where as you reject H0. If H1 is
true and you reject H0 that is called the
176
00:19:54,059 --> 00:20:07,159
power of the test for that distribution. So
you have 2 types of error, alpha error where
177
00:20:07,159 --> 00:20:17,809
you should not reject H0 but you end up rejecting
H0. Whereas the beta error you have to reject
178
00:20:17,809 --> 00:20:28,590
H0 but you do not reject H0 and that is also
called the power of the test, the beta error
179
00:20:28,590 --> 00:20:34,330
is also called the power of the test. So these
2 terms are very, very important when you
180
00:20:34,330 --> 00:20:39,019
are deciding on the alpha error and the beta
error.
181
00:20:39,019 --> 00:20:47,229
Let us get into problems, the first problem
is called the 1 sample t-test one-sided. The
182
00:20:47,229 --> 00:20:53,239
average size of barnacle shells is 25 mm,
you know what is barnacle right? It is a marine
183
00:20:53,239 --> 00:21:02,749
organism, it is got a shell, it gets attached
using glue to hard surfaces and then it feeds
184
00:21:02,749 --> 00:21:10,380
on it. So it is got a shell and average size
is 25 mm. Now we have collected 10 barnacles
185
00:21:10,380 --> 00:21:16,960
in South India and we got their sizes, these
are the sizes. Now are the South Indian barnacles
186
00:21:16,960 --> 00:21:22,909
of smaller size? That is the question or they
are of greater size? So we take 95 % confidence.
187
00:21:22,909 --> 00:21:32,110
So this is the South Indian barnacles, the
average comes out to be 24.4 but the population
188
00:21:32,110 --> 00:21:40,979
average is 25 mm, this is the statement. Now
I want to know whether the this 24.4 is statistically
189
00:21:40,979 --> 00:21:48,850
significantly smaller or it comes from the
same population at a p value of 0.05 or a
190
00:21:48,850 --> 00:21:52,499
95 % confidence. So how do we that?
191
00:21:52,499 --> 00:22:01,749
Simple, H0 is equal to mu equal to mu naught,
that is mu naught is your original mean of
192
00:22:01,749 --> 00:22:08,090
the population, now they are same but alternate
is mu is less than mu naught that means, you
193
00:22:08,090 --> 00:22:14,679
want to know whether the average size of the
barnacle shell from South India is less than
194
00:22:14,679 --> 00:22:21,419
this 25. So what do you do, you calculate
t, if you remember this equation we had this
195
00:22:21,419 --> 00:22:32,960
mu is equal to x bar plus or minus t into
s by square root of n. We rearrange that to
196
00:22:32,960 --> 00:22:39,899
get this t value, x bar minus mu naught divided
by s square root of n. It is called degrees
197
00:22:39,899 --> 00:22:47,859
of freedom n minus 1, if t which we calculate
is less than the table t. So there is a table
198
00:22:47,859 --> 00:22:54,870
t for different degrees of freedom accept
H naught, if t is greater than t then the
199
00:22:54,870 --> 00:23:01,639
table t then reject H naught this is called
the, of course we are using 1 tail test.
200
00:23:01,639 --> 00:23:09,399
So, what do we do? We get the average which
is 24.4; we get the standard deviation which
201
00:23:09,399 --> 00:23:21,309
is 2.59. So the t calculated we can use here
24.4 minus 25 divided by s divided by square
202
00:23:21,309 --> 00:23:28,009
root of n. So we get the t calculated and
minus 0.73 for 9 degrees of freedom t table
203
00:23:28,009 --> 00:23:29,940
there is a t table here.
204
00:23:29,940 --> 00:23:34,929
I want you to look here for 95 % this is a
two-tail test and the top 1 is the single
205
00:23:34,929 --> 00:23:39,390
tail test, in your problem we are talking
about single tail test because we want to
206
00:23:39,390 --> 00:23:44,729
know whether this South Indian barnacles are
of smaller size. So for 9 degrees of freedom
207
00:23:44,729 --> 00:23:51,909
go like this, you go like this and read out
0.05, here you get for 9 degrees of freedom
208
00:23:51,909 --> 00:24:02,219
1.833. So t table is 1.833, the t calculated
is minus 0.737. So t calculated is less than
209
00:24:02,219 --> 00:24:08,500
the t table so there is no reason for you
to reject h naught at this condition. So what
210
00:24:08,500 --> 00:24:16,799
you can say is the South Indian barnacle,
there is no statistical reason for saying,
211
00:24:16,799 --> 00:24:20,860
the South Indian barnacles are of smaller
size. So South Indian barnacles come from
212
00:24:20,860 --> 00:24:29,869
the same population of 25 mm. In order to
get your confidence limit on the mean which
213
00:24:29,869 --> 00:24:35,509
we calculated from the sample, as you know
this equation mu is equal to x bar plus or
214
00:24:35,509 --> 00:24:41,799
minus t df s by square root of n the s by
square root of n is called the standard error,
215
00:24:41,799 --> 00:24:48,529
right? I talked about this long time back.
Now you need to know that degree t value,
216
00:24:48,529 --> 00:24:56,250
now for these you have 10 data sets obviously
the degrees of freedom is 9 you get t as 2.26
217
00:24:56,250 --> 00:25:03,719
and the mean of this sample is 24.4. So 24.4
plus or minus 2.26 the standard deviation
218
00:25:03,719 --> 00:25:13,759
is 2.59 divided by square root of 10, which
is equal to 24.4 plus or minus 1.85. So the
219
00:25:13,759 --> 00:25:23,830
confidence limit 95 % confidence limit for
the this mean is 22.24 to 26.25. One important
220
00:25:23,830 --> 00:25:29,830
point you need to remember is the t which
you calculated will be for two-tailed both
221
00:25:29,830 --> 00:25:35,549
the sides because we are talking about plus
or minus that is why we get 2.26. 4 mm as
222
00:25:35,549 --> 00:25:44,210
the mean of the sample with the standard deviation
of 2.59 and you want to know whether the mu
223
00:25:44,210 --> 00:25:51,979
which you calculate is equal to mu 0 or the
alternate mu is less than mu 0, so for mu
224
00:25:51,979 --> 00:26:00,019
is equal to mu 0. So what do you do, we calculate
the t you know t from this equation if you
225
00:26:00,019 --> 00:26:12,820
adjust it is x bar minus mu divided by s square
root of n. So you get minus 0.7377 for 9 degrees
226
00:26:12,820 --> 00:26:20,470
of freedom we have from the table, I showed
you t table that top one is for single tailed,
227
00:26:20,470 --> 00:26:26,909
the bottom one is for two-tailed so for 5
and p of 0.05, 9 degrees of freedom you get
228
00:26:26,909 --> 00:26:32,909
1.833. So obviously, the t which you have
calculated is much less so there is no reason
229
00:26:32,909 --> 00:26:41,659
for you to reject the null hypothesis.
For 95 % confidence we say 24.4 plus or minus
230
00:26:41,659 --> 00:26:54,450
1.96 s is equal to given here 2.59 divided
by square root of n. So you get the confidence
231
00:26:54,450 --> 00:27:00,609
limit for the mean a 22.82 to 26 so obviously
your 25 falls within that that is why you
232
00:27:00,609 --> 00:27:06,330
are not able to reject the null hypothesis
at this condition. So this table is very important
233
00:27:06,330 --> 00:27:14,039
as we can see this table gives you for different
degrees of freedom the top 1 for single tail,
234
00:27:14,039 --> 00:27:20,840
the bottom 1 is for two-tail, So if I am interested
in a single tail I will use 95 % this column,
235
00:27:20,840 --> 00:27:25,409
if I am interested in two-tail 95 % I use
this column. As you go down, down, down as
236
00:27:25,409 --> 00:27:33,419
you can see for infinite degrees of freedom
we get 1.96. So, as I said two-tailed 0.05
237
00:27:33,419 --> 00:27:41,270
means a single tailed 0.025 because when you
say 95 % two-tail the tails are divided half
238
00:27:41,270 --> 00:27:46,739
on both the sides that is why you get 0.05
by 2 which is 0.025 here. Do you understand?
239
00:27:46,739 --> 00:27:51,719
Do you understand the logic of this particular
table, this table is very important when you
240
00:27:51,719 --> 00:28:00,129
are calculating t test, when you are calculating
t based on means, whether it is 1 sample t-test
241
00:28:00,129 --> 00:28:04,279
the example which we saw or later on we are
going to look at 2 sample t-test, paired t
242
00:28:04,279 --> 00:28:14,359
tests and so on. So this table is very important.
So that you calculate t from the equation
243
00:28:14,359 --> 00:28:21,739
and then you compare with the t in the table
for a different degrees of freedom, and then
244
00:28:21,739 --> 00:28:28,820
you say whether the t calculated is greater
than t table, if it is greater than you have
245
00:28:28,820 --> 00:28:34,059
you can reject the null hypothesis, but if
it is less we calculated less than the t table
246
00:28:34,059 --> 00:28:38,029
we cannot reject the null hypothesis.
247
00:28:38,029 --> 00:28:46,869
Now we can also use the particular software
which I mentioned about the GraphPad software,
248
00:28:46,869 --> 00:28:55,880
which can also calculate the t value, given
the probability value you can see for a 1
249
00:28:55,880 --> 00:29:02,690
tail it is 1.833. So you compare with the
t which you calculated, which is minus 0.737.
250
00:29:02,690 --> 00:29:12,159
So there is no reason for you to reject the
null hypothesis, but if I take t value the
251
00:29:12,159 --> 00:29:18,730
minus 0.737. And then the same software can
be used to calculate the p value, the p comes
252
00:29:18,730 --> 00:29:27,659
out to be 0.4799. So obviously, you can say
it is not statistically significant at all.
253
00:29:27,659 --> 00:29:32,630
It should have been 0.05 or less than only
we can call it a statistically significant
254
00:29:32,630 --> 00:29:38,029
difference. So it is very useful for calculating
this.
255
00:29:38,029 --> 00:29:48,229
Or we can also use 1 sample t-test results,
again the GraphPad software can do this and
256
00:29:48,229 --> 00:29:53,850
again the results as you can see it gives
you a p value of this the difference is considered
257
00:29:53,850 --> 00:29:59,889
to be not statistically significant. The actual
mean of the sample is 24.4, the hypothetical
258
00:29:59,889 --> 00:30:09,520
mean you want to reach is 25.0 and so on actually.
So we will see how to do this, it is quite
259
00:30:09,520 --> 00:30:19,789
simple. We have this data set; I will show
you how to do this.
260
00:30:19,789 --> 00:30:36,750
So here we have the continuous data here we
need to use this particular thing, here you
261
00:30:36,750 --> 00:30:46,190
have the one sample t-test as you can see
here one sample t-test you can say continue.
262
00:30:46,190 --> 00:30:53,039
So we can even enter data like this or we
can copy paste like this. So we can enter
263
00:30:53,039 --> 00:31:03,830
the data like that also. So I am comparing
it with respect to 25, 25 is my population.
264
00:31:03,830 --> 00:31:17,239
Now I want to know the sample which is equal
to 22 and so on. So I can copy this, copy
265
00:31:17,239 --> 00:31:32,529
this and I go to my I paste it here
or if it does not get pasted. So obviously
266
00:31:32,529 --> 00:32:21,119
we can write, so we can write 22, 23, 22,
25, 28. So we say it 22, then we say 23, then
267
00:32:21,119 --> 00:33:00,529
22, 25, 28, 25, 28, 27, then again we put
28, then again we put 25, then we put 23.
268
00:33:00,529 --> 00:33:09,080
So we have 10 data points we can put this
here and then we can say you calculate now.
269
00:33:09,080 --> 00:33:14,849
So here we put 25 is the global average which
you are interested in so we can say you calculate
270
00:33:14,849 --> 00:33:26,139
now. So by conventional the difference is
considered to be not statistically significance,
271
00:33:26,139 --> 00:33:33,320
because the p value is coming out to be 0.4826.
If we get p value less than, because the t
272
00:33:33,320 --> 00:33:42,139
value is 0.732 where as you want 1.833. So
obviously it is not statistically significant
273
00:33:42,139 --> 00:33:49,269
different. So you can use the GraphPad software
also to calculate, we can put in your data
274
00:33:49,269 --> 00:33:54,249
and you can use the GraphPad software also
to perform this type of calculation.
275
00:33:54,249 --> 00:34:03,379
It is quite useful software and this problem
is quite simple. So you have the population
276
00:34:03,379 --> 00:34:10,690
mean and you have the sample. So from the
sample you calculate the mean, from this sample
277
00:34:10,690 --> 00:34:14,700
you calculate the standard deviation and then
you calculate the standard error which is
278
00:34:14,700 --> 00:34:21,260
given is by s by square root of n, and then
you know you can get the t value and then
279
00:34:21,260 --> 00:34:29,619
for 9 degrees of freedom you make use of this
particular table. And for a single tailed
280
00:34:29,619 --> 00:34:37,010
test you use this, you go like this for 9
degrees of freedom you get 1.833, whereas
281
00:34:37,010 --> 00:34:45,419
t calculated is 0.7377. So obviously, we have
no reason for rejecting the null hypothesis
282
00:34:45,419 --> 00:34:50,639
that is what it is. So I showed you how to
calculate from the GraphPad software also
283
00:34:50,639 --> 00:34:55,269
actually. We will continue more on this one
sample t-test as we go along.
284
00:34:55,269 --> 00:34:59,939
Thank you very much for your time.