1
00:00:12,650 --> 00:00:16,770
We will continue with the course on Biostatistics
and Design of Experiments. We have been talking
2
00:00:16,770 --> 00:00:22,150
about two sample t-tests in the previous class.
Basically, two sample t-tests means we have
3
00:00:22,150 --> 00:00:27,730
two sets of samples, that means you are comparing
drug a with drug b or you are comparing two
4
00:00:27,730 --> 00:00:33,110
different plants for the yields, so we take
samples and then we want to find out whether
5
00:00:33,110 --> 00:00:38,691
those two samples come from the same population
or they are from different population. Let
6
00:00:38,691 --> 00:00:40,730
us look at some more problems in that two-sample
t-tests.
7
00:00:40,730 --> 00:00:48,110
.
A sleep inducing drug was given to 30 subjects,
8
00:00:48,110 --> 00:00:54,620
so 30 subjects here and at the same time a
placebo was given to 33 subjects we have it
9
00:00:54,620 --> 00:01:06,510
here actually and the time taken for them
to get into the sleep was monitored. For example,
10
00:01:06,510 --> 00:01:12,030
if you look here we are given 15 minutes,
25 minutes, 30 minutes, that means it took
11
00:01:12,030 --> 00:01:17,490
some volunteers 15 minutes to sleep, when
they took 2 volunteers when they took placebo
12
00:01:17,490 --> 00:01:23,180
and 0 volunteers when they took the drug,
it took 4 volunteers who took the drug to
13
00:01:23,180 --> 00:01:30,710
sleep in 25 minutes and it took 3 volunteers
to get into to sleep after 25 minutes so that
14
00:01:30,710 --> 00:01:35,240
is what these numbers are actually. So, these
numbers are number of people who took so much
15
00:01:35,240 --> 00:01:41,020
minutes to sleep. Now the question is the
with the drug lead to quicker on set of sleep
16
00:01:41,020 --> 00:01:48,390
that means, I want to know whether the average
time taken to sleep by the drug people is
17
00:01:48,390 --> 00:01:57,400
less than the average time taken for the placebo
group at 95 % confidence interval. It is quite
18
00:01:57,400 --> 00:02:04,280
simple, so we have 30 volunteers who took
the drug and we have a 33 volunteer who took
19
00:02:04,280 --> 00:02:13,110
the placebo and this is the time so we can
use the equation which I talked about in the
20
00:02:13,110 --> 00:02:14,760
previous class.
21
00:02:14,760 --> 00:02:18,319
.
We need to calculate t, t is equal to X H
22
00:02:18,319 --> 00:02:26,590
minus X L, that means X H is your ave mean
for a one group and this is the mean for another
23
00:02:26,590 --> 00:02:33,420
group, divided by S p square root of 1 by
n H that means n H is number of samples for
24
00:02:33,420 --> 00:02:41,250
1 group and n L is number of samples for another
group and then where S p is given by n H minus
25
00:02:41,250 --> 00:02:46,999
1, S H square so S H is the standard deviation
for the first group, S L is the standard deviation
26
00:02:46,999 --> 00:02:53,260
for the second group, and this is called the
t calculated. If the t calculated is less
27
00:02:53,260 --> 00:02:58,870
than table t accept H0, if t calculated is
greater than table t then reject H0,. The
28
00:02:58,870 --> 00:03:03,860
degrees of freedom is equal to n H minus 1
plus n L minus 2, so this in this particular
29
00:03:03,860 --> 00:03:09,510
problem it will be 63 minus 2, 61 will be
the degrees of freedom. What is the hypothesis?
30
00:03:09,510 --> 00:03:17,659
The null hypothesis H naught will be mu a
is equal to mu b that is the null hypothesis
31
00:03:17,659 --> 00:03:22,430
that means both the drug a could be the drug,
b could be your placebo.
32
00:03:22,430 --> 00:03:29,110
Now the alternate hypothesis could be mu a
less than mu b, b could be your placebo, a
33
00:03:29,110 --> 00:03:36,609
could be your drug. So it is a two sample.
1 tail because we are talking about alternatives
34
00:03:36,609 --> 00:03:44,550
µ a less than µ b so it is a one tail or
single tail, 95 % confidence interval, so
35
00:03:44,550 --> 00:03:52,519
we have to use this equation to calculate
t and then compare it with the table t, if
36
00:03:52,519 --> 00:03:57,439
this t is less than the table t we accept
H naught for 95 % confidence that is p is
37
00:03:57,439 --> 00:04:08,069
equal to 0.05 one tail test, the t calculated
here is greater than table t then you reject
38
00:04:08,069 --> 00:04:17,090
the H naught. We can use a excel like software
to calculate all these parameters the mean,
39
00:04:17,090 --> 00:04:22,990
the standard deviation of the individual samples
all these things that is what we are going
40
00:04:22,990 --> 00:04:23,990
to do.
41
00:04:23,990 --> 00:04:28,650
This is again I am showing you the problem
here time taken to sleep 10 minutes, a number
42
00:04:28,650 --> 00:04:34,880
of subjects who took this drug, number of
subjects who took the placebo. I need to calculate
43
00:04:34,880 --> 00:04:40,740
the average of each set, so how do I do, I
will do 15 into 0 this, 15 into 2 is for the
44
00:04:40,740 --> 00:04:47,960
placebo, 25 into 4 is here, 25 into 3 is here,
so like that I do go down, down, down and
45
00:04:47,960 --> 00:04:53,800
then I do all the summation here and then
here I divide by 30 because there are 30 people
46
00:04:53,800 --> 00:05:00,270
who took the drug, then here I divide by 33
so it took 40.5 minutes or an average for
47
00:05:00,270 --> 00:05:09,610
the subjects who took drug to get into sleep.
Whereas it took 43 minutes for subjects who
48
00:05:09,610 --> 00:05:17,240
took placebo.
Now the question is 43 and 40 appears to be
49
00:05:17,240 --> 00:05:25,699
different, but is 40.5 is statistically less
than 43 at 95 % confidence interval. Now we
50
00:05:25,699 --> 00:05:30,879
need to calculate the standard deviation of
each set, then we need to calculate this n
51
00:05:30,879 --> 00:05:37,949
H minus 1, that means here it could be 30
minus 1, n L minus 1 could be 33 minus 1 then
52
00:05:37,949 --> 00:05:43,699
standard deviation square that is variance
take the square root substitute here and then
53
00:05:43,699 --> 00:05:49,130
calculate the t calculator and that is what
I am doing in this part of the sheet. So,
54
00:05:49,130 --> 00:05:59,629
what do I do, I take a average this is my
average is 40, then I will take the individual
55
00:05:59,629 --> 00:06:08,860
value and then multiplied by the then square
it and then multiplied by the number of candidates.
56
00:06:08,860 --> 00:06:17,169
Here it will be 40 minus 15 square multiplied
by here 0, whereas for the other for the placebo
57
00:06:17,169 --> 00:06:25,020
I will do 43.03 minus 15 square multiplied
by 2 like that I keep on doing down, down,
58
00:06:25,020 --> 00:06:36,069
down, down then I will do the summation then
I will do the overall summation.
59
00:06:36,069 --> 00:06:43,780
Then I can calculate the 251 is what I am
getting here, 251 is what I am getting here
60
00:06:43,780 --> 00:06:51,240
and then I that is multiplied by 1 by 30 plus
1 by 33 that is this term here and then I
61
00:06:51,240 --> 00:06:59,360
am taking a square root here you can see here
square root and then I am doing the difference
62
00:06:59,360 --> 00:07:05,310
is 2.5, because X H minus X L you have so
the difference is 2.5 and the denominator
63
00:07:05,310 --> 00:07:14,819
is 3.99, so t comes out to be 0.633. Now is
this t statistically less than the table t
64
00:07:14,819 --> 00:07:22,650
or greater than the table t for 61 degrees
of freedom for p 0.05, 1 tail test. Now I
65
00:07:22,650 --> 00:07:38,090
go to this it is a 1 tailed test 0.05, I go
down 61, so it is 1.645. So obviously, this
66
00:07:38,090 --> 00:07:42,740
t calculated is much less. Then the t table
so we cannot reject the null hypothesis, that
67
00:07:42,740 --> 00:07:47,099
means there is no statistically significant
difference at 95 % confident limit between
68
00:07:47,099 --> 00:07:53,969
the placebo and the drug.
Although it looks, usually there is a decrease
69
00:07:53,969 --> 00:07:59,689
of 2.5 minutes on an average for a people
who have taken the drug and compared to placebo
70
00:07:59,689 --> 00:08:05,860
to go to sleep but when you do this analysis
we find that there is no statistically significant
71
00:08:05,860 --> 00:08:11,389
difference and 95 % confidence, there is no
reason for you to reject the null hypothesis
72
00:08:11,389 --> 00:08:17,310
so you understand this problem. So, make use
of this equation the equation is X H bar minus
73
00:08:17,310 --> 00:08:22,879
X L bar that means mean of sample 1, mean
of sample 2, if you call it this then n H
74
00:08:22,879 --> 00:08:29,610
is the number of items in that sample 1, these
the number of items in the sample 2 and S
75
00:08:29,610 --> 00:08:35,909
p is the standard error which you calculate
using this formula, so these are the individual
76
00:08:35,909 --> 00:08:42,090
variances for this sample 1, the sample 2.
The degrees of freedom here will be n H minus
77
00:08:42,090 --> 00:08:48,980
n L minus 2. Then you calculate t from this
that is called t calculated then we use the
78
00:08:48,980 --> 00:08:55,460
table t and then if the t calculated is less
than the table t, there is no reason for you
79
00:08:55,460 --> 00:08:59,630
to reject null hypothesis that means you have
to accept null hypothesis. Only if the t calculated
80
00:08:59,630 --> 00:09:05,110
is greater than a table t then we reject the
null hypothesis then we accept the alternate
81
00:09:05,110 --> 00:09:08,910
hypothesis.
In this particular case we want to know whether
82
00:09:08,910 --> 00:09:17,770
the drug reduces the onset of sleep. So, your
hypothesis are null hypothesis is µa equal
83
00:09:17,770 --> 00:09:23,200
to µb, if I call a as my drug and my b as
my placebo and the alternate hypothesis is
84
00:09:23,200 --> 00:09:32,440
µa is less than µb. How do you calculate
each of these? is quite simple let me first
85
00:09:32,440 --> 00:09:35,100
talk about the averages.
How do you calculate averages, so what do
86
00:09:35,100 --> 00:09:43,510
you do? 15 into 0, 25 into 4 like that you
keep on doing it then you sum it up and then
87
00:09:43,510 --> 00:09:49,770
divided by 30 that will give you the average
time taken for the people who took the drug.
88
00:09:49,770 --> 00:09:55,900
For the placebo you do 15 into 2, 25 into
3, 30 into 3 and so on, again you add it up
89
00:09:55,900 --> 00:10:03,040
and then divide by 33 you get the average
as 43.03 so you can call it X H bar this is
90
00:10:03,040 --> 00:10:12,011
X L bar, n H will be 30, n L will be 33. Now
how do you calculate the variances here what
91
00:10:12,011 --> 00:10:19,590
do you do, if you want to look at the drug
part of it so we say 40.5 minus 15 square
92
00:10:19,590 --> 00:10:27,060
multiplied by the number here in this case
0, then 40.5 minus 25 square multiplied by
93
00:10:27,060 --> 00:10:33,100
4 people, 40.5 minus 30 square multiplied
by 3 that is what your writing here, then
94
00:10:33,100 --> 00:10:37,820
here 40.5 minus 35 square into 6 that is what
you are writing.
95
00:10:37,820 --> 00:10:44,650
So like that you keep on doing for the drug.
And for the placebo what you do, you take
96
00:10:44,650 --> 00:10:54,530
a 43.03 minus 15 square multiplied by 2 that
is what comes here, 43.03 minus 25 square
97
00:10:54,530 --> 00:11:01,300
multiplied 3 that is what it comes here and
so on. And then you add up, you end up like
98
00:11:01,300 --> 00:11:10,770
this and then you can calculate these term,
because you got this and then finally we end
99
00:11:10,770 --> 00:11:20,270
up with the 251 is divided by 1.1 by 30 plus
1 by 33 that is nothing but this ok because,
100
00:11:20,270 --> 00:11:27,900
30 is the number of samples for the drug and
33 is the samples for placebo, then you are
101
00:11:27,900 --> 00:11:33,880
taking the square root that comes to 3.99
so that will cover the entire denominator
102
00:11:33,880 --> 00:11:40,480
part of it. The numerator is 2.5 right that
is the difference in the averages, so you
103
00:11:40,480 --> 00:11:49,440
divide 2.5 by 3.99 you will get 0.633 so that
is the table t and according to sorry that
104
00:11:49,440 --> 00:11:55,290
that will be your calculated t and table t
I have written as 1.96 but actually it should
105
00:11:55,290 --> 00:11:57,700
be.
106
00:11:57,700 --> 00:12:06,310
If I take a 95 % or point p as 0.05 and I
take 1 tail test I should be reading here
107
00:12:06,310 --> 00:12:15,450
so it should be 1.645 not 1.96, so this should
read as 1.645. Whether it is 1.645 or 1.96
108
00:12:15,450 --> 00:12:22,190
obviously, the table t is much higher, then
your calculated t is obviously there is no
109
00:12:22,190 --> 00:12:27,860
reason for you to reject the null hypothesis
at 95 % confidence interval. So, that means,
110
00:12:27,860 --> 00:12:36,720
the drug does not reduce the onset of a sleeping
time when compared to the placebo. So, you
111
00:12:36,720 --> 00:12:42,160
understand how to do this problem very interesting
we come across these types of problems quite
112
00:12:42,160 --> 00:12:47,000
a lot when you are performing clinical trials
it could placebo, it could be drug b and that
113
00:12:47,000 --> 00:12:54,220
means another drug, so you may be comparing
2 drugs in the market and so on actually.
114
00:12:54,220 --> 00:13:03,320
This table is very important, this top portion
is related to the 1 tail, the bottom portion
115
00:13:03,320 --> 00:13:08,170
is related to 2 tail, so when we say 1 tail
0.05 you have to read this column, if you
116
00:13:08,170 --> 00:13:14,990
say 2 tail 0.05 you have to read this column
please remember that. So, what is the relationship
117
00:13:14,990 --> 00:13:22,640
between 2 tail and 1 tail, 2 tail is totally
is 0.05 so each side will be 0.025 so that
118
00:13:22,640 --> 00:13:25,360
is the relationship here understand.
119
00:13:25,360 --> 00:13:28,110
.
Or we can use this is another table this gives
120
00:13:28,110 --> 00:13:37,270
you the outside portion here again you can
see 0.025 relates to one tail. So, for two
121
00:13:37,270 --> 00:13:42,820
tail it is 0.05 so 1.96 is the t value.
122
00:13:42,820 --> 00:13:48,890
Let us look at another problem another interesting
problem. Artificial valve was placed in rats
123
00:13:48,890 --> 00:13:53,750
and mouse for 30 days; they were removed and
tested for wear. This is very common when
124
00:13:53,750 --> 00:14:00,690
you are doing bio material design, when you
are creating new material you keep it in a
125
00:14:00,690 --> 00:14:06,120
animal models and then see the mechanicals
strength, changes in mechanical strength which
126
00:14:06,120 --> 00:14:13,430
could be tensile, compression, where, care
and so on actually. Artificial valves where
127
00:14:13,430 --> 00:14:20,510
placed in rats and mouse for 30 days and they
wear after 30 days were measured with the
128
00:14:20,510 --> 00:14:24,460
standard deviation so it is all given, it
is not raw data but somebody has already done
129
00:14:24,460 --> 00:14:28,400
it and they have given it. Can it be said
that the wear is less when the valves are
130
00:14:28,400 --> 00:14:36,240
placed in rats. This is again a 1 sample t-tests,
because we are trying look at wear is less
131
00:14:36,240 --> 00:14:42,830
when the valves are placed in rats, so rats
will be less than the mouse. So obviously,
132
00:14:42,830 --> 00:14:49,930
it is 1 sample t-tests, we can look at 95
or 99 depending upon the importance. The null
133
00:14:49,930 --> 00:14:58,450
hypothesis will be µ rats is equal to µ
mouse, the alternate hypothesis will be µ
134
00:14:58,450 --> 00:15:07,780
rats less than µ mouse that is what is , say
99 or 95 or whatever you , this is what I
135
00:15:07,780 --> 00:15:08,780
have said.
136
00:15:08,780 --> 00:15:15,740
H0 µ rat is equal to µ m mouse, µrat is
less than µ m. So, we are talking about 1
137
00:15:15,740 --> 00:15:22,100
tail test, obviously, the degrees of freedom
is 16, 10 plus 8 minus 2 I said even if you
138
00:15:22,100 --> 00:15:28,160
remember. So, 16 degrees of freedom 1 tail
test p is equal to 0.05 you go here, 16 degrees
139
00:15:28,160 --> 00:15:39,600
of freedom you can read out 1.746. Whatever
calculation we do, if the t value we get through
140
00:15:39,600 --> 00:15:44,750
calculation is greater than 1.746, then we
can say we reject the null hypothesis, if
141
00:15:44,750 --> 00:15:51,360
the t value we calculate is less than 1.746,
then there is no reason for you to reject
142
00:15:51,360 --> 00:15:58,670
the null hypothesis at 95 % confidence interval.
So, we use the same equation if you remember
143
00:15:58,670 --> 00:16:05,330
X H bar minus X L bar divided by S p square
root of 1 by n H plus 1 by n L various p is
144
00:16:05,330 --> 00:16:09,560
given by right.
145
00:16:09,560 --> 00:16:13,890
.
We can take rats here n r minus 1, so n r
146
00:16:13,890 --> 00:16:21,160
for rats is 10, n m for mouse is 8, the degrees
of freedom is 16, 10 plus 8 minus 2 then the
147
00:16:21,160 --> 00:16:26,830
standard deviation is given for rats, so squaring
there standard deviation for mouse is given
148
00:16:26,830 --> 00:16:35,380
squaring that here we put 9, here we put 7
divided by 9 plus 7 so we get S p is equal
149
00:16:35,380 --> 00:16:37,089
to all these number.
150
00:16:37,089 --> 00:16:40,070
.
Now, if you remember t is equal to X r minus
151
00:16:40,070 --> 00:16:46,610
X m bars that means average divided by S p
is whole thing which you have calculated here.
152
00:16:46,610 --> 00:16:55,010
Now X r is given by this, X m is given by
this so divided by we are taking this then
153
00:16:55,010 --> 00:17:03,290
we are taking a square root of 1 by 10 plus
1 by 8 we get 6.89. So obviously, the t value
154
00:17:03,290 --> 00:17:10,740
we calculate is much larger than the table
t so we reject the null hypothesis at 95 % confidence
155
00:17:10,740 --> 00:17:17,140
interval, then we accept the alternate hypothesis.
Please note there is a minus here, because
156
00:17:17,140 --> 00:17:24,060
I have taken rat minus X m and the rat numbers
are smaller. Even if you look at 99 % single
157
00:17:24,060 --> 00:17:30,850
tail test, let us look at 99 % single tail
test that is you look here that is 0.1, 10
158
00:17:30,850 --> 00:17:45,970
plus 8 minus 2 is 16.
So, let us look here 2.583, so 2.583 is still
159
00:17:45,970 --> 00:17:52,470
smaller than 6.89. So even at 99 % confidence
interval we can reject the null hypothesis,
160
00:17:52,470 --> 00:17:59,380
whether it is 95 or 99 we can reject the null
hypothesis that means you accept the alternate
161
00:17:59,380 --> 00:18:05,010
hypothesis. There is a statistically significant
difference that means the wear is much more
162
00:18:05,010 --> 00:18:11,890
in rats than in mouse of the artificial wear.
So, you see you have done lot of these type
163
00:18:11,890 --> 00:18:22,950
of problems I am mean showing you quite a
lot you can use simple excel and do all these
164
00:18:22,950 --> 00:18:30,620
calculations all you need to know is these
set of equations here. You can use that t-test
165
00:18:30,620 --> 00:18:35,900
function that is available in excel, where
when you have the raw data for the set of
166
00:18:35,900 --> 00:18:42,510
samples and another row data for another set
of samples in the t-test function that is
167
00:18:42,510 --> 00:18:48,730
available in the excel you can do something
called the two-sample t-test, when the variance
168
00:18:48,730 --> 00:18:52,750
are equal and the 2 sample t-test the variance
are not equal.
169
00:18:52,750 --> 00:18:58,140
But that t-test function cannot do this type
of calculation because in this data, raw data
170
00:18:58,140 --> 00:19:04,440
is not given the average wear and standard
deviation are already given whereas the excel
171
00:19:04,440 --> 00:19:09,760
function t-test can be done only when we have
the raw data. So if we have data like this
172
00:19:09,760 --> 00:19:14,601
then obviously you have to use these equations
remember that and you can we cannot use the
173
00:19:14,601 --> 00:19:20,190
graph pad online software also, if you have
data like this you have to use these equations
174
00:19:20,190 --> 00:19:25,120
that is why I am spending lot of time on these
equations that way you get an idea about the
175
00:19:25,120 --> 00:19:32,350
underlying mathematics behind calculating
the t. Otherwise, softwares can blindly give
176
00:19:32,350 --> 00:19:37,530
you some t and then it can tell you it is
statistically not significant or statistically
177
00:19:37,530 --> 00:19:42,800
significant so anybody can do that, but you
need to know what are the underlying equations
178
00:19:42,800 --> 00:19:48,990
that are used when you calculate 2 sample
t-tests or 1 sample t-tests or confidence
179
00:19:48,990 --> 00:19:53,360
interval that is why I am spending lot of
time on these equations.
180
00:19:53,360 --> 00:19:57,490
And for example, these types of problem we
cannot do it with excel you have to do it
181
00:19:57,490 --> 00:20:02,850
manually like this you have to take these
equation substitute here and then using a
182
00:20:02,850 --> 00:20:08,970
calculator or something do the calculations.
Whereas a excel or even graph pad can do if
183
00:20:08,970 --> 00:20:13,590
you are given raw data. What does that raw
data means? For example if you are talking
184
00:20:13,590 --> 00:20:22,360
about 10 rats, I will have all the 10 wear
value know 0.0045, 0.0048 like that and if
185
00:20:22,360 --> 00:20:27,690
I have 8 mouse I will have all the 8 wear
values. Whereas here the average and standard
186
00:20:27,690 --> 00:20:31,620
deviation of this 10-data set is given that
means somebody has already done that calculation
187
00:20:31,620 --> 00:20:38,960
and given you. So, the excels or graph pad
will not be able to handle this type of data
188
00:20:38,960 --> 00:20:46,440
so you need to use these equations.
So, we talked about 2 sample t-tests where
189
00:20:46,440 --> 00:20:51,740
you can compare 2 different sets of samples
and I told you how to calculate the t using
190
00:20:51,740 --> 00:21:00,820
this equation and you say whether the equation
t is greater than the t the table for a one
191
00:21:00,820 --> 00:21:06,610
tail or a two-tail test, if it is a greater
than the table we reject the null hypothesis,
192
00:21:06,610 --> 00:21:16,800
if it is less than the t table we accept the
null hypothesis. And also, we looked at one
193
00:21:16,800 --> 00:21:25,530
tail test where we are saying that the drug
induces sleep much faster than placebo, they
194
00:21:25,530 --> 00:21:37,480
were in the valve when they are implanted
in rats is much more and so on actually. So,
195
00:21:37,480 --> 00:21:44,090
in that sort of thing we are talking about
one tail test. I also showed you the table
196
00:21:44,090 --> 00:21:50,960
how to use the table and that is very, very
important, that table it tells you the p values
197
00:21:50,960 --> 00:21:57,630
for one tail on the top and the p values for
a two tail here and this one gives you the
198
00:21:57,630 --> 00:22:01,810
degrees of freedom. In a two-sample t-test
please remember the degrees of freedom will
199
00:22:01,810 --> 00:22:07,620
be number of samples for a sample set 1 and
number of samples for a sample set 2 minus
200
00:22:07,620 --> 00:22:13,420
2 remember that. Whereas when you are doing
a one sample t-test it will be total number
201
00:22:13,420 --> 00:22:20,170
of samples minus 1, that will be the number
of degrees of freedom. So, we will continue
202
00:22:20,170 --> 00:22:23,870
and we will talk about paired t-tests in the
next class.
203
00:22:23,870 --> 00:22:25,130
Thank you very much for your time