1 00:00:12,690 --> 00:00:19,000 Hello again, welcome to the course on Biostatistics and Design of Experiments. In this class, 2 00:00:19,000 --> 00:00:29,509 we are going to briefly touch about various statistical test, that one should understand 3 00:00:29,509 --> 00:00:34,051 and how to go about using them. So, there are different types of statistical test, some 4 00:00:34,051 --> 00:00:41,660 are used for comparing mean of a data and some are used for comparing variance of data 5 00:00:41,660 --> 00:00:46,199 and some are used for comparing ratios of data and so on. So, we will talk about that. 6 00:00:46,199 --> 00:00:51,789 Before going into that, let me recall again how you go about doing this hypothesis, you 7 00:00:51,789 --> 00:00:56,309 have something called the null hypothesis, which is no difference or state as score. 8 00:00:56,309 --> 00:01:05,880 So, imagine I am comparing the IQ's of 2 different classes or class of students, then I would 9 00:01:05,880 --> 00:01:11,930 say the null hypothesis will be there is no difference in the IQ of students in class 10 00:01:11,930 --> 00:01:18,750 a and class b. Then we also have the alternate hypothesis, if I am going to say yes there 11 00:01:18,750 --> 00:01:26,460 is possible difference then I would say the alternate hypothesis will be the, IQ average 12 00:01:26,460 --> 00:01:33,570 of class a is different from IQ average of b. Then you can also have another situation 13 00:01:33,570 --> 00:01:40,010 where instead of just saying different, the IQ average of class a could be better than 14 00:01:40,010 --> 00:01:46,970 the IQ average of class b. That is we are comparing only the better part of it or the 15 00:01:46,970 --> 00:01:53,020 IQ average of a class a is worst than that of that class b, then we are comparing only 16 00:01:53,020 --> 00:02:00,831 the worst part of it. So we have those 3 situations, no difference is the null hypothesis, there 17 00:02:00,831 --> 00:02:06,800 is a difference is the alternate hypothesis and that is called a two-tailed comparison 18 00:02:06,800 --> 00:02:11,750 that is a different, it could be greater or worst. Then we are comparing only 1 side of 19 00:02:11,750 --> 00:02:16,690 it greater side of it or worst side of it, then that is called one-tailed test. 20 00:02:16,690 --> 00:02:23,381 So we decide on the hypothesis with the tail and then we also decide on the p value, that 21 00:02:23,381 --> 00:02:32,360 is am I going to test against 95 % confidence or am I going to test against 99 % confidence. 22 00:02:32,360 --> 00:02:44,350 So p of 0.05 indicates 95 % and p of 0.01 indicates 99 %. Once I decide on all these, 23 00:02:44,350 --> 00:02:49,070 that means I decide on the type of hypothesis, I decide on whether it is a single tailed 24 00:02:49,070 --> 00:02:56,920 or a two-tailed test, then I decide on the p I want to look at, then I calculate using 25 00:02:56,920 --> 00:03:06,090 different test something called t, and then from there I will calculate may be the p value 26 00:03:06,090 --> 00:03:12,160 and then I will say whether the p value which I have calculated is less than 0.05 or it 27 00:03:12,160 --> 00:03:21,840 is more. So if it is less than 0.05 obviously, there is a difference so obviously, I cannot 28 00:03:21,840 --> 00:03:25,980 accept null hypothesis so I have to reject the null hypothesis that means, I have to 29 00:03:25,980 --> 00:03:32,050 accept the alternate hypothesis. Now if the p value is greater than 0.05 for a 95 % confidence 30 00:03:32,050 --> 00:03:37,030 obviously, there is no reason for me to reject the null hypothesis. 31 00:03:37,030 --> 00:03:47,541 So, imagine I am testing 2 drugs in the market which affects this sleeping pattern. So, the 32 00:03:47,541 --> 00:03:53,210 null hypothesis could be, there is no difference between the 2 drugs, the alternate there is 33 00:03:53,210 --> 00:03:59,960 a difference at a 95 % confidence interval. Now what type of hypothesis equations we will 34 00:03:59,960 --> 00:04:08,670 put first, the two-tailed under a single tailed. For a two-tailed your alternate will be mu 35 00:04:08,670 --> 00:04:18,130 a is not equal to mu b, your null hypothesis will be mu a equal to mu b, that is the two-tail. 36 00:04:18,130 --> 00:04:23,940 For single tailed test null hypothesis could be mu a is equal to mu b, alternate hypothesis 37 00:04:23,940 --> 00:04:30,030 could be mu a is greater than mu b, that means drug a has an increased effect than drug b 38 00:04:30,030 --> 00:04:37,570 or mu a could be less than mu b, that mean drug a has decreased effect than drug b that 39 00:04:37,570 --> 00:04:43,040 is a lower tailed and upper tailed. Now there is something called error, there are 2 types 40 00:04:43,040 --> 00:04:44,040 of error. 41 00:04:44,040 --> 00:04:49,320 There is something called type 1 error and there is something called type 2 error. What 42 00:04:49,320 --> 00:04:56,449 is type 1 error? We are rejecting null hypothesis h naught, when h naught is true that means, 43 00:04:56,449 --> 00:05:03,220 we are rejecting the null hypothesis where as in reality the null hypothesis is true 44 00:05:03,220 --> 00:05:08,530 that is called type 1 error. Then you have the type 2 error, we fail to reject the null 45 00:05:08,530 --> 00:05:14,990 hypothesis when the alternate hypothesis is true. Whereas, instead of accepting alternate 46 00:05:14,990 --> 00:05:22,130 hypothesis we do not accept alternate, we accept null hypothesis. We have 2 situations, 47 00:05:22,130 --> 00:05:31,669 one is called the do not reject null hypothesis, null hypothesis is true null alternate hypothesis 48 00:05:31,669 --> 00:05:37,910 is true. So here this is the correct, null hypothesis is true you do not reject null 49 00:05:37,910 --> 00:05:45,229 hypothesis, similarly alternate hypothesis need to accept reject null hypothesis so this 50 00:05:45,229 --> 00:05:54,660 is the correct situation, whereas in some times although we need to, null hypothesis 51 00:05:54,660 --> 00:06:01,979 is true we end up rejecting the null hypothesis that is called the type 1 error or alpha error. 52 00:06:01,979 --> 00:06:09,419 Whereas in the other situation we do not reject the null hypothesis whereas, we have H 1 is 53 00:06:09,419 --> 00:06:17,040 true that is called the type 2 error and that is called the beta error. 54 00:06:17,040 --> 00:06:25,490 So type 1 error, is generally the probability 95 % or 99 % or 90 % which we make use of 55 00:06:25,490 --> 00:06:36,070 in our statistical calculation. That is where if you take a larger alpha then obviously 56 00:06:36,070 --> 00:06:45,400 you are very sure that you will not reject the null hypothesis, where when H naught is 57 00:06:45,400 --> 00:06:52,280 true but then your significance level also gets affected by that. So this table is very 58 00:06:52,280 --> 00:07:04,509 important when H naught is true, we reject the H naught that is called the type 1 error. 59 00:07:04,509 --> 00:07:12,330 When H 1 is true, we fail to reject the null hypothesis that is called the type 2 error 60 00:07:12,330 --> 00:07:17,090 or beta error. So type 1 error is called the alpha error and type 2 is called the beta 61 00:07:17,090 --> 00:07:23,139 error. So we need to always, sort of balance between the alpha and the beta error and generally 62 00:07:23,139 --> 00:07:32,610 we give more importance to the alpha error actually. 63 00:07:32,610 --> 00:07:39,410 As you know we have a continuous data and alternatively we have the discrete data. Discrete 64 00:07:39,410 --> 00:07:47,370 data we use equations like Binomial or we use Poisson and so on. So when we have the 65 00:07:47,370 --> 00:07:56,260 continuous data there are many situations, we can have one sample that means I know the 66 00:07:56,260 --> 00:08:01,380 population details, I take a small sample and I am comparing with the population that 67 00:08:01,380 --> 00:08:09,890 is called a one sample. For example, I know the 12th standard average from a particular 68 00:08:09,890 --> 00:08:19,680 school is 95 %, so I take 10 students in that school and calculate their 12th standard average. 69 00:08:19,680 --> 00:08:30,370 I may get some x bar, now I want to know whether this x bar is related to the 95 % school average, 70 00:08:30,370 --> 00:08:36,339 which is more like a population or is it very far away. So here we are collecting only 1 71 00:08:36,339 --> 00:08:41,610 set of samples and then comparing with the population that is called a one sample test 72 00:08:41,610 --> 00:08:46,520 or one sample t-test. Here we are comparing the mean of the sample with the population 73 00:08:46,520 --> 00:08:53,650 mean. Now what is Two samples, suppose I am comparing performance of drug A and performance 74 00:08:53,650 --> 00:08:59,880 of drug B and trying to tell there is no statistically significant difference between their performance 75 00:08:59,880 --> 00:09:04,980 or there is significance difference that means, I am comparing the means of 2 samples that 76 00:09:04,980 --> 00:09:10,959 is why it is called the two sample t-test. Now again you can have Multiple samples, I 77 00:09:10,959 --> 00:09:20,250 may be comparing drug A, B, C, D so, I could be having lots of different means that is 78 00:09:20,250 --> 00:09:26,191 called a Multiple samples actually. So there are many, many ways by which one 79 00:09:26,191 --> 00:09:32,390 could analyze these data one is called the Study Stable or Run Charts, then we can look 80 00:09:32,390 --> 00:09:38,640 at the Shape we can create a histogram and see whether it looks normally distributor, 81 00:09:38,640 --> 00:09:43,520 then we can look at the Data whether it is and so on actually. And then, there is something 82 00:09:43,520 --> 00:09:50,910 called Chi Squared Test and then there is something called t test, t-tests are generally 83 00:09:50,910 --> 00:09:58,790 meant for comparing means, chi square test are generally meant for comparing ratios. 84 00:09:58,790 --> 00:10:03,459 And there is something called F test, which is generally meant for comparing variances 85 00:10:03,459 --> 00:10:08,411 or spreads actually. In the t test we have 3 types of t test, one 86 00:10:08,411 --> 00:10:14,029 sample t-test that means, I take only 1 sample and then compare it with the population or 87 00:10:14,029 --> 00:10:19,870 I can have a two sample t-test, where I am comparing 2 sets of samples or I may be comparing 88 00:10:19,870 --> 00:10:31,070 paired t-test that means, there is a relationship between the sample items with a and sample 89 00:10:31,070 --> 00:10:39,880 items in b. So when I am comparing variances there is something called F test or if I am 90 00:10:39,880 --> 00:10:45,550 having multiple samples and I am comparing variances then this is something called ANOVA 91 00:10:45,550 --> 00:10:52,690 analysis of variance test and if I am comparing the spread there is something called homogeneity 92 00:10:52,690 --> 00:10:57,130 of variance. There are different types of test that are possible and we are going to 93 00:10:57,130 --> 00:11:04,110 spend lot of time on each one of them actually. So we can have only one sample collected, 94 00:11:04,110 --> 00:11:08,231 comparing it with the population or we could be having two samples collected, comparing 95 00:11:08,231 --> 00:11:14,040 with the population or we could have multiple samples collected, comparing with the population. 96 00:11:14,040 --> 00:11:20,350 Then if I am comparing means then there is something called t test, the one sample t-test, 97 00:11:20,350 --> 00:11:25,260 the two sample t-test, paired t-test. If I am comparing variances there is something 98 00:11:25,260 --> 00:11:31,610 called f test, if I am comparing a variances of a large number of data sets or samples 99 00:11:31,610 --> 00:11:36,050 this analysis of variance. If I am looking at spread of the data I can use something 100 00:11:36,050 --> 00:11:41,360 called Homogeneity of variance and so on. So large number of tests are possible we will 101 00:11:41,360 --> 00:11:47,459 talk about each one of them in detail and we are going to spend with some examples also, 102 00:11:47,459 --> 00:11:50,570 so do not worry about it. Refer Slide Time: 11:50) 103 00:11:50,570 --> 00:11:54,040 . So if you are comparing means, there is something 104 00:11:54,040 --> 00:12:03,550 called t test. We can get average of a sample one and then I get a mean of sample two and 105 00:12:03,550 --> 00:12:09,100 then I am going to find out whether both the means come from the same population or each 106 00:12:09,100 --> 00:12:15,399 of the mean come from 2 different population that is called a t-test, t-tests are quite 107 00:12:15,399 --> 00:12:23,960 robust even for non normal data. Generally we can say the standard deviations have to 108 00:12:23,960 --> 00:12:28,950 be similar, but there can be some difference in the standard deviation also but still, 109 00:12:28,950 --> 00:12:34,040 t test is good. In t-test we have 3 types 1 sample t-test, 2 sample t-test and paired 110 00:12:34,040 --> 00:12:37,950 t test. So one sample t-test you are taking a sample 111 00:12:37,950 --> 00:12:44,310 out and then you are getting the mean and the variance of that sample and you are comparing 112 00:12:44,310 --> 00:12:51,940 it with the mean of the population like I gave you some examples actually like, I take 113 00:12:51,940 --> 00:13:00,970 10 students from a class and then get their mean average, class average, remarks average 114 00:13:00,970 --> 00:13:06,750 then I compare it with the school average and try to tell whether these averages are 115 00:13:06,750 --> 00:13:13,709 far away from the school average or they fall into the same population, I can do that sort 116 00:13:13,709 --> 00:13:23,540 of study. I can collect IQ of 10 students in a university and then try to say whether 117 00:13:23,540 --> 00:13:33,329 the mean IQ falls within the university average IQ or it falls outside that, so that is one 118 00:13:33,329 --> 00:13:39,420 sample t-test here we are taking only 1 sample. Two sample t-test if I am going to have 2 119 00:13:39,420 --> 00:13:45,480 sets of samples, I am taking 10 students from 1 university 10 students from another university 120 00:13:45,480 --> 00:13:51,770 and getting their getting their IQ's and then comparing their IQ's and trying to say whether 121 00:13:51,770 --> 00:13:57,310 the IQ's are statistically different or there is no statistically significant difference 122 00:13:57,310 --> 00:14:02,389 between these 2 IQ's. So that is called two sample t-test because I am using 2 sets of 123 00:14:02,389 --> 00:14:07,910 samples. Paired t-test, if you are pairing 2 sets of 124 00:14:07,910 --> 00:14:18,260 data then difference in result should be 0 for example, I take 10 cats and I test a drug 125 00:14:18,260 --> 00:14:26,519 A on the 10 cats and look at their outcome then on the same 10 cats I give drug B and 126 00:14:26,519 --> 00:14:30,790 look at the outcome. So the different should be 0, if there is no difference between in 127 00:14:30,790 --> 00:14:39,240 the drug A and drug B. If there is a statistically significant difference away from 0 then I 128 00:14:39,240 --> 00:14:46,550 can say yes, drug A is different from drug B because, I have used the same volunteer 129 00:14:46,550 --> 00:14:54,420 cats and I am testing drug a seeing some performance change, then I am testing drug B seeing some 130 00:14:54,420 --> 00:15:00,490 performance change, if drug A and drug B have to be same then the performance change we 131 00:15:00,490 --> 00:15:04,780 observed, the difference in performance change we observed should be equal to 0 that is called 132 00:15:04,780 --> 00:15:09,829 the paired t-test. So we are going to look at examples of the each one of them. So you 133 00:15:09,829 --> 00:15:15,380 do not worry about that. So, interestingly all these tests are looking 134 00:15:15,380 --> 00:15:23,380 at means that means averages. Average or mean of the samples which you are taking it out, 135 00:15:23,380 --> 00:15:28,550 whether it is one sample t-test, 2 sample. Now you may ask the question suppose instead 136 00:15:28,550 --> 00:15:35,200 of 2 samples if I have many more samples what will I do of course I can do, take 2 sets 137 00:15:35,200 --> 00:15:41,670 of samples 1 at a time and do a two sample t-test, but there is another approach which 138 00:15:41,670 --> 00:15:48,089 is much faster that is called ANOVA analysis of variance, it is called the 1 way ANOVA 139 00:15:48,089 --> 00:15:50,899 we will talk about that later in the course. 140 00:15:50,899 --> 00:16:01,699 Now, if you are comparing variances that means, you are comparing these standard error. So 141 00:16:01,699 --> 00:16:07,191 here we are comparing variances, this is generally valid both for normal and non normal. So the 142 00:16:07,191 --> 00:16:13,290 H naught will be sigma 1 square is equal to sigma 2 square and so on, we cannot reject 143 00:16:13,290 --> 00:16:18,550 h naught when p is greater than 0.05. The alternate could be sigma a square is different 144 00:16:18,550 --> 00:16:28,130 from sigma b square, p is less than 0.05 we have to reject H naught and accept H a. So 145 00:16:28,130 --> 00:16:35,811 you see the tests are there, for Comparing Variances. So in the t-test we are comparing 146 00:16:35,811 --> 00:16:39,329 means, here we are comparing variances. 147 00:16:39,329 --> 00:16:47,870 One test that is there is called F test, that means I am comparing the variation from here 148 00:16:47,870 --> 00:16:56,089 sample 1 and a sample 2. So, the H naught could be sigma 1 square is equal to sigma 149 00:16:56,089 --> 00:17:00,649 2 square that means variances are same or the alternate could be sigma 1 square is different 150 00:17:00,649 --> 00:17:06,529 from sigma 2 square. So we calculate the f ratio which is given by s 1 square by s 2 151 00:17:06,529 --> 00:17:14,819 square, s 1 and the s 2 are the sample standard deviation, so s 1 square is the variance. 152 00:17:14,819 --> 00:17:23,549 There is a table called F table, for a 95 % or 99 % you will get F value and if the 153 00:17:23,549 --> 00:17:30,120 table F value is greater than the F you calculate, then you accept H naught and if the table 154 00:17:30,120 --> 00:17:38,529 F value is less than the F you calculate, you reject H naught and accept H a. For different 155 00:17:38,529 --> 00:17:44,029 degrees of freedom, the degrees of freedom for data set 1 is n 1 minus 1, if you have 156 00:17:44,029 --> 00:17:50,419 used n data sets and degrees of freedom for data set 2 is n 2 minus 1, if you have collected 157 00:17:50,419 --> 00:17:53,019 n 2 samples that is called the F test. 158 00:17:53,019 --> 00:17:58,279 And then you also have ANOVA, when you are comparing a large number of data sets in the 159 00:17:58,279 --> 00:18:03,409 previous F test you have only 2 data sets, 1 and 2. So your saying sigma 1 square is 160 00:18:03,409 --> 00:18:09,680 equal to sigma 2 square, alternative sigma 1 square is not equal to sigma 2 square. So 161 00:18:09,680 --> 00:18:16,149 ANOVA is very, very powerful because we can collect a large number of data sets I am comparing 162 00:18:16,149 --> 00:18:22,549 the IQ's of university A, university B, university C, university D and trying to find whether 163 00:18:22,549 --> 00:18:27,649 there is a statistical significant difference or not. I am comparing drug A, B, C, D in 164 00:18:27,649 --> 00:18:34,570 clinical trials, I want to perform analysis to find out whether there is a statistically 165 00:18:34,570 --> 00:18:41,529 significant difference, then I use ANOVA here. The variances of the samples are approximately 166 00:18:41,529 --> 00:18:46,129 equal, the response within any given level are normally distributed these are the assumptions. 167 00:18:46,129 --> 00:18:55,950 So H naught will be all these are same, whereas H a is at least 1 variance is different then 168 00:18:55,950 --> 00:19:01,270 you get a p value less than 0.05 ok. 169 00:19:01,270 --> 00:19:08,570 We have the t test, different types of t tests which are very powerful for comparing mean, 170 00:19:08,570 --> 00:19:14,759 then we have the test for comparing variances like F test and ANOVA and then for ratios 171 00:19:14,759 --> 00:19:20,879 we have something called chi squared test which we will talk about later. So if there 172 00:19:20,879 --> 00:19:26,429 is something called the power of the test or if H 1 is true, so that the distribution 173 00:19:26,429 --> 00:19:34,150 of X is specified by H 1 then the probability of rejecting the H0 is the power of the test 174 00:19:34,150 --> 00:19:45,299 for distribution. H 1 is true, so you have 2 situations you do not reject H0 that is 175 00:19:45,299 --> 00:19:54,059 beta error where as you reject H0. If H1 is true and you reject H0 that is called the 176 00:19:54,059 --> 00:20:07,159 power of the test for that distribution. So you have 2 types of error, alpha error where 177 00:20:07,159 --> 00:20:17,809 you should not reject H0 but you end up rejecting H0. Whereas the beta error you have to reject 178 00:20:17,809 --> 00:20:28,590 H0 but you do not reject H0 and that is also called the power of the test, the beta error 179 00:20:28,590 --> 00:20:34,330 is also called the power of the test. So these 2 terms are very, very important when you 180 00:20:34,330 --> 00:20:39,019 are deciding on the alpha error and the beta error. 181 00:20:39,019 --> 00:20:47,229 Let us get into problems, the first problem is called the 1 sample t-test one-sided. The 182 00:20:47,229 --> 00:20:53,239 average size of barnacle shells is 25 mm, you know what is barnacle right? It is a marine 183 00:20:53,239 --> 00:21:02,749 organism, it is got a shell, it gets attached using glue to hard surfaces and then it feeds 184 00:21:02,749 --> 00:21:10,380 on it. So it is got a shell and average size is 25 mm. Now we have collected 10 barnacles 185 00:21:10,380 --> 00:21:16,960 in South India and we got their sizes, these are the sizes. Now are the South Indian barnacles 186 00:21:16,960 --> 00:21:22,909 of smaller size? That is the question or they are of greater size? So we take 95 % confidence. 187 00:21:22,909 --> 00:21:32,110 So this is the South Indian barnacles, the average comes out to be 24.4 but the population 188 00:21:32,110 --> 00:21:40,979 average is 25 mm, this is the statement. Now I want to know whether the this 24.4 is statistically 189 00:21:40,979 --> 00:21:48,850 significantly smaller or it comes from the same population at a p value of 0.05 or a 190 00:21:48,850 --> 00:21:52,499 95 % confidence. So how do we that? 191 00:21:52,499 --> 00:22:01,749 Simple, H0 is equal to mu equal to mu naught, that is mu naught is your original mean of 192 00:22:01,749 --> 00:22:08,090 the population, now they are same but alternate is mu is less than mu naught that means, you 193 00:22:08,090 --> 00:22:14,679 want to know whether the average size of the barnacle shell from South India is less than 194 00:22:14,679 --> 00:22:21,419 this 25. So what do you do, you calculate t, if you remember this equation we had this 195 00:22:21,419 --> 00:22:32,960 mu is equal to x bar plus or minus t into s by square root of n. We rearrange that to 196 00:22:32,960 --> 00:22:39,899 get this t value, x bar minus mu naught divided by s square root of n. It is called degrees 197 00:22:39,899 --> 00:22:47,859 of freedom n minus 1, if t which we calculate is less than the table t. So there is a table 198 00:22:47,859 --> 00:22:54,870 t for different degrees of freedom accept H naught, if t is greater than t then the 199 00:22:54,870 --> 00:23:01,639 table t then reject H naught this is called the, of course we are using 1 tail test. 200 00:23:01,639 --> 00:23:09,399 So, what do we do? We get the average which is 24.4; we get the standard deviation which 201 00:23:09,399 --> 00:23:21,309 is 2.59. So the t calculated we can use here 24.4 minus 25 divided by s divided by square 202 00:23:21,309 --> 00:23:28,009 root of n. So we get the t calculated and minus 0.73 for 9 degrees of freedom t table 203 00:23:28,009 --> 00:23:29,940 there is a t table here. 204 00:23:29,940 --> 00:23:34,929 I want you to look here for 95 % this is a two-tail test and the top 1 is the single 205 00:23:34,929 --> 00:23:39,390 tail test, in your problem we are talking about single tail test because we want to 206 00:23:39,390 --> 00:23:44,729 know whether this South Indian barnacles are of smaller size. So for 9 degrees of freedom 207 00:23:44,729 --> 00:23:51,909 go like this, you go like this and read out 0.05, here you get for 9 degrees of freedom 208 00:23:51,909 --> 00:24:02,219 1.833. So t table is 1.833, the t calculated is minus 0.737. So t calculated is less than 209 00:24:02,219 --> 00:24:08,500 the t table so there is no reason for you to reject h naught at this condition. So what 210 00:24:08,500 --> 00:24:16,799 you can say is the South Indian barnacle, there is no statistical reason for saying, 211 00:24:16,799 --> 00:24:20,860 the South Indian barnacles are of smaller size. So South Indian barnacles come from 212 00:24:20,860 --> 00:24:29,869 the same population of 25 mm. In order to get your confidence limit on the mean which 213 00:24:29,869 --> 00:24:35,509 we calculated from the sample, as you know this equation mu is equal to x bar plus or 214 00:24:35,509 --> 00:24:41,799 minus t df s by square root of n the s by square root of n is called the standard error, 215 00:24:41,799 --> 00:24:48,529 right? I talked about this long time back. Now you need to know that degree t value, 216 00:24:48,529 --> 00:24:56,250 now for these you have 10 data sets obviously the degrees of freedom is 9 you get t as 2.26 217 00:24:56,250 --> 00:25:03,719 and the mean of this sample is 24.4. So 24.4 plus or minus 2.26 the standard deviation 218 00:25:03,719 --> 00:25:13,759 is 2.59 divided by square root of 10, which is equal to 24.4 plus or minus 1.85. So the 219 00:25:13,759 --> 00:25:23,830 confidence limit 95 % confidence limit for the this mean is 22.24 to 26.25. One important 220 00:25:23,830 --> 00:25:29,830 point you need to remember is the t which you calculated will be for two-tailed both 221 00:25:29,830 --> 00:25:35,549 the sides because we are talking about plus or minus that is why we get 2.26. 4 mm as 222 00:25:35,549 --> 00:25:44,210 the mean of the sample with the standard deviation of 2.59 and you want to know whether the mu 223 00:25:44,210 --> 00:25:51,979 which you calculate is equal to mu 0 or the alternate mu is less than mu 0, so for mu 224 00:25:51,979 --> 00:26:00,019 is equal to mu 0. So what do you do, we calculate the t you know t from this equation if you 225 00:26:00,019 --> 00:26:12,820 adjust it is x bar minus mu divided by s square root of n. So you get minus 0.7377 for 9 degrees 226 00:26:12,820 --> 00:26:20,470 of freedom we have from the table, I showed you t table that top one is for single tailed, 227 00:26:20,470 --> 00:26:26,909 the bottom one is for two-tailed so for 5 and p of 0.05, 9 degrees of freedom you get 228 00:26:26,909 --> 00:26:32,909 1.833. So obviously, the t which you have calculated is much less so there is no reason 229 00:26:32,909 --> 00:26:41,659 for you to reject the null hypothesis. For 95 % confidence we say 24.4 plus or minus 230 00:26:41,659 --> 00:26:54,450 1.96 s is equal to given here 2.59 divided by square root of n. So you get the confidence 231 00:26:54,450 --> 00:27:00,609 limit for the mean a 22.82 to 26 so obviously your 25 falls within that that is why you 232 00:27:00,609 --> 00:27:06,330 are not able to reject the null hypothesis at this condition. So this table is very important 233 00:27:06,330 --> 00:27:14,039 as we can see this table gives you for different degrees of freedom the top 1 for single tail, 234 00:27:14,039 --> 00:27:20,840 the bottom 1 is for two-tail, So if I am interested in a single tail I will use 95 % this column, 235 00:27:20,840 --> 00:27:25,409 if I am interested in two-tail 95 % I use this column. As you go down, down, down as 236 00:27:25,409 --> 00:27:33,419 you can see for infinite degrees of freedom we get 1.96. So, as I said two-tailed 0.05 237 00:27:33,419 --> 00:27:41,270 means a single tailed 0.025 because when you say 95 % two-tail the tails are divided half 238 00:27:41,270 --> 00:27:46,739 on both the sides that is why you get 0.05 by 2 which is 0.025 here. Do you understand? 239 00:27:46,739 --> 00:27:51,719 Do you understand the logic of this particular table, this table is very important when you 240 00:27:51,719 --> 00:28:00,129 are calculating t test, when you are calculating t based on means, whether it is 1 sample t-test 241 00:28:00,129 --> 00:28:04,279 the example which we saw or later on we are going to look at 2 sample t-test, paired t 242 00:28:04,279 --> 00:28:14,359 tests and so on. So this table is very important. So that you calculate t from the equation 243 00:28:14,359 --> 00:28:21,739 and then you compare with the t in the table for a different degrees of freedom, and then 244 00:28:21,739 --> 00:28:28,820 you say whether the t calculated is greater than t table, if it is greater than you have 245 00:28:28,820 --> 00:28:34,059 you can reject the null hypothesis, but if it is less we calculated less than the t table 246 00:28:34,059 --> 00:28:38,029 we cannot reject the null hypothesis. 247 00:28:38,029 --> 00:28:46,869 Now we can also use the particular software which I mentioned about the GraphPad software, 248 00:28:46,869 --> 00:28:55,880 which can also calculate the t value, given the probability value you can see for a 1 249 00:28:55,880 --> 00:29:02,690 tail it is 1.833. So you compare with the t which you calculated, which is minus 0.737. 250 00:29:02,690 --> 00:29:12,159 So there is no reason for you to reject the null hypothesis, but if I take t value the 251 00:29:12,159 --> 00:29:18,730 minus 0.737. And then the same software can be used to calculate the p value, the p comes 252 00:29:18,730 --> 00:29:27,659 out to be 0.4799. So obviously, you can say it is not statistically significant at all. 253 00:29:27,659 --> 00:29:32,630 It should have been 0.05 or less than only we can call it a statistically significant 254 00:29:32,630 --> 00:29:38,029 difference. So it is very useful for calculating this. 255 00:29:38,029 --> 00:29:48,229 Or we can also use 1 sample t-test results, again the GraphPad software can do this and 256 00:29:48,229 --> 00:29:53,850 again the results as you can see it gives you a p value of this the difference is considered 257 00:29:53,850 --> 00:29:59,889 to be not statistically significant. The actual mean of the sample is 24.4, the hypothetical 258 00:29:59,889 --> 00:30:09,520 mean you want to reach is 25.0 and so on actually. So we will see how to do this, it is quite 259 00:30:09,520 --> 00:30:19,789 simple. We have this data set; I will show you how to do this. 260 00:30:19,789 --> 00:30:36,750 So here we have the continuous data here we need to use this particular thing, here you 261 00:30:36,750 --> 00:30:46,190 have the one sample t-test as you can see here one sample t-test you can say continue. 262 00:30:46,190 --> 00:30:53,039 So we can even enter data like this or we can copy paste like this. So we can enter 263 00:30:53,039 --> 00:31:03,830 the data like that also. So I am comparing it with respect to 25, 25 is my population. 264 00:31:03,830 --> 00:31:17,239 Now I want to know the sample which is equal to 22 and so on. So I can copy this, copy 265 00:31:17,239 --> 00:31:32,529 this and I go to my I paste it here or if it does not get pasted. So obviously 266 00:31:32,529 --> 00:32:21,119 we can write, so we can write 22, 23, 22, 25, 28. So we say it 22, then we say 23, then 267 00:32:21,119 --> 00:33:00,529 22, 25, 28, 25, 28, 27, then again we put 28, then again we put 25, then we put 23. 268 00:33:00,529 --> 00:33:09,080 So we have 10 data points we can put this here and then we can say you calculate now. 269 00:33:09,080 --> 00:33:14,849 So here we put 25 is the global average which you are interested in so we can say you calculate 270 00:33:14,849 --> 00:33:26,139 now. So by conventional the difference is considered to be not statistically significance, 271 00:33:26,139 --> 00:33:33,320 because the p value is coming out to be 0.4826. If we get p value less than, because the t 272 00:33:33,320 --> 00:33:42,139 value is 0.732 where as you want 1.833. So obviously it is not statistically significant 273 00:33:42,139 --> 00:33:49,269 different. So you can use the GraphPad software also to calculate, we can put in your data 274 00:33:49,269 --> 00:33:54,249 and you can use the GraphPad software also to perform this type of calculation. 275 00:33:54,249 --> 00:34:03,379 It is quite useful software and this problem is quite simple. So you have the population 276 00:34:03,379 --> 00:34:10,690 mean and you have the sample. So from the sample you calculate the mean, from this sample 277 00:34:10,690 --> 00:34:14,700 you calculate the standard deviation and then you calculate the standard error which is 278 00:34:14,700 --> 00:34:21,260 given is by s by square root of n, and then you know you can get the t value and then 279 00:34:21,260 --> 00:34:29,619 for 9 degrees of freedom you make use of this particular table. And for a single tailed 280 00:34:29,619 --> 00:34:37,010 test you use this, you go like this for 9 degrees of freedom you get 1.833, whereas 281 00:34:37,010 --> 00:34:45,419 t calculated is 0.7377. So obviously, we have no reason for rejecting the null hypothesis 282 00:34:45,419 --> 00:34:50,639 that is what it is. So I showed you how to calculate from the GraphPad software also 283 00:34:50,639 --> 00:34:55,269 actually. We will continue more on this one sample t-test as we go along. 284 00:34:55,269 --> 00:34:59,939 Thank you very much for your time.