1
00:00:18,070 --> 00:00:35,100
Hello, good morning. Today we will discuss
univariate statistical topic estimation. Estimation
2
00:00:35,100 --> 00:00:38,519
comes under univariate statistics.
3
00:00:38,519 --> 00:00:46,690
Today’s content is we will start with what
4
00:00:46,690 --> 00:00:58,000
is estimation? Then I will tell you the different
types of estimation like for single population
5
00:00:58,000 --> 00:01:06,280
mean for single population, variance, confidence
intervals for the difference between two population,
6
00:01:06,280 --> 00:01:13,640
means confidence interval for the ratio of
two population variances followed by references.
7
00:01:13,640 --> 00:01:21,490
Now, if you see this slide estimation has
two parts.
8
00:01:21,490 --> 00:01:33,560
One is point estimation, another one is interval
estimation. So, under point estimation we
9
00:01:33,560 --> 00:01:39,240
will be discussing about that point estimation
of mean, and point estimation of standard
10
00:01:39,240 --> 00:01:48,420
deviation or we can say that point estimation
of variance that is the square of standard
11
00:01:48,420 --> 00:01:58,890
variation, that we will be discussing. And
under interval estimation here we will be
12
00:01:58,890 --> 00:02:09,450
discussing for first for single population,
that is confidence interval for mean confidence
13
00:02:09,450 --> 00:02:22,700
interval for mean and confidence interval
for variance. We will also discuss today that
14
00:02:22,700 --> 00:02:34,630
interval estimation for difference of means
between two population, two populations here
15
00:02:34,630 --> 00:02:46,010
also difference CI confidence interval for
differences, difference of means between two
16
00:02:46,010 --> 00:03:00,139
population, and ratio of confident interval
for the ratio of two population variances,
17
00:03:00,139 --> 00:03:09,280
two population variances, okay?
Essentially what will happen here? Ultimately
18
00:03:09,280 --> 00:03:14,570
you will find out when you talk about the
interval estimation, the intel logic, the
19
00:03:14,570 --> 00:03:22,060
logic remain same whether we will go for the
single population or two population and the
20
00:03:22,060 --> 00:03:29,850
difference you will find in the little bit
in the computation. So, if you have n observations
21
00:03:29,850 --> 00:03:41,669
x, x 1, x 2 like this x n, n observation all
of you know that the mean, the estimate of
22
00:03:41,669 --> 00:03:50,350
mean is the average of the sample data. So,
if I say x bar is an estimate of mu, then
23
00:03:50,350 --> 00:03:59,449
that you all know that this is i equal to
1 to n x i. So, what do we say that x bar
24
00:03:59,449 --> 00:04:08,139
is the estimate of population mean.
Similarly, we will calculate variance, sample
25
00:04:08,139 --> 00:04:15,220
variance which we say that the estimate of population variance which will be n minus
26
00:04:15,220 --> 00:04:29,250
1 sum total of i equal to 1 to n x i minus
x bar square. When you compute like this that
27
00:04:29,250 --> 00:04:37,960
you collect a sample and compute x bar s square from the sample, this is your point estimate.
28
00:04:37,960 --> 00:04:46,790
So, x bar is the point estimate of mu s square is the point estimate of sigma square. Now,
29
00:04:46,790 --> 00:04:58,180
as we have discussed in last class that what
will happen when I go for several samples
30
00:04:58,180 --> 00:05:00,290
collected from a population.
31
00:05:00,290 --> 00:05:14,120
I told you in the last class this is my population
and if I go for several sample collected so
32
00:05:14,120 --> 00:05:23,440
for example, you have collected sample 1,
sample 2, then sample n, all of the samples
33
00:05:23,440 --> 00:05:31,260
with size n one, size n equal sample size
n and here also n and if you calculate the
34
00:05:31,260 --> 00:05:39,220
point estimate that will be x 1 bar, x 2 bar
like this x n bar and we have seen that this
35
00:05:39,220 --> 00:05:49,580
x bar if I write in this vector form that
x 1 bar, x 2 bar, let x n bar. So, this follow
36
00:05:49,580 --> 00:05:56,560
certain distribution, last class we have seen
this follow certain distribution.
37
00:05:56,560 --> 00:06:09,860
So, if x bar follow certain distribution.
Now, you are collecting one sample and computing
38
00:06:09,860 --> 00:06:16,800
x bar, what is the guarantee that the computed
x bar will be representing the population
39
00:06:16,800 --> 00:06:27,370
mean? So, we want to have certain amount of
confidence in our estimate. So, that confidence
40
00:06:27,370 --> 00:06:36,190
is known as confidence interval, by confidence
interval what do we mean? We mean that suppose
41
00:06:36,190 --> 00:06:47,870
the distribution of x bar is like this, this
is my pdf of x bar and all of us know now
42
00:06:47,870 --> 00:06:53,920
that expected value of x bar will be mu because
this is the property of unbiased estimation.
43
00:06:53,920 --> 00:07:02,000
So, then your mu is coming here.
Now, let us talk about the sample 1 and you
44
00:07:02,000 --> 00:07:10,780
have computed x bar using S 1 and it is falling
here, the value of x 1 bar is falling here.
45
00:07:10,780 --> 00:07:20,760
So, what is our interest here using confidence?
We want to know that whether this x 1 bar
46
00:07:20,760 --> 00:07:28,310
or the x bar collected using sample 1 is representative
of mu or not. It all depends on the distance
47
00:07:28,310 --> 00:07:40,889
between this two.
If x bar is far away from mu then it can it
48
00:07:40,889 --> 00:07:46,980
will not be a representative 1. So, in order
to know whether the x bar contains the interval
49
00:07:46,980 --> 00:07:56,760
of x bar contains mu or not, we will go for
first identifying the what is the distribution
50
00:07:56,760 --> 00:08:01,470
sampling distribution that is applicable.
And using this sampling distribution, we will
51
00:08:01,470 --> 00:08:08,680
generate a interval and we want to find interval
for mu, that is a population mean and we want
52
00:08:08,680 --> 00:08:14,120
to find out that whether that interval contains mu or not.
53
00:08:14,120 --> 00:08:21,870
Now, come back to the slide here. What will
happen ultimately? Population can be normal
54
00:08:21,870 --> 00:08:31,270
and non normal. Now, of you sample from normal
population with sigma known and sample size
55
00:08:31,270 --> 00:08:40,019
whether small or large that is not a problem,
not a question issue at all. Then you you
56
00:08:40,019 --> 00:08:47,600
will fall a z distribution, then what quantity
will follow z distribution? I will discuss,
57
00:08:47,600 --> 00:08:53,720
but you please remember that if your population
is normal and sigma is known irrespective
58
00:08:53,720 --> 00:09:04,110
of the sample size. The statistics will generate
for x bar, which is which is basically z equal
59
00:09:04,110 --> 00:09:10,790
to x bar minus expected value of x bar by
sigma x bar.
60
00:09:10,790 --> 00:09:18,910
These quantity follows that this x bar minus
expected value of x bar by sigma x bar follows
61
00:09:18,910 --> 00:09:29,339
z distribution if you sample from normal population,
irrespective of the sample size. But sigma
62
00:09:29,339 --> 00:09:36,570
must be known, if sigma is unknown your sample
size is large then this quantity follows again
63
00:09:36,570 --> 00:09:44,089
z distribution, but if sigma that sample size
is small and sigma is unknown then this quantity
64
00:09:44,089 --> 00:09:50,290
follows t distribution.
Now, if you sample from non normal population
65
00:09:50,290 --> 00:09:57,870
when sigma is known and your sample size is
large then this quantity again the same quantity,
66
00:09:57,870 --> 00:10:04,560
this quantity follows z distribution even
if sigma is unknown is also, but sample size
67
00:10:04,560 --> 00:10:13,040
is large that is again z. But other two cases
when sample size is less than 30 that is the
68
00:10:13,040 --> 00:10:22,220
small sample size, then no parametric distribution
possible, okay? So, whether you will use t
69
00:10:22,220 --> 00:10:30,990
or z distribution, the mathematics and the
procedures remain same. Only you have to use
70
00:10:30,990 --> 00:10:33,920
z table or t table.
71
00:10:33,920 --> 00:10:40,850
Let us see here what is what way you can calculate
the confidence interval, you see what we have
72
00:10:40,850 --> 00:10:47,830
said, first we collect data. Then find out
the statistic what you want to compute, that
73
00:10:47,830 --> 00:10:54,580
is the estimate of mean population, mean is
x bar, estimate of population standard deviation
74
00:10:54,580 --> 00:11:07,000
is s. Then you choose a particular alpha value
and we know that probability that this quantity
75
00:11:07,000 --> 00:11:09,460
what the statistic what we generated.
76
00:11:09,460 --> 00:11:18,180
That x bar minus expected value of x bar divided
by sigma x bar, this one probability that
77
00:11:18,180 --> 00:11:24,600
this value will be greater than l and less
than or greater than equal to l and less than
78
00:11:24,600 --> 00:11:36,660
equal to u, that will be 1 minus alpha, okay?
So, if your distribution this one is nothing
79
00:11:36,660 --> 00:11:44,810
but z that l less than equal to z less than
equal to u, when some the condition satisfied
80
00:11:44,810 --> 00:11:52,050
like from normal population irrespective sample
s sigma is known then what we will write basically?
81
00:11:52,050 --> 00:12:03,060
We will write z equal to x bar minus mu by
sigma by root n, the sigma x bar is sigma
82
00:12:03,060 --> 00:12:10,120
by root n. We have seen in the last class
and expected value x bar is this.
83
00:12:10,120 --> 00:12:17,640
So, essentially what you can write that this
is normally distributed unit, normal distribution,
84
00:12:17,640 --> 00:12:24,459
this is my unit normal distribution and there
is one value which is the lower value. We
85
00:12:24,459 --> 00:12:29,600
are expecting considering, another one is
the upper value you are considering and what
86
00:12:29,600 --> 00:12:37,470
we are saying the probability, that this z
value lies in between l and u and that is
87
00:12:37,470 --> 00:12:46,180
the confidence interval which is 100 into
1 minus alpha percent CI.
88
00:12:46,180 --> 00:12:57,040
What is this alpha? How do you determine this
alpha? Alpha is known as significance level,
89
00:12:57,040 --> 00:13:08,959
this alpha when our, this our this x bar minus
mu sigma by root n that is z distributed,
90
00:13:08,959 --> 00:13:15,470
it is a two tailed case. So, left hand side
and right hand side will be that probability
91
00:13:15,470 --> 00:13:22,680
value will be equally divided. So, the portion
here is alpha by 2, here is alpha by 2. So,
92
00:13:22,680 --> 00:13:27,240
significant level of significance or significance
level alpha what it indicates? It indicates
93
00:13:27,240 --> 00:13:35,440
that if I consider that l to u that is the
confidence interval then what is the error
94
00:13:35,440 --> 00:13:45,360
you are consuming? That error is alpha, what
is error the probability that that true mean
95
00:13:45,360 --> 00:13:53,600
lies in this portion that is basically alpha
percent probability, okay?
96
00:13:53,600 --> 00:14:05,839
So, now how do we get this value? This l and
u what will be l and u? It all depends on
97
00:14:05,839 --> 00:14:17,430
what will be your alpha. If we consider alpha
equal to 0.05, then alpha by 2 is 0.025, then
98
00:14:17,430 --> 00:14:23,660
what do you require to know? Now, that you
have to find this z alpha by 2, this is your
99
00:14:23,660 --> 00:14:29,810
z alpha by 2 and this left hand side this
value will be minus z alpha by 2. So, you
100
00:14:29,810 --> 00:14:36,740
see the table and find out z alpha by 2 value
and accordingly you compute.
101
00:14:36,740 --> 00:14:43,680
So, then mathematically what is happening
here? Mathematically is my l is minus z alpha
102
00:14:43,680 --> 00:14:51,399
by 2 and this will be less than equal to x
bar minus mu by sigma by root n, that equal
103
00:14:51,399 --> 00:14:59,260
to plus z alpha by 2. Now, if you rearrange
this one what you were getting z alpha by
104
00:14:59,260 --> 00:15:07,750
2 sigma by root n less than equal to x bar
minus mu less than equal to z alpha by 2 sigma
105
00:15:07,750 --> 00:15:13,470
by root n.
Then again, we will manipulate this. So, I
106
00:15:13,470 --> 00:15:20,750
will just bring that because we are looking
for confidence interval of mu. So, what do
107
00:15:20,750 --> 00:15:27,600
you do? We basically separate, we will take
x bar from this, the middle portion then if
108
00:15:27,600 --> 00:15:34,820
I write again I will write like this, minus
x bar minus z alpha by 2 sigma by root n.
109
00:15:34,820 --> 00:15:43,690
This is less than equal to minus mu less than
equal to minus x bar plus z alpha by 2 sigma
110
00:15:43,690 --> 00:15:50,339
by root n, correct?
Very simple manipulation, you are just now
111
00:15:50,339 --> 00:15:56,630
taking out x bar from the middle portion putting
to the left hand, right hand side then if
112
00:15:56,630 --> 00:16:01,019
you little modify. Now, we do not want minus
mu, we want plus mu. So, you are multiplying
113
00:16:01,019 --> 00:16:06,930
it by minus 1. Now, it will just the reverse
will take place what will happen x bar minus
114
00:16:06,930 --> 00:16:15,380
z alpha by 2 sigma by root n less than equal
to mu, less than equal to x bar plus z alpha
115
00:16:15,380 --> 00:16:25,490
by 2 sigma by root n.
So, this formula is applicable when you this
116
00:16:25,490 --> 00:16:36,930
is the confidence interval for mu, 100 into
1 minus alpha percent CI for mu, correct?
117
00:16:36,930 --> 00:16:43,490
So, when you talk about confidence interval
it is definitely for the population parameter,
118
00:16:43,490 --> 00:16:55,339
not for the simple statistic, getting me?
Then once I know this one what will happen
119
00:16:55,339 --> 00:17:02,670
ultimately? If you know these then how do
I know that whether this mu basically contain
120
00:17:02,670 --> 00:17:09,360
this within this, in this interval mean is
contained or not that how do you know? Because
121
00:17:09,360 --> 00:17:12,789
if i it is basically the z value.
122
00:17:12,789 --> 00:17:24,350
I think let us go for a problem first then
suppose this is our problem, this is a problem,
123
00:17:24,350 --> 00:17:29,480
what is this? Musculoskeletal disorder is
a serious problem of crane operators in heavy
124
00:17:29,480 --> 00:17:36,789
industries. MSD in a survey to assess crane
operators MSD, approximately how many times
125
00:17:36,789 --> 00:17:42,129
in a month an operator suffer from body pain was asked. You are asked that this is the
126
00:17:42,129 --> 00:17:50,869
measure of one of the measure of MSD could
measure of MSD that how many times in a month
127
00:17:50,869 --> 00:17:57,580
a random sample of 76 responses yielded a
mean of 7. So, what is our problem here? We
128
00:17:57,580 --> 00:18:02,389
have taken collected n equal to 76.
129
00:18:02,389 --> 00:18:11,840
You have computed x bar which is 7 and standard
deviation that S equal to 4. Let the population
130
00:18:11,880 --> 00:18:18,889
standard deviation is given 3, that is sigma
equal to 3 constructs 90 percent confidence
131
00:18:18,889 --> 00:18:34,330
interval for x bar, your work is construct
95 percent CI for x bar, this is your work.
132
00:18:34,330 --> 00:18:41,659
Now, see when I say 95 percent that means
we are saying that 100 into 1 minus alpha
133
00:18:41,659 --> 00:18:54,179
equal to 95. So, you are getting alpha equal
to 0.05, so what is my alpha by 2? 0.025.
134
00:18:54,179 --> 00:19:00,429
Now, here it is clearly given that sigma is
3, that population standard deviation is known,
135
00:19:00,429 --> 00:19:06,409
and we are assuming that the sample has come
from the normal distribution, population distribution
136
00:19:06,409 --> 00:19:13,970
is normal then this is normal. So, you see
this is normal, assume normal distribution
137
00:19:13,970 --> 00:19:19,649
then what we will use? We will use z distribution
and accordingly our interval will be x bar
138
00:19:19,649 --> 00:19:26,960
minus z alpha by 2 sigma by root n, less than
equal to mu less than equal to x bar plus
139
00:19:26,960 --> 00:19:32,919
z alpha by 2 sigma by root n, every values
are known to you.
140
00:19:32,919 --> 00:19:44,669
So, your computation is be x bar will be 7,
you have to know z, 0.025 into sigma, population
141
00:19:44,669 --> 00:19:56,129
sigma is 3 and your n is 76. So, less than
equal to mu less than equal to 7 plus z 0.025
142
00:19:56,129 --> 00:20:11,970
into 3 by root over 76. All of you know that
that z 0.025, if you see table it is 1.96.
143
00:20:11,970 --> 00:20:23,269
So, I can write further that our interval
is 7 minus 1.963 by root over 76 less than
144
00:20:23,269 --> 00:20:34,429
equal to mu less than equal to 7 plus 1.963
by root 76 and the answer will be 6.33 less
145
00:20:34,429 --> 00:20:45,019
than equal to mu less than equal to 7.67,
this is the confidence interval for mu.
146
00:20:45,019 --> 00:20:50,109
So, that mean essentially what you are getting?
You are not getting a point estimate only,
147
00:20:50,109 --> 00:21:00,379
here you are getting an interval estimate,
point estimate says mu estimate is 7, interval
148
00:21:00,379 --> 00:21:15,080
estimate says it is not 7, it is in between
6.33 to 7.67. So, it is 6.33 to 7.67 that
149
00:21:15,080 --> 00:21:23,019
is the difference between your point estimation
and interval estimation or point estimate
150
00:21:23,019 --> 00:21:30,590
of mu and interval estimate of mu.
What is how to know that whether these estimate
151
00:21:30,590 --> 00:21:42,909
this one, this one contain the mu true mu.
Now, see when we convert the statistic that
152
00:21:42,909 --> 00:21:49,919
x bar to equivalent z by subtracting its mean
and dividing it by standard deviation. So,
153
00:21:49,919 --> 00:22:02,470
what is the mean of z 0, it is 0. Now, here
what is you are getting the interval? 6.3
154
00:22:02,470 --> 00:22:15,909
minus 7.67. So, can you find out some meaning
of that if I convert into z 0 and you are
155
00:22:15,909 --> 00:22:21,600
getting when you are again translating back
to the mu term. What will happen ultimately?
156
00:22:21,600 --> 00:22:37,279
See you are getting positive left hand that
6.33 is also positive and 7.76 is also positive.
157
00:22:37,279 --> 00:22:45,659
You think next class I will explain if I want
to say that that how do I know my question
158
00:22:45,659 --> 00:22:56,109
to you that, how do I know that this interval
contains the mean or not? Last class I also
159
00:22:56,109 --> 00:23:05,529
last but one population mean last but one
I have given you 1 1 similar question also,
160
00:23:05,529 --> 00:23:12,899
but I have not asked this is one question
we will discuss but you must remind me next
161
00:23:12,899 --> 00:23:19,739
class, because I am giving you in the belief
that you will go through the book. Now, next
162
00:23:19,739 --> 00:23:26,279
question here is what will happen if population
standard deviation is not known, you will
163
00:23:26,279 --> 00:23:27,909
use s.
164
00:23:27,909 --> 00:23:35,029
Yes, that sigma square will be s square, population
standard deviation is not known, mean sigma
165
00:23:35,029 --> 00:23:39,720
square estimate that is what we are saying
s square. So, you will be using s square.
166
00:23:39,720 --> 00:23:45,119
Now, here sample size is 76, it is a large
sample because it is greater than 30. So,
167
00:23:45,119 --> 00:23:52,509
you can still use the z distribution but your
change will be here.
168
00:23:52,509 --> 00:23:59,879
What will be your change? You will be using
the same z distribution but instead of sigma,
169
00:23:59,879 --> 00:24:11,229
you are using s. You see in the given problem
sigma and s are not same. So, s equal to 4
170
00:24:11,229 --> 00:24:17,599
sigma, sigma equal to 3. So, you will replace
everything, this will be by 4 by this and
171
00:24:17,599 --> 00:24:23,769
this will be 4 by this. So, ultimately what
will happen? This resultant quantity will
172
00:24:23,769 --> 00:24:33,549
be bigger than the earlier one and the interval
will increase. So, if you compute this you
173
00:24:33,549 --> 00:24:40,840
will be finding out interval will increase.
Now, another question here is that what will
174
00:24:40,840 --> 00:24:49,289
happen if population standard deviation is
not known and sample size in less than 30.
175
00:24:49,289 --> 00:24:52,970
So, you cannot use z distribution.
176
00:24:52,970 --> 00:25:00,940
What is required? Now, your case is going
like this. You will be using t distribution
177
00:25:00,940 --> 00:25:11,190
which is x bar minus mu by s by root n. You
will be using distribution that is why the
178
00:25:11,190 --> 00:25:24,159
reason is n is less than 30. Please keep in
mind when you sample from normal population
179
00:25:24,159 --> 00:25:35,320
with sigma population variance is known, irrespective
of your sample space you will use z distribution.
180
00:25:35,320 --> 00:25:42,739
Population variance is not known but sample
size is large, you will use z distribution.
181
00:25:42,739 --> 00:25:48,190
Population variance is not known, sample size
is small but you are sampling from normal
182
00:25:48,190 --> 00:25:54,539
distribution, that is t you have to use. See
rest of the things are same. Now, in case
183
00:25:54,539 --> 00:26:00,139
of t I told you earlier that there will be
degrees of freedom for t distribution. In
184
00:26:00,139 --> 00:26:04,929
this case this will be n minus 1 degrees of
freedom.
185
00:26:04,929 --> 00:26:14,220
So, let us see the t distribution. So, z this
one I have shown you, so t distribution I
186
00:26:14,220 --> 00:26:20,929
am not kept here. So, you have to use t distribution,
you are getting me? What is required then
187
00:26:20,929 --> 00:26:21,330
here.
188
00:26:21,330 --> 00:26:30,019
Your interval will be like this, x bar minus
t n minus 1 alpha by 2 because t distribution
189
00:26:30,019 --> 00:26:39,590
also 2 del distribution. It is your min value
is and this side and this side it is that
190
00:26:39,590 --> 00:26:46,580
two extremes negative to positive that minus
infinite to plus infinite. Now, then s by
191
00:26:46,580 --> 00:26:58,979
root n less than equal to mu less than equal
to x bar plus t n minus 1 alpha by 2 s by
192
00:26:58,979 --> 00:27:10,989
root n, the same problem if you collect observations
which is less than 30 as well as your population
193
00:27:10,989 --> 00:27:15,799
variance is not known.
You use t distribution and then find out what
194
00:27:15,799 --> 00:27:21,669
is this n value and then find according n,
what is alpha value and accordingly you find
195
00:27:21,669 --> 00:27:28,139
out the t n minus 1 alpha by 2 value and put into this formula. You will be getting the
196
00:27:28,139 --> 00:27:37,049
confidence interval for mean population, mean
I repeat that always confidence interval for
197
00:27:37,049 --> 00:27:48,269
the population parameter, okay? So, same thing
can be applied to population variance also
198
00:27:48,269 --> 00:27:55,559
but in case of population variance your distribution
will be different. What is happening here
199
00:27:55,559 --> 00:27:57,769
in population variance?
200
00:27:57,769 --> 00:28:08,649
Population variance means sigma square. What
do you want? You want your, you have a sample
201
00:28:08,649 --> 00:28:17,629
is collected and s square is calculated, a
sample standard variance is computed your
202
00:28:17,629 --> 00:28:24,799
n is the sample size. So, what do you require
to know? You require to know a confidence
203
00:28:24,799 --> 00:28:31,809
interval based on this sample data. You want
to know the confidence interval for this.
204
00:28:31,809 --> 00:28:37,129
Again what do you want? What will be the you
got this s square value, but what is the lower
205
00:28:37,129 --> 00:28:42,809
value and what is the upper value? Similar
manner you have to find out.
206
00:28:42,809 --> 00:28:48,159
So, that means if I know the distribution.
I can say this is what is this, this is 1
207
00:28:48,159 --> 00:29:05,090
minus alpha, in true sense you want to do
like this, that is what you want basically,
208
00:29:05,090 --> 00:29:11,899
but you have your this value, what is the
computed statistic value is s square and we
209
00:29:11,899 --> 00:29:17,219
have seen under sampling distribution that
s square. It is not s square, it is n minus
210
00:29:17,219 --> 00:29:27,669
1 s square by sigma square follows with distribution,
chi square distribution. What will be the
211
00:29:27,669 --> 00:29:33,389
degrees of freedom? n minus 1. So, n minus
1 degrees.
212
00:29:33,389 --> 00:29:39,509
So, you know chi square distribution because
we have seen chi square distribution depending
213
00:29:39,509 --> 00:29:47,009
on degrees of freedom. It will of different
shape, one may be this is the chi square distribution,
214
00:29:47,009 --> 00:29:55,070
what you and this one is the PDF of chi square,
which is basically in this case n minus 1
215
00:29:55,070 --> 00:30:05,309
s square by sigma square. And you want to
find out a upper value, that is u and a lower
216
00:30:05,309 --> 00:30:12,129
value, that is l for n minus 1 square by sigma
square.
217
00:30:12,129 --> 00:30:19,109
So, again if I consider alpha that this one
is basically this particular, this side it
218
00:30:19,109 --> 00:30:27,029
is let it be that alpha by 2 and this one
what will happen? Total is 1, 1 minus alpha
219
00:30:27,029 --> 00:30:33,690
by 2 you will be getting chi square 1 minus
alpha by 2, but please keep in mind chi square
220
00:30:33,690 --> 00:30:40,149
also having a degree of freedom, that is alpha
by 2 and degree of freedom is what n minus
221
00:30:40,149 --> 00:30:50,429
1 n minus 1, getting me?
So, then what you will write? Then you will
222
00:30:50,429 --> 00:30:58,389
write n minus 1 s square by sigma square.
It must be less than equal to chi square alpha
223
00:30:58,389 --> 00:31:09,029
by 2 n minus 1 as well as it must be greater
than equal to chi square 1 minus alpha by
224
00:31:09,029 --> 00:31:19,440
2 n minus 1. You see these two, this is left
from a interval point of view and this total
225
00:31:19,440 --> 00:31:26,509
it is in between whatever it is there, that
is what is our 100 into 1 minus alpha percent
226
00:31:26,509 --> 00:31:36,749
CI. For the variance this is my but what do
you want here in this equation? What do you
227
00:31:36,749 --> 00:31:43,359
want? You want something like this, something
like l less than equal to sigma square less
228
00:31:43,359 --> 00:31:50,639
than equal to u. Can you now find out that
what will be can you not manipulate this?
229
00:31:50,639 --> 00:31:55,179
You can easily manipulate.
So, what will be the once you manipulate this
230
00:31:55,179 --> 00:32:00,249
that mean what will happen? You will want
to keep in between the less than equal to
231
00:32:00,249 --> 00:32:08,879
terms, only the sigma square. So, if you manipulate
you will be getting like this, n minus 1 s
232
00:32:08,879 --> 00:32:20,549
square by chi square alpha by 2 n minus 1.
You will get this s alpha by 2 n minus 1 less
233
00:32:20,549 --> 00:32:28,669
than equal to sigma square less than equal
to n minus 1 square by chi square alpha by
234
00:32:28,669 --> 00:32:39,419
2 n minus 1.
You see the here you see the denominator.
235
00:32:39,419 --> 00:32:47,969
Here is n minus 1 into s square, here also
n minus 1 into s square, same quantity the
236
00:32:47,969 --> 00:32:54,109
difference is in the new, sorry in the numerator.
Both are like n minus 1 into s square but
237
00:32:54,109 --> 00:33:01,109
in the denominator that is s square chi square
alpha by 2 n minus 1. What value is this?
238
00:33:01,109 --> 00:33:09,539
This is this value and in the right hand side this value is other one and you can find out
239
00:33:09,539 --> 00:33:15,700
that this is the chi square axis. So, definitely
this value is less than this value, so that
240
00:33:15,700 --> 00:33:21,409
mean n minus 1 s square by chi square alpha
by 2 n minus 1 is definitely less than n minus
241
00:33:21,409 --> 00:33:29,399
1 s square by chi square alpha by 2 n minus
1, and that is the interval, okay?
242
00:33:29,399 --> 00:33:41,320
Now, given data what you will do? Suppose
this is the data, can you not compute this
243
00:33:41,320 --> 00:33:50,269
a company manufactures worm wheels for worm
gears, one of the critical to quality variable
244
00:33:50,269 --> 00:33:55,879
is hardness which is normally distributed.
The quality control engineer wants to control
245
00:33:55,879 --> 00:34:02,799
its variability, a random sample of 30 worm
wheels are tested that yielded mean, hardness
246
00:34:02,799 --> 00:34:11,659
of 100 which is measured using Brinell hardness
number with standard deviation of 5, develop
247
00:34:11,659 --> 00:34:18,639
90 percent confidence interval for the population sigma.
248
00:34:18,639 --> 00:34:28,720
What you will do? You will all what is n value,
here n equal to 30. What is s square population?
249
00:34:28,720 --> 00:34:34,120
That means you have collected a sample of 30 with standard deviation of 5. So, that
250
00:34:34,120 --> 00:34:41,570
mean this is 5 square, correct? Then what
more you want? You want, you want nothing
251
00:34:41,570 --> 00:34:47,740
only one thing you want to know that is what
is alpha. So, we are saying 90 percent confidence
252
00:34:47,740 --> 00:34:52,490
interval.
So, alpha equal to 0 1, that is 1 minus 0.9,
253
00:34:52,490 --> 00:35:02,630
that means 0.10. So, your alpha by 2 is 0.05,
you want to calculate chi square 2 value,
254
00:35:02,630 --> 00:35:16,780
chi square alpha by 2 n minus 1, where n is
30 that is 29 and alpha is your that alpha
255
00:35:16,780 --> 00:35:21,080
by 2 is 0.05. That is chi square 0.05, that
is chi square 0.05 29, you require to find
256
00:35:21,080 --> 00:35:29,710
and get as well as one more value you want
to know that is chi square 1 minus alpha by
257
00:35:29,710 --> 00:35:43,770
2 with again same n minus 1. So, that mean
chi square 0.95 29.
258
00:35:43,770 --> 00:35:52,030
If you know these two value then put into
this equation, s square is known 25, n minus
259
00:35:52,030 --> 00:36:02,570
1 is 29, chi square that is 0.05 29 chi square
0.95 29, you have to find out the chi value.
260
00:36:02,570 --> 00:36:12,640
See that table, you see this table that in
this table, our alpha value a alpha by 2 is
261
00:36:12,640 --> 00:36:22,290
0.05. So, we want to first know that chi square,
our degree of freedom is 29 and what is this
262
00:36:22,290 --> 00:36:34,230
value? That value is 42.50. So, chi square
our 29 0.05, which one is this, chi square
263
00:36:34,230 --> 00:36:38,000
29 0.05 that is 42.56.
264
00:36:38,000 --> 00:36:49,330
Similarly, chi square that 99 percent, that
is 17.71. So, once you know these two values,
265
00:36:49,330 --> 00:36:57,140
17.71 and these values you know your computation
becomes very simple, n minus 1 that is 29
266
00:36:57,140 --> 00:37:07,850
into s square is 25, divided by chi square.
That one, that mean 42.56 less than equal
267
00:37:07,850 --> 00:37:22,870
to sigma square less than equal to 29 into
25 by 17.71 is in the formula, then what is
268
00:37:22,870 --> 00:37:30,610
the result? Ultimate result will be like this.
So, you will be getting 17.03 less than equal
269
00:37:30,610 --> 00:37:40,440
to sigma square less than equal to 40.94 and
if you go by sigma, then square root of this
270
00:37:40,440 --> 00:37:49,750
4.13 less than equal to sigma, less than equal
to 6.40, that is what is interval estimation.
271
00:37:49,750 --> 00:37:56,780
So, if you really want to know what, how to
go about interval estimation, please keep
272
00:37:56,780 --> 00:38:05,530
in mind you must know the statistic. You are
interested to know the interval estimation
273
00:38:05,530 --> 00:38:10,860
for a population parameter. First you know
the population parameter, you must know the
274
00:38:10,860 --> 00:38:18,250
corresponding sample statistic, you also know
that what will be the basically the statistics
275
00:38:18,250 --> 00:38:23,790
for which you want to develop the sampling
distribution. If it is x bar then you are
276
00:38:23,790 --> 00:38:33,250
converting into z or t, if it is your sigma
square then you are converting into appropriate
277
00:38:33,250 --> 00:38:37,760
statistic n minus 1 s square by sigma square
which follows chi square distribution.
278
00:38:37,760 --> 00:38:44,210
So, unless you do not know the distribution
as well as the statistics required you cannot
279
00:38:44,210 --> 00:38:49,340
go about interval estimation, and what is
the advantage of interval estimation? As I
280
00:38:49,340 --> 00:38:53,730
told you instead of a point estimate you are
getting an interval, and what do you mean
281
00:38:53,730 --> 00:38:59,280
the 95 percent case, the mean value will lie
within this interval. For example, in the
282
00:38:59,280 --> 00:39:05,630
this example variance case what we are saying
the variance or standard deviation whatever
283
00:39:05,630 --> 00:39:08,260
you consider.
Suppose, variance is 95 percent, 90 percent
284
00:39:08,260 --> 00:39:18,090
of the cases this sigma value will lie, basically
90 percent we are confident that sigma value
285
00:39:18,090 --> 00:39:22,480
population sigma square value is in between
this somewhere. It is there sir logically
286
00:39:22,480 --> 00:39:39,990
point estimation value will be inside the
interval. Basically, if you want to understand
287
00:39:39,990 --> 00:39:46,010
interval that fast whether interval estimation,
what is the meaning of this? Suppose, there
288
00:39:46,010 --> 00:39:53,410
is population parameter theta and you have
estimated theta cap using some sampling distribution.
289
00:39:53,410 --> 00:39:59,700
Now, let the theta cap has a particular distribution.
Then what you are doing? You are basically
290
00:39:59,700 --> 00:40:07,380
doing like this, less than equal to theta
less than equal to u, that you are doing which
291
00:40:07,380 --> 00:40:13,780
is 1 minus alpha, using the theta cap value
that range you are getting theta cap value.
292
00:40:13,780 --> 00:40:19,490
Now, let us concentrate consider the theta
is mu that is the mean value, there theta
293
00:40:19,490 --> 00:40:23,640
is mu.
Suppose, you have collected a sample with
294
00:40:23,640 --> 00:40:31,290
n size and you have computed the x bar and
you found out that x bar interval is like
295
00:40:31,290 --> 00:40:38,780
this. This is your sample 1 and your x 1 bar
its interval you found out, because once you
296
00:40:38,780 --> 00:40:43,850
collect a sample you know x bar. You can calculate
the interval now, second one suppose like
297
00:40:43,850 --> 00:40:52,720
this, third one suppose like this, fourth
one like this, suppose fifth one is like this,
298
00:40:52,720 --> 00:40:58,660
suppose sixth one is like this, seventh one,
eighth one like this. Let all other suppose
299
00:40:58,660 --> 00:41:06,030
you have collected 20 samples, 20 samples
of size n and you computed, you have found
300
00:41:06,030 --> 00:41:12,260
out that that this is the true mean. Hypothetically,
we are assuming this is my true mean and 19
301
00:41:12,260 --> 00:41:18,570
sample contain this mean because the interval
contain this means, this is the constant value
302
00:41:18,570 --> 00:41:23,010
that mean.
Now, first sample the interval contain mean,
303
00:41:23,010 --> 00:41:30,240
second sample contain mean, but third one
no. So, out of 29 contain mean that means
304
00:41:30,240 --> 00:41:41,320
it is 95 percent that is the message, that
mean what we are saying when you collect the
305
00:41:41,320 --> 00:41:46,270
sample, it does not mean that that contain
mean. But you were getting interval but that
306
00:41:46,270 --> 00:41:53,220
this in the total method is such that it says
that there is 95 percent chance that it will
307
00:41:53,220 --> 00:41:59,820
contain the mean, it may not contain, that
chance is 5 percent. That is why we are saying
308
00:41:59,820 --> 00:42:10,450
confidence interval. Actually we will not
go for 20 samples, but then if you go, if
309
00:42:10,450 --> 00:42:20,820
you find like this that is the case then chi
square, that we have seen this.
310
00:42:20,820 --> 00:42:28,200
Then confidence interval for two population,
difference between two population mean. What
311
00:42:28,200 --> 00:42:35,080
you will do here? See the entire procedure
remain same, only you have to know what is
312
00:42:35,080 --> 00:42:42,060
the random variable, what is the statistic
and what is the distribution sampling distribution.
313
00:42:42,060 --> 00:42:49,800
If you know then your work is over. For example,
what we are now creating one variable here.
314
00:42:49,800 --> 00:42:58,030
Suppose, you want the difference between two
population means, this is my difference between
315
00:42:58,030 --> 00:43:11,610
two population, two population means. If your
population one is having the mu 1 mean and
316
00:43:11,610 --> 00:43:18,980
population two mean is mu 2, you want to find
a confidence interval for this. What will
317
00:43:18,980 --> 00:43:29,470
be the 1 and u value for which your probability
that the interval contains mu 1 minus mu 2
318
00:43:29,470 --> 00:43:38,290
is 1 minus alpha, getting me?
So, that is the issue. So, if this is the
319
00:43:38,290 --> 00:43:47,110
case then what is at your hand? You have only
x 1 bar from the population one that mean
320
00:43:47,110 --> 00:43:55,780
1, and for this you have x 2 bar and you have
also the difference between x 1 bar and x
321
00:43:55,780 --> 00:44:03,350
2 bar. We have seen earlier that x 1 in any
statistics is a random variable. Now, difference
322
00:44:03,350 --> 00:44:09,020
between the statistic, also a random variable
that mean you want to know. What is the mean
323
00:44:09,020 --> 00:44:13,030
value of x 1 bar minus x 2 bar and standard
deviation of x.