1
00:00:17,900 --> 00:00:38,830
Good afternoon. In this lecture, we will discuss
sampling distribution. In the last lecture,
2
00:00:38,830 --> 00:00:54,489
you have seen that population is characterized
by probability distribution with certain parameters.
3
00:00:54,489 --> 00:01:04,320
Also we have discussed that sample can be
collected from the population and then estimate
4
00:01:04,320 --> 00:01:18,350
of population parameters can be found out
the in terms of x bar, in terms of s square,
5
00:01:18,350 --> 00:01:36,580
x bar and s square is 1 is known as statistic.
And x bar s square collectively you can say
6
00:01:36,580 --> 00:01:46,130
statistics. So, population parameter sample
statistic they are synonymous. When I talk
7
00:01:46,130 --> 00:01:55,390
about population, I talk about population
parameter, parameters talk about sample, the
8
00:01:55,390 --> 00:02:01,390
sample statistic.
Statistic is the estimate of parameter. Now,
9
00:02:01,390 --> 00:02:16,610
if I say that x bar is an estimate of population
parameter mu, then what is the distribution
10
00:02:16,610 --> 00:02:37,340
of x bar? Similarly, you see there, s square
is the sample variance for the variable x
11
00:02:37,340 --> 00:02:42,579
and it is an estimate of sigma square, that
is population parameter. You may be interested
12
00:02:42,579 --> 00:03:02,519
to know the distribution of s square, getting
me? The distribution of a statistic is known
13
00:03:02,519 --> 00:03:12,439
as sampling distribution. So, when we talk
about sampling distribution we talk about
14
00:03:12,439 --> 00:03:22,239
the distribution of a statistic computed from
the sample that distribution of that.
15
00:03:22,239 --> 00:03:29,120
Now, here x bar is a sample statistic. What
is the distribution of x bar? Similarly, s
16
00:03:29,120 --> 00:03:37,689
square is a sample statistic, what is the
distribution of s square under sampling distribution?
17
00:03:37,689 --> 00:03:48,730
We will discuss these two, but please keep
in mind in general if we say a theta is equal
18
00:03:48,730 --> 00:03:57,139
to. Let it be theta 1, theta 2 like suppose
theta k, there are k parameters for theta.
19
00:03:57,139 --> 00:04:04,779
Then definitely, suppose your phi is the sample
size of statistic. Then corresponding to every
20
00:04:04,779 --> 00:04:17,090
population parameter, there will be sample
statistic be there. Now, by sampling distribution
21
00:04:17,090 --> 00:04:22,019
we say, what is the distribution of this statistic.
What is the distribution of this statistic,
22
00:04:22,019 --> 00:04:29,290
what is the distribution of this statistic.
It may be univariate, it may be multivariate.
23
00:04:29,290 --> 00:04:35,690
So, if you are interested to know collectively,
what is the distribution of this statistic
24
00:04:35,690 --> 00:04:44,919
vector k e cross 1 vector, then it will be
multivariate distribution, probability distribution.
25
00:04:44,919 --> 00:04:49,909
Essentially, by sampling distribution you
are talking about probability distribution
26
00:04:49,909 --> 00:04:57,409
only. But the difference is you are talking
about the distribution of not the variable
27
00:04:57,409 --> 00:05:03,139
characterizing the population, rather the
statistic computed from the sample distribution
28
00:05:03,139 --> 00:05:11,099
of that statistic, clear? It is for example,
we have taken the, if we consider the same
29
00:05:11,099 --> 00:05:22,930
example again, the profit if you see the 12
months data, 1 to 12. So, your profit is there
30
00:05:22,930 --> 00:05:30,939
different values of profit. So, you will calculate
mean profit average per month. My question
31
00:05:30,939 --> 00:05:43,780
here is, what is the distribution of x bar?
So, with this line today’s discussion is
32
00:05:43,780 --> 00:05:46,560
sampling distributions.
33
00:05:46,560 --> 00:05:54,970
And you see that there are unit normal distribution?
z stands for unit normal distribution, the
34
00:05:54,970 --> 00:06:02,310
chi-square distribution, t distribution, F
distribution. These are the most popular,
35
00:06:02,310 --> 00:06:12,990
mostly used, widely used distribution for
sample statistics. Then we will discuss central
36
00:06:12,990 --> 00:06:20,009
limit theorem. Finally, sampling strategy,
because how do you collect the data? It all
37
00:06:20,009 --> 00:06:26,750
depends on what strategy you will be adopting,
because the data collection process if wrong
38
00:06:26,750 --> 00:06:34,770
or faulty analysis will not give you good
result. So, you may be wondering that where
39
00:06:34,770 --> 00:06:35,629
from there these.
40
00:06:35,629 --> 00:06:44,180
Suppose you have collected one sample. For
example, you collected the height of people
41
00:06:44,180 --> 00:06:48,039
for a particular community, suppose n data points.
42
00:06:48,039 --> 00:06:55,990
You have collected 1, 2, dot, dot, dot, n.
Your value x value is x 1, x 2, dot, dot,
43
00:06:55,990 --> 00:07:03,370
dot, dot, x n. You are thinking that I have
computed x bar here and s square here. And
44
00:07:03,370 --> 00:07:09,289
these are all because once you collect data,
this the these are the fixed data. So, your
45
00:07:09,289 --> 00:07:16,430
x bar is also a fixed value, s square is also
a fixed value for a particular sample. Then
46
00:07:16,430 --> 00:07:24,400
where from this distribution is coming? The
distribution concept is coming, because if
47
00:07:24,400 --> 00:07:32,310
you do the same sampling next time, there
is no guarantee that you will get the same
48
00:07:32,310 --> 00:07:41,569
value of x bar, the same value of s square.
That is why we in statistics, all statistics
49
00:07:41,569 --> 00:07:58,639
are or otherwise each statistic is a random
variable statistic.
50
00:07:58,639 --> 00:08:11,370
In this figure, you see this figure that in
sample, n samples, sample 1 to sample n collected
51
00:08:11,370 --> 00:08:19,499
from a particular population and the sample
size is n for all the cases. Sample size means
52
00:08:19,499 --> 00:08:28,349
the number of observation collected per sample.
And if you compute the mean and variance from
53
00:08:28,349 --> 00:08:34,110
each sample, you will be getting for first
sample. It is x 1 bar and s 1 square, x 2
54
00:08:34,110 --> 00:08:42,120
bar s 2 square for second sample, and it is
obvious that all those values may not be same,
55
00:08:42,120 --> 00:08:52,170
actually it will be different. Suppose, if
I say that sample, sample 1, sample 2 like,
56
00:08:52,170 --> 00:08:57,510
sample n. Then if you calculate your mean and calculate
57
00:08:57,510 --> 00:09:06,070
variance, x 1 bar is the first sample mean,
x 2 bar is the second sample mean, like this
58
00:09:06,070 --> 00:09:15,780
x n bar is the nth sample mean. s 1 square
is the first sample, standard variance and
59
00:09:15,780 --> 00:09:23,130
s 2 square is the second sample variance.
Like this the variance s n square. Now, as
60
00:09:23,130 --> 00:09:30,500
these are not same value, what will happen
if you can draw a histogram?
61
00:09:30,500 --> 00:09:46,510
How you can find out the smallest value of
x bar? Let it be, if I say x within bracket
62
00:09:46,510 --> 00:09:58,680
1 bar, originally what happened? Originally
you have x 1 bar, x 2 bar, like this x n bar.
63
00:09:58,680 --> 00:10:09,020
If I say this is a vector x bar which is this
one. This T stands for transpose, then you
64
00:10:09,020 --> 00:10:16,430
are ordering them from smallest to largest,
then the same thing. What will happen if I
65
00:10:16,430 --> 00:10:28,160
say that some order we are giving? Then this
will be x 1 bar, x 2 bar, x n bar. So, this
66
00:10:28,160 --> 00:10:33,980
is the smallest one and the largest one is x n bar.
67
00:10:33,980 --> 00:10:41,420
So, if you plot you will get, you will develop
histogram. You may find out a distribution,
68
00:10:41,420 --> 00:10:56,410
it may be like this. It may be like this.
So, that means what we are trying to say,
69
00:10:56,410 --> 00:11:06,560
that if you collect sample 1 after another
from the same population of same size for
70
00:11:06,560 --> 00:11:13,290
the same variable, you will get different
values for that sample statistic. And that
71
00:11:13,290 --> 00:11:18,870
sample and that is why the sample statistic
is a random variable. And you have a probability
72
00:11:18,870 --> 00:11:27,300
density function for a random variable and
that density function is the sampling distribution
73
00:11:27,300 --> 00:11:28,820
function.
74
00:11:28,820 --> 00:11:41,500
Now, you have to know that what are the distribution
that comes under sampling distribution. I
75
00:11:41,500 --> 00:11:47,000
told you earlier that there are four most
popular sampling distribution, like z distribution,
76
00:11:47,000 --> 00:11:54,220
chi square distribution, t distribution and
f distribution. And these concept is very,
77
00:11:54,220 --> 00:12:00,760
very vital concept, I am telling you and very,
very important. Later stage is when we talk
78
00:12:00,760 --> 00:12:03,370
about any multivariate model. For example,
for multiple regression, there are several
79
00:12:03,370 --> 00:12:09,460
beta coefficient. You will be finding out
that regression coefficient, these regression
80
00:12:09,460 --> 00:12:14,580
coefficient through data, Sample data you
will be estimating. They are parameter, they
81
00:12:14,580 --> 00:12:22,670
are basically the statistic.
So, each parameter will have a distribution.
82
00:12:22,670 --> 00:12:33,890
How do I know that what distribution it follows?
If you do not understand this you will face
83
00:12:33,890 --> 00:12:47,200
problem there. So, this is a I can say that
one of the most fundamental concept and it
84
00:12:47,200 --> 00:12:56,910
should be, and you should be thorough about
this concept. Now, let us concentrate on what
85
00:12:56,910 --> 00:12:58,330
is z distribution.
86
00:12:58,330 --> 00:13:08,410
I am sure that you will not face it a difficult
one, because all of you know that this left
87
00:13:08,410 --> 00:13:14,570
hand side, the figure of you see this is the
normal distribution. The probability density
88
00:13:14,570 --> 00:13:17,870
function is f x equal to 1 by…
89
00:13:17,870 --> 00:13:26,300
I am writing here, once more f x equal to
one by root over 2 pi sigma square e to the
90
00:13:26,300 --> 00:13:37,790
power minus half x minus mu by sigma square.
And x varies from minus infinite to plus infinite.
91
00:13:37,790 --> 00:13:48,120
Now, where x is a random variable this is
the original one, original random variable,
92
00:13:48,120 --> 00:13:56,430
original one what you have observed, correct?
Now, let us transform x in this manner. Let
93
00:13:56,430 --> 00:14:10,040
z is a transform variable of x, which is x
minus mu by sigma. So, you have observed x,
94
00:14:10,040 --> 00:14:16,880
you are creating another variable z which
we will be using, x as well as the population
95
00:14:16,880 --> 00:14:22,430
parameter like mu and sigma. And this is this
is what is known as the transform the standardized
96
00:14:22,430 --> 00:14:39,250
variable. This is known as standardized variable.
So, when a variable is subtracted by its mean
97
00:14:39,250 --> 00:14:46,670
and divided and the resultant quantity is
divided by the standard deviation, that is
98
00:14:46,670 --> 00:14:53,510
known as standardized variable. So, that mean
by standardized variable, we mean that the
99
00:14:53,510 --> 00:15:06,300
variable minus its expected value, that is
a mean divided by that what I can say sigma
100
00:15:06,300 --> 00:15:17,100
means. Basically, x minus mu expected value
of this square root. You mean this square
101
00:15:17,100 --> 00:15:24,880
root of this, because we are considering standard
deviation square root of this, correct? Now,
102
00:15:24,880 --> 00:15:41,010
what will be the mean value of z 0? How every
guess, basically you can write in this manner.
103
00:15:41,010 --> 00:15:45,470
Suppose, what is mean? Mean is suppose if
I write mean.
104
00:15:45,470 --> 00:15:53,250
Mean is expected value of variable z. Here
we are talking about mu z so that mean expected
105
00:15:53,250 --> 00:16:03,040
value of x minus mu by sigma, that mean 1
by sigma expected value of x minus expected
106
00:16:03,040 --> 00:16:10,180
value of mu. And c expected value of constant
is constant. So, expected value of x is mu
107
00:16:10,180 --> 00:16:22,350
minis mu, which is equal to 0. So, that is
why the z is a random variable whose mean
108
00:16:22,350 --> 00:16:32,700
value is 0. What will happen to its standard
deviation? You will get 1 in the same fashion.
109
00:16:32,700 --> 00:16:38,030
You can find out that, what is the standard
deviation? Standard deviation means expected
110
00:16:38,030 --> 00:16:44,200
value, that is the variance. You find out
first variance of z which is expected value
111
00:16:44,200 --> 00:16:54,720
of your z minus expected value of E z that
whole square. So, expected value of z already
112
00:16:54,720 --> 00:17:02,240
you got 0. So, this basically expected value
of z square. So, it is expected value of z
113
00:17:02,240 --> 00:17:16,669
is nothing but x minus mu by sigma square.
So, if the variance of x, suppose variance
114
00:17:16,669 --> 00:17:24,790
of x is sigma square. The variance of a x,
where a is constant, it will be a square sigma
115
00:17:24,790 --> 00:17:41,660
square. So, it will be sigma square. So, that
means you can write this one, 1 by sigma square
116
00:17:41,660 --> 00:17:53,110
into expected value of x minus mu square.
What is this expected value of x minus mu
117
00:17:53,110 --> 00:18:00,350
square? That is the sigma square. So, your
sigma square by sigma square, this is 1. If
118
00:18:00,350 --> 00:18:08,990
this is the case, now you put you see that
in this equation. Come back to this slide
119
00:18:08,990 --> 00:18:15,750
again, what we have put? We put z equal to
x minus mu by sigma square, our resultant
120
00:18:15,750 --> 00:18:21,510
equation. The probability density function
for z which is 1 by root over 2 pi sigma square
121
00:18:21,510 --> 00:18:27,560
is 1. So, it is 2 pi into 1, means 2 pi square root
122
00:18:27,560 --> 00:18:36,320
e to the power 1 by 2 z square. So, this is
the conversion of any variable. For example,
123
00:18:36,320 --> 00:18:45,970
this normal variable to its unit normal distribution.
So, z distribution is also known as unit normal
124
00:18:45,970 --> 00:18:56,570
distribution, because its mean value is 0
and standard deviation is 1. What is the use?
125
00:18:56,570 --> 00:19:06,390
Why should we convert to unit normal distribution?
The reason is the even if there are many variables,
126
00:19:06,390 --> 00:19:13,300
but once you standardized those any variables
it will be a unit normal. So, you require
127
00:19:13,300 --> 00:19:19,000
only one normal distribution unit, normal
distribution table and using that you are
128
00:19:19,000 --> 00:19:27,160
able to, what I can say use that table to
different situation, even though the variable
129
00:19:27,160 --> 00:19:29,980
mean and standard deviation differs.
130
00:19:29,980 --> 00:19:43,670
So, this our example. And if I consider profit,
we are considering the profit and showing
131
00:19:43,670 --> 00:19:44,510
you that.
132
00:19:44,510 --> 00:19:53,810
What is the use of standard normal distribution
here? Assume the variable profit per month
133
00:19:53,810 --> 00:19:59,050
is normally distributed with mean of rupees
11 millions and standard deviation of rupees
134
00:19:59,050 --> 00:20:11,190
1.5 millions. What is the probability that
the profit per month will be within 12.5 millions?
135
00:20:11,190 --> 00:20:16,240
How to go about it? You see the left hand
side this figure, it is in the original variable.
136
00:20:16,240 --> 00:20:23,900
Right hand side is the unit normal one. Now,
if you see this x axis in the bottom one,
137
00:20:23,900 --> 00:20:30,270
you see what is the mean is the middle value.
And every other one standard deviation, two
138
00:20:30,270 --> 00:20:34,450
standard deviation, three standard deviation,
both side that demarcation is there.
139
00:20:34,450 --> 00:20:41,270
Now, as sigma equal to 1 that same will be
now mean will be 0 and 1 into 1, that will
140
00:20:41,270 --> 00:20:51,190
be 1 that minus 1, minus 2, minus 3 like this.
So, this line is what is the z line, because
141
00:20:51,190 --> 00:20:58,460
you are interested to know that your probability
of profit, less than equal to 12.5 in rupees
142
00:20:58,460 --> 00:21:06,720
million. And if I convert into z value, it
is coming 1. So, probability z less than equal
143
00:21:06,720 --> 00:21:16,800
to 1 that mean this 1, this is the z equal
to 1. The left hand side values probability
144
00:21:16,800 --> 00:21:25,460
equal z less than equal to 1 will be the area
under the normal distribution curve from minus
145
00:21:25,460 --> 00:21:33,000
infinity to that z value, because that is
within this. Now, you have standard normal
146
00:21:33,000 --> 00:21:39,110
table. Now, you will, what you will do once
you get the z value? You go for the table.
147
00:21:39,110 --> 00:21:42,730
So, our z value is 1.
148
00:21:42,730 --> 00:21:46,340
You see that this is the shaded portion is
the probability. This area is the probability,
149
00:21:46,340 --> 00:21:56,410
whose value is 0.8413. So, you are able to
find out the probability that your profit
150
00:21:56,410 --> 00:21:59,120
will be within this.
151
00:21:59,120 --> 00:22:15,940
Now, second distribution, sample distribution
is chi square. Do have any idea that when
152
00:22:15,940 --> 00:22:24,320
you use chi square distribution you find out
that if you go through the standard book like,
153
00:22:24,320 --> 00:22:29,250
very good book like Johnson and Richard book.
You will find that everywhere may be when
154
00:22:29,250 --> 00:22:37,910
you talk about the constant there all statistical
distribution, the chi square distribution.
155
00:22:37,910 --> 00:22:43,630
But why chi square distribution is used instead
of t distribution, or instead of z distribution
156
00:22:43,630 --> 00:22:50,309
or instead of f distribution or instead of
any other distribution. What is the basis
157
00:22:50,309 --> 00:22:54,940
that mean? We must know that, what is chi
square distribution? How it is generated and
158
00:22:54,940 --> 00:23:02,110
when it will be used.
Now, you see this slide here what we are seeing
159
00:23:02,110 --> 00:23:10,710
that, if z 1, z 2, z k, what is z? z is normal
distribution unit normal. So, you have collected
160
00:23:10,710 --> 00:23:21,790
suppose, k observations and those k unit that
normal observations are that z 1, z 2, z k
161
00:23:21,790 --> 00:23:28,700
and you are creating one variable, which is
the sum of the normal variable, unit normal
162
00:23:28,700 --> 00:23:37,309
variable square. In what I mean to say here,
then suppose you have collected n k data point
163
00:23:37,309 --> 00:23:40,630
1, 2, 3 like k.
164
00:23:40,630 --> 00:23:51,100
And you have x values for x 1, x 2, x 3. Let
it be x k and you know z values, z equal to
165
00:23:51,100 --> 00:23:57,210
x minus mu n by sigma. Assuming that mu and
sigma are population parameter, then you are
166
00:23:57,210 --> 00:24:07,550
getting z 1, z 2, z 3 like z k. Now, you are
making z square, z 1 square, z 2 square, z
167
00:24:07,550 --> 00:24:17,000
3 square, like this z k square. Now, if you
take a sum of these z I, i equal to 1 to k,
168
00:24:17,000 --> 00:24:24,900
you get a quantity. This is also a statistic,
this is the linear sum of the sum of the square
169
00:24:24,900 --> 00:24:34,000
of the variable values of the normal unit
normal variable. These quantity that y what
170
00:24:34,000 --> 00:24:45,030
you have created, y as shown here it is y
and any how you can change it to y, no problem.
171
00:24:45,030 --> 00:24:53,350
So, these quantity follows chi square distribution,
with how many degrees of freedom? k degrees
172
00:24:53,350 --> 00:24:58,870
of freedom. So, what is the essential learning?
Here, our learning is suppose, I know that
173
00:24:58,870 --> 00:25:09,320
there is no, I know that there is a normal
variable x, your collected data on it, you
174
00:25:09,320 --> 00:25:17,510
converted it to standard normal, each of the observations is squared and you have taken
175
00:25:17,510 --> 00:25:26,320
a sum and that sum you used for different
purposes. That sum we will follow certain
176
00:25:26,320 --> 00:25:35,300
distribution, that distribution is chi square
distribution. Remember, this one.
177
00:25:35,300 --> 00:25:40,500
Suppose, usually why normal distribution will
start with normal distribution, but normal
178
00:25:40,500 --> 00:25:46,400
distribution plenty of things in the real
world, most of the things can be converted
179
00:25:46,400 --> 00:25:55,620
to normally distributes. Most of the cases
that is the starting point. Now, for the purpose
180
00:25:55,620 --> 00:26:01,360
of your analysis, purpose your model building.
What is the purpose for which you want to
181
00:26:01,360 --> 00:26:09,630
use it there? What is required? You require
the sum of the square of the variables values,
182
00:26:09,630 --> 00:26:15,750
then what we will do when you collect one
sample? You have to have the distribution
183
00:26:15,750 --> 00:26:26,800
of that, that is chi square distribution.
Any question? Understood fully or not? Any
184
00:26:26,800 --> 00:26:31,960
question?
The general form of chi square distribution
185
00:26:31,960 --> 00:26:39,470
is like this, that f y is 1 by 2 to the power
gamma k y 2 gamma k by 2. And this is the
186
00:26:39,470 --> 00:26:46,880
case and the mean value of chi square is the
k, which is degrees of freedom and variance
187
00:26:46,880 --> 00:26:56,000
is 2 times degrees of freedom. And this figure
you see that, this is the probability density
188
00:26:56,000 --> 00:27:04,170
function for a chi square variable. Now, that
the x is chi square, then what is happening
189
00:27:04,170 --> 00:27:09,929
here? 1 to 8 these are the values and ultimately
you will be getting different shape of chi
190
00:27:09,929 --> 00:27:17,179
square density function.
When your k equal to 1 this is this as well
191
00:27:17,179 --> 00:27:23,150
as k equal to 2, it looks like exponential
distribution, but slowly that shape will change.
192
00:27:23,150 --> 00:27:28,710
So, degrees of freedom plays an important
role in chi square distribution. In z distribution,
193
00:27:28,710 --> 00:27:37,570
what is the degree of freedom required? No
degrees of freedom. We have not discussed
194
00:27:37,570 --> 00:27:44,840
anything related to degrees of freedom in
z distribution. When the unit normal distribution
195
00:27:44,840 --> 00:27:51,570
you table, you see there is no degrees of
freedom column. So, that mean in z distribution
196
00:27:51,570 --> 00:27:59,299
it is basically not affected by the degrees
of freedom available with the data set chi
197
00:27:59,299 --> 00:28:04,210
square distribution. When you talk about chi
square distribution, please keep in mind the
198
00:28:04,210 --> 00:28:12,059
degrees of freedom is coming into consideration.
And chi square is nothing but the normal square.
199
00:28:12,059 --> 00:28:16,110
My x is normally distributed, I am taking
the square linear square, linear sum of square
200
00:28:16,110 --> 00:28:22,299
of x, that is my chi square distribution.
So, you may be wondering that where is the
201
00:28:22,299 --> 00:28:35,090
use? Now, see I told you we want to find out
the distribution of sample statistic, getting
202
00:28:35,090 --> 00:28:47,559
me? Now, one of the sample statistic is a
square variance, yes or no? Very much, what
203
00:28:47,559 --> 00:28:54,940
is the distribution of a square? How do we
know what is the, what will be distribution
204
00:28:54,940 --> 00:29:11,740
of x bar? See, x bar is 1 by n sum total of
x i normally distributed variable. So, x bar
205
00:29:11,740 --> 00:29:24,420
if x is normally distributed, x bar also follow
normal distribution, by considering x bar
206
00:29:24,420 --> 00:29:30,160
minus expected value of x bar by variance
of x bar.
207
00:29:30,160 --> 00:29:37,429
Then uninormal distribution I will not tell
anything related to this computation. Later
208
00:29:37,429 --> 00:29:45,809
on I will tell you what is this from central
limit theorem. You know what will be the distribution,
209
00:29:45,809 --> 00:29:51,179
but irrespective of what I mean to say here,
that irrespective of the value that x bar
210
00:29:51,179 --> 00:30:01,010
value, if you collect data from a normal distribution
and x bar will be normally distributed, getting
211
00:30:01,010 --> 00:30:04,480
me? Now, what will be the A square? How do
what is A square?
212
00:30:04,480 --> 00:30:15,000
A square is 1 by n minus 1 sum total of n
equal to 1 to n x i minus x bar square. Can
213
00:30:15,000 --> 00:30:20,770
you find out any similarity here? What you
have done, I say x is normally distributed,
214
00:30:20,770 --> 00:30:30,700
x bar is also normally distributed then you
have made the squaring this. So, when normal
215
00:30:30,700 --> 00:30:39,170
variable is squared and you take summation
and do little bit of manipulation using the
216
00:30:39,170 --> 00:30:47,910
population variance, what you will get? You
will get standard normal z and summation of
217
00:30:47,910 --> 00:30:52,780
square of standard normal z. Is it not correct?
218
00:30:52,780 --> 00:31:02,950
You see this slide, what I have shown here?
A square is just it is nothing but the formula
219
00:31:02,950 --> 00:31:08,690
we have given X 1 minus x bar square plus
X 2 minus x bar square plus X 3 minus x bar
220
00:31:08,690 --> 00:31:18,900
square and sigma square is divided. This sigma
is the population variance. Now, this quantity
221
00:31:18,900 --> 00:31:27,970
is coming like Z 1 square plus Z 2 square
plus Z n square. So, it is it is the sum total
222
00:31:27,970 --> 00:31:36,799
of normal squares, the variable value square.
So, it is chi square distribution, that is
223
00:31:36,799 --> 00:31:45,530
why when we will go for interval estimation
of variance we will use chi square distribution,
224
00:31:45,530 --> 00:31:51,429
because it should come to your mind that why
x bar, for x bar we are using normal distribution,
225
00:31:51,429 --> 00:31:55,830
but for s square we are using chi square distribution.
That mean, the crux of the matter is here,
226
00:31:55,830 --> 00:32:05,750
this is the development, this is the development,
fantastic. Then you may be wondering why the
227
00:32:05,750 --> 00:32:12,179
n minus 1? Already we have seen that while
calculating the A square, one degree is lost,
228
00:32:12,179 --> 00:32:20,760
then same thing continues and ultimately our,
this n minus 1 A square by sigma square, this
229
00:32:20,760 --> 00:32:35,419
quantity follows chi square distribution with
n minus one degrees of freedom. Any question
230
00:32:35,419 --> 00:32:46,620
for this? So, it is any question? No question?
I think it is obvious now. So, keep in mind
231
00:32:46,620 --> 00:32:53,740
this one because many a times you will be
using this type of derived units. You will
232
00:32:53,740 --> 00:32:57,280
square and then add and chi square is required,
but you do not know what distribution you
233
00:32:57,280 --> 00:33:04,710
will be using, just follow this concept. And
you follow chi square, some other case it
234
00:33:04,710 --> 00:33:10,880
will be other distribution.
This is the use for example, what the if we
235
00:33:10,880 --> 00:33:18,130
consider data then profit variance obtained
through the twelve month data, assuming population
236
00:33:18,130 --> 00:33:24,720
variance is you can find out the distribution
of the variance component computed from the
237
00:33:24,720 --> 00:33:34,620
twelve months data. It is all the uses of
this chi square distribution will be revealed
238
00:33:34,620 --> 00:33:42,559
in subsequent lectures, but the concept remains
same when we will use chi square distribution,
239
00:33:42,559 --> 00:33:46,830
this is the concept, okay?
240
00:33:46,830 --> 00:33:53,070
Chi square table, when you have gone for Z
table please keep in mind that in z table
241
00:33:53,070 --> 00:33:59,700
there is no degrees of freedom column. Chi
square means there will be degrees of freedom.
242
00:33:59,700 --> 00:34:06,960
So, there the first column itself is degrees
of freedom, 1 to 15 it will be, it will be
243
00:34:06,960 --> 00:34:13,429
infinite, because chi square value can go
up to that level. Then there are different
244
00:34:13,429 --> 00:34:19,919
probabilities values here and for different
probability values what will be your chi square
245
00:34:19,919 --> 00:34:26,529
value. So, how to use this table? Any idea?
246
00:34:26,529 --> 00:34:36,999
For example, let the chi square distribution
PDF is like this chi square, this is function
247
00:34:36,999 --> 00:34:50,720
of that chi square PDF. Let the distribution
look like this. Now, I want to know what is
248
00:34:50,720 --> 00:35:09,579
the chi square value for the probability of
this side? Let it be alpha which is 0.5, from
249
00:35:09,579 --> 00:35:16,630
this table can u find out this value, chi
square value, what will be this chi square?
250
00:35:16,630 --> 00:35:24,059
0.05 understand?
Because these things you will be requiring
251
00:35:24,059 --> 00:35:31,619
later on. You have to see this chi square
table frequently. What I mean to say, I know
252
00:35:31,619 --> 00:35:38,130
the probability right hand side probability
here, for which I want to know what will be
253
00:35:38,130 --> 00:35:44,809
the chi square value. If you are giving like
this, chi square with probability 0.05, you
254
00:35:44,809 --> 00:35:50,970
cannot calculate. You cannot find out the
value from here which degrees of freedom is
255
00:35:50,970 --> 00:35:56,869
it, 1 degree of freedom or 15 degrees of freedom
or 120 degrees of freedom.
256
00:35:56,869 --> 00:36:04,989
So, another quantity should be here which
is degree of freedom. So, suppose k equal
257
00:36:04,989 --> 00:36:12,269
to 1 then what is this value? k equal to 1,
probability is 0.05. So, your chi square value
258
00:36:12,269 --> 00:36:27,529
is 3.841. So, my this value is 3.841. Suppose,
you require a value where the probability,
259
00:36:27,529 --> 00:36:37,549
so what is this? This is this one is probability,
that chi square value is this chi square.
260
00:36:37,549 --> 00:36:45,210
This probability that chi square value this
value, this value that will be greater than
261
00:36:45,210 --> 00:36:52,130
some value, getting me?
So, let any value we have, we had basically
262
00:36:52,130 --> 00:37:03,180
k equal to 1 some value this is that value.
We will say that chi square computed value
263
00:37:03,180 --> 00:37:09,650
probability that this will be chi square.
Chi square is computed, this value this will
264
00:37:09,650 --> 00:37:20,509
be greater than this is the value, this side
less than this will be other side. Now, if
265
00:37:20,509 --> 00:37:29,920
your degree of freedom is 6 and you want the
probability that the value is that 9.995,
266
00:37:29,920 --> 00:37:34,009
then your value chi square value will be 0.676
which will be somewhere here.
267
00:37:34,009 --> 00:37:58,440
So, you are talking about the total probability this side. Now, let us see t distribution, t distribution,
268
00:37:58,440 --> 00:38:05,470
when will you use t distribution? First you
have to understand when we will be using t
269
00:38:05,470 --> 00:38:09,999
distribution, then we will see the uses of
t distribution. Then again I will show you
270
00:38:09,999 --> 00:38:18,700
the table, how to use table to find the different
critical value for t distribution. You see
271
00:38:18,700 --> 00:38:24,009
here, if z and x k are independent normal
and chi square variables.
272
00:38:24,009 --> 00:38:30,569
So, we are now considering two variable, one
is normal variable that is unit normal that
273
00:38:30,569 --> 00:38:41,910
is z, another one we are talking about chi
square. Let it be chi square with k degrees
274
00:38:41,910 --> 00:38:51,690
of freedom, fine? Now, you create another
variable which is t, which x by square root
275
00:38:51,690 --> 00:39:06,269
of chi square k divided by k. Now, in your
development process suppose when you are developing
276
00:39:06,269 --> 00:39:13,890
model, if you find that you have created some variable which is of this form, that normal
277
00:39:13,890 --> 00:39:21,670
by square root of chi square divided by its degrees of freedom. Then this quantity that
278
00:39:21,670 --> 00:39:32,440
t will follow t distribution with k degrees
of freedom. t distribution with k degrees
279
00:39:32,440 --> 00:39:41,150
of freedom, that means t distribution case
also degree of freedom will come.
280
00:39:41,150 --> 00:39:50,049
Degrees of freedom is very important for chi
square distribution, for t distribution also.
281
00:39:50,049 --> 00:40:00,190
And you see it is the same of t distribution,
it is similar to normal distribution. When
282
00:40:00,190 --> 00:40:09,289
your this side that degree of freedom will
be infinite, that is it exactly match with
283
00:40:09,289 --> 00:40:21,089
normal distribution. What is the mean value
of t distribution? You are not getting? See
284
00:40:21,089 --> 00:40:30,029
z what is the mean value of z 0 mean, that
t value divided by tangent 0 is coming. So,
285
00:40:30,029 --> 00:40:37,549
what is happening here? The t distribution
mean value is 0 and standard deviation value,
286
00:40:37,549 --> 00:40:45,630
if square root of k by k minus 2 and k must
be greater than equal to 2 and you see that
287
00:40:45,630 --> 00:40:52,640
this one here in the diagram itself, the mean value is 0.
288
00:40:52,640 --> 00:41:11,799
What is each? What is its use? Which one this?
There you basically it speaks what is the
289
00:41:11,799 --> 00:41:15,869
parameter of t distribution. What is the parameter
of normal distribution, mu and sigma square.
290
00:41:15,869 --> 00:41:34,099
What is the parameter of t distribution, that
is the case. What is here? You have see everywhere
291
00:41:34,099 --> 00:41:43,279
k is 3. You see the function gamma k plus
1 by 2 k. Here k, everywhere k is there. So,
292
00:41:43,279 --> 00:41:51,119
that means the parameter of this distribution,
t distribution k degrees of freedom and you
293
00:41:51,119 --> 00:41:57,430
are getting, that is why mean is 0 and sigma
square in terms of k. Let it be.
294
00:41:57,430 --> 00:42:09,930
So, you will know the parameter from the PDF
only in t distribution, PDF all the all other
295
00:42:09,930 --> 00:42:17,029
values are constant, t defines the random
variables. Otherwise, for other things you
296
00:42:17,029 --> 00:42:23,819
see that pi is the constant for and that is
that. By other way only k is there everywhere,
297
00:42:23,819 --> 00:42:34,589
gamma distribution is beginning, getting me?
Now, come back to this use. What is the use
298
00:42:34,589 --> 00:42:46,460
of when do you use t distribution? Now, let us see that the t is like this, that x bar
299
00:42:46,460 --> 00:43:02,299
minus mu by s by square root of n. You have
created this type of composition. How it will
300
00:43:02,299 --> 00:43:03,960
come or how it will, why it will come?
301
00:43:03,960 --> 00:43:09,849
How it will come? I am showing you one thing
that come. I am showing you one thing that
302
00:43:09,849 --> 00:43:19,769
when we talk about the distribution of x bar.
So, x bar is 1 by n sum total of x I, then
303
00:43:19,769 --> 00:43:26,119
I ask you what is the expected value of x
bar. You said that expected value of 1 by
304
00:43:26,119 --> 00:43:38,400
n sum total of x i. So, that mean 1 by n expected
value of x 1 plus expected value of x 2 plus
305
00:43:38,400 --> 00:43:49,739
expected value of x n. What is the expected
value of x 1? All are mu, if x is normally
306
00:43:49,739 --> 00:43:55,140
distributed every observation will be correspondingly
distributed to the expected, respective mean.
307
00:43:55,140 --> 00:44:05,650
So, it is basically n mu by n, that is mu.
So, we say that expected value of sample average
308
00:44:05,650 --> 00:44:11,369
is mu. Now, what will be the variance component?
309
00:44:11,369 --> 00:44:20,269
If I want to know variance of x bar, that
means this is nothing but variance of 1 by
310
00:44:20,269 --> 00:44:28,849
n sum total of again x i. So, as I said that
variance if you add 1 by n square, then variance
311
00:44:28,849 --> 00:44:41,479
of x 1 plus variance of x 2 plus variance
of x n. x 1, x 2, x n, all are normal random
312
00:44:41,479 --> 00:44:49,859
variables with variance sigma square. So,
1 by n square then sigma square plus sigma
313
00:44:49,859 --> 00:44:57,130
square, like plus sigma square which will
n sigma square by n square, that is sigma
314
00:44:57,130 --> 00:45:02,130
square by n.
So, that means using that if I create it,
315
00:45:02,130 --> 00:45:11,099
create a that we will x bar, I told u that
if normally distributed with mean of these.
316
00:45:11,099 --> 00:45:22,299
Mean will be mu and x sigma square by n. Now,
you are creating a z variable here. If you
317
00:45:22,299 --> 00:45:31,489
create a z variable here, so z is nothing
but x bar. This is the normal variable minus
318
00:45:31,489 --> 00:45:41,170
it expected value of x bar divided by square
root of variance of x bar. If any variable
319
00:45:41,170 --> 00:45:49,789
random variable, any variable is subtracted
by its mean and divided by the standard deviation,
320
00:45:49,799 --> 00:45:54,999
it is z standard value.
So, what is this? x bar minus expected value
321
00:45:54,999 --> 00:46:03,680
of x bar is mu. We have already proved, we
have already proved and your variance is sigma
322
00:46:03,680 --> 00:46:10,809
square by root n. So, it is sigma square by
variance is sigma square by n. This is the
323
00:46:10,860 --> 00:46:19,969
quantity or other way I can say z is x bar
minus mu by sigma by root n. So, this will
324
00:46:19,969 --> 00:46:23,440
follow z distribution.
325
00:46:23,450 --> 00:46:28,249
Now, here in t case what is happening here
it is t case you will see that what we are
326
00:46:28,249 --> 00:46:34,680
saying, suppose x bar minus mu is there, but
sigma is not there known. You are using s
327
00:46:34,680 --> 00:46:47,279
in z case x bar minus mu by sigma by root
n. This sigma is population standard deviation,
328
00:46:47,279 --> 00:46:53,969
but this is not known. Instead you are using
sample standard deviation in the t case. So,
329
00:46:53,969 --> 00:47:02,249
s is a random variable, but here sigma is
a constant. Now, depending on the situation
330
00:47:02,249 --> 00:47:04,349
ultimately what level there are different
conditions.
331
00:47:04,349 --> 00:47:12,029
So, if sample size is very high large then
the same this quantity can be in unit normal
332
00:47:12,029 --> 00:47:20,710
also, but most general case is this quantity
is t. Why, because this quantity follows t
333
00:47:20,710 --> 00:47:27,319
distribution. How can we justify that this
quantity follows t distribution? If you want
334
00:47:27,319 --> 00:47:34,219
to justify this then I just written this one,
the same thing I have written in this manner,
335
00:47:34,219 --> 00:47:41,660
x bar minus mu divided by sigma square by
n. And then the square within square root
336
00:47:41,660 --> 00:47:48,170
component is n minus 1 square by sigma square
1 by n minus 1, just manipulation. What is
337
00:47:48,170 --> 00:47:57,459
that the numerator and denominator is manipulated
with some constants, then what is this x bar
338
00:47:57,459 --> 00:48:07,190
minus mu by sigma square by n. This is z,
already we have seen this is z top portion.
339
00:48:07,190 --> 00:48:11,869
What is the bottom portion?
Earlier I have shown you that n minus 1 s
340
00:48:11,869 --> 00:48:22,140
square by sigma square follow chi square distribution.
You go back you see what is this one, chi
341
00:48:22,140 --> 00:48:35,160
square distribution. So, your resultant variable
is that z value, z random by the square root
342
00:48:35,160 --> 00:48:41,309
of chi square by its degrees of freedom. So,
it is t distribution that is the use. Why
343
00:48:41,309 --> 00:48:51,779
you will use t distribution in this case?
So, you see this formula and appreciate it,
344
00:48:51,779 --> 00:48:59,839
because if you understand this huge problem
will be solved. That is the use, follows chi
345
00:48:59,839 --> 00:49:05,209
square this one. So, the resultant quantity
is this. Now, you require to use chi square
346
00:49:05,209 --> 00:49:07,079
table.
347
00:49:07,079 --> 00:49:09,420
This is our chi square table. So, please keep
348
00:49:09,420 --> 00:49:16,170
in mind in chi square table also there will
be degrees of freedom. Degrees of freedom
349
00:49:16,170 --> 00:49:19,819
coming into consideration, because the parameter
is degrees of freedom, because the parameter
350
00:49:19,819 --> 00:49:29,660
of the distribution is degrees of freedom.
I think you will be able to find out, suppose
351
00:49:29,660 --> 00:49:38,150
if I say that if my t distribution with 11
degrees of freedom and what is the probability
352
00:49:38,150 --> 00:49:48,160
for that means that 0.0025. If we consider
then you will be getting a value of 2.228,
353
00:49:48,160 --> 00:49:54,209
but if you see the z distribution for the
same thing for 0 to 5 there is no need of
354
00:49:54,209 --> 00:50:01,950
any degrees of freedom. 1.96 will be the z
value.
355
00:50:01,950 --> 00:50:15,130
Then come to f distribution, please know that
is developed by that I think I just and the
356
00:50:15,130 --> 00:50:31,190
it is developed while he was a student so
far. I know this one student that is distribution
357
00:50:31,190 --> 00:50:43,989
is here sample variance, but population variance
is still there that n minus 1 square by sigma
358
00:50:43,989 --> 00:50:56,359
square. But question is that when we compute
the t there is no population variance. When
359
00:50:56,359 --> 00:51:02,499
you compute t there is no population variance,
s is there. There everything can be calculated.
360
00:51:02,499 --> 00:51:08,609
Now, I want to know what is the distribution
of this and we found out that the distribution
361
00:51:08,609 --> 00:51:19,019
is t distribution, f distribution. Now, come
to a ratio measure, what I am trying to we
362
00:51:19,019 --> 00:51:28,619
will see here. Suppose, you have two population
for the same variable, population 1 and population
363
00:51:28,619 --> 00:51:35,009
2 are characterized by the same variable,
and you have computed. You have taken sample,
364
00:51:35,009 --> 00:51:41,170
from both the sample you have computed your
standard deviation or variance for population
365
00:51:41,170 --> 00:51:48,969
1 as well as population 2. You want to compare
whether variability in population 1 is different
366
00:51:48,969 --> 00:51:53,599
from the variability in population 2 or not
or they are basically equal.
367
00:51:53,599 --> 00:52:01,469
That mean you are basically creating a variable,
which is suppose s 1 square by s 2 square.
368
00:52:01,469 --> 00:52:16,839
This is coming from population 1, this is
coming from population 2, whether two population
369
00:52:16,839 --> 00:52:23,359
are having same variance or not in many models,
we assume that the population variance are
370
00:52:23,359 --> 00:52:29,940
equal. For example, when you do anova even
in manova case also we will be seeing that
371
00:52:29,940 --> 00:52:39,190
one of the condition for sometimes. We use
that population variances are equal, this
372
00:52:39,190 --> 00:52:43,890
ratio we want to require to know in anova.
You will be finding out the use of this anova,
373
00:52:43,890 --> 00:52:45,319
must use of these two regression also is the
same.
374
00:52:45,319 --> 00:53:03,259
So, the ratio when you are finding some quantity,
you are finding out which is basically the
375
00:53:03,259 --> 00:53:14,170
ratio of two chi square variable divided by
their respective degrees of freedom. Then
376
00:53:14,170 --> 00:53:20,619
the quantity basically this is what I meant
to say that if this is the w, this w if we
377
00:53:20,619 --> 00:53:27,599
create like this the quantity follows f distribution
with two degrees of freedom. Numerator degrees
378
00:53:27,599 --> 00:53:32,890
of freedom and denominator degrees of freedom,
why you see the distribution here mu 1 and
379
00:53:32,890 --> 00:53:41,749
mu 2, everywhere mu 1, mu 2 is there. Rest
of gamma function it is in other cases. So,
380
00:53:41,749 --> 00:53:45,489
getting me?
So, this distribution is characterized by
381
00:53:45,489 --> 00:53:51,849
numerator degrees of freedom and denominator
degrees of freedom, and when do you use f
382
00:53:51,849 --> 00:54:02,019
distribution? When you derive any quantity,
which is the ratio of two chi square variables
383
00:54:02,019 --> 00:54:07,349
and definitely that ratio weighted ratio.
This ratio means weighted ratio of two chi
384
00:54:07,349 --> 00:54:13,739
square variable, where weight is nothing but
the degrees of freedom 1 by that degrees of
385
00:54:13,739 --> 00:54:14,119
freedom.
386
00:54:14,119 --> 00:54:24,640
You are getting me? For example, what I say
let us see the use see s 1 square, s 2 square
387
00:54:24,640 --> 00:54:30,509
from two population. That population variance
is sigma square, sigma 2 square I can write
388
00:54:30,509 --> 00:54:36,329
this like this, because all of this we have
already seen that n minus 1 is s 1 square
389
00:54:36,329 --> 00:54:42,219
by s n minus 1 square by sigma square follows
chi square distribution. So, that means this
390
00:54:42,219 --> 00:54:48,299
quantity will be chi square divided by degrees
of freedom, the denominator point it will
391
00:54:48,299 --> 00:54:54,200
be again be chi square divided by the respective
degrees of freedom, correct? So, this is w
392
00:54:54,200 --> 00:54:57,589
and this one is f distributed.
393
00:54:57,589 --> 00:55:03,609
So, in f distribution please keep in mind
when we talk about f distribution, there will
394
00:55:03,609 --> 00:55:07,579
be numerator degrees of freedom, denominator
degrees of freedom.
395
00:55:07,579 --> 00:55:13,359
So, when you go to see the f distribution
table, you have to see that two distribution,
396
00:55:13,359 --> 00:55:20,199
two different degrees of freedom. For example,
if you are interested to know for 5 numerator
397
00:55:20,199 --> 00:55:27,449
degrees of freedom and 3, that denominator
degrees of freedom you have 3 denominator
398
00:55:27,449 --> 00:55:33,329
and 5 numerator degrees of freedom with a
probability 0.025, then you find out this
399
00:55:33,329 --> 00:55:46,819
value. You will be getting and this value
0.7, that is 7.76. So, this way it will be
400
00:55:46,819 --> 00:55:51,690
used and central limit theorem is the final
one for our.
401
00:55:51,690 --> 00:55:57,869
Today again we will discuss. Next class we
will discuss the sampling strategy. Today
402
00:55:57,869 --> 00:56:08,599
we will finish by this central limit theorem
says, if you sample from normal population
403
00:56:08,599 --> 00:56:17,349
or other population when the sample size is
large. Then the distribution of x bar, the
404
00:56:17,349 --> 00:56:24,380
statistical distribution that is sampling
distribution of x bar will be normal. x bar
405
00:56:24,380 --> 00:56:35,299
is normally distributed with mean mu, variance
sigma square by n. So, there are many sampling
406
00:56:35,299 --> 00:56:41,739
strategy, first one is random sampling, stratified
sampling, cluster sampling, systematic sampling.
407
00:56:41,739 --> 00:56:46,259
There are some other sampling, convenient
sampling, all those things basically we talk
408
00:56:46,259 --> 00:56:51,709
about the sample statistic. You have collected
data, how you have collected data strategy,
409
00:56:51,709 --> 00:56:58,559
means what method you have adopted while collecting
the sample random means. You will randomize
410
00:56:58,559 --> 00:57:01,640
the collection procedure in such a manner
that each and every observation is equally
411
00:57:01,640 --> 00:57:10,289
likely to come. Stratified sampling is sometime
required, suppose you want to see that at
412
00:57:10,289 --> 00:57:16,150
different age groups what is the pattern.
Suppose, for a particular suppose blood pressure
413
00:57:16,150 --> 00:57:22,229
pattern then you will take young people, middle
people, old people, three starta you will
414
00:57:22,229 --> 00:57:29,849
create and every strata you randomly select
observation cluster sampling. So, what cluster
415
00:57:29,849 --> 00:57:35,749
sampling is? Suppose, you think that our exit
poll this time for panchayat election in Bengal,
416
00:57:35,749 --> 00:57:41,769
there are in West Bengal. There are so many
district, each district is a cluster.
417
00:57:41,769 --> 00:57:49,059
Suppose, you cannot go for every district
and collect sample from each district, you
418
00:57:49,059 --> 00:57:55,109
can randomize the cluster. So, that means
you will select based on some randomization
419
00:57:55,109 --> 00:57:59,999
experiment. You will select some of the select
suppose you select one district based on randomization
420
00:57:59,999 --> 00:58:06,229
and every voters on that district is sample.
Then that is single stage cluster sampling.
421
00:58:06,229 --> 00:58:14,559
Now, it may so happen that you may go for
3, 4 or little more district at random, and
422
00:58:14,559 --> 00:58:23,709
again each of the district you will collect
data from individual selecting randomly, getting
423
00:58:23,709 --> 00:58:32,219
me? First one is district, number of district
is there, you select one randomly, one district
424
00:58:32,219 --> 00:58:38,789
sample everybody that is single stage cluster
sampling. Second is first randomize the selection
425
00:58:38,789 --> 00:58:44,579
of the district, take few district and again
in each district you randomize the selection
426
00:58:44,579 --> 00:58:50,549
of individual, that is two stage cluster.
And now, if you again in district level you
427
00:58:50,549 --> 00:58:56,609
can go for city also, there one more randomized
sampling is possible. So, that multistage
428
00:58:56,609 --> 00:59:02,239
cluster it will come last one is systematic
sampling systematic sampling, means you basically
429
00:59:02,239 --> 00:59:10,009
follow an order. For example, suppose the
tenth day you will set a target for example.
430
00:59:10,080 --> 00:59:15,589
Suppose, the k th observation you will collect
in one sample, correct? Suppose, the population
431
00:59:15,589 --> 00:59:23,459
size is N, population size is N, sample size
is small n so that and you want to suppose
432
00:59:23,459 --> 00:59:29,999
this quantity is k. You want to collect a
sample of k. What you will do from the first
433
00:59:29,999 --> 00:59:36,400
ten observation? You may be at the eleventh
point you observe, then you go on adding k.
434
00:59:36,400 --> 00:59:44,449
The second observation will be after you will
select the l plus k th item, then l plus 2
435
00:59:44,449 --> 00:59:51,499
k that way you will collect the l sample,
getting me? So, these are the strategies and
436
00:59:51,499 --> 00:59:56,539
you go through some books and you know that
as I told you that t distribution, and f distribution
437
00:59:56,539 --> 00:59:59,749
will be having the use.
438
00:59:59,749 --> 01:00:06,430
And f distribution all of you know that R
A Fischer, an English statistician, he has
439
01:00:06,430 --> 01:00:13,099
contributed a lot in agricultural sector.
And George W Snedecor, american mathematician,
440
01:00:13,099 --> 01:00:22,880
they are the pioneer in developing the distribution.
The next class I will bring.
441
01:00:22,880 --> 01:00:33,569
That student distribution, that is you can
remember throughout, you know Fischer and
442
01:00:33,569 --> 01:00:40,449
Snedecor they have basically developed the
f distribution, Snedecor and Fisher, R A Fisher
443
01:00:40,449 --> 01:00:51,650
is one only montgomery and this Aczel A D
complete business statistics. This both books
444
01:00:51,650 --> 01:00:58,989
are very much available and multivariate statistics,
Johnson and Richard that all that I told you
445
01:00:58,989 --> 01:01:09,400
in the beginning. So, next class we will discuss
estimation particular confidence interval.
446
01:01:09,400 --> 01:01:23,599
So, I think again it will be on Tuesday, coming
Tuesday, 7 o clock evening. Tuesday 7 to 9.
447
01:01:23,599 --> 01:01:35,919
Thank you very much.