1
00:00:10,830 --> 00:00:19,270
Hello and welcome to the third and last lecture
on the series on Random Variables and Probability
2
00:00:19,270 --> 00:00:20,270
Distributions.
3
00:00:20,270 --> 00:00:24,900
In the first lecture we spoke about, we introduced
the concept of random variables spoke about,
4
00:00:24,900 --> 00:00:30,180
how probability distribution can be discrete
or continuous and we also introduced the idea
5
00:00:30,180 --> 00:00:37,660
of PDFs and CDFs, Probability Density Functions
and Cumulative Density Functions.
6
00:00:37,660 --> 00:00:43,809
In the second lecture, we targeted about five
or six distributions, commonly used distributions
7
00:00:43,809 --> 00:00:46,470
and we introduced them.
8
00:00:46,470 --> 00:00:54,790
As well as talking a little bit about, how
one can get the CDF if you are given the PDF,
9
00:00:54,790 --> 00:00:59,190
what is the relationship between the PDF and
CDF and vise versa and how do you get to the
10
00:00:59,190 --> 00:01:01,400
PDF given a CDF, symbolically.
11
00:01:01,400 --> 00:01:07,190
And we also spoke about, how you can mathematically
using given a distribution compute it is mean,
12
00:01:07,190 --> 00:01:09,440
compute it is variance and so on.
13
00:01:09,440 --> 00:01:12,820
In this lecture, we are going to focus more
on a single distribution called the normal
14
00:01:12,820 --> 00:01:15,600
distribution, many of you might have already
heard about it.
15
00:01:15,600 --> 00:01:23,030
But, we are also going to look at some applications
associated with this distribution and one
16
00:01:23,030 --> 00:01:27,260
really important application has to do with
inferential statistics, which is something
17
00:01:27,260 --> 00:01:32,320
that will be quite central to the next 4 or
5 lectures.
18
00:01:32,320 --> 00:01:36,970
So, it is in that idea that, we are introducing
the normal distributions.
19
00:01:36,970 --> 00:01:43,710
So, the normal distribution itself you might
have come across it, if you not you might
20
00:01:43,710 --> 00:01:45,860
have heard of this thing called the bell shaped
curve.
21
00:01:45,860 --> 00:01:51,730
So, the distribution itself looks like the
shape of a bell.
22
00:01:51,730 --> 00:01:57,560
So, just like the uniform looks like a flat
line and you know different distribution have
23
00:01:57,560 --> 00:02:03,620
different shapes, this looks like a symmetric
bell, bell shaped curve and the probability
24
00:02:03,620 --> 00:02:08,690
density function of this distribution is characterized
by this formula.
25
00:02:08,690 --> 00:02:16,230
This formula that is shown here and one thing
that is noteworthy is that, this distribution
26
00:02:16,230 --> 00:02:19,420
has two parameters mu and sigma.
27
00:02:19,420 --> 00:02:27,870
So, the distribution itself is defined by
the mean and variance, so the mean and variance
28
00:02:27,870 --> 00:02:33,040
of this distribution go into the formula and
they defined it.
29
00:02:33,040 --> 00:02:40,220
So, there is no point saying tell me, what
is the probability of value x for a normal
30
00:02:40,220 --> 00:02:42,740
distribution, because that question does not
make sense.
31
00:02:42,740 --> 00:02:50,700
In order to say for a distribution with this
mean with this variance, what is the probability
32
00:02:50,700 --> 00:02:53,349
of value equal to a greater than x?
33
00:02:53,349 --> 00:02:59,870
So, that question means more or you know,
what is the probability of finding a value
34
00:02:59,870 --> 00:03:09,180
between x and x plus delta, for a normal distribution
with a mean mu and a sigma equal to sigma
35
00:03:09,180 --> 00:03:11,250
and standard deviation equal to sigma.
36
00:03:11,250 --> 00:03:16,180
But, once you given the mean and sigma, it
is quite simply this formula that you would
37
00:03:16,180 --> 00:03:19,820
use and you can compute the probabilities.
38
00:03:19,820 --> 00:03:28,370
So, what is the mean of a particular normal
distribution defined by mu and sigma?
39
00:03:28,370 --> 00:03:32,400
Well, that is very straight forward, it is
the mu, because the distribution is defined
40
00:03:32,400 --> 00:03:37,030
by mu and the variance is nothing but, your
sigma square.
41
00:03:37,030 --> 00:03:42,410
So, you can, it is quite straight forward
there is well.
42
00:03:42,410 --> 00:03:50,560
The CDF; however, is not something that simplifies
very elegantly.
43
00:03:50,560 --> 00:03:58,959
So, to define the CDF you would still use
your traditional procedure of using the integral
44
00:03:58,959 --> 00:04:02,629
and by the way the normal distribution goes
from minus infinity to plus infinity, so it
45
00:04:02,629 --> 00:04:05,489
make sense to actually use the minus infinity
here.
46
00:04:05,489 --> 00:04:15,709
So, you would actually use the minus infinity
to x, f of x, which is the PDF, which is nothing
47
00:04:15,709 --> 00:04:22,720
but, this formula, so dot d x.
48
00:04:22,720 --> 00:04:31,220
But, while in many distributions this whole
thing simplifies and you are able to do the
49
00:04:31,220 --> 00:04:36,870
integration and there is an actual value,
with the normal distribution it does not simplify
50
00:04:36,870 --> 00:04:40,349
very elegantly without using more complex
algebraic terminology.
51
00:04:40,349 --> 00:04:46,990
So, the CDF is often just stored in tables,
sometimes especially for the normal with mean
52
00:04:46,990 --> 00:04:52,580
0, standard deviation 1 or it is just something
that you integrate each time to get.
53
00:04:52,580 --> 00:04:56,979
Now, this is a very interesting distribution,
because there are lot of things that are normally
54
00:04:56,979 --> 00:04:58,419
distributed.
55
00:04:58,419 --> 00:05:05,180
So, things like peoples height, weight well
height; obviously, with each gender, grades
56
00:05:05,180 --> 00:05:09,830
in a class, marks that people score in exams.
57
00:05:09,830 --> 00:05:15,499
The core idea with the normal distribution
is that, unlike the uniform distribution,
58
00:05:15,499 --> 00:05:17,610
which says everything is equally likely.
59
00:05:17,610 --> 00:05:23,930
It is the normal distribution says that things
in the extremes are less likely, things in
60
00:05:23,930 --> 00:05:30,039
the center are more likely within certain
limits, which is what gives it its characteristics
61
00:05:30,039 --> 00:05:31,330
bell shaped curve.
62
00:05:31,330 --> 00:05:37,439
I mean, if this is any attempt to the bell
shaped curve, we basically saying that things
63
00:05:37,439 --> 00:05:47,770
that are on the extremes, like here and here
are less likely and things in the center like
64
00:05:47,770 --> 00:05:54,550
here are more likely that is why they have
a greater height, with all of these things
65
00:05:54,550 --> 00:05:57,150
the y axis is the probability.
66
00:05:57,150 --> 00:06:04,509
So, if you take a look at something like heights
or let us say weights and you fix a gender,
67
00:06:04,509 --> 00:06:08,139
let us say male and you take something like
people, who are registered for introduction
68
00:06:08,139 --> 00:06:15,400
to data analytic course, then you will find
that there might be very few people, who weigh
69
00:06:15,400 --> 00:06:21,129
less than I do not know 40 kg, so or 50 kgs,
men especially.
70
00:06:21,129 --> 00:06:27,600
And you will find very few of them probably
weighing more than 100 kgs or so and then,
71
00:06:27,600 --> 00:06:31,680
you know and so that kind of tapers off, an
either extreme you find less, in the center
72
00:06:31,680 --> 00:06:32,990
you find more.
73
00:06:32,990 --> 00:06:38,199
But, there are many other distributions are
also like this, but this that this is that
74
00:06:38,199 --> 00:06:42,050
is key feature of being in a bell shaped curve.
75
00:06:42,050 --> 00:06:46,490
The other thing is you know many things after
you remove outliers start to look normal and
76
00:06:46,490 --> 00:06:49,189
we will talk about an example of that.
77
00:06:49,189 --> 00:06:52,029
In this slide, I am just not going to talk
about the other things that we will talk about
78
00:06:52,029 --> 00:06:55,389
in this lecture, so I am not kind of rushing
through it.
79
00:06:55,389 --> 00:07:01,809
We will especially take up from here and till
here and go through them in detail with slides.
80
00:07:01,809 --> 00:07:07,430
But, you are also encounter that there is
this things called the binomial approximation,
81
00:07:07,430 --> 00:07:12,279
which isÃ‰ We briefly spoke about this when
we introduce the binomial distribution that
82
00:07:12,279 --> 00:07:17,129
certain problems, which just by definition
look like they fall so cleanly as a binomial
83
00:07:17,129 --> 00:07:23,039
distribution, for computational reasons could
be quite easily approximated to a normal distribution.
84
00:07:23,039 --> 00:07:28,139
Although, the binomial is a discrete distribution
and the normal is a continuous distribution.
85
00:07:28,139 --> 00:07:31,110
We will also talk about something called the
central limit theorem, which makes the normal
86
00:07:31,110 --> 00:07:38,430
distribution very useful for many applications
and also a very interesting concept per se
87
00:07:38,430 --> 00:07:43,169
and finally, we will look at the idea of sampling
distributions.
88
00:07:43,169 --> 00:07:50,749
The core idea being that, if you take a random
sample of size x of associated with any variables,
89
00:07:50,749 --> 00:07:54,030
so I randomly select five people and measure
their heights.
90
00:07:54,030 --> 00:07:58,319
Is there a distribution associated with the
parameters that I get like the mean and standard
91
00:07:58,319 --> 00:07:59,319
deviation?
92
00:07:59,319 --> 00:08:01,159
But, we will talk about this in greater detail.
93
00:08:01,159 --> 00:08:04,179
So, the first thing is things after removal
of outliers.
94
00:08:04,179 --> 00:08:11,919
So, here is an example of some real data,
where we looked at the total annual household
95
00:08:11,919 --> 00:08:19,689
income and you know, so the graph that you
see to the left hand side is you know, it
96
00:08:19,689 --> 00:08:31,270
is essentially all these households with income
up to and we just stop the x axis at a certain
97
00:08:31,270 --> 00:08:38,620
point and so, we said let us look it income
up to a certain value and the y axis is the
98
00:08:38,620 --> 00:08:39,620
number of households.
99
00:08:39,620 --> 00:08:45,070
So, I have created essentially a histogram,
but that is a proxy for finding the probability
100
00:08:45,070 --> 00:08:46,420
distribution itself.
101
00:08:46,420 --> 00:08:51,620
So, you can think of the probability distribution
as something that looks like this, in this
102
00:08:51,620 --> 00:08:52,830
particular case.
103
00:08:52,830 --> 00:08:57,760
People cannot have incomes less than 0, so
on and so forth.
104
00:08:57,760 --> 00:09:04,060
Now, look at the same graph, where I said
I am not going to look up to 4 lakh rupees
105
00:09:04,060 --> 00:09:07,590
income, but I am just going to concatenate
the x axis in 90000 rupees.
106
00:09:07,590 --> 00:09:13,080
So, the whole idea was to say that some of
these values could have been outliers and
107
00:09:13,080 --> 00:09:16,490
we took a certain value beyond, which we go.
108
00:09:16,490 --> 00:09:21,650
And already you can see that this graph is
starting to look a lot more bell shape.
109
00:09:21,650 --> 00:09:28,400
Probably not perfect, but the core idea is
this, which is that sometimes once even though
110
00:09:28,400 --> 00:09:33,470
the distribution originally might not look
normal with sufficient amount of outlier removal,
111
00:09:33,470 --> 00:09:37,310
the distribution could truly be know.
112
00:09:37,310 --> 00:09:41,260
The second concept that we want to speak with
respect to this is the binomial approximation.
113
00:09:41,260 --> 00:09:47,080
So, let us just very quickly review, what
the binomial distribution is about.
114
00:09:47,080 --> 00:09:53,620
We spoke about, how this term in the PDF of
the binomial distribution was really, n choose
115
00:09:53,620 --> 00:09:54,620
k.
116
00:09:54,620 --> 00:10:01,030
So, n combinations, k combinations out of
n was the core idea and that is fine.
117
00:10:01,030 --> 00:10:05,410
So, if you have problem of the type saying,
what is the probability of finding, you know
118
00:10:05,410 --> 00:10:08,060
3 heads out of 10 tosses.
119
00:10:08,060 --> 00:10:11,290
This works fine, you can substitute the values
get the PDF.
120
00:10:11,290 --> 00:10:18,280
Now, somebody came and asked you saying, what
is the probability of getting 2100 heads out
121
00:10:18,280 --> 00:10:20,700
of 5000 tosses.
122
00:10:20,700 --> 00:10:29,780
Then, you essentially need to, if you want
to use this formula you need to plug in 5000,
123
00:10:29,780 --> 00:10:36,870
you know c 2100 or whatever the number is
and you know; that is a very large number;
124
00:10:36,870 --> 00:10:43,990
that is a very hard computation and you could
5000 and 2100 just an example that could be
125
00:10:43,990 --> 00:10:51,120
5 million and you know 200000 and it is very
hard to do those calculations.
126
00:10:51,120 --> 00:10:59,640
So, one thing that you can do, when n becomes
really large is you can essentially use this
127
00:10:59,640 --> 00:11:05,840
formula that you have for mean and variance
of the binomial distribution and construct
128
00:11:05,840 --> 00:11:11,830
a normal distribution with this mean and this
variance and used to answer distribution related
129
00:11:11,830 --> 00:11:12,830
question.
130
00:11:12,830 --> 00:11:21,310
So, you for instance if there is a 50 percent
chance for instance of a coin falling head
131
00:11:21,310 --> 00:11:31,250
and tails, you can say well the mean of 5000
tosses is 2500, because you have 5000 tosses
132
00:11:31,250 --> 00:11:32,730
times 50 percent probability.
133
00:11:32,730 --> 00:11:39,480
So, that is 2500 and that is your mean and
your variance also you would similarly calculate
134
00:11:39,480 --> 00:11:44,560
by plugging in n equals 5000 and p equals
0.5.
135
00:11:44,560 --> 00:11:49,670
And once you do that, you can essentially
construct a normal distribution with these
136
00:11:49,670 --> 00:11:56,590
parameters and you can answer questions like,
what is the probability of there being more
137
00:11:56,590 --> 00:12:02,670
than 2100 heads or what is the probability
that the number of heads would be between
138
00:12:02,670 --> 00:12:06,330
2000 and 2500 out of 5000 tosses.
139
00:12:06,330 --> 00:12:13,000
You; obviously, cannot answer a question like,
what is the exact probability of getting 2112
140
00:12:13,000 --> 00:12:19,530
heads, because you essentially converted this
to continuous distribution.
141
00:12:19,530 --> 00:12:25,590
And the idea of answering a question like,
what is the exact probability of 2121 tosses
142
00:12:25,590 --> 00:12:33,350
out of 5000 or I mean 2121 heads out of 5000
tosses becomes relatively meaningless, because
143
00:12:33,350 --> 00:12:38,750
as n keeps becoming large the probability
of any one thing exactly occurring becomes
144
00:12:38,750 --> 00:12:41,270
really small becoming close to 0.
145
00:12:41,270 --> 00:12:47,750
So, you are interested more in intervals,
which is in spirit this, what you can do with
146
00:12:47,750 --> 00:12:53,430
continuous distributions and you can use a
normal approximation of the binomial distribution
147
00:12:53,430 --> 00:12:56,670
to achieve that as long as n is fairly large.
148
00:12:56,670 --> 00:13:03,910
Next, we will move to something called the
central limit theorem and the core idea here
149
00:13:03,910 --> 00:13:10,780
is that the aggregation of a sufficiently
large number of independent random variables
150
00:13:10,780 --> 00:13:17,030
results in a random variable, which will be
approximately normal.
151
00:13:17,030 --> 00:13:19,960
So, what is that mean?
152
00:13:19,960 --> 00:13:26,560
It just means that look, if you have some
process and it is some distribution from that
153
00:13:26,560 --> 00:13:33,720
process, so let us say flipping a coin or
throwing a dice is the process.
154
00:13:33,720 --> 00:13:38,350
Now, central limit theorem says as long as
I am aggregating many such processes.
155
00:13:38,350 --> 00:13:44,040
So, if I said instead of asking you the simple
question of the distribution associated with
156
00:13:44,040 --> 00:13:47,780
what I would get, if the roll the dice once.
157
00:13:47,780 --> 00:13:55,030
I instead say I want to know the distribution
associated with rolling the dice twice and
158
00:13:55,030 --> 00:13:56,180
I am going to add them up.
159
00:13:56,180 --> 00:14:00,750
So, the first time I will roll the dice and
then, I get some number I write it down, I
160
00:14:00,750 --> 00:14:05,100
will roll the dice another time and I will
get another number and I am going to add those
161
00:14:05,100 --> 00:14:07,090
two numbers.
162
00:14:07,090 --> 00:14:16,650
Now, the distribution associated with that
sum is also probability distribution, because
163
00:14:16,650 --> 00:14:22,170
you know it is still a random process; there
is still some chance that I can get each value.
164
00:14:22,170 --> 00:14:27,170
I clearly cannot get any value less than 2,
because first time I can roll 1, second time
165
00:14:27,170 --> 00:14:28,170
I can roll 1.
166
00:14:28,170 --> 00:14:34,460
So, I cannot get 1, I can only get 2 as the
minimum value and the maximum value is 12,
167
00:14:34,460 --> 00:14:37,400
I can roll 6 and 6 and that is 12.
168
00:14:37,400 --> 00:14:42,100
So, the idea that is being put forth here
with central limit theorem is that aggregating
169
00:14:42,100 --> 00:14:47,090
it and the word aggregating can be thought
of is, you know taking the sum or you can
170
00:14:47,090 --> 00:14:51,060
think of it is taking the average, both a
forms have, both are essentially the same
171
00:14:51,060 --> 00:14:52,060
thing.
172
00:14:52,060 --> 00:14:56,050
The difference between sum and average is,
average is just divided by the number of times.
173
00:14:56,050 --> 00:15:01,370
But, this form of aggregation of a sufficiently
large number of random variables results in
174
00:15:01,370 --> 00:15:03,990
a random variable, which will be approximately
normal.
175
00:15:03,990 --> 00:15:06,470
So, let us see how that works.
176
00:15:06,470 --> 00:15:12,920
So, on the left hand side of graph out here,
I talk about the distribution associated with
177
00:15:12,920 --> 00:15:18,560
the single row and this view seen and we have
discussed this is uniformly distributed.
178
00:15:18,560 --> 00:15:19,560
Why?
179
00:15:19,560 --> 00:15:22,360
Because, the heights are all in the same,
which is discrete distribution and you see,
180
00:15:22,360 --> 00:15:29,270
it is uniformed distribution and it is 1 by
6; that is what I have shown here today.
181
00:15:29,270 --> 00:15:35,480
On the right hand side, I show you the distribution
of the sum of two rows.
182
00:15:35,480 --> 00:15:39,330
So, you can think of it is, rolling it once
writing it down rolling it second time.
183
00:15:39,330 --> 00:15:44,170
So, you can think of your hands having you
know two dice and you roll both of them and
184
00:15:44,170 --> 00:15:48,300
you sum up, what you see and what shows up.
185
00:15:48,300 --> 00:15:55,890
And already you can see that the distribution
is started moving from uniform to something
186
00:15:55,890 --> 00:15:56,890
else.
187
00:15:56,890 --> 00:16:00,330
This happens to be triangular, but that is
just the first step towards starting to look
188
00:16:00,330 --> 00:16:03,660
more and more bell shape.
189
00:16:03,660 --> 00:16:05,560
What is happening?
190
00:16:05,560 --> 00:16:17,230
Now, although the probabilities of rolling
1 through 6 were uniform, the summations;
191
00:16:17,230 --> 00:16:19,300
however, are not equal.
192
00:16:19,300 --> 00:16:28,900
So, the probability of getting a 2 is lower
than the probability of getting a 3 and that
193
00:16:28,900 --> 00:16:30,340
should be fairly intuitive.
194
00:16:30,340 --> 00:16:35,500
For you to get a 2, you need to roll a 1 the
first time and roll a 1 the second time.
195
00:16:35,500 --> 00:16:41,220
But, there are many ways in which you can
get 3, mainly 2.
196
00:16:41,220 --> 00:16:46,060
You can roll a 1 the first time and then,
the roll a 2 or you can roll a 2 and then,
197
00:16:46,060 --> 00:16:53,250
roll a 1 and that kind of keeps increasing
till you hit the point at 7, where 7 you can
198
00:16:53,250 --> 00:16:58,250
get in so many ways, you can roll a 6 the
first time and then, roll a 1 the second or
199
00:16:58,250 --> 00:17:01,910
if you not, if you thinking of rolling both
of the same time you can get a 6 and 1 or
200
00:17:01,910 --> 00:17:06,089
1 and 6, 3 and 4, 4 and a 3 or 2 and a 5.
201
00:17:06,089 --> 00:17:11,040
So, there are more ways of achieving the same
thing of achieving a 7, there are fewer ways
202
00:17:11,040 --> 00:17:16,909
of achieving a 2 or 3 and so, you already
have something that is looking more like a
203
00:17:16,909 --> 00:17:17,909
normal.
204
00:17:17,909 --> 00:17:24,600
Now, you go further as I discussed; obviously,
the average of two dice is the same as the
205
00:17:24,600 --> 00:17:25,899
sum of two dice.
206
00:17:25,899 --> 00:17:32,669
So, these two graphs are identical, this one
and the next one on the next slide.
207
00:17:32,669 --> 00:17:39,169
These two are identical, except that are changed
to average, so this axis is different.
208
00:17:39,169 --> 00:17:44,740
It goes through 1 through 6, the other one
went from 2 to 12, but these are essentially
209
00:17:44,740 --> 00:17:49,570
identical graphs and this is also a triangular
distribution.
210
00:17:49,570 --> 00:17:51,690
But, look it, what is already happening.
211
00:17:51,690 --> 00:17:57,059
Now, if I say the average of three dice, so
I am going to roll three dice at the same
212
00:17:57,059 --> 00:18:00,129
time or I am going to roll one after the other
after the other.
213
00:18:00,129 --> 00:18:03,169
There are all independent either way.
214
00:18:03,169 --> 00:18:08,630
What you going to see is that, now this is
started looking a little bit more you know
215
00:18:08,630 --> 00:18:12,380
triangular, let us starting to get that little
bit of inflection and so on.
216
00:18:12,380 --> 00:18:16,549
And, so as you increase this number more and
more, as the idea I said you get something
217
00:18:16,549 --> 00:18:24,720
that looks fairly normal and that is about
the central limit theorem is about, that you
218
00:18:24,720 --> 00:18:29,679
aggregate a sufficiently large number of distributions
when you start getting a normal distribution.
219
00:18:29,679 --> 00:18:34,260
Now, this is the really important point for
what we are going to say in next associated
220
00:18:34,260 --> 00:18:35,600
with sampling distributions.
221
00:18:35,600 --> 00:18:39,760
So, we going to start a fresh and sampling
distributions, but I just want you to keep
222
00:18:39,760 --> 00:18:43,220
in mind, what we have discussed now in central
limit theorem.
223
00:18:43,220 --> 00:18:51,009
So, jumping give us, now to sampling distributions
the idea here is very simple.
224
00:18:51,009 --> 00:18:58,730
So, lets the you have some original distribution
and lets for now, say this distribution is
225
00:18:58,730 --> 00:19:03,309
normally distributed.
226
00:19:03,309 --> 00:19:08,970
And let us say this normal distribution has
some mean, which I have shown with this blue
227
00:19:08,970 --> 00:19:18,440
vertical line and lets call that mu, so this
point is mu.
228
00:19:18,440 --> 00:19:24,799
And let us say it has some standard deviation
I am just referring to the dispersion through
229
00:19:24,799 --> 00:19:28,350
the arrow that is not the exact link of the
standard deviation.
230
00:19:28,350 --> 00:19:33,029
But, it is have some standard deviation, which
can be represented is the variance is represented
231
00:19:33,029 --> 00:19:38,929
as sigma square and by the way this mu comma
sigma square is fairly norm nomenclature that
232
00:19:38,929 --> 00:19:44,490
just means it is a normal distribution with
that looks like a with mu and sigma square
233
00:19:44,490 --> 00:19:49,520
all though that looks like an m, so may be
a little bit more like mu norm.
234
00:19:49,520 --> 00:19:58,049
So, you have this distribution, now let us
say N let us give it a name, so let us say
235
00:19:58,049 --> 00:20:04,200
this is the distribution of heights this is
the distribution of what we lets keeps weights.
236
00:20:04,200 --> 00:20:11,049
So, this is the distribution of weights staying
consistence with the previous example of the
237
00:20:11,049 --> 00:20:15,260
men or the male members, who registered for
introduction to data analytics.
238
00:20:15,260 --> 00:20:24,769
So, may be this distribution starts somewhere
at I do not know 50 kgs and goes all the ways
239
00:20:24,769 --> 00:20:30,960
to say 100 kgs this is this is the distribution
technically it can go all the way to infinity.
240
00:20:30,960 --> 00:20:36,380
Because, by definition and normal distribution
can go to infinity and on this side it can
241
00:20:36,380 --> 00:20:39,559
go to minus infinity.
242
00:20:39,559 --> 00:20:44,460
So, this is this is the normal distribution.
243
00:20:44,460 --> 00:20:53,369
Now, let us say that I took a sample from
this distribution.
244
00:20:53,369 --> 00:21:03,369
So, these data points represent the different
samples and in this particular case I have
245
00:21:03,369 --> 00:21:11,429
taken just six samples, but well that can
be more and the heights mean nothing the sample
246
00:21:11,429 --> 00:21:14,340
just mean, where they fall on the distribution.
247
00:21:14,340 --> 00:21:20,379
You can; obviously, use that and build a histogram
and the idea is that if you build a if you
248
00:21:20,379 --> 00:21:27,860
take a sample large enough that histogram
will fit very neatly to this curve, which
249
00:21:27,860 --> 00:21:30,820
is the normal distribution if that sample
is very large.
250
00:21:30,820 --> 00:21:36,679
If the sample is not you might get a different
histogram, but what we most interested in
251
00:21:36,679 --> 00:21:41,340
is taking this sample and computing some key
statistics from this sample.
252
00:21:41,340 --> 00:21:47,430
For instance if you took this sample and computed
the arithmetic mean of the samples you will
253
00:21:47,430 --> 00:21:55,499
take each data point right and let us say
you call it x 1 and the next data point got
254
00:21:55,499 --> 00:21:59,119
called x 2 and so on.
255
00:21:59,119 --> 00:22:08,669
Then, what you looking at is x 1 plus x 2
plus dot, dot, dot divided by N that is your
256
00:22:08,669 --> 00:22:12,409
arithmetic mean and, so you compute an arithmetic
mean.
257
00:22:12,409 --> 00:22:18,080
But ,since you got some finite sample got
like 6 points and may be you have in an others
258
00:22:18,080 --> 00:22:26,980
instance 10 points the question is will your
arithmetic mean always be equal to mu remember
259
00:22:26,980 --> 00:22:34,840
mu was, what defined this distributions this
distribution is by definition mu comma normal
260
00:22:34,840 --> 00:22:36,330
mu comma sigma square.
261
00:22:36,330 --> 00:22:41,610
But, if you take a sample if you take some
axis and compute x bar we differentiate between
262
00:22:41,610 --> 00:22:46,440
mu and x bar meaning mu is the theoretical
mean, where as x bar is the sample mean it
263
00:22:46,440 --> 00:22:47,440
is.
264
00:22:47,440 --> 00:22:54,989
If you take a sample of size n and compute
an x bar will this x bar be equal to mu and
265
00:22:54,989 --> 00:23:03,129
both intuitively another wise the answer is
no theoretically if your sample size is equal
266
00:23:03,129 --> 00:23:08,790
to infinity; that means, you take infinite
number of samples, then perhaps your sample
267
00:23:08,790 --> 00:23:13,070
means will be equal to, then your sample mean
again an theory will be equal to mu.
268
00:23:13,070 --> 00:23:18,809
But, that is not a practical situation, who
takes infinite samples like that that by definition
269
00:23:18,809 --> 00:23:22,280
does not does not make is not very useful.
270
00:23:22,280 --> 00:23:27,570
So, if you take a finite sample and in this
case 6 and another case can be 20 next less
271
00:23:27,570 --> 00:23:32,529
say 20 and you compute a sample mean it is
not going to be equal to mu.
272
00:23:32,529 --> 00:23:43,659
But, the idea is that it might the idea is
that it is also a random variable, what do
273
00:23:43,659 --> 00:23:49,009
you mean by that you mean that say I mean
one time you go about you take a sample you
274
00:23:49,009 --> 00:23:50,269
take a sample of 10.
275
00:23:50,269 --> 00:23:55,880
Let us say and you take the mean of that sample
you will get a particular value that will
276
00:23:55,880 --> 00:23:58,010
not be equal to mu it could be equal to mu.
277
00:23:58,010 --> 00:24:03,029
But, you know it could be little less than
mu little greater than mu now, you go do that
278
00:24:03,029 --> 00:24:08,389
exact same thing again you will get some other
new value.
279
00:24:08,389 --> 00:24:15,830
So, what; that means, is you have a random
variable on your hands and the random variable
280
00:24:15,830 --> 00:24:22,110
is about the distribution of the sample means
for a given size end.
281
00:24:22,110 --> 00:24:29,619
So, that is what that is a core idea associated
with sampling, which is that from the original
282
00:24:29,619 --> 00:24:35,510
distribution you take a sample and you compute
a means and you get a certain value and, but
283
00:24:35,510 --> 00:24:43,000
that value itself belongs to a distribution
that distribution changes based on the sample
284
00:24:43,000 --> 00:24:44,000
size.
285
00:24:44,000 --> 00:24:50,290
So, suppose we were like I said if you took
infinite if your sample size was really large
286
00:24:50,290 --> 00:24:55,239
if it was infinite, then perhaps you will
not even have a distribution you just have
287
00:24:55,239 --> 00:25:01,610
a line out here which is that you almost always
get mu because your sample size is, so large.
288
00:25:01,610 --> 00:25:08,109
But, if you , but think of the other extreme
suppose your sample size is equal to one that
289
00:25:08,109 --> 00:25:14,880
is each time you took one point from the distribution
and you computed the mean of that point, what
290
00:25:14,880 --> 00:25:18,730
is it means to compute the mean of a single
points it is that number itself.
291
00:25:18,730 --> 00:25:27,340
So, let us say we were looking at 50 kgs to
a 100 kgs you took a random sample of one.
292
00:25:27,340 --> 00:25:33,440
So, 75 you know 65 kgs this time that was
the random number I picked the average of
293
00:25:33,440 --> 00:25:34,809
65 is 65.
294
00:25:34,809 --> 00:25:44,340
So, if you had a sample size of one what could
the distribution of sample means look like
295
00:25:44,340 --> 00:25:49,730
the answer is it would look exactly like this
distribution, because you taking a sample
296
00:25:49,730 --> 00:25:54,809
size of one its essentially like and you computing
the average of that, which is nothing but,
297
00:25:54,809 --> 00:25:55,809
that number itself.
298
00:25:55,809 --> 00:26:00,779
So, it is essentially like just re plotting
that graph, now if your sample size was greater
299
00:26:00,779 --> 00:26:07,359
than 1, but less an infinity, what happens
is if your sample size as the sample size
300
00:26:07,359 --> 00:26:12,470
gets larger and larger you are dealing with
the distribution step.
301
00:26:12,470 --> 00:26:19,190
Because, each time you take a sample of, let
us say 5 or 10 or 20 you are going to get
302
00:26:19,190 --> 00:26:23,809
some sample mean from that and that sample
mean is not going to always be equal to the
303
00:26:23,809 --> 00:26:26,159
exact overall population mean.
304
00:26:26,159 --> 00:26:32,879
And, but it is going to be some number nearby
and the idea is that as in this particular
305
00:26:32,879 --> 00:26:38,739
case we had a normal distribution and the
idea is that as long as we taking the average
306
00:26:38,739 --> 00:26:40,809
of a some number.
307
00:26:40,809 --> 00:26:47,929
Let us say 10 or 20 or 30 or 40 or 50 samples
you are going to get a mean.
308
00:26:47,929 --> 00:26:52,639
But, that mean is not certain it is not certain,
what that mean is going to be you know it
309
00:26:52,639 --> 00:26:56,710
is you know it need not be mu you know that
for a fact.
310
00:26:56,710 --> 00:27:00,379
So, what you are essentially getting is another
distribution you getting a random number from
311
00:27:00,379 --> 00:27:05,379
another distribution and this the distribution
of the sample means now it is.
312
00:27:05,379 --> 00:27:10,129
So, happens that when your original distribution
is normally distributed the distribution of
313
00:27:10,129 --> 00:27:14,600
sample means is also normally distributed,
but they might be some questions you have
314
00:27:14,600 --> 00:27:16,549
in this regard.
315
00:27:16,549 --> 00:27:21,200
So, for instance what is the shape of this
distribution the quick answer to the question
316
00:27:21,200 --> 00:27:26,259
is when the original distribution is normal
like we said this distribution of sample means
317
00:27:26,259 --> 00:27:27,639
is also normal.
318
00:27:27,639 --> 00:27:33,229
But, we also went through this central limit
theorem where we said as long as you are aggregating
319
00:27:33,229 --> 00:27:38,840
is sufficiently large number of distributions
the resulting distribution starts to look
320
00:27:38,840 --> 00:27:39,840
normal.
321
00:27:39,840 --> 00:27:46,429
So, even if your original distribution is
not normal as long as your aggregating a sufficiently
322
00:27:46,429 --> 00:27:52,729
large number this distribution of sample means
becomes normal.
323
00:27:52,729 --> 00:28:00,499
So, that is the shape, now what is the mean
of this distribution, what is the mean of
324
00:28:00,499 --> 00:28:06,620
the distribution of sample means the quick
answer is because your just taking the average
325
00:28:06,620 --> 00:28:10,570
of some numbers if you what to do this is
sufficiently large number of times you should
326
00:28:10,570 --> 00:28:13,340
not get a mean that is biased.
327
00:28:13,340 --> 00:28:24,200
So, the mean of this distribution will also
be equal to mu, but it is clear that the standard
328
00:28:24,200 --> 00:28:28,190
deviations are not the same right the standard
deviation would be the same if your sample
329
00:28:28,190 --> 00:28:33,080
size was one in which, case you are not really
sampling you are just taking a single data
330
00:28:33,080 --> 00:28:34,360
point.
331
00:28:34,360 --> 00:28:41,570
But, depending on the size of the sample the
standard deviation is going to be typically
332
00:28:41,570 --> 00:28:51,190
lower a it will always be lower as long as
the sample size is greater than 1 and the
333
00:28:51,190 --> 00:28:55,700
relationship is nothing but, sigma square.
334
00:28:55,700 --> 00:29:02,500
So, if you are using sigma square it would
be sigma square divided by n when you use
335
00:29:02,500 --> 00:29:08,809
small n refer to the sample size, but you
can also think of it as taking the square
336
00:29:08,809 --> 00:29:13,779
root of this you can also think of it is sigma
divided by square root of n.
337
00:29:13,779 --> 00:29:20,649
So, this would be the standard deviation and
this would be the variance.
338
00:29:20,649 --> 00:29:35,940
So, this is var this is the variance and this
is yours standard deviation.
339
00:29:35,940 --> 00:29:43,350
So, that is the that is relationship that
is very useful to remember, now you might
340
00:29:43,350 --> 00:29:44,419
have a question saying.
341
00:29:44,419 --> 00:29:49,509
So, we did all of this work to say let you
have an original distribution you randomly
342
00:29:49,509 --> 00:29:55,629
sample from that distribution and you compute
a mean an arithmetic mean then that arithmetic
343
00:29:55,629 --> 00:30:01,669
mean that you compute belongs has a distribution
of its own and we spoke about the mean and
344
00:30:01,669 --> 00:30:03,889
shape and standard deviation.
345
00:30:03,889 --> 00:30:09,370
Similarly, if you take a sample from the original
distribution and you compute a standard deviation
346
00:30:09,370 --> 00:30:10,970
of that sample.
347
00:30:10,970 --> 00:30:18,220
Then, would you be is that sample standard
deviation also coming from a distribution
348
00:30:18,220 --> 00:30:23,830
and the quick answer to that question is yes
and in the if you are using a normal distribution
349
00:30:23,830 --> 00:30:29,289
to start with that is distribution of the
sample standard deviations tends to be chi
350
00:30:29,289 --> 00:30:33,750
square distributed and that is also something
that we will encounter.
351
00:30:33,750 --> 00:30:39,799
But, are focus for now has when on the distribution
of sample means and the important things to
352
00:30:39,799 --> 00:30:46,390
take away are if you start with an original
normal distribution, then by theory you will
353
00:30:46,390 --> 00:30:52,190
have a normal distribution for your sample
means for whatever sample size.
354
00:30:52,190 --> 00:30:57,539
But, given that we also learnt about the central
limit theorem even if you start with an original
355
00:30:57,539 --> 00:31:04,840
distribution that is not normal as long as
you aggregate sufficiently large number of
356
00:31:04,840 --> 00:31:11,360
as long as your sample size is large enough
and the distribution of sample means is likely
357
00:31:11,360 --> 00:31:13,940
to be normally distributed.
358
00:31:13,940 --> 00:31:19,820
We spoke about how the mean of the distribution
of sample means should be no different from
359
00:31:19,820 --> 00:31:23,779
the mean of the original distribution, because
you are not adding or subtracting any number
360
00:31:23,779 --> 00:31:27,899
you just taking average numbers you are just
taking numbers and taking the average of that.
361
00:31:27,899 --> 00:31:34,099
So, if you do that many times the distribution
that you get from that should also be centered
362
00:31:34,099 --> 00:31:42,549
around the overall grand mean of the original
distribution we spoke about how the standard
363
00:31:42,549 --> 00:31:43,549
deviation.
364
00:31:43,549 --> 00:31:49,849
However, keeps reducing, so as long as you
are aggregating more numbers your standard
365
00:31:49,849 --> 00:31:54,820
deviation will reduce in this rate it which,
is reduces is as a function of this square
366
00:31:54,820 --> 00:31:57,299
root of n the sample size.
367
00:31:57,299 --> 00:32:04,330
So, sigma divided by square root of N is the
rated, which the sample size your standard
368
00:32:04,330 --> 00:32:10,169
deviation of the distribution of sample means
is with respect to the original distribution
369
00:32:10,169 --> 00:32:14,419
and actually that phenomena you should be
able to see even in the examples that we took
370
00:32:14,419 --> 00:32:19,249
of the central limit theorem just two kind
of show that you again see in this particular
371
00:32:19,249 --> 00:32:25,509
example and I will erase the red mark in this
particular example I was focusing more and
372
00:32:25,509 --> 00:32:28,409
showing a central limit theorem about how
the shape changes.
373
00:32:28,409 --> 00:32:39,389
But, if you take this graph, which is this
uniform distribution out here and there is
374
00:32:39,389 --> 00:32:45,000
some standard deviation out here right the
sum spread around the mean correct this sum
375
00:32:45,000 --> 00:32:46,000
spread.
376
00:32:46,000 --> 00:32:54,320
Now, take a look at the average of two dice
the mean is the same centered around the 3.5,
377
00:32:54,320 --> 00:33:00,950
but this spread has decreased right before
this spread was like this.
378
00:33:00,950 --> 00:33:06,440
So, there was there was the higher probability
of seeing values in the in the earlier graph
379
00:33:06,440 --> 00:33:13,720
up here you had data points that were with
the higher probability further away from the
380
00:33:13,720 --> 00:33:16,279
center at 3.5.
381
00:33:16,279 --> 00:33:23,529
Now, you do not see the probability of finding
points far away from the center has reduced
382
00:33:23,529 --> 00:33:29,429
the these are low probabilities, but the probability
finding things close to the center is increased.
383
00:33:29,429 --> 00:33:34,259
So, therefore, the standard deviation of this
distributions is lower than the standard deviation
384
00:33:34,259 --> 00:33:41,110
of the uniform distribution and that effect
is going to just increase the probability
385
00:33:41,110 --> 00:33:48,200
of extremes keeps becoming lower there by
the standard deviation becomes lower given
386
00:33:48,200 --> 00:33:52,460
that you are for all of these you are starting
with one and ending with 6.
387
00:33:52,460 --> 00:33:58,299
So, the that example shows both the central
limit theorem meaning the change in the shape,
388
00:33:58,299 --> 00:34:03,889
but you can also capture this idea which is
the distribution associated with sample means
389
00:34:03,889 --> 00:34:11,129
and in the previous cases the sample size
was two in the first example and the sample
390
00:34:11,129 --> 00:34:19,120
size was three right because we were averaging
two dice or three dice.
391
00:34:19,120 --> 00:34:26,590
So, the distribution that results from that
is having a lower standard deviation.
392
00:34:26,590 --> 00:34:36,179
So, that should give you an idea of the whole
idea behind sampling distributions and this
393
00:34:36,179 --> 00:34:39,080
is the good concept to revise or understand
deeply.
394
00:34:39,080 --> 00:34:44,679
Because, a lot of inferential statistics is
based half of this and with that we conclude
395
00:34:44,679 --> 00:34:49,350
our lecture on random variables and probability
distributions.
396
00:34:49,350 --> 00:34:53,179
We will continue a next class and focus more
on inferential statistics.