1
00:00:19,560 --> 00:00:24,609
Good morning. This is Rudra Pradhan here.
Welcome to NPTEL project on Econometric Modelling.
2
00:00:24,609 --> 00:00:29,980
Today, we will discuss the reliability of
bivariate econometric modelling. So, in the
3
00:00:29,980 --> 00:00:36,199
last class, we have discussed the entire setup
of bivariate econometric modelling where the
4
00:00:36,199 --> 00:00:43,199
problem boundaries with respect to two variables.
So, we we we we have discussed the details
5
00:00:43,440 --> 00:00:49,210
about the estimation of alpha hat and beta
hats. So, after getting the estimated value
6
00:00:49,210 --> 00:00:56,210
of alpha hat and beta hats, so then, our problem
setup is completely different, so far, as
7
00:00:56,210 --> 00:00:58,330
forecasting or policy implication is concerned.
8
00:00:58,330 --> 00:01:05,330
So, let me highlight what is a, what is the
last class discussion. So, for bivariate models
9
00:01:05,430 --> 00:01:10,770
with respect to Y and X, Y equal to alpha
plus beta X and we have to introduced the
10
00:01:10,770 --> 00:01:17,630
error terms U, and the estimated model equal
to Y, Y hat equal to alpha hat plus beta hat
11
00:01:17,630 --> 00:01:24,630
X. And where alpha hat equal to Y minus Y
minus Y bar minus beta hat X bar and beta
12
00:01:29,829 --> 00:01:36,829
hat equal to summation X Y by summation X
squares, where X equal to X minus X bar and
13
00:01:37,329 --> 00:01:42,479
Y equal to Y minus Y bar. This was our last
class discussions.
14
00:01:42,479 --> 00:01:48,969
But, you know the moment from the first equation
to second equation, where the estimated models,
15
00:01:48,969 --> 00:01:53,869
we have describes Y hat equal to alpha hat
plus between hat x, so, we have applied the
16
00:01:53,869 --> 00:01:58,179
technique called as a OLS techniques. Of course,
there are several techniques like ordinary
17
00:01:58,179 --> 00:02:02,229
least square methods, general generalized
least square method, weighted least square
18
00:02:02,229 --> 00:02:07,060
methods and maximum likelihood estimator.
So, here we start with OLS technique, that
19
00:02:07,060 --> 00:02:11,620
is ordinary least square method, because it
is a it is very simple technique, easy to
20
00:02:11,620 --> 00:02:16,870
understand and easy to apply; that is how,
we we start with OLS technique. Then, we subsequently
21
00:02:16,870 --> 00:02:22,660
move to GLS techniques, OLS techniques and
MLE technique where, we need to have, because
22
00:02:22,660 --> 00:02:26,700
some of the problems, it is very difficult
to handle through OLS techniques. So, that
23
00:02:26,700 --> 00:02:32,170
is how, we have to apply the other techniques
like GLS and WLS and maximum likelihood estimator.
24
00:02:32,170 --> 00:02:39,170
But, in the last class, we have little bit
problem regarding the OLS restrictions. So,
25
00:02:39,430 --> 00:02:45,489
before you applying OLS technique then obviously,
we will get the estimated model. But, the
26
00:02:45,489 --> 00:02:50,549
application of OLS is based on certain assumptions.
So, before we check the reliability part of
27
00:02:50,549 --> 00:02:55,879
the models, so, let me first highlight the
assumption related to OLS technique because
28
00:02:55,879 --> 00:03:01,510
which last class, we have not discussed details,
so, we have to little bit finish about this
29
00:03:01,510 --> 00:03:08,510
particular common, because the main problems
main problems, we will derive from this particular
30
00:03:09,829 --> 00:03:13,310
assumptions because that is the starting point
of econometric modelling.
31
00:03:13,310 --> 00:03:18,689
So, you see here. So, what are the assumption
regarding this bivariate modelling? So, that
32
00:03:18,689 --> 00:03:25,689
is our last class discussion. So, for this
particular models, Y equal to alpha plus beta
33
00:03:26,590 --> 00:03:31,170
X plus U. So, we have three parts. This is
one part, this is another part and this is
34
00:03:31,170 --> 00:03:35,400
another. This is called as a total variation,
this is explained variation, this is unexplained
35
00:03:35,400 --> 00:03:39,829
variations. So, now assumption related to
this particular component, this particular
36
00:03:39,829 --> 00:03:45,109
component and this particular component and
also overall fitness of this particular model;
37
00:03:45,109 --> 00:03:50,980
So, the first standard assumption is that
model must be linear in parameters, linear
38
00:03:50,980 --> 00:03:56,180
in parameters linear in parameters. So, what
we have already mentioned here? Here, alpha
39
00:03:56,180 --> 00:04:03,109
beta are represented in the form of linear
and that too, you can say without any problem.
40
00:04:03,109 --> 00:04:08,299
So, linear in parameters.
Second, our second assumption is that X should
41
00:04:08,299 --> 00:04:15,299
be, your independent variable should be non-stochastic.
This should be non-stochastic. So, third item
42
00:04:18,440 --> 00:04:24,310
related to the error component. So, what we
have discussed last class? Mean of error term
43
00:04:24,310 --> 00:04:30,940
should be equal to zero. Then, fourth assumption
is the variance of error terms should be equal
44
00:04:30,940 --> 00:04:37,940
to unit, that is sigma square U. And fourth
fifth assumption is that covariance of U i
45
00:04:38,190 --> 00:04:45,190
U j should be equal to zero. So, here this
variance of U means, it is nothing but covariance
46
00:04:46,569 --> 00:04:53,569
of U i U j is equal to sigma square U, provided
i equal to j. If i not equal to j and then,
47
00:04:55,880 --> 00:05:01,280
this will come down to this stage. So, now
this particular this particular structure
48
00:05:01,280 --> 00:05:07,690
is called as a homoscedasticity. If the variance
of error term is exactly equal to 1 or constant
49
00:05:07,690 --> 00:05:14,050
or you can say unique or this particular presentation
is called as a homoscedasticity issue. So,
50
00:05:14,050 --> 00:05:17,949
now, if that is not the case, then this problem
called as a heteroscedasticity issue and that
51
00:05:17,949 --> 00:05:23,030
is the serious problem under econometric modelling.
We have a special component on a heteroscedasticity.
52
00:05:23,030 --> 00:05:27,470
We will discuss detail about that particular
issue. If there is a heteroscedasticity, how
53
00:05:27,470 --> 00:05:32,069
to detect it? And how to solve this particular
component? Until, unless, you solve that particular
54
00:05:32,069 --> 00:05:35,979
component, this model estimated model cannot
be used for forecasting.
55
00:05:35,979 --> 00:05:42,250
So, then covariance of U i and U j. Covariance
of U I U j, it should be equal to zero. If
56
00:05:42,250 --> 00:05:47,770
it is not equal to zero, then there is problem
called as a autocorrelation or sometimes,
57
00:05:47,770 --> 00:05:54,740
it is otherwise called as a serial correlation.
So, now, if we have estimated models, then
58
00:05:54,740 --> 00:05:59,229
of course, with the help of estimated model,
we will get the error terms. So, now, once
59
00:05:59,229 --> 00:06:04,280
we have error term, we will create several
other variables related to error terms like
60
00:06:04,280 --> 00:06:11,280
U 1, U 2, U 3. So, now we have to track the
relationship between U 1, U 2, U 3, like this.
61
00:06:11,889 --> 00:06:18,889
So, if these relationships by default cross
correlation, if not equal to zero, then it
62
00:06:21,669 --> 00:06:28,430
will lead to serial correlation. For instance,
like this: we have U 1, U 2 up to U n. So,
63
00:06:28,430 --> 00:06:35,430
this side U 1, U 2 up to U n.
So, now, this is we will correlate with U
64
00:06:35,710 --> 00:06:42,710
1 1, U 1 2, U 1 n. Similarly, U 2 1, U 2 2,
U 2 n. So, U n 1 up to U n n. So, now, this
65
00:06:47,110 --> 00:06:52,330
particular problem is called as a homoscedasticity
and heteroscedasticity issue. This particular
66
00:06:52,330 --> 00:06:57,289
problem is called as a homoscedasticity or
heteroscedasticity problem. If that is, if
67
00:06:57,289 --> 00:07:04,289
all these other items like this, these these
particular you know diagonals, this particulars,
68
00:07:04,349 --> 00:07:09,419
this is called as a off diagonals and on diagonals.
These are these are should be equal to zero,
69
00:07:09,419 --> 00:07:13,810
if that is not the case, then this particular
problem is called as a serial correlation
70
00:07:13,810 --> 00:07:20,810
and autocorrelation. It has a serious problem
again in the case of modelling particularly
71
00:07:21,020 --> 00:07:24,460
converted modelling.
So, when there is serial correlation; obviously,
72
00:07:24,460 --> 00:07:31,460
when we will go for any typical problem or
any estimated model, there will be definitely
73
00:07:32,409 --> 00:07:38,030
autocorrelations. There is standard techniques
are there, standard statistics are there to
74
00:07:38,030 --> 00:07:43,509
know the exact value of autocorrelation or
serial correlation. There is certain limits,
75
00:07:43,509 --> 00:07:48,659
if it will cross that limits, means upper
limit or lower limit, then it will problem
76
00:07:48,659 --> 00:07:53,800
for estimated model. So, we have to find out
how that particular problem can be solved
77
00:07:53,800 --> 00:07:56,199
and that model can be used for forecasting.
78
00:07:56,199 --> 00:08:00,990
So, this particular structure is called as
a autocorrelation issue. Similarly, this is
79
00:08:00,990 --> 00:08:07,990
fifth standard assumptions, then there is
another assumption called as a this is sixth
80
00:08:09,060 --> 00:08:15,009
assumptions. Sixth assumption related to covariance
of U and X should be equal to zero because
81
00:08:15,009 --> 00:08:22,009
our model is equal to Y is function of X and
U. So, there are three variables. We start
82
00:08:23,830 --> 00:08:30,210
with two variable initially but ultimately,
we will get a variables called as a U. So,
83
00:08:30,210 --> 00:08:36,770
that is difference between U Y minus Y hat,
that is estimated model. Covariance X must
84
00:08:36,770 --> 00:08:42,400
be equal to zero, otherwise it will create
problem. So, I will detail discuss next next
85
00:08:42,400 --> 00:08:49,400
items.
So, covariance covariance upon you know X
86
00:08:49,420 --> 00:08:56,420
i or X j is equal to zero. In fact, this X
I X j is not actual problem in bivariate model.
87
00:08:58,410 --> 00:09:02,899
So, when we will go for multivariate model,
then one of the standard constant is that
88
00:09:02,899 --> 00:09:09,339
there are multiple number of independent variables
like X 1, X 2, X 3, like this. So, now all
89
00:09:09,339 --> 00:09:14,269
these independent variables should be independent,
means the relationship between all these variables
90
00:09:14,269 --> 00:09:19,040
should be independent. There should not be
any again association between these independent
91
00:09:19,040 --> 00:09:22,050
variables. If there is such problem, then
it is called as a multi-coordinate issue.
92
00:09:22,050 --> 00:09:27,140
So, we will discuss detail when we will go
for multivariate modelling because we are
93
00:09:27,140 --> 00:09:32,459
in the bivariate models and this particular
problem may not be there, means it should
94
00:09:32,459 --> 00:09:37,980
not be there in the case of bivariate model.
So, that is how there is a covariance between
95
00:09:37,980 --> 00:09:44,329
U and X. Now, since U is in the right side
of this particular equation, obviously, there
96
00:09:44,329 --> 00:09:48,920
are two variables X and U. So, now, if we
will integrate with this multi-coordinate
97
00:09:48,920 --> 00:09:53,120
issue X and Y should be totally independent
that should, that means, there there should
98
00:09:53,120 --> 00:09:57,720
not be any association between X and U. If
it is there, then it will be problem.
99
00:09:57,720 --> 00:10:04,519
So, now, if we will integrate this, all these
equation in structurally with respect to error
100
00:10:04,519 --> 00:10:11,000
term, then error term should be followed by
normal distribution zero mean and unit standardizations.
101
00:10:11,000 --> 00:10:17,269
Now, zero mean and unit standard. This is
the most important one, most important one
102
00:10:17,269 --> 00:10:23,940
for this particular econometric modelling.
So, then Ninth there should be variation on
103
00:10:23,940 --> 00:10:30,940
X, there should be variation on X. Of course,
if there is variation on X then there should
104
00:10:31,510 --> 00:10:37,260
be variation on Y. For instance, we have Y
series and we have X series, forget about
105
00:10:37,260 --> 00:10:42,190
Y in the mean times. So, now, for every Y
there is X items. So, now, if the X items
106
00:10:42,190 --> 00:10:48,450
are very much similar, then obviously, we
will get our model cannot be better fitted.
107
00:10:48,450 --> 00:10:54,959
For instance for every Y in the X values are
2 2 2 2 say or 5 5 5 say, then obviously,
108
00:10:54,959 --> 00:11:01,200
there is no such variations. So, in this particular
context, this model cannot be better fitted.
109
00:11:01,200 --> 00:11:06,680
So, this is by default we have to replace
it, otherwise you cannot get a best fitted
110
00:11:06,680 --> 00:11:13,089
models. These are the inspection, before you
go for you can say estimated models. So, because
111
00:11:13,089 --> 00:11:17,639
to get the estimated model, you have to put
lots of labor and efforts. So, after that,
112
00:11:17,639 --> 00:11:23,250
if the model is not reliable, then obviously,
you have to redesign and re-estimate to get
113
00:11:23,250 --> 00:11:28,610
again better fitted model, but there are certain
clues before you go for estimation. So, you
114
00:11:28,610 --> 00:11:33,120
have to clarify all these detail and one of
such clue is like this, there should be variation
115
00:11:33,120 --> 00:11:38,649
on both Y and X, but X is very important here
because X is a, X influence Y only.
116
00:11:38,649 --> 00:11:44,089
So, there should be some kind of variability
in X. So, this is another assumption related
117
00:11:44,089 --> 00:11:51,089
to variance of X and model must be correctly
specified. Model must be, model must be correctly
118
00:11:51,389 --> 00:11:58,389
specified. Model must be correctly specified.
What is what is mean that? For instance, we
119
00:12:02,519 --> 00:12:07,860
are just representing Y equal to alpha plus
beta X. So, there are many ways we can we
120
00:12:07,860 --> 00:12:13,209
can represent this particular relationship
between Y and X. For instance, we can also
121
00:12:13,209 --> 00:12:19,430
write Y equal to alpha plus beta to the power
X or we can write Y equal to Y equal to alpha
122
00:12:19,430 --> 00:12:26,430
alpha beta by X, like this or we can also
write Y equal to alpha plus beta X square
123
00:12:28,670 --> 00:12:34,220
plus gamma X, like this. So, there are many
ways many ways we can represent the relationship
124
00:12:34,220 --> 00:12:40,420
between Y and X, but for a particular problem
or particular instance every every equation
125
00:12:40,420 --> 00:12:44,449
may not be very fitted for the econometric
modelling.
126
00:12:44,449 --> 00:12:50,920
So, we like to know, what is the best mathematical
relationship, we have to use to get the best
127
00:12:50,920 --> 00:12:56,399
fitted models. So, the for that you have to
do lots of homework and you have to test,
128
00:12:56,399 --> 00:13:00,760
there are certain procedures of test or graphically,
you can plot it and you like to know what
129
00:13:00,760 --> 00:13:05,000
is the exact lesson, whether it is linear
one or non-linear one. Then, accordingly you
130
00:13:05,000 --> 00:13:09,320
have to proceed further. In fact, if it non-linear
one, then it may be very complex problem
131
00:13:09,320 --> 00:13:15,149
So, what we have to do in that linear setup,
you have to first transfer to the linear format
132
00:13:15,149 --> 00:13:20,240
by using transformation rule. There are various
transformation rule through which you can
133
00:13:20,240 --> 00:13:24,510
transfer the non-linear program to linear
one. Then, you have to go for estimation.
134
00:13:24,510 --> 00:13:29,779
So, if you do not go for you know transformation,
then it will create lots of problem again
135
00:13:29,779 --> 00:13:36,779
and in typically, it is problem with OLS technique.
So, you must be very careful about this. Then,
136
00:13:38,610 --> 00:13:44,519
then next is number of observations number
of observations should be greater than to
137
00:13:44,519 --> 00:13:51,010
number of variables number of variables number
of variables, number of observations should
138
00:13:51,010 --> 00:13:55,399
be greater than to number of variables. For
instance, you see this this particular problem
139
00:13:55,399 --> 00:14:01,279
for bivariate econometric it is not a serious
issue, because we have two variables only.
140
00:14:01,279 --> 00:14:07,589
So, obviously, most of the problems is more
than, you will get more than two sample. So,
141
00:14:07,589 --> 00:14:11,220
if it is more than two samples, then obviously,
there is no issue at all.
142
00:14:11,220 --> 00:14:16,860
But, when there is multivariate problem where
the problem setup consist of you can say so
143
00:14:16,860 --> 00:14:21,230
many variables like, say ten, twenty or thirty
or hundred, then that time sample size is
144
00:14:21,230 --> 00:14:26,029
very typical issue. So, you have to very careful
how many variables you must be putting in
145
00:14:26,029 --> 00:14:32,070
the system and how many observations are there.
So, by default it should be whatever variables
146
00:14:32,070 --> 00:14:36,730
you are using in a particular system, your
number of sample size should be five times
147
00:14:36,730 --> 00:14:42,110
than that. So, that means, if the variables
are 5 in numbers in a particular system or
148
00:14:42,110 --> 00:14:47,160
in particular modelling setup, then obviously,
sample size at least should be 5 into 5, so
149
00:14:47,160 --> 00:14:53,509
that means, at least Twenty five, but Twenty
five is also very small one. By standard rule,
150
00:14:53,509 --> 00:14:58,720
your sample size should be greater than to
Thirty. If it is greater than to means, Thirty
151
00:14:58,720 --> 00:15:04,630
two, exactly, if it is greater than Thirty
two, then obviously, there is, you can say
152
00:15:04,630 --> 00:15:08,220
we have two different sample structures, small
sample and large sample.
153
00:15:08,220 --> 00:15:14,060
So, minimum number of sample must be greater
than to Thirty two or for a particular problem,
154
00:15:14,060 --> 00:15:18,769
whether it is bivariate and multivariate.
And if it is multivariate, then obviously,
155
00:15:18,769 --> 00:15:22,920
then there is second criteria. So that means,
the second criteria is number of variables
156
00:15:22,920 --> 00:15:28,230
and there sample size should be multiple to
five. So, now, if it is ten variables, then
157
00:15:28,230 --> 00:15:32,959
obviously, sample size should be at least
fifty. If it is less than that, then there
158
00:15:32,959 --> 00:15:39,040
is problem of modelling or estimation process.
So, if it is more than that, then no doubt
159
00:15:39,040 --> 00:15:43,699
about it. The model accuracy is very high
when the sample size is very high. The model
160
00:15:43,699 --> 00:15:48,680
will be less accurate, if the sample size
is very small. So, that is how, you have to
161
00:15:48,680 --> 00:15:54,120
go. Always go for higher and higher sample
size. So, that, the model accuracy will be
162
00:15:54,120 --> 00:16:00,209
very very perfect. So, that is how the reliability
part is concerned because reliability part,
163
00:16:00,209 --> 00:16:06,240
the objective of reliability part just to
check whether the estimated model can be used
164
00:16:06,240 --> 00:16:12,029
for forecasting or can be, we call it that
it is best fitted model. So, that for that,
165
00:16:12,029 --> 00:16:17,980
we we have different test structures. By this
process of test, we like to say, we like to
166
00:16:17,980 --> 00:16:22,449
be in a position to say that this particular
model can be used for forecasting or you can
167
00:16:22,449 --> 00:16:29,449
say policy use. So, this is eleventh assumptions.
So, it is well number of observation greater
168
00:16:30,139 --> 00:16:36,410
than to number of variables.
Then, last but not the least, then relation
169
00:16:36,410 --> 00:16:43,410
should be very identify one. So, relationships
relationship should be identified, properly
170
00:16:45,199 --> 00:16:52,199
identified, should be properly identified.
For instance, you see here, this particular
171
00:16:56,199 --> 00:17:00,889
problem for two variable model, it is not
a issue, but when you will go for multivariate
172
00:17:00,889 --> 00:17:05,829
models, particularly there is, we have a component
called as a simultaneous equation modelling
173
00:17:05,829 --> 00:17:10,819
or structural equation modelling. In that
context, you know model identification is
174
00:17:10,819 --> 00:17:16,490
very very serious issue. If you do not identify
properly that model, then the process of estimation
175
00:17:16,490 --> 00:17:20,089
or its interpretation or reliability will
be get affected.
176
00:17:20,089 --> 00:17:25,420
So, what we have to do first means, you must
have a very sound knowledge, that is theoretical
177
00:17:25,420 --> 00:17:30,390
knowledge before we go for you know fitting
the relationship. So, because, so far as a
178
00:17:30,390 --> 00:17:36,610
interpretation part is concerned, so, the
theoretical knowledge or theoretical background
179
00:17:36,610 --> 00:17:42,150
will give you lots of ideas, how to interpret
this particular model or when you will go
180
00:17:42,150 --> 00:17:46,290
for forecasting issue.
So, these are the standard assumptions through
181
00:17:46,290 --> 00:17:53,180
which OLS technique is practically feasible.
If this assumptions are on other way around,
182
00:17:53,180 --> 00:17:58,360
then obviously, these particular technique
is you can say, the model you will get through
183
00:17:58,360 --> 00:18:03,200
this particular technique, cannot be used
for forecastings or you can you cannot deal
184
00:18:03,200 --> 00:18:09,640
for any use because it will give you wrong
signals or you can say wrong indications.
185
00:18:09,640 --> 00:18:16,500
So, now now what is the best idea is the,
so, when we have a model, so, Y equal to alpha
186
00:18:16,500 --> 00:18:22,170
plus beta X, then we have a received the models
called as a Y hat equal to alpha hat plus
187
00:18:22,170 --> 00:18:27,390
beta hat X. So, now, we have two standard
estimators. So, this is called as a alpha
188
00:18:27,390 --> 00:18:32,880
estimators and beta estimators. You know,
this particular estimators should follow certain
189
00:18:32,880 --> 00:18:39,880
principles. So, there are certain principles
behind this particular estimators. This particular
190
00:18:40,720 --> 00:18:47,720
this particular principle is called as a represented
as the term called as a BLUE; Best Linear
191
00:18:47,860 --> 00:18:52,480
Unbiased Estimator. So, that means, whatever
estimators we are receiving; that is, alpha
192
00:18:52,480 --> 00:18:57,900
hat and beta hat, that should be best, that
should be linear one, that should be unbiased
193
00:18:57,900 --> 00:19:00,960
one.
So, now, you see here what is linear? Linear
194
00:19:00,960 --> 00:19:07,960
means its linear combination of variable.
For instance, X bar is nothing but summation
195
00:19:08,260 --> 00:19:15,260
X I divide by n. So, that means, 1 by n 1
into X 1 plus 1 by n 2 plus into X 2 into
196
00:19:17,770 --> 00:19:24,770
X 2. So, 1 by n n you know n 3 X 3 like this.
So, these are all linear combination of x.
197
00:19:26,200 --> 00:19:32,940
So, what I like to say is that, whatever estimators
value we have received, alpha and beta hat,
198
00:19:32,940 --> 00:19:39,940
so it should be linear in nature. So, that
you know the theorem can be meaningful. So,
199
00:19:39,950 --> 00:19:46,770
that is how it is called as a best linear
.With the classical linear regression models,
200
00:19:46,770 --> 00:19:52,290
the estimators what we have received by the
process of estimation, should be linear, unbiased,
201
00:19:52,290 --> 00:19:59,290
and you can say it must have a minimum variance.
So, what is the what is that minimum variance.
202
00:19:59,550 --> 00:20:06,550
So, before that you must have linear in natures
then second then there is called as a minimum
203
00:20:06,780 --> 00:20:12,880
variance or you say first start with unbiasedness.
Unbiasedness means, what is first biased?
204
00:20:12,880 --> 00:20:18,870
Biased means difference between expected value
of you can say let say we start with beta
205
00:20:18,870 --> 00:20:24,300
parameter only. So, estimated beta hat should
equal to beta hat. So, this is true beta.
206
00:20:24,300 --> 00:20:29,860
So, now if the estimator beta is exactly equal
to true beta, so, there is no gap at all,but
207
00:20:29,860 --> 00:20:35,650
we are expecting something what you know actually,
something that gap should be very very minimum.
208
00:20:35,650 --> 00:20:40,580
If that gap should be high, then obviously
it will create lots of problems. So, that
209
00:20:40,580 --> 00:20:45,460
means, for every problem ,whatever estimators
we are receiving, that should be unbiasedness.
210
00:20:45,460 --> 00:20:51,590
So that means, the estimated value should
be exactly equal to truthful. There should
211
00:20:51,590 --> 00:20:57,260
not be drastic difference between this. As
long as the difference is increasing, then
212
00:20:57,260 --> 00:21:03,810
the model accuracy will be getting affected.
So, this this should be taken care. So, this
213
00:21:03,810 --> 00:21:10,810
is second properties of this particular estimator.
Third is the minimum variance. third is minimum
214
00:21:10,810 --> 00:21:16,130
variance minimum variance. So, what is minimum
variance? Now minimum variance is, you have
215
00:21:16,130 --> 00:21:22,520
to find out, you know variance of variance
of beta hat is should be less than equal to
216
00:21:22,520 --> 00:21:26,190
variance of variance of another estimator
say beta star.
217
00:21:26,190 --> 00:21:31,680
So, that means, whatever estimator you have
received, that should be minimum variance.
218
00:21:31,680 --> 00:21:38,190
You see, variance is variance is an indicator,
through which we can just the accuracy of
219
00:21:38,190 --> 00:21:45,190
particular theme or particular item. If the
variance is very high then obviously, obviously,
220
00:21:46,250 --> 00:21:51,290
that model can be you can say cannot be consider
as the best fitted model.
221
00:21:51,290 --> 00:21:56,920
So, just like you know we have discussed details
in the univariate data setup, where this particular
222
00:21:56,920 --> 00:22:01,580
component is called as a dispersion issue.
So that means, the variation from the center
223
00:22:01,580 --> 00:22:06,540
point should not be so drastic. So, it should
be uniformly distributed and whatever the
224
00:22:06,540 --> 00:22:11,590
variance we have received, that should be
the minimum one. If it is minimum one, then
225
00:22:11,590 --> 00:22:15,350
the model can be treated as the best fitted
one. If it is not minimum then obviously,
226
00:22:15,350 --> 00:22:20,600
it will get affected.
So, variance should be at the lower level.
227
00:22:20,600 --> 00:22:25,740
Then forth one is, it should be consistent.
Consistence means, it is integration about
228
00:22:25,740 --> 00:22:31,980
minimum variance properties and unbiasedness
property; that means, its e beta for estimated
229
00:22:31,980 --> 00:22:38,260
beta hat so, e beta hat should be equal to
beta and variance of beta should be less than
230
00:22:38,260 --> 00:22:44,720
to another variance, say beta beta star.
So, when we will go for unbiasedness property
231
00:22:44,720 --> 00:22:48,850
only, then there is no link with minimum variance.
But when we will go for minimum variance,
232
00:22:48,850 --> 00:22:54,970
then there is no no point to go for you can
say unbiasedness property. So that means,
233
00:22:54,970 --> 00:23:01,700
there should be there should be, you can say,
two things are completely different, but so
234
00:23:01,700 --> 00:23:05,610
far, as consistency property is concerned,
this integration about unbiasedness property
235
00:23:05,610 --> 00:23:12,080
and So, unbiased property is that expected
value estimated beta value is equal to true
236
00:23:12,080 --> 00:23:15,060
value that is not beta hat. In fact, that
is beta only.
237
00:23:15,060 --> 00:23:20,220
So, e beta hat equal to beta and variance
of beta hat should be less than to another
238
00:23:20,220 --> 00:23:25,950
variance of estimator. So, now, if that is
the case, then, this particular theorem is
239
00:23:25,950 --> 00:23:32,360
called as a BLUE; Best Linear Unbiased Estimator.
So, now every model has to be tested with
240
00:23:32,360 --> 00:23:37,560
respect to all these indicator. These are
all called as a means these are we are assuming
241
00:23:37,560 --> 00:23:42,050
that, these are the indicator, through which
we we just the reliability of the particular
242
00:23:42,050 --> 00:23:48,190
models. If whatever estimator we have received,
that should be taken care with respect to
243
00:23:48,190 --> 00:23:53,190
all these indicators. If these indicators
are going on the right track, then the model
244
00:23:53,190 --> 00:23:56,580
accuracy will be high and we can say that
model is reliable one.
245
00:23:56,580 --> 00:24:02,460
So, now we have to know what is exactly this
reliability part of this modelling. So, this
246
00:24:02,460 --> 00:24:09,000
reliability part of this modelling basically
see here is. So, what is exactly the proper
247
00:24:09,000 --> 00:24:10,180
setup here is..
248
00:24:10,180 --> 00:24:16,900
So, now, the model is like this. Y equal to
alpha plus beta X plus U. All right
249
00:24:16,900 --> 00:24:23,900
So, we are getting Y hat equal to alpha hat
plus beta hat beta hat x. So, now so, far
250
00:24:24,730 --> 00:24:31,730
as a is reliability is concerned, so you we
have to integrate so many things, because
251
00:24:33,380 --> 00:24:39,220
we like to know, whether this alpha hat and
beta hat are you can say reliable one. So,
252
00:24:39,220 --> 00:24:45,600
that is, with respect to this particular theorem,
BLUE- Best Linear Unbiased Estimator. So that
253
00:24:45,600 --> 00:24:52,600
means, alpha hat and beta hat has to be checked
so that the theorem can be satisfied. So that
254
00:24:54,750 --> 00:25:01,000
means, the properties of all these estimator
should be as per the rules or principle of
255
00:25:01,000 --> 00:25:07,410
this econometric modelling. So, that is what
we call it best linear and unbiased and obviously,
256
00:25:07,410 --> 00:25:11,660
it will go by minimum variance and that a
consistence property.
257
00:25:11,660 --> 00:25:16,270
So now, how do we go for that? The when you
will get the estimator models, then basically
258
00:25:16,270 --> 00:25:21,560
the estimator model will be represented by
like this. This is followed by variance of
259
00:25:21,560 --> 00:25:27,770
alpha hats variance of alpha hat and with
respect to beta hat,we will receive variance
260
00:25:27,770 --> 00:25:32,710
of variance of beta hat. So, our standard
assumption is that variance of alpha hat should
261
00:25:32,710 --> 00:25:37,640
be very very low and variance of beta hat
should be very very low.
262
00:25:37,640 --> 00:25:42,800
So, with respect to variance of alpha and
variance of beta hat, we like to receive the
263
00:25:42,800 --> 00:25:48,650
standard error of standard error of alpha
hat and standard error of beta hat. So, which
264
00:25:48,650 --> 00:25:54,550
is derived through variance of alpha hat,
which is derived through variance of alpha
265
00:25:54,550 --> 00:26:01,160
hat? Then, there is there is term called as
a stat .Then we have to apply this statistics
266
00:26:01,160 --> 00:26:05,990
actually to check whether this particular
parameter is statistically significant or
267
00:26:05,990 --> 00:26:09,730
not. Ok.
So, now to know the statistical significance
268
00:26:09,730 --> 00:26:13,430
of this particular parameter,that is what
another part of the reliability, it is not
269
00:26:13,430 --> 00:26:20,090
like that whatever estimator we have receive,that
is just you know follow the principle of blue.
270
00:26:20,090 --> 00:26:24,080
In the same times, these particular items
should be statistically significant and for
271
00:26:24,080 --> 00:26:30,630
that, we we have lots of test statistics,
like you know, t test, f test, j test, like
272
00:26:30,630 --> 00:26:33,870
this.
So now, for these particular standard problems,
273
00:26:33,870 --> 00:26:39,480
we usually help take the help of t statistic
and f statistic and for this particular angles,
274
00:26:39,480 --> 00:26:45,640
so, means, so far as the you know significance
of the alpha hat and beta hat is concerned,
275
00:26:45,640 --> 00:26:50,550
we have to apply the t statistic and through
which we like to make a judgment whether this
276
00:26:50,550 --> 00:26:56,890
particular parameter is statistical significant
or or not. If it is statistical significant
277
00:26:56,890 --> 00:27:03,160
then obviously, this particular model can
be considered. If this parameters are not
278
00:27:03,160 --> 00:27:08,840
statistical significant then it will be problem
it will be problem definitely.
279
00:27:08,840 --> 00:27:14,280
But you know, here in this particular bivariate
setup, we have only two items. So, that is
280
00:27:14,280 --> 00:27:18,860
alpha hat and beta hat. But when we will go
for multivariate model, there are several
281
00:27:18,860 --> 00:27:22,530
beta’s like you know beta 1, beta 2, beta
3. Alpha is just supporting component that
282
00:27:22,530 --> 00:27:29,530
is represented as you know sorry you know
constant item and there is slope items. This
283
00:27:31,580 --> 00:27:38,580
slope items are you can say beta 1, beta 2,
beta 3 like this and you know constant term
284
00:27:39,750 --> 00:27:45,330
we call is intercept. This is what we we when
get the supporting factor through which you
285
00:27:45,330 --> 00:27:50,420
have to start this issue like this you know.
So, this particular representation is like,
286
00:27:50,420 --> 00:27:55,690
this is this is alpha hat this is alpha hat
and through which we have to start the Y hat
287
00:27:55,690 --> 00:28:00,110
component that is nothing but this particular
item alpha hat and beta hat x.
288
00:28:00,110 --> 00:28:07,110
So, now this side is X axis and this side
is Y axis. So, now, you like to assume that
289
00:28:07,490 --> 00:28:12,400
this is particular estimated model; in between
there is true value, these are the true value.
290
00:28:12,400 --> 00:28:19,010
So, we like to know, how the you know best
fitted line can be derived through this you
291
00:28:19,010 --> 00:28:25,630
know true values or true points. Because it
should be you know by default or you can say
292
00:28:25,630 --> 00:28:30,650
by inspection it should be in the middle of
all these points. So, the way you have to
293
00:28:30,650 --> 00:28:35,250
design, so that the you know path will be
in between the two so that the model accuracy
294
00:28:35,250 --> 00:28:39,910
will be very high.
So, now you know, when you have plotted points,
295
00:28:39,910 --> 00:28:46,520
there are many ways you will get to know mid
points. It may be you know like this, when
296
00:28:46,520 --> 00:28:50,960
we have several points like this. So, it can
be like this, it can be like this, it can
297
00:28:50,960 --> 00:28:56,120
be like this. So, all this things cannot be
possible to you can say consider.
298
00:28:56,120 --> 00:29:03,120
So, we have to choose one. So, that is one
must be very best. So, by inspection it is
299
00:29:03,810 --> 00:29:09,010
very difficult, of course you can make a judgment,
but it is not accurate judgment. So, that
300
00:29:09,010 --> 00:29:12,700
is how you have to go through statistical
procedures only. There is statistical procedure,
301
00:29:12,700 --> 00:29:19,700
that statistical procedure is called as a
reliability check. So, that reliability check
302
00:29:19,860 --> 00:29:25,080
will indicate whether this particular path
is best one or this particular path is best
303
00:29:25,080 --> 00:29:29,960
one or this particular path is best one.
So, So, as far as significance of the parameter
304
00:29:29,960 --> 00:29:35,160
is concerned, then standard error of alpha
then we have we use t statistic t alpha hat
305
00:29:35,160 --> 00:29:42,160
similarly we have to see t beta hats. So,
then since it is issue about the significant,
306
00:29:43,970 --> 00:29:49,620
then significance for significance level we
have to see we have to the compare with probability
307
00:29:49,620 --> 00:29:54,710
level.
So, in this particular system, so, the significance
308
00:29:54,710 --> 00:30:01,550
level we usually consider it you know 1 percent
level, 5 percent level and 10 percent level.
309
00:30:01,550 --> 00:30:08,140
So, maximum limit we will go up to 10 percent.
So, now, 1 percent, if this some item is significant
310
00:30:08,140 --> 00:30:15,140
at 1 percent, then obviously that means, 99
percent chance is you know fact and 1 percent
311
00:30:16,410 --> 00:30:22,480
is not supporting. Similarly if it is 5 percent;
that means, 95 percent are supporting and
312
00:30:22,480 --> 00:30:26,790
5 percent are not supporting. Again 10, if
we will say 10 percent, then 90 percent are
313
00:30:26,790 --> 00:30:31,710
supporting and 10 percent are not supporting.
Again in between, there are, this test procedures
314
00:30:31,710 --> 00:30:36,290
are usually divided into two parts called
as a, one tailed test and two tailed test.
315
00:30:36,290 --> 00:30:41,600
If it is one tailed test, then it may be consider
as a 1 percent level, 5 percent level, 10
316
00:30:41,600 --> 00:30:47,280
percent level. If it is two tailed test, then
you have to divide by 2. So obviously, two
317
00:30:47,280 --> 00:30:51,830
tailed test at 1 percent means you have to
go 0.05 then 0.05
318
00:30:51,830 --> 00:30:58,620
Similarly, 5 percent means, you have to go
for 0.1 and 0.1.Similarly, 10 percent you
319
00:30:58,620 --> 00:31:04,490
have to go for 0.5 and 0.5. So, this is how
you have to proceed. So that means, this particular
320
00:31:04,490 --> 00:31:09,410
item is normally distributed. So, this part
and this part has to be considered, this is
321
00:31:09,410 --> 00:31:14,640
plus side and this is minus side.
So, this is how you have to; that means, 50
322
00:31:14,640 --> 00:31:18,480
percent is this side and 50 percent is this
side. So, if it is one tailed, then then it
323
00:31:18,480 --> 00:31:24,050
is q distribution that is positive least square
or negative. So, this is how you have to go
324
00:31:24,050 --> 00:31:28,860
through it. So, now, so, probability level
you have to decide and probability level you
325
00:31:28,860 --> 00:31:34,750
you have to decide. And with this particular
system, we have to check the standard, you
326
00:31:34,750 --> 00:31:40,410
know best fitted model or reliability model.
So, now I am giving you the exact procedure
327
00:31:40,410 --> 00:31:44,660
of reliability. What is all about this reliability
of this estimated model?
328
00:31:44,660 --> 00:31:51,660
So, reliability, so far as a reliability is
concerned, reliability of estimator is concerned,
329
00:31:52,670 --> 00:31:59,670
we have to check two things; first is with
respect to reliability reliability of parameters
330
00:32:05,020 --> 00:32:12,020
and second is reliability of the reliability
of the overall fitness of the model overall
331
00:32:13,560 --> 00:32:20,370
fitness of the model overall fitness of the
model ok.
332
00:32:20,370 --> 00:32:26,670
So, first is a reliability of estimator; that
means, the initial starting point is you must
333
00:32:26,670 --> 00:32:31,390
have Y information axis, information about
your problem, your your basic relationship
334
00:32:31,390 --> 00:32:37,410
between Y and X must be known to you, then
you have to make a functional relationship,
335
00:32:37,410 --> 00:32:42,730
that is what we will call it in mathematical
model, then you transfer into statistical
336
00:32:42,730 --> 00:32:48,560
model by introducing the error component,
then you have to estimate the models by the
337
00:32:48,560 --> 00:32:54,200
use of OLS technique , then you like to have
Y hat equal to alpha hat plus beta hat x.
338
00:32:54,200 --> 00:32:59,470
So, the moment you have Y hat equal to alpha
hat plus beta hat X , then we are going for
339
00:32:59,470 --> 00:33:04,580
reliability. So, So, the reliability has a
two parts. One part is with respect to significance
340
00:33:04,580 --> 00:33:10,240
of the parameters that to alpha hat and beta
hat so what we have just explained detail.
341
00:33:10,240 --> 00:33:15,620
And second is the overall fitness of the overall
fitness of the model. So, the overall fitness
342
00:33:15,620 --> 00:33:20,570
of the model is that that can be also standard
technique through which you have to judge
343
00:33:20,570 --> 00:33:27,280
the overall fitness of the model. Just like
you know, see here there are three parts you
344
00:33:27,280 --> 00:33:33,660
know, alpha plus beta X, alpha 1 part, beta
X another part and you know Y is another part.
345
00:33:33,660 --> 00:33:38,860
So, this is one side and this two part is
another side. So, this should be, you know
346
00:33:38,860 --> 00:33:43,760
this this particular reliability means, so
what is the overall impact of this particular
347
00:33:43,760 --> 00:33:50,060
models, so that is means explained side then
unexplained side, then obviously, the reliability
348
00:33:50,060 --> 00:33:55,740
of the individual parameter; that means, section
a and section b that is, alpha hat beta alpha
349
00:33:55,740 --> 00:33:59,830
hat alpha plus beta X is one part and you
know U is another part.
350
00:33:59,830 --> 00:34:04,440
So, we like to know what the judgment of explained
item is and what the judgment of unexplained
351
00:34:04,440 --> 00:34:10,339
item is. So, reliability has two parts with
that is with respect to parameter significant
352
00:34:10,339 --> 00:34:15,230
check and this is with respect to overall
fitness of the model ok.
353
00:34:15,230 --> 00:34:22,129
So far as this reliability of the parameter
is concerned,so we like to know this significance
354
00:34:22,129 --> 00:34:29,129
significance significance of parameters. So
that means, if parameters, so, since; that
355
00:34:30,840 --> 00:34:36,850
means, if your objective is to check the reliability
of parameter; that means, the idea is very
356
00:34:36,850 --> 00:34:43,060
clear, we like to know what is the significance
level of the parameter. If that item is statistical
357
00:34:43,060 --> 00:34:48,330
significance and close to and percent, then
that the model accuracy is very high. If it
358
00:34:48,330 --> 00:34:53,610
is if, that item is not statistical significant
and 1 percent at one tailed test or two tailed
359
00:34:53,610 --> 00:34:57,480
test, if it is totally failed, then you have
to go for again 5 percent. Ok.
360
00:34:57,480 --> 00:35:02,530
So, now if 5 percent, again you have to go
one tailed test or two tailed test. If it
361
00:35:02,530 --> 00:35:07,020
is not significant, then you have to go to
10 percent level. Again 10 percent level,
362
00:35:07,020 --> 00:35:11,740
you have to check it at one one tailed through
one tailed test and you can say two tailed
363
00:35:11,740 --> 00:35:17,510
test. If not you know, after that 10 percent,
10 percent experiment, if it is not statistical
364
00:35:17,510 --> 00:35:23,010
significant, then we can conclude that, this
particular you know item or variable is not
365
00:35:23,010 --> 00:35:27,760
at all statistically significance. Yes there
is relationship, but that relationship is
366
00:35:27,760 --> 00:35:34,760
not strong enough to use for you know forecasting.
So, that is how the reliability you know rule
367
00:35:35,310 --> 00:35:38,510
is concerned.
So, now what is the significance of parameters?
368
00:35:38,510 --> 00:35:44,920
Here the significance of parameters, we have
two aspects, that is, you know, alpha alpha
369
00:35:44,920 --> 00:35:50,250
aspect and beta aspects. So, we like to know
what is the significance of alpha hat and
370
00:35:50,250 --> 00:35:56,410
beta hat. For that, you know we have to know
two things.So alpha hat is known to us, so,
371
00:35:56,410 --> 00:36:03,410
we like to know t alpha hat and t alpha hat,
t alpha hat you know tabulated value tabulated
372
00:36:05,460 --> 00:36:12,460
value, and this is estimated value, this is
estimated value t alpha hat and t beta hat.
373
00:36:14,220 --> 00:36:20,090
So, you know this is not estimated. In fact,
this we call it a calculated value. Calculated
374
00:36:20,090 --> 00:36:26,020
calculated t statistic and tabulated t statistics.
So, now, we have to compare the calculated
375
00:36:26,020 --> 00:36:31,770
t of alpha hat and tabulated t of hat. Similarly
we have to calculate calculate the or you
376
00:36:31,770 --> 00:36:36,560
have to compare the calculated t of beta hat
and tabulated t hat beta hat all right.
377
00:36:36,560 --> 00:36:42,150
So, now what is happening here is, so, now,
this t of alpha hat depends upon two components.
378
00:36:42,150 --> 00:36:48,590
t of alpha hat depends upon the individual
alpha hat value and its standard error of
379
00:36:48,590 --> 00:36:55,560
its standard error of alpha hat. Similarly,
you know for beta hat, we have two aspects,
380
00:36:55,560 --> 00:37:02,560
that is, you know, tabulated tabulated t statistic
and you know calculated t statistic. Calculated
381
00:37:03,350 --> 00:37:08,330
t statistic again depends upon beta hat and
standard error of beta hat. ok.
382
00:37:08,330 --> 00:37:14,070
So, this is one part of the story, one part
of the story, that is what we called as a
383
00:37:14,070 --> 00:37:18,850
reliability of parameter. So, once what is
the exact structure of reliability of the
384
00:37:18,850 --> 00:37:25,810
parameters? So, we like to know means, we
must have alpha hat value and beta hat value,
385
00:37:25,810 --> 00:37:31,560
so we must have a variance of alpha hat and
variance of beta hat, then we have to get
386
00:37:31,560 --> 00:37:36,430
the standard error of alpha hat and standard
error of beta hat because this square root
387
00:37:36,430 --> 00:37:40,880
of variance of alpha hat is nothing but standard
error of alpha hat and square root of beta
388
00:37:40,880 --> 00:37:43,420
variance of beta hat is nothing but standard
error of beta hat.
389
00:37:43,420 --> 00:37:50,420
So now that means, we must, you we must means
we must have alpha hat and variance of alpha
390
00:37:51,700 --> 00:37:57,670
hat and beta hat and variance of beat hat
then everything will be automatic means not
391
00:37:57,670 --> 00:38:02,130
automatic everything can be very smooth to
get this work done.All right.
392
00:38:02,130 --> 00:38:08,490
So, now the second part of the model is called
as a reliability reliability of overall fitness
393
00:38:08,490 --> 00:38:13,380
of the model. So, this is what we called as
a significance or otherwise it is called as
394
00:38:13,380 --> 00:38:20,380
a significance significance of overall fitness
of the model over all fitness of the model
395
00:38:21,250 --> 00:38:27,230
fitness of the models.
So, overall fitness of the model for a particular
396
00:38:27,230 --> 00:38:34,230
econometric, you know technique solely depends
upon and the statistic called as a ((r squares
397
00:38:34,600 --> 00:38:40,130
r square)). This is ,what it is usually depends
upon r square, this r square represented as
398
00:38:40,130 --> 00:38:47,130
a coefficient of determination. We we we have
discussed r component, that too small r , that
399
00:38:48,390 --> 00:38:54,460
is what we called as a simple correlation.
So, covariance of X Y upon sigma X and sigma
400
00:38:54,460 --> 00:38:59,400
Y that is with respect to two variables. In
fact, the starting point of correlation is
401
00:38:59,400 --> 00:39:06,400
two variables . So, now, we we we must have
a you know we must have two variable, then
402
00:39:09,580 --> 00:39:15,690
we can calculate the correlation, then correlation
coefficient simply represented as a small
403
00:39:15,690 --> 00:39:19,520
r. If we will make it is square then it is
square of correlation coefficient.
404
00:39:19,520 --> 00:39:24,850
But for bivariate models, this particular
small r and capital r model is same. But when
405
00:39:24,850 --> 00:39:30,110
we will square it, r square, that is what
we will call it its contemporary its small
406
00:39:30,110 --> 00:39:34,710
r square. But this r square this r square
there is difference. So, this is called as
407
00:39:34,710 --> 00:39:41,710
a coefficient of determination. This particular
item is called as a coefficient of determination,
408
00:39:47,550 --> 00:39:51,160
this particular item is called as a coefficient
of determination.
409
00:39:51,160 --> 00:39:54,070
What is the rule of coefficient determination?
The rule of coefficient determination is that,
410
00:39:54,070 --> 00:40:01,070
it is the ratio it is the ratio between explains
to total. So that means, what is the percentage
411
00:40:02,020 --> 00:40:08,240
of explained component is information the
dependent variables. So, that is what we will
412
00:40:08,240 --> 00:40:11,760
call it r square, that is overall fitness
of the model.
413
00:40:11,760 --> 00:40:17,970
So, this the moment there are two means two
steps here. First is, you have to calculate
414
00:40:17,970 --> 00:40:22,690
the r square statistics. So, now, once we
will get r square value, so that r square
415
00:40:22,690 --> 00:40:27,800
value has to be tested again, because here
we are we are having Y equal to alpha hat
416
00:40:27,800 --> 00:40:32,070
and beta hat X only. So that means, alpha
hat is known to us and beta hat is known to
417
00:40:32,070 --> 00:40:39,010
us, nothing is available, but lots of other
statistic also simultaneously we have to calculate.
418
00:40:39,010 --> 00:40:43,090
So, for the reliability is concerned, one
part of this particular item is called as
419
00:40:43,090 --> 00:40:48,380
a r squares. r r square is the ratio between
explained sum square by total sum square we
420
00:40:48,380 --> 00:40:51,250
will derive details in the next class not
today.
421
00:40:51,250 --> 00:40:57,510
So, now. So, this r square structure is like
this means, it is the ratio between explained
422
00:40:57,510 --> 00:41:03,950
sum by you can say total sum square. So, now.
So, the. So, far as the reliability of r square
423
00:41:03,950 --> 00:41:10,950
is concerned, then it depends upon its depends
upon f statistics, this depends upon f statistics.
424
00:41:11,160 --> 00:41:18,160
So that means, for significance of the parameters,
we are using all t statistics and for overall
425
00:41:18,540 --> 00:41:25,540
fitness of the model, we are using the f statistics.
So, now, like this particular case, here also
426
00:41:26,510 --> 00:41:32,110
you like to know , what is the calculated
f and what is the tabulated f, calculated
427
00:41:32,110 --> 00:41:39,020
f and tabulated f. So, now, we have to compare
the calculated f with tabulated f. So, means,
428
00:41:39,020 --> 00:41:45,230
what is exactly the difference between calculated
f and tabulated f or similarly calculated
429
00:41:45,230 --> 00:41:50,030
t alpha hat and tabulated t alpha hat, calculated
t beta hat and tabulated t beta hat.
430
00:41:50,030 --> 00:41:55,670
So, now the thing is that, for calculated
there is standard you know techniques or you
431
00:41:55,670 --> 00:42:02,670
can say formula, through which you can obtain
this value, but there is you know by simulation
432
00:42:02,780 --> 00:42:07,860
we have statistical tables.ok.
So, statistical table has a standard norms.
433
00:42:07,860 --> 00:42:13,920
So, now, we have a calculated statistics and
we have a standard norm. So, we have to compare
434
00:42:13,920 --> 00:42:19,080
the calculated statistic with the standard
norms. If the calculate statistic is the greater
435
00:42:19,080 --> 00:42:24,140
than to the standard norm, then the statistic
then that item will be statistically significant.
436
00:42:24,140 --> 00:42:30,310
But the standard norms are available, you
know, at difference significance level. For
437
00:42:30,310 --> 00:42:34,860
instance, at 1 percent level, at 5 percent
level, at 10 percent level. Again for one
438
00:42:34,860 --> 00:42:39,290
tailed and two tailed, one tailed and two
tailed, again one tailed two tailed, again
439
00:42:39,290 --> 00:42:43,480
one tailed two tailed, that means, with respect
to 1 percent, 5 percent and 10 percent.
440
00:42:43,480 --> 00:42:49,420
But some of the standard books there is you
know chance of 25 percent, then I go also
441
00:42:49,420 --> 00:42:56,420
up to 25 percent. But but you know you know
who for statistics and you know finance finance
442
00:42:56,870 --> 00:43:03,870
problem or any other technical problems. So,
we usually handle up to you can say 10 percent
443
00:43:04,020 --> 00:43:08,890
level only.
But social science problems which is very
444
00:43:08,890 --> 00:43:14,410
complicated and when your sample size is very
extremely high and high, then that times you
445
00:43:14,410 --> 00:43:19,960
can go for, but very extreme case, you have
to go for 25 percent. But 25 percent is not
446
00:43:19,960 --> 00:43:26,620
at all you know very reliable one. So, in
most effective way, you have to go up to 10
447
00:43:26,620 --> 00:43:31,570
percent. So, if it is not up to 10 percent
then you have to reject or redesign the structure
448
00:43:31,570 --> 00:43:36,760
entirely all right.
So, now for t statistics, you have a t tables
449
00:43:36,760 --> 00:43:43,200
and for f statistic you have a f tables. So,
again for t table weighted and you know f
450
00:43:43,200 --> 00:43:48,970
table weighted, there is another items you
need to integrate to get this particular item,
451
00:43:48,970 --> 00:43:54,050
that is what we called as a degrees of freedom
degrees of freedom. Degrees of freedom means
452
00:43:54,050 --> 00:44:00,460
it is the difference between total number
of observations and number of you know independent
453
00:44:00,460 --> 00:44:04,600
variables represent in the system.
So, we will discuss in detail when we go for
454
00:44:04,600 --> 00:44:11,340
that is what we will sometimes call it is
a n minus k, n is total number of sample size
455
00:44:11,340 --> 00:44:16,980
and k is the total number of total number
of variables independent variables in the
456
00:44:16,980 --> 00:44:23,980
systems. So, n minus k is or total number
of parameters in the system whether instead
457
00:44:24,070 --> 00:44:29,950
2 variables you can say that k represents
total number of variables parameters involved
458
00:44:29,950 --> 00:44:34,050
in this particular model
So, now this n minus k represents the degrees
459
00:44:34,050 --> 00:44:40,210
of freedoms. If the degrees of freedom is
very high, then the model accuracy will be
460
00:44:40,210 --> 00:44:47,210
very high. But you know for this is true for
the t statistic, but for f statistic we have
461
00:44:47,430 --> 00:44:52,550
two different. In fact, for this f statistics
we have 2 different component, one is called
462
00:44:52,550 --> 00:44:58,750
as a explained sum square and another is called
as a total sum square. So that means, there
463
00:44:58,750 --> 00:45:05,750
are three standard items, one is called as
a tss, then ess, then sometimes called as
464
00:45:06,140 --> 00:45:12,720
a rss or you can say sometimes it is called
as a unexplained sum square explained sum
465
00:45:12,720 --> 00:45:16,560
squares ok.
So, tss represent total sum squares, which
466
00:45:16,560 --> 00:45:22,130
is nothing but the actually we will derive
from this structure, Y equal to alpha plus
467
00:45:22,130 --> 00:45:29,130
beta X plus U. So, this particular item is
nothing but explained sum square and U related
468
00:45:29,490 --> 00:45:36,490
to unexplained residual sum square and t represents
to Y. So, this is how the all about the integrations.
469
00:45:36,510 --> 00:45:42,450
So, but the calculating procedure is something
different with respect to total sum square,
470
00:45:42,450 --> 00:45:49,450
explained sum square and residual sum square.
So now, so when we will when we have a information,
471
00:45:49,500 --> 00:45:56,230
so we have, we have Y information and X information
through which you fit the model you get the
472
00:45:56,230 --> 00:46:03,230
model, but anyway you fit, but when you’ll
go for typically means in a smaller version
473
00:46:03,650 --> 00:46:07,850
of the problem, then usually we will not derive
every times.
474
00:46:07,850 --> 00:46:13,620
So, what we have to do,we have Y information
and we have X information. So, we know what
475
00:46:13,620 --> 00:46:19,390
is alpha hat formula and we know what is beta
hat formula, similarly we have also formula
476
00:46:19,390 --> 00:46:23,640
for variance of alpha hat and variance of
beta hats.
477
00:46:23,640 --> 00:46:29,080
So, again we, through statistics, t statistic,
there is calculate a statistic, how to get
478
00:46:29,080 --> 00:46:35,850
this t value. So, now, if we will go sequentially,with
respect to the available information, then
479
00:46:35,850 --> 00:46:41,450
you can automatically or by very easily you
can get this particular value, that is you
480
00:46:41,450 --> 00:46:46,990
know, t alpha hat calculated and t beta hat
calculated similarly in the case of.
481
00:46:46,990 --> 00:46:53,380
So now, once we have Y information and X information
for this particular bivariate setup, so, what
482
00:46:53,380 --> 00:47:00,380
we have to do in the first step itself, you
have to calculate sum Y, sum X , sum X square,
483
00:47:01,840 --> 00:47:08,840
sum Y square, sum X y.
So, now in the same times, we must identify
484
00:47:09,480 --> 00:47:15,200
number of sample size,that is you know n,
and in the same times what is k. In fact,
485
00:47:15,200 --> 00:47:21,940
for this bivariate models, k is usually represented
as a 2. So that means, the structure is n
486
00:47:21,940 --> 00:47:28,460
minus 2, degrees of freedom is n minus 2,
k represents number of parameters in the systems,
487
00:47:28,460 --> 00:47:33,740
number of parameters in the system or number
of variables in this particular systems all
488
00:47:33,740 --> 00:47:37,390
right.
So, now this will determine the degrees of
489
00:47:37,390 --> 00:47:42,820
freedom. So, now, once once you have such
information, then you first calculate the
490
00:47:42,820 --> 00:47:48,580
alpha statistic, alpha hat statistic, then
beta hat statistics. First you start with
491
00:47:48,580 --> 00:47:55,160
actually beta hat, so that is, summation X
Y X Y by summation X square or means it is
492
00:47:55,160 --> 00:47:59,210
in a deviation format. If we will go by other
way round, then it is n summation actually
493
00:47:59,210 --> 00:48:04,390
minus sum X sum Y divide by m summation X
square minus sum X whole square. This is the
494
00:48:04,390 --> 00:48:08,550
standard formula.
So, whatever step of this particular problem,
495
00:48:08,550 --> 00:48:13,650
but you know whether this step of the problem
is perfectly or not. That is how we are checking
496
00:48:13,650 --> 00:48:18,590
here is. So, that that is how there is no
you know hard and fast rule that ,we have
497
00:48:18,590 --> 00:48:23,700
to you know check the functional form or you
have to derive etcetera. First you do it,
498
00:48:23,700 --> 00:48:28,640
but the best procedure is, check it properly,
then you fit the model accordingly then obviously,
499
00:48:28,640 --> 00:48:34,119
reliability is very consistent. Or else, what
you can do to, you start with something, then
500
00:48:34,119 --> 00:48:38,320
automatically, at the point of reliability,
it will be give you indication, whether this
501
00:48:38,320 --> 00:48:41,650
model is perfectly for forecasting or not
ok.
502
00:48:41,650 --> 00:48:48,650
So, now let us let us assume that we are you
know we have no knowledge about it all these
503
00:48:48,750 --> 00:48:54,300
testing etcetera, so, we start by the standard
rules, you get the alpha hat and you get the
504
00:48:54,300 --> 00:48:58,790
beta hats, then you fit the model like Y y
Y y hat equal to alpha hat and beta hat x.
505
00:48:58,790 --> 00:49:03,420
So, now you calculate the alpha hat statistic,
beta hat statistic. Then you go by standard
506
00:49:03,420 --> 00:49:08,080
procedures, get the variance of alpha hat,
get the variance of beta hat, then get the
507
00:49:08,080 --> 00:49:11,930
standard error of alpha hat and standard error
of beta hat, then you calculate the t of alpha
508
00:49:11,930 --> 00:49:16,580
hat, calculate the t of beta hat, then, you
compare with it calculated t of alpha t of
509
00:49:16,580 --> 00:49:21,260
alpha hat and calculate a t of beta hat.
So, this is the tabulated value available
510
00:49:21,260 --> 00:49:26,460
available with you and obviously, you have
to calculated value. So, then we have to make
511
00:49:26,460 --> 00:49:31,030
a comparative analysis. In fact, you go any
statistics books or any econometric books,
512
00:49:31,030 --> 00:49:36,420
in the end of the books, you will find standard
statistical table, you know with respect to
513
00:49:36,420 --> 00:49:41,300
t statistic, with respect to f statistic and
with respect to j statistic, etcetera etcetera.
514
00:49:41,300 --> 00:49:48,119
But you know, we need only t t tables and
you know f tables. But t statistic is easy
515
00:49:48,119 --> 00:49:53,940
to understand and easy to pick up, but f statistic
is little bit complicated,why because, when
516
00:49:53,940 --> 00:49:58,770
we will get r square, then f is the difference
means, in this particular setup, f is the
517
00:49:58,770 --> 00:50:04,080
ratio between ess by tss.
So, now ess is a standard standard you know
518
00:50:04,080 --> 00:50:09,220
statistics and tss is also standard statistic.
So, ess depends upon something, because it
519
00:50:09,220 --> 00:50:14,840
is it is degree of freedom is completely different
and tss degrees of freedom is completely different.
520
00:50:14,840 --> 00:50:20,390
You see here is, this is this is ess part.
So, this part, the degrees of freedom, this
521
00:50:20,390 --> 00:50:24,520
part is degrees of freedom is and this part
degrees of freedom is, because it depends
522
00:50:24,520 --> 00:50:29,030
upon two parts. but this is on the one part
ok.
523
00:50:29,030 --> 00:50:33,600
So now,so, since the degrees of freedom are
are different, so obviously, in the case of
524
00:50:33,600 --> 00:50:40,520
ess, the degrees of freedom will be completely
different and you know below tss the degrees
525
00:50:40,520 --> 00:50:47,410
of freedom is also completely different.
So, as a result,so in the f case, for f statistic,
526
00:50:47,410 --> 00:50:52,119
the table is something different. So, that
is why you must be you must have thorough
527
00:50:52,119 --> 00:50:56,780
knowledge how to pick up the tabulated figure.
So, once you get the tabulated figure, then
528
00:50:56,780 --> 00:51:02,430
you have a calculated figure, you make a comparative
analysis. If the calculated statistic is higher
529
00:51:02,430 --> 00:51:07,990
enough than the tabulated statistic, then
the item is statistical significant. Usually
530
00:51:07,990 --> 00:51:13,680
you start with at 1 percent, because one statistical
significance 1 percent is the best. If it
531
00:51:13,680 --> 00:51:19,280
is not best, then you have to go for 5 percent,
the second best. If it is not 5 percent, then
532
00:51:19,280 --> 00:51:24,270
you have to go for third best that is 10 percent.
After that, if that item is not statistically
533
00:51:24,270 --> 00:51:29,490
significant at 10 percent, then the better
you have to stop there. Stop means, you have
534
00:51:29,490 --> 00:51:35,610
to either redesign or you can say restructure,
then again you have to go for reliability
535
00:51:35,610 --> 00:51:41,860
test, till you get this you know items significant.
But for bivariate structure, it is very easy
536
00:51:41,860 --> 00:51:46,540
to pick up for you can proceed. But when you
will go for multivariate, its the problem
537
00:51:46,540 --> 00:51:51,160
is very complicated. It is very difficult
to get all these variables statistically significant
538
00:51:51,160 --> 00:51:56,700
or overall fitness. Obviously, when we will
introduce 1 after another variable or when
539
00:51:56,700 --> 00:52:02,150
we will move from bivariate to multivariate,
then obviously, explained sum is all you know
540
00:52:02,150 --> 00:52:05,880
with respect to X items and total sum is with
respect to Y only.
541
00:52:05,880 --> 00:52:11,790
So, obviously, Y is always one. So, that item
will be remain constant. So, when we will
542
00:52:11,790 --> 00:52:16,380
add 1 after another item, the numerator will
be start increasing. So, as a result r square
543
00:52:16,380 --> 00:52:20,390
will be always high, once you introduce one
after another variables. So, when r square
544
00:52:20,390 --> 00:52:25,750
is exclusively high, then your f statistics
will be exclusively high.
545
00:52:25,750 --> 00:52:32,080
So, the overall fitness will be usually very
high or you can say you can be get significant.
546
00:52:32,080 --> 00:52:36,440
But the parameter you may not be get significant.
So, you have to make a compromise. So, the
547
00:52:36,440 --> 00:52:40,410
compromise rule we will discuss details when
we will go for multivariate framework.
548
00:52:40,410 --> 00:52:44,920
So, with this today we will stop here is and
we will discuss the details in the next class.
549
00:52:44,920 --> 00:52:45,170
thank you very much have a nice day.