1
00:00:18,000 --> 00:00:24,650
Good afternoon. Welcome to NPTEL project on
econometric modeling. Today, we will discuss
2
00:00:24,650 --> 00:00:31,119
the concept bivariate econometric modelling.
It is a statistical tool, which deals with
3
00:00:31,119 --> 00:00:38,119
the relationship between two variables only.
In the last class, we have discussed the entire
4
00:00:42,640 --> 00:00:49,640
structure of data analysis, that is, univariate
modeling, bivariate modelling and multivariate
5
00:00:51,850 --> 00:00:58,579
modeling. Particularly in the last class,
we have discussed the entire structure of
6
00:00:58,579 --> 00:01:05,579
univariate modeling, that is, with respect
to central tendency, dispersion, skewness
7
00:01:07,770 --> 00:01:14,770
and photosis.
The main objective behind univariate modelling
8
00:01:15,810 --> 00:01:22,810
is that we have to describe the features of
a particular variable, that is, with respect
9
00:01:26,900 --> 00:01:33,900
to its average, mid-value, frequency distributions
and its variability within the system. However,
10
00:01:43,080 --> 00:01:50,080
in the real world lots of variables are integrated
with other variables. We cannot generalize
11
00:01:57,070 --> 00:02:04,070
a particular problem or we cannot discuss
a particular problem with respect to a single
12
00:02:04,670 --> 00:02:11,670
variable. The analysis of univariate modelling
is very essential or it is the essential condition
13
00:02:15,650 --> 00:02:22,650
or you can say necessary condition for bivariate
modelling and multivariate modeling. So, we
14
00:02:25,150 --> 00:02:32,150
are here to know details about bivariate modeling.
It is the game between two variables at a
15
00:02:35,629 --> 00:02:42,629
time. So, let me take a case what is exactly
the concept of bivariate econometric modelling.
16
00:02:45,660 --> 00:02:52,660
Bivariate econometric modeling – I will
call it here BEM (bivariate econometric modeling);
17
00:02:57,099 --> 00:03:04,099
bivariate econometric modelling basically
deals with two problems: association and causality.
18
00:03:18,140 --> 00:03:25,140
So, we have we have two variables in a system.
In a bivariate econometric modeling, our objective
19
00:03:31,959 --> 00:03:38,959
is to know the association between two variables;
and, second objective is to know the cause
20
00:03:39,819 --> 00:03:46,819
and effective relationship between the two
variables. In some of the problems, you may
21
00:03:50,140 --> 00:03:57,140
not need to know the cause and effective relationship.
In some of the cases, you are also not very
22
00:04:01,549 --> 00:04:08,549
much interested about the association between
two variables. It is a very interesting history
23
00:04:08,580 --> 00:04:15,580
behind the movement from univariate to bivariate
and bivariate to multivariate. In this particular
24
00:04:15,980 --> 00:04:22,980
bivariate framework, again there is a history.
So, the history is that the movement from
25
00:04:27,430 --> 00:04:34,430
association to causality. Both are somewhat
similar in nature, but causality is little
26
00:04:39,240 --> 00:04:46,240
bit much higher and much better than the association,
because causality is the generalized concept
27
00:04:49,699 --> 00:04:55,770
and association is the part of causality.
Now, in this particular stroke models here
28
00:04:55,770 --> 00:04:59,900
bivariate econometric modeling, we have two
basic objectives: association and causality.
29
00:04:59,900 --> 00:05:06,900
So, we have altogether three forms of technique.
In the association, we have two different
30
00:05:08,280 --> 00:05:15,280
techniques called as a covariance and here
we have correlation. And, in the case of causality,
31
00:05:20,960 --> 00:05:27,960
we have a technique called as regression.
So, in the bivariate framework, we have two
32
00:05:33,110 --> 00:05:40,110
different games: one is association between
two variables and another is cause and effect
33
00:05:41,750 --> 00:05:48,750
relationship between two variables. So, for
association is concerned, we can apply covariance
34
00:05:49,770 --> 00:05:56,770
and we can apply correlation. Between the
two, correlation is much better and much advanced
35
00:06:00,050 --> 00:06:05,190
technique than covariance. We will discuss
how it is advanced and how it is much better
36
00:06:05,190 --> 00:06:12,190
than covariance. However, the origin of bivariate
econometric modelling is covariance. Correlation
37
00:06:12,190 --> 00:06:19,190
is an extension of covariance. Similarly,
regression is also an extension of correlation.
38
00:06:23,080 --> 00:06:29,800
So, the movement is from covariance to correlation,
then correlation to regression.
39
00:06:29,800 --> 00:06:36,800
We will discuss here first what is all about
the covariance; then, we like to know what
40
00:06:36,949 --> 00:06:43,949
is correlation; then, we have to proceed for
regression. The moment you will enter to the
41
00:06:47,000 --> 00:06:53,800
regression, then that is the root point of
real econometric modeling. Whatever components
42
00:06:53,800 --> 00:07:00,800
we are discussing now, it is just supporting
components to econometric modelling and it
43
00:07:02,539 --> 00:07:09,539
is very essential. Until unless you know the
concept structure of univariate modeling,
44
00:07:09,800 --> 00:07:16,800
bivariate modelling and its limitation or
advantages, you cannot proceed further econometric
45
00:07:19,440 --> 00:07:26,440
modelling with a multivariate framework.
Now, we will start with the issue of the bivariate
46
00:07:32,240 --> 00:07:38,080
game; that to covariance analysis. So, what
is all about this covariance analysis? Let
47
00:07:38,080 --> 00:07:45,080
me explain here covariance analysis. Let us
take a case; here is two variables say X 1,
48
00:07:46,930 --> 00:07:53,930
which represents the component X 1 1, X 1
2 up to X 1 n. And, another variables we have
49
00:07:55,389 --> 00:08:02,389
X 2 to X 2 1, X 2 2 up to X 2 n. So, now,
the moment you will say bivariate econometric
50
00:08:04,440 --> 00:08:11,440
modeling, the boundary must be in between
two variables. That is first condition. And,
51
00:08:15,819 --> 00:08:22,819
second condition is that since we like to
trans the association between two variables,
52
00:08:25,190 --> 00:08:32,190
the sufficient condition is that the sample
information must be uniform, must be similar.
53
00:08:33,819 --> 00:08:40,819
For instance, if X 1 contains n number of
sample points, then X 2 must have n number
54
00:08:41,390 --> 00:08:48,390
of points. If X 1 is n minus 1 and X 2 is
n or vice versa, then the system is inconsistent.
55
00:08:54,540 --> 00:09:01,540
To apply covariance technique or correlation
technique or regression technique, the first
56
00:09:02,740 --> 00:09:09,740
prime requirement is that you must have uniform
sample distribution. So, the observation for
57
00:09:09,839 --> 00:09:16,839
both the variables should be similar and unique.
If the observations are not similar or unique,
58
00:09:18,839 --> 00:09:25,839
the system itself is inconsistent. So, now,
with the inconsistent system, you cannot apply
59
00:09:26,089 --> 00:09:33,089
none of the techniques; neither correlations
nor you can say regression or covariance.
60
00:09:36,430 --> 00:09:43,430
Now, the starting point here is that the classification
of variables. In the last class, I have discussed
61
00:09:50,550 --> 00:09:57,410
variable classification and that too in a
multivariate modelling and that too entire
62
00:09:57,410 --> 00:10:04,410
structure of data analysis. When the system
is univariate, then the classification of
63
00:10:07,510 --> 00:10:14,510
variable is not at all matter, because there
is only one variable; we have no clue to make
64
00:10:16,149 --> 00:10:23,149
the classification. The moment you enter to
the bivariate econometric modeling, you must
65
00:10:25,829 --> 00:10:32,829
have the problem about classification of variables.
In a bivariate framework, classification of
66
00:10:36,820 --> 00:10:43,820
variable is sometimes important, sometimes
may not be important. And, it depends upon
67
00:10:45,140 --> 00:10:52,140
the technique, which we use in the particular
process. If we handle the technique covariance
68
00:10:58,570 --> 00:11:05,570
or correlation, then the classification of
variables are not at all important. However,
69
00:11:11,140 --> 00:11:18,140
if you go for regression technique, then classification
of variable is very important; until and unless
70
00:11:18,800 --> 00:11:25,800
you classify the variable, then you cannot
apply the regression technique; that means,
71
00:11:29,450 --> 00:11:35,480
what is all about this classification? Classification
here we mean the dependent classification
72
00:11:35,480 --> 00:11:42,350
and independent classification, which we have
discussed earlier in detail.
73
00:11:42,350 --> 00:11:49,350
Here the issue is the classification of variables.
Now, in the case of covariance and correlation,
74
00:11:50,430 --> 00:11:57,430
we need not require any classifications or
no classification of variables; that too dependent
75
00:12:03,899 --> 00:12:10,899
and independent. However, in the case of regression,
you need to have a classification of variables,
76
00:12:15,779 --> 00:12:22,779
that is, with respect to dependent structure
and independent structure. Here in the case
77
00:12:22,829 --> 00:12:29,829
of covariance, you need not require anything
to describe the classification of dependent
78
00:12:29,870 --> 00:12:36,850
variable and independent variable. Before
we go for this regression or all about this
79
00:12:36,850 --> 00:12:42,709
dependent structure and independent structure,
it is better we first know the exact issue
80
00:12:42,709 --> 00:12:49,709
of covariance; then, we have to move correlation
and regression. So, question is what is covariance?
81
00:12:50,810 --> 00:12:57,620
For this particular system, X 1 contains n
items; X 2 contains n items; then, covariance
82
00:12:57,620 --> 00:13:04,620
is represented as Cov upon X 1 and X 2, represents
summation X 1 minus X 1 bar into X 2 minus
83
00:13:15,690 --> 00:13:22,690
X 2 bar i upon 1 to n divided by n. So, this
is the standard formula we like to apply to
84
00:13:27,470 --> 00:13:34,470
get the covariance. So, covariance is simply
represented as the sum of X 1 deviation and
85
00:13:40,089 --> 00:13:47,089
X 2 deviation. And, it is the total number
of observation. Here n represents total number
86
00:13:50,350 --> 00:13:57,350
of observations. And, X 1 represents a particular
variable – first variable; and, X 2 represents
87
00:14:04,620 --> 00:14:11,620
second variable. Then, X 1 bar represents
mean of X 1 variables. Similarly, X 2 bar
88
00:14:27,300 --> 00:14:34,300
represents mean of second variables, that
is, X 2. Now i is the class size or class
89
00:14:39,930 --> 00:14:42,620
interval.
90
00:14:42,620 --> 00:14:49,620
Let me write in a different way. Now, for
X 1 and X 2, covariance is equal to summation
91
00:14:54,130 --> 00:15:01,130
X 1 minus X 1 bar into X 2 minus X 2 bar;
i equal to 1 to n divided by n. So, now, I
92
00:15:05,480 --> 00:15:12,480
can write this structure into summation, simply
x 1 and x 2 divided by n. And, for simplicity,
93
00:15:17,680 --> 00:15:24,680
you can write like this first: X and Y. Then,
covariance of X Y equal to summation x y divided
94
00:15:26,190 --> 00:15:33,190
by n. So, n is the number of observations.
However, for X and Y if you put X 1 and X
95
00:15:38,380 --> 00:15:44,970
2, then obviously, the observation is n 1
and n 2. So, these are the observations. So,
96
00:15:44,970 --> 00:15:51,970
now, I have already mentioned that for covariance,
the essential condition is that sample observation
97
00:15:51,990 --> 00:15:58,990
must be same. If it is same or uniform, then
you can proceed further; that means n 1 must
98
00:16:00,829 --> 00:16:07,829
be equal to n 2. If n 1 not equal to n 2,
then the system itself is inconsistent. So,
99
00:16:13,680 --> 00:16:20,680
now, you cannot apply here covariance if the
sample observations are completely different.
100
00:16:21,930 --> 00:16:28,930
So, in order to have sample observation different,
then you have to adjust the system, that is,
101
00:16:32,910 --> 00:16:39,910
with respect to simplicity of this picture.
Now, to highlight all these things, you need
102
00:16:41,860 --> 00:16:48,860
to have examples. Let us take a case of two
examples here: X and Y. X here is minus 10,
103
00:16:51,279 --> 00:16:58,279
minus 5, 0, 5 and 10. Now, here you have 5,
9, 7, 11, then 13. So, now, you need to calculate
104
00:17:08,620 --> 00:17:15,620
covariance. So, covariance is simply the summation
of X minus X bar and Y minus Y bar divided
105
00:17:20,520 --> 00:17:27,520
by total number of observations; that means,
we like to know what is sum of X, what is
106
00:17:30,000 --> 00:17:37,000
sum of Y. Then, we like to know what is X
bar, what is Y bar. X bar is nothing but sum
107
00:17:39,150 --> 00:17:46,150
X by n and Y bar is nothing but sum Y divide
by n. Corresponding X bar and Y bar, we must
108
00:17:47,570 --> 00:17:53,450
have the component X minus X bar and Y minus
Y bar.
109
00:17:53,450 --> 00:18:00,450
Now, X minus X bar – that means, minus 10
here and minus this X bar. So, what is X bar
110
00:18:05,460 --> 00:18:12,460
here? Now, sum of X is equal to here 0; sum
of Y is here equal to 45. So, now, corresponding
111
00:18:16,370 --> 00:18:23,370
to X bar, sum of X equal to 0 divide by number
of observations; number of observation is
112
00:18:24,799 --> 00:18:31,500
here; n equal to 5 and this is also n equal
to 5. So, 0 by 5 is equal to 0. And, summation
113
00:18:31,500 --> 00:18:38,500
Y by n is nothing but 45 by 5; it is equal
to 9. So, obviously, it is X minus X bar;
114
00:18:41,890 --> 00:18:48,890
it is otherwise known as X minus 0; and, it
is nothing but Y minus 9. So, now, the corresponding
115
00:18:51,280 --> 00:18:58,280
component is nothing but minus 10, then minus
5, then 0, then 5, then 10. So, in the case
116
00:19:02,230 --> 00:19:09,230
of Y, it is minus 4; then 0; then it is minus
2; then it is 2; it is 4. So, now, this is
117
00:19:16,049 --> 00:19:23,049
this sum of x and sum of y. So, we need to
have X into Y. Then, we have to get summation
118
00:19:28,390 --> 00:19:35,390
XY. So, XY means here small xy; that is nothing
but X minus X bar into Y minus Y bar. So,
119
00:19:39,600 --> 00:19:46,600
this is the sum case. So, xy is nothing but
minus 10 into minus 4. So, it is 40; then,
120
00:19:48,830 --> 00:19:55,830
this is 0, this is 0, this is 10, this is
40. So, sum of xy is equal to simply 90. Now,
121
00:20:02,870 --> 00:20:05,730
I will summarize it.
122
00:20:05,730 --> 00:20:12,730
Now, remember here; we have two variables:
X variables and Y variables. Let us start
123
00:20:17,280 --> 00:20:24,280
with the univariate structure. Let us we start
with the component univariate structure. So,
124
00:20:27,700 --> 00:20:34,700
what is univariate structure? Now, you like
to know what is X bar, what is Y bar; this
125
00:20:37,809 --> 00:20:44,809
means arithmetic mean. Then, what is median,
what is mode, what is skewness? Similarly,
126
00:20:49,860 --> 00:20:56,860
in the case of X and in the case of Y. Now,
you know median is nothing but middle must
127
00:20:58,470 --> 00:21:05,429
below this sequence. So, if we arrange it
in proper sequence, then obviously, in the
128
00:21:05,429 --> 00:21:11,450
first case, the median structure is here;
so, 0. In the second case, you have to arrange
129
00:21:11,450 --> 00:21:17,570
it in ascending order. And, if you apply the
technique, then obviously, the size of median
130
00:21:17,570 --> 00:21:24,570
is the 9 also. Now, in the case of X, median
is 0; and, in the case of Y, it is 9. So,
131
00:21:26,650 --> 00:21:33,650
to get mode, mode equal to 3 median minus
2 mean. Accordingly, you will fill the gap
132
00:21:36,770 --> 00:21:43,710
from mode and you can fill the gap of mode.
And, for skewness, skewness is equal to mean
133
00:21:43,710 --> 00:21:50,710
minus mode or mean minus mode by standard
deviation, so that we have to calculate this
134
00:21:55,559 --> 00:22:02,210
skewness component. So, this is what the univariate
structure is concerned.
135
00:22:02,210 --> 00:22:09,210
Now, before explaining the issue of covariance,
you must have a clear-cut understanding about
136
00:22:09,480 --> 00:22:16,480
univariate statistics, because the univariate
output will give you the path for bivariate
137
00:22:16,990 --> 00:22:23,960
structure. So, bivariate results depends upon
the univariate results. So, you must have
138
00:22:23,960 --> 00:22:29,130
complete information about univariate statistic;
then, you have to proceed further. Further,
139
00:22:29,130 --> 00:22:36,130
you can say bivariate structure.
Now, coming to bivariate structure, we need
140
00:22:36,870 --> 00:22:43,870
to know what is summation XY. So, summation
xy is here – 90; this summation xy is the
141
00:22:47,640 --> 00:22:54,640
small xy, not capital XY. I like to clarify
one thing here. x is a deviation, which represents
142
00:22:56,039 --> 00:23:03,039
X minus X bar; and, y is also deviation, which
represents Y minus Y bar. So, obviously, sum
143
00:23:04,390 --> 00:23:11,390
of X minus X bar into Y minus Y bar is nothing
but summation xy. So, now, covariance upon
144
00:23:14,340 --> 00:23:21,340
x, y represents summation xy by n. That is
nothing but 90 divide by number of observation
145
00:23:25,820 --> 00:23:32,820
is 5, which is nothing but equal to 18. So,
now, this structure will give you the concept
146
00:23:36,250 --> 00:23:43,250
of covariance. So, covariance is nothing but…
So, the value of covariance is 18; that means,
147
00:23:48,919 --> 00:23:55,919
the association between X and Y is nothing
but 18. So, now, we have complete information
148
00:23:58,240 --> 00:24:05,240
about univariate structure and complete information
about the bivariate structure. So, that means…
149
00:24:06,770 --> 00:24:13,770
We like to know what is the X 1 in the system
or X in the system and what is the nature
150
00:24:14,110 --> 00:24:21,110
of Y in the system. And, we like to know what
the association between the two is. Now, that
151
00:24:21,360 --> 00:24:28,360
is possible under the structure of covariance.
Now, with the help of covariance, we like
152
00:24:29,250 --> 00:24:36,250
to know degree of associations, but sometimes
there is a problem. That is how it is another
153
00:24:40,330 --> 00:24:46,880
technique called as a correlation. Now, one
thing is very clear here is, because covariance
154
00:24:46,880 --> 00:24:51,990
will give you similar results, correlation
also gives you similar results, because the
155
00:24:51,990 --> 00:24:57,179
objective of correlation and covariance is
that it measures the degree of association
156
00:24:57,179 --> 00:25:04,150
between two variables. So, now, once you calculate
the degree of association through covariance,
157
00:25:04,150 --> 00:25:09,520
then correlation not at all matters; or, if
you calculate through correlation, then covariance
158
00:25:09,520 --> 00:25:16,520
does not matters. But, there is sometimes
issue. The issue is that correlation is much
159
00:25:19,539 --> 00:25:26,049
better technique, much advanced technique
than the covariance. Why? Because in some
160
00:25:26,049 --> 00:25:32,400
of the cases, covariance has limitation.
For instance, if you will go for comparative
161
00:25:32,400 --> 00:25:39,400
analysis… last class, we have discuss the
issue between the US dollar and Japanese yen.
162
00:25:41,770 --> 00:25:48,770
Now, when you will go for comparative analysis,
then obviously, unity of measurement always
163
00:25:49,130 --> 00:25:56,130
matters. So, in that case, the relative measure
sometimes is much handy for the analysis or
164
00:25:58,179 --> 00:26:04,120
for a particular problem. Like in the case
of univariate analysis, covariance is much
165
00:26:04,120 --> 00:26:09,289
better than standard deviation. In the case
of bivariate modeling, correlation is much
166
00:26:09,289 --> 00:26:15,630
better than the covariance, because it is
unitless measurement. So, now, we would like
167
00:26:15,630 --> 00:26:21,779
to know what is the structure and setup of
correlations. So, let me highlight here what
168
00:26:21,779 --> 00:26:28,779
is the entire structure of correlation here.
169
00:26:28,909 --> 00:26:35,909
Correlation – For X and Y, where X contains
X 1 up to X n and Y contains Y 1, Y 2, Y n.
170
00:26:46,049 --> 00:26:53,049
Now, we like to know what is the structure
of correlations. Now, correlation is nothing
171
00:26:55,100 --> 00:27:02,100
but we will write CoR upon X and Y; then,
correlation of X, Y is nothing but covariance
172
00:27:04,360 --> 00:27:11,360
of X and Y divided by sigma X into sigma Y.
So, what is sigma X and what is sigma Y? That
173
00:27:16,409 --> 00:27:23,409
is the issue here. Now, sometimes, it is represented
as sigma xy by sigma x and sigma y; where,
174
00:27:28,580 --> 00:27:35,580
sigma x stands for standard deviation of X
and sigma y represents standard deviation
175
00:27:45,960 --> 00:27:52,960
of Y and sigma xy represents covariance of
X and Y. So, sigma x, sigma y is the product
176
00:28:01,049 --> 00:28:06,320
of univariate modelling and sigma xy is the
product of bivariate modelling. So, now, correlation
177
00:28:06,320 --> 00:28:13,320
is nothing but the ratio between covariance
of XY by standard deviation of X and standard
178
00:28:13,520 --> 00:28:20,520
deviation of Y. So, put it in explicitly format.
Then, the correlation of XY is nothing but
179
00:28:20,580 --> 00:28:27,580
summation X minus X bar into Y minus Y bar
divided by summation X minus X bar whole square
180
00:28:36,110 --> 00:28:43,110
into summation Y minus Y
bar whole square.
Now, the issue is here; so, you like to know
181
00:28:59,120 --> 00:29:06,120
what is X bar here; X bar is the mean of X;
Y bar is the mean
of Y; and, n represents number of observations.
182
00:29:20,490 --> 00:29:27,490
Now, n does not matter here, because upper
side n and lower side n is cancelled. Now,
183
00:29:27,679 --> 00:29:34,179
if I simplify this structure, then it is simply
represented as summation xy by summation x
184
00:29:34,179 --> 00:29:41,179
squares and summation y squares. Now, x is
nothing but X minus X bar and y is nothing
185
00:29:45,020 --> 00:29:50,010
but Y minus Y bar; both are in deviation format.
186
00:29:50,010 --> 00:29:57,010
Now, with further simplicity, the correlation
coefficient can be calculated as N summation
187
00:30:01,000 --> 00:30:08,000
XY minus summation X into summation Y divided
by N summation X square minus sum X whole
188
00:30:12,049 --> 00:30:19,049
square into N summation Y squares minus sum
Y whole squares whole to the power 1 by 2.
189
00:30:25,360 --> 00:30:32,360
So, this is the complete structure of correlations.
Now, the interesting is here – the value
190
00:30:38,090 --> 00:30:45,090
and nature of correlations. Now, before we
proceed to put it in a real example format,
191
00:30:50,799 --> 00:30:57,669
we like to know what is the features and specialty
of correlation. The feature and specialty
192
00:30:57,669 --> 00:31:04,669
of correlation is that first, the value of
correlation coefficient, which is usually
193
00:31:07,130 --> 00:31:14,130
denoted as rho; so, minus 1 less than 1 rho
less than equal to 1; that means, the value
194
00:31:26,519 --> 00:31:33,519
of correlation coefficient lies between minus
1 to plus 1. So, this is the standard techniques;
195
00:31:34,630 --> 00:31:41,630
mathematically, there is a proof. So, it is
always true that the value of correlation
196
00:31:41,799 --> 00:31:48,799
coefficient should be in between minus 1 to
plus 1. So, if it is minus value, then it
197
00:31:49,190 --> 00:31:54,929
is represented as negative correlation; and,
if it is towards close, then it is called
198
00:31:54,929 --> 00:32:01,289
as positive correlation.
Now, second property is that correlation coefficient
199
00:32:01,289 --> 00:32:08,289
is symmetric in nature; r xy is equal to r
yx. Then, covariance of X, Y is sometimes
200
00:32:17,149 --> 00:32:24,149
represented as covariance of U and V; U and
V is treated as another set of variables;
201
00:32:28,710 --> 00:32:35,710
that means, the most important trick is that
correlation coefficient is independent of
202
00:32:38,120 --> 00:32:45,120
change of original scale. For instance, if
you need to take a case here, U equal to X
203
00:32:47,279 --> 00:32:54,279
minus A by h and V equal to Y minus B by k,
then we will simplify this; then, we will
204
00:32:58,419 --> 00:33:05,419
get r uv. Whatever r uv you will get, same
structure you will be get it through r xy.
205
00:33:07,500 --> 00:33:14,500
Sometimes this origin and scale are very important
when we will go for higher order problem and
206
00:33:14,559 --> 00:33:21,559
complex problem. So, it is very important
for that particular angle. So, now, before
207
00:33:21,700 --> 00:33:28,700
proceeding further, I like to highlight here
one thing that one of the condition of this
208
00:33:28,730 --> 00:33:35,519
correlation is that the variable structure
or sample observations must be uniform from
209
00:33:35,519 --> 00:33:42,519
both end X and Y. So, let me here take an
example – so, how this example can be applied
210
00:33:45,700 --> 00:33:50,590
to calculate the correlation coefficient?
211
00:33:50,590 --> 00:33:57,590
Now, we can cite the same example. In fact,
we can cite the same example here; the example
212
00:33:59,269 --> 00:34:06,269
is here – X represents minus 10, minus 5,
0, 5, 10; and, Y represents 5, 9, 7, 11 13.
213
00:34:18,220 --> 00:34:25,220
We need to calculate correlation coefficient.
As usual you must have some univariate statistics
214
00:34:26,369 --> 00:34:33,329
first; that is, descriptive situation. So,
the will come automatically. Let me highlight
215
00:34:33,329 --> 00:34:39,470
here. This is, you need summation X you need
summation Y.
216
00:34:39,470 --> 00:34:45,349
Now, univariate statistic structure – you
would like to know what is summation X, what
217
00:34:45,349 --> 00:34:52,349
is summation Y. Summation X is here 0 and
summation Y is here 45. So, corresponding
218
00:34:53,530 --> 00:35:00,530
to summation X, we have X bar, which is nothing
but summation X by n and n represents here
219
00:35:00,710 --> 00:35:07,710
5. So; obviously, X bar is equal to summation
X by n; that is nothing but 0 by 5 and which
220
00:35:10,339 --> 00:35:17,339
is equal to 0. So, now, summation Y case – it
is Y bar is equal to again 45 by 5; it is
221
00:35:19,980 --> 00:35:26,380
equal to 9. Now, this particular information
is very much required for further analysis.
222
00:35:26,380 --> 00:35:33,380
So, now, we need to have X minus X bar and
Y minus Y bar. So, now, I am directly writing
223
00:35:34,880 --> 00:35:41,880
here; so, minus 10, minus 5, 0, 5, 10. Then,
here minus 4, 0, 2, minus 2, 2, then 4. So,
224
00:35:53,160 --> 00:36:00,160
then, this represents X component – small
x; and, this represents small y; so, now,
225
00:36:00,220 --> 00:36:04,970
small x and small y.
We need to have information about small x
226
00:36:04,970 --> 00:36:11,970
square and we need to have information about
small y square; and, we need to have information
227
00:36:11,970 --> 00:36:18,970
about small x into small xy. Now, x square
represents here 100, then 25, 0, then 25,
228
00:36:22,550 --> 00:36:29,550
and 100. Then, y square is equal to 16, 0,
then 4, then 4, then16. So, now, we like to
229
00:36:40,220 --> 00:36:47,220
know what is sum x square. So, now, come down
to here sum x square and sum y square. So,
230
00:36:48,390 --> 00:36:55,390
then, we must have summation xy. Now, for
summation xy, this is minus 10 into minus
231
00:36:57,530 --> 00:37:04,530
4 – this is 40; then, minus 5 into 0 – 0;
then 0 minus 2 – 0; then, 5 into 2 – 10;
232
00:37:07,670 --> 00:37:14,670
then, 10 into 4 – it is 40. Now, if we simplify
further, then summation x square is equal
233
00:37:18,700 --> 00:37:25,700
to 250; then, summation y square is 40; and,
summation xy is nothing but 90. So, the sum
234
00:37:29,170 --> 00:37:36,170
total is 250, 40 and 90. Now, we have to know
what the value of correlation coefficient
235
00:37:42,180 --> 00:37:49,180
is. Having such information here, we like
to know the correlation statistics. So, now,
236
00:37:49,910 --> 00:37:56,910
correlation for XY is nothing but covariance
of X, Y into sigma x into sigma y. So, now,
237
00:38:00,859 --> 00:38:07,859
sigma x is equal to summation x square by
n and sigma y is equal to summation y square
238
00:38:08,540 --> 00:38:10,030
by n.
239
00:38:10,030 --> 00:38:17,030
Now, the moment you will put it here, then
obviously, correlation coefficient is equal
240
00:38:19,510 --> 00:38:26,510
to covariance of XY, which is nothing but
summation xy by n divided by sigma x and sigma
241
00:38:36,089 --> 00:38:43,089
y upon 1 by n. So, now, we like to know what
is covariance of X, Y. Covariance of X, Y
242
00:38:48,020 --> 00:38:55,020
is nothing but summation xy by n, which is
nothing but 90 by 5. So, it will be around
243
00:38:56,220 --> 00:39:03,220
18. So, now, we have already sigma x; sigma
x equal to summation x square – 250 by 5
244
00:39:03,630 --> 00:39:10,630
square root; and, sigma y is equal to 40 by
5 square root. So, the structure of correlation
245
00:39:18,089 --> 00:39:25,089
coefficient is that r equal to 90 by square
root of 250 and square root of 40. So, n,
246
00:39:30,500 --> 00:39:36,329
n automatically cancels. Now, this is around
0.9
247
00:39:36,329 --> 00:39:43,329
But, one thing is very clear here, we know
the correlation coefficient is always between
248
00:39:45,630 --> 00:39:52,630
minus 1 and plus 1; and, the natural correlation
depends upon the movement of covariance. The
249
00:39:57,710 --> 00:40:04,710
reason is that standard deviation of X must
be always positive, because it is the square
250
00:40:07,390 --> 00:40:14,390
root of variance. Similarly, standard deviation
of Y is also always positive. So, the value
251
00:40:16,650 --> 00:40:23,650
of correlation whether it is negative or positive
depends upon the value of covariance. Now,
252
00:40:27,420 --> 00:40:34,420
if the covariance is negative, then we must
have negative correlation; and, if the value
253
00:40:35,640 --> 00:40:42,640
of covariance is positive, then we must have
positive correlation. So, now, with these
254
00:40:43,160 --> 00:40:50,160
structures we can cite here the essential
condition for this structure is that sigma
255
00:40:50,540 --> 00:40:57,540
x is always greater than 0; sigma y is always
greater than 0; and, sigma x y is either greater
256
00:41:01,510 --> 00:41:08,430
than or less than equal to 0. Sigma x sometimes
can be also equal to 0. So, that is why the
257
00:41:08,430 --> 00:41:14,369
condition is that sigma x greater than equal
to 0; sigma y greater than equal to 0; and,
258
00:41:14,369 --> 00:41:19,579
sigma xy is greater than equal to 0.
259
00:41:19,579 --> 00:41:26,579
Now, in order for better simplicity, we like
to know the nature of the correlation coefficient.
260
00:41:29,609 --> 00:41:36,609
Now, I will explain here the detailed structure
of correlation. Correlation basically has
261
00:41:41,700 --> 00:41:48,700
three different formats: the first format
represents the simple correlation; then, second
262
00:42:02,310 --> 00:42:09,310
is called as a partial correlation; third
is called as a multiple correlation. So, correlation
263
00:42:14,530 --> 00:42:21,530
again can be linear and can be non-linear.
Then, correlation can be positive, can be
264
00:42:28,810 --> 00:42:35,810
negative or can be zero. So, correlation – the
basic framework is simple correlation, partial
265
00:42:46,520 --> 00:42:53,520
correlation and multiple correlation. It can
be linear in nature; it can be non-linear
266
00:42:58,040 --> 00:43:05,040
in nature; it can be positive; it can be negative.
Sometimes, the value of correlation coefficient
267
00:43:10,890 --> 00:43:17,890
can be also 0. If the value of correlation
coefficient equal to 0, then there is no association
268
00:43:23,339 --> 00:43:30,339
between these two variables.
For instance, we have a relationship between
269
00:43:31,760 --> 00:43:38,760
pen and paper, but we may not have a relationship
between pen and chair, because we do not have
270
00:43:45,319 --> 00:43:52,319
any link between pen and chair. So, one interesting
issue is here that before entering to the
271
00:43:58,560 --> 00:44:05,560
correlation statistic, there must be sound
theory behind it, because anything you try
272
00:44:08,500 --> 00:44:15,500
to integrate, you will get some value, because
it is all about mathematical calculation.
273
00:44:15,810 --> 00:44:22,810
But, the interpretation, the utility, the
usefulness depends upon its theory only. Theory
274
00:44:27,010 --> 00:44:34,010
will give you support or you can say sound
structure, so that you can establish the problem
275
00:44:40,170 --> 00:44:47,170
setup. If there is no theory behind the correlation
approach, then this term is called as simply
276
00:44:54,040 --> 00:44:59,180
nonsense correlation; it is sometimes called
as a nonsense correlation.
277
00:44:59,180 --> 00:45:06,180
Now, before we go into detail structure about
that issue, let me explain how the accurate
278
00:45:09,210 --> 00:45:16,210
structure here is. Now, within the particular
setup, this particular structure is called
279
00:45:22,290 --> 00:45:29,290
as a multivariate framework. It cannot be
with two variables here; it is with respect
280
00:45:33,680 --> 00:45:40,680
to more than two variables. But, simple correlation
is a bivariate game. However, partial and
281
00:45:43,710 --> 00:45:50,710
multiple correlations are multivariate game.
So, here we will not discuss about this partial
282
00:45:54,069 --> 00:45:58,780
correlation coefficient and multiple correlation
coefficient, because we will discuss details
283
00:45:58,780 --> 00:46:05,780
when we will go for multivariate modeling.
Now, if we integrate all these structures
284
00:46:08,050 --> 00:46:15,050
here, then we have various forms. Let me explain
here what these forms are. I will give you
285
00:46:16,640 --> 00:46:23,640
indication. This is one way we can represent
the correlation coefficient. Here X information
286
00:46:25,000 --> 00:46:32,000
and Y information. Now, one step of correlation
structure is like this. Now, I will just draw
287
00:46:35,640 --> 00:46:42,640
this. This particular setup is called as a
positive linear correlation. This is case
288
00:46:55,980 --> 00:47:02,980
1. Case 2 – 0, X and Y; now, the structure
may be like this. I am just highlighting what
289
00:47:10,280 --> 00:47:17,280
are the possibilities under the correlation
modeling. Now, this particular structure is
290
00:47:17,280 --> 00:47:24,280
called as a negative
linear correlation; this is situation of case
2.
291
00:47:37,579 --> 00:47:44,579
I will put another structure here. I will
represent like this 0, X, Y. Then, this structure
292
00:47:51,510 --> 00:47:58,510
may be like this; I will put like this; then,
I will join like this. So, this is
called as a positive non-linear correlation.
293
00:48:15,790 --> 00:48:22,790
This is case 3. So, case 3 is a positive non-linear
correlation. Then, I represent another case
294
00:48:25,230 --> 00:48:32,230
here. Now, that case is like this; then, I
will draw like this. So, this is X and Y.
295
00:48:41,859 --> 00:48:48,859
So, this particular structure is called as
a negative non-linear correlation. So, now,
296
00:48:50,930 --> 00:48:57,930
case 4 is negative non-linear correlation.
So, now, we have four different games: first
297
00:49:22,119 --> 00:49:29,119
is positive linear correlation; negative linear
correlation, positive non-linear correlation,
298
00:49:31,260 --> 00:49:38,170
negative non-linear correlation. So, that
means, altogether four different sets of correlation
299
00:49:38,170 --> 00:49:45,170
we can find.
If the structure is in between 0 to minus
300
00:49:47,829 --> 00:49:54,829
1 and 0 to plus 1; if the structure or the
value of correlation coefficient is exactly
301
00:49:58,970 --> 00:50:05,970
at that 0 level, that means the structure
is completely different, which we call it
302
00:50:07,619 --> 00:50:14,619
zero correlation. So, that means there is
another case, where we have no correlation
303
00:50:17,530 --> 00:50:24,530
between two variables. So, that means the
setup is like this. So, we have no relationship
304
00:50:24,880 --> 00:50:31,880
between these two variables. So, that is why,
you must have sound logic, sound theory, sound
305
00:50:33,059 --> 00:50:38,290
structure; then, you can apply the correlation
technique or covariance technique.
306
00:50:38,290 --> 00:50:45,290
Without any theory, logic and structure if
you will apply correlation coefficient, then
307
00:50:47,099 --> 00:50:53,440
obviously, sometimes, either you may get zero
correlation or you may get simply nonsense
308
00:50:53,440 --> 00:50:59,050
correlation; nonsense correlation means the
value may not be equal to 0, but it does not
309
00:50:59,050 --> 00:51:06,050
support any theory. For example, if I will
just plot one cite, this room, number of chairs
310
00:51:08,059 --> 00:51:12,170
are there; first room, number of chairs – 10;
second room, number of chairs is 20; third
311
00:51:12,170 --> 00:51:17,920
room, number of chairs is 40; then, first
room, the number of availability of pen is
312
00:51:17,920 --> 00:51:24,119
5; second room, number of pens available – 20;
third, number of pen is 40; then obviously,
313
00:51:24,119 --> 00:51:31,119
I do not find any theory between number of
chairs in particular structure and number
314
00:51:31,400 --> 00:51:38,400
of pens in a another particular structure,
where we will get the correlation. Now, correlation
315
00:51:38,970 --> 00:51:44,640
can be positive correlation, can be negative
correlation; that is, with linear structure;
316
00:51:44,640 --> 00:51:51,640
and that is, with non-linear structure. And,
in between, there may be the case called as
317
00:51:51,740 --> 00:51:58,740
a zero correlation. So, this is the entire
setup of the structure of correlation.
318
00:51:58,760 --> 00:52:05,760
Now, the correlation is a very important tool
for bivariate econometric modeling. Specially,
319
00:52:07,190 --> 00:52:14,190
it is the middle part in between covariance
and regressions. It is much better than covariance
320
00:52:14,700 --> 00:52:21,700
and less better than the regression. The advantage
of correlation is that it brings the degree
321
00:52:27,309 --> 00:52:33,180
of association between these two variables.
Since we have already mentioned, the value
322
00:52:33,180 --> 00:52:40,180
of correlation coefficient is in between minus
1 and plus 1, then obviously, the nature of
323
00:52:41,930 --> 00:52:48,930
association will be very divergent, also interesting.
If the value of correlation is exactly 1,
324
00:52:53,670 --> 00:53:00,670
then it is called as a perfect positive correlation.
If the value of correlation is exactly minus
325
00:53:01,490 --> 00:53:08,490
1, then it is called as a perfectly negative
correlation. If it is very close to minus
326
00:53:10,550 --> 00:53:17,550
1, then it is highly correlated, negatively
correlated correlation. If it is very close
327
00:53:19,099 --> 00:53:25,809
to plus 1, then it is called as a highly positive
correlation.
328
00:53:25,809 --> 00:53:32,809
However, if it is very close to in betweens
like this – so, the structure is, this is
329
00:53:33,380 --> 00:53:40,380
minus 1 and this is plus 1. So, this is 0.
Now, if I will say here minus 0.8 and this
330
00:53:44,780 --> 00:53:51,780
is here 0.8, then if this is the then this
is called as a high correlation; and, this
331
00:53:52,109 --> 00:53:57,630
is also high correlation; high negative correlation
and this is high positive correlation. So,
332
00:53:57,630 --> 00:54:04,630
now, if it is in between 4 to 6, then it is
called as a moderate correlation; it may be
333
00:54:05,819 --> 00:54:12,819
moderate positive; it may be moderate negative.
So, now, if the value of correlation is less
334
00:54:13,369 --> 00:54:20,369
than 4, 3 or very close to 0, then it is called
as a very low correlation. So, now, the association
335
00:54:26,290 --> 00:54:33,290
of the two variables can be very strong if
the value of correlation is very high; if
336
00:54:33,770 --> 00:54:40,770
the value of correlation is low, then the
association is also very low.
337
00:54:43,220 --> 00:54:50,220
This is very interesting component and very
useful for multivariate technique and it is
338
00:54:51,800 --> 00:54:58,740
a component regression technique. However,
the essential part of this bivariate modelling
339
00:54:58,740 --> 00:55:04,010
is that you must have a thorough knowledge
and complete information about the univariate
340
00:55:04,010 --> 00:55:08,770
modeling. Until and unless you have complete
information, complete setup, you cannot handle
341
00:55:08,770 --> 00:55:15,770
the game of correlation. Now, it is not possible
for us to discuss the detail about regression
342
00:55:15,890 --> 00:55:19,799
modelling here; again, within the setup of
bivariate modeling, which we will discuss
343
00:55:19,799 --> 00:55:23,990
in the next class. With this, we can conclude
this session.
344
00:55:23,990 --> 00:55:25,400
Thank you very much. Have a nice day.