1
00:00:20,710 --> 00:00:26,560
Good afternoon, we will continue multivariate
descriptive statistics today. Last class I
2
00:00:26,560 --> 00:00:37,339
have given you S j k the formula to compute
the covariance between two variables X j and
3
00:00:37,339 --> 00:00:52,929
X k, and this is the sample covariance computation
formula. Now, we want to use matrix here,
4
00:00:52,929 --> 00:01:03,789
primarily matrix multiplication to compute
S, where S is this one, that P cross P matrix,
5
00:01:03,789 --> 00:01:11,369
diagonal elements will be the variance component
and up diagonal elements will be the covariance
6
00:01:11,369 --> 00:01:19,350
component. We will compute these from the
data matrix.
7
00:01:19,350 --> 00:01:31,240
Last class you have seen the data matrix like
this that x 11, x 2 1 like x i 1, x n 1, that
8
00:01:31,240 --> 00:01:40,000
is observation on variable x 1. Similarly,
observation of variable x 2 is x 1 2, x 2
9
00:01:40,000 --> 00:01:56,750
2 like x i 2 then x n 2, so you take all p
variables then x 1 p, x 2 p, x i p and like
10
00:01:56,750 --> 00:02:08,679
this x n p. This is our n cross p matrix.
Now, let us consider a general observation
11
00:02:08,679 --> 00:02:21,480
here, which is x i j, so if we create one
more general observation x star i j, which
12
00:02:21,480 --> 00:02:36,000
is x i j minus x j bar, then what you do,
if you subtract each of the elements by its
13
00:02:36,000 --> 00:02:43,150
respective mean. For example, for this one,
if you subtract by x 1 bar, for these it is
14
00:02:43,150 --> 00:02:49,519
x 2 bar like these it is x p bar, then we
will create another matrix, which we are denoting
15
00:02:49,519 --> 00:03:08,090
like this x star this is your x star 1 1,
x star 2 1 so like this x star n 1 then x
16
00:03:08,090 --> 00:03:26,510
star 1 2, x star 2 2 like this x star n 2.
So, x star 1 p, x star 2 p like this x star
17
00:03:26,510 --> 00:03:40,079
n p.
Now, if you create these x star transpose,
18
00:03:40,079 --> 00:03:52,989
what will be the order of this matrix p cross
n and take a dot product with x star. This
19
00:03:53,020 --> 00:04:11,329
resultant matrix n cross p resultant matrix will be
p by p cross matrix, which is nothing but
20
00:04:11,329 --> 00:04:25,970
n minus 1 into S, where S is the covariance
matrix. So, S is p cross p that is the covariance
21
00:04:25,970 --> 00:04:37,000
of x, where you have taken this calculated
this x. So, in matrix multi manipulation you
22
00:04:37,000 --> 00:04:50,849
are you are able to find the covariance matrix
in just one go for all the variables. Now,
23
00:04:50,849 --> 00:04:58,069
you solve one small problem here.
24
00:04:58,069 --> 00:05:14,529
For example, suppose this one you see that
you take this one that first 10 12 11 100
25
00:05:14,820 --> 00:05:20,480
110 and 105, this data matrix.
26
00:05:20,490 --> 00:05:32,939
X equal to 10 12 11 100 110 and 105.
27
00:05:32,939 --> 00:05:42,689
So, what will be your x 1 bar or if I say
x bar equal to x 1 bar and x 2 bar, what will
28
00:05:42,689 --> 00:05:56,050
be its value it will be 10 plus 12, that is
33 by 3, 11 and this one will be 100 110 105
29
00:05:56,050 --> 00:06:03,969
105. Now, what you are creating, you are creating
X star. What is this X star? You want that
30
00:06:03,969 --> 00:06:14,189
each element on x 1 will be subtracted by
its mean 11. Similarly, each element of x
31
00:06:14,189 --> 00:06:25,270
2 will subtracted by its mean 105 so then
this one will be 10 minus 11, 12 minus 11,
32
00:06:25,270 --> 00:06:42,659
11 minus 11, second one will be 100 minus
105, 110 minus 105, 105 minus 105. Then what
33
00:06:42,659 --> 00:06:50,999
is happening here, you are getting this is
10 minus 11 minus 1 plus 1 zero, then 100
34
00:06:50,999 --> 00:07:05,099
minus that is minus 5 plus 5 zero. So, sincerely
that zero is element is coming here.
35
00:07:05,099 --> 00:07:15,090
Let us see that what will happen if we do
like this x transpose x, what will happen
36
00:07:15,090 --> 00:07:33,949
here minus 1 1 0 minus 5 5 0, multiplied by
a x star minus 1 1 0 minus 5 5 0. If you multiply
37
00:07:33,949 --> 00:07:41,599
what will happen, this is basically 2 cross
3 matrix, this one is 3 cross 2, so you want
38
00:07:41,599 --> 00:07:53,069
to get matrix called 2 cross 2. So, this time
this minus 1 into minus 1, that is plus 1,
39
00:07:53,069 --> 00:08:02,719
plus 1 plus zero for every this 2. Now, second
one minus 1 into minus 5 that is plus 5, 1
40
00:08:02,719 --> 00:08:14,029
into 5 plus 5 that is 10, so this 1 minus
5 into 1 5 5 into 1 5 5 plus 5 10 and this
41
00:08:14,029 --> 00:08:29,169
one 25 plus 25 that is 50. So, we say n minus
1 into S equal to X star transpose X star,
42
00:08:29,169 --> 00:08:42,750
so which is here 2 10 10 50. So, what is n
value here 3, so minus 1 is 2, so s will be
43
00:08:42,750 --> 00:08:54,780
1 by 2 2 10 10 50 then what is this value
now.
44
00:08:54,780 --> 00:09:15,310
Our S is that means 1 5 5 25, so you have
already computed x bar, which is your 11 105
45
00:09:15,310 --> 00:09:28,530
and your S is this. So, that mean S 1 1 is
1, S 2 2 is 25, what does it mean S 1 square
46
00:09:28,530 --> 00:09:37,650
equal to 1 and S 2 square equal to 25, S 1
equal to 1, s 2 equal to 5, that is a standard
47
00:09:37,650 --> 00:09:50,590
deviation both the cases and S 1 2, which
is 5 the covariance between x 1 and x 2. So,
48
00:09:50,590 --> 00:10:07,080
if I say that my population is multivariate
normal that is 2 cross 1, this is the variable,
49
00:10:07,080 --> 00:10:19,960
so it is basically N 2 mu and sigma, then
my mu is mu 1 and mu 2 and sigma will be 2
50
00:10:19,960 --> 00:10:27,830
cross 2 sigma 1 1, sigma 1 2, sigma 1 2, sigma
2 2, this is 2 cross 2.
51
00:10:27,830 --> 00:10:48,270
So, we can now say, this is our x bar the
estimate
in this manner we will precede. So, that please
52
00:10:48,270 --> 00:10:59,630
remember multivariate descriptive statistics
has 3 components, one is mean vector, second
53
00:10:59,630 --> 00:11:18,750
one is. Covariance matrix, third one is correlation
matrix. Now, we will discuss about correlation
54
00:11:18,750 --> 00:11:46,340
matrix. Now, population correlation matrix
is denoted by rho, this is population
correlation matrix.
55
00:11:46,340 --> 00:11:57,980
Basically, what I mean to say that the population
characterized by p variable, then will you
56
00:11:57,980 --> 00:12:06,120
get p cross p matrix for the population correlation
matrix, like p cross p for some population
57
00:12:06,120 --> 00:12:24,220
covariance matrix. Duty here is that your
diagonal element will be 1, this is the correlation
58
00:12:24,220 --> 00:12:31,430
of the same variable with it. And up diagonal
variable element will be writing like this
59
00:12:31,430 --> 00:12:43,890
rho 1 2, like rho 1 p here also rho 1 2, rho
2 p. So, like this rho 1 p, rho 2 p, it will
60
00:12:43,890 --> 00:12:58,360
continue like this. Now, if this is the case
we find out a relationship between rho and
61
00:12:58,360 --> 00:13:05,760
sigma.
What is sigma, population covariance matrix
62
00:13:05,760 --> 00:13:16,030
that we have seen sigma 1 1, sigma 1 2, sigma
1 p, sigma 1 2, sigma 2 2, sigma 2 p like
63
00:13:16,030 --> 00:13:26,800
this 1 p, sigma 2 p, sigma p p. So, the crux
of the matter is the diagonal elements are
64
00:13:26,800 --> 00:13:36,790
variance that is means, the same variable
varying with it that is single here, diagonal
65
00:13:36,790 --> 00:13:50,940
correlation. So, if you see what is the correlation
between x j and x k, then you can write this
66
00:13:50,940 --> 00:14:11,800
one as covariance between x j x k by standard
deviation of x j times standard deviation
67
00:14:11,800 --> 00:14:23,550
of x k. So, mathematical what we will write
basically mathematically that correlation
68
00:14:23,550 --> 00:14:37,830
of x j x k is cov of x j x k divided by sigma
j sigma k.
69
00:14:37,830 --> 00:14:48,760
Now, you have seen from this figure, we are
saying that correlation between j and k is
70
00:14:48,760 --> 00:14:55,930
somewhere like this rho j k, here also we
are talking about that covariance between
71
00:14:55,930 --> 00:15:05,400
this is this like this. So, then if we use
this same notation I can write rho j k equal
72
00:15:05,400 --> 00:15:15,190
to sigma j k divided by sigma j sigma k, that
is the relationship. What is the relationship
73
00:15:15,190 --> 00:15:25,690
then covariance between 2 variables is the
correlation between the 2 variables times
74
00:15:25,690 --> 00:15:36,620
its standard deviations.
Now, what will happen to this correlation
75
00:15:36,620 --> 00:15:46,970
when sigma j, what I mean to say j equal to
k. You see what will happen to this that means
76
00:15:46,970 --> 00:15:56,720
it will be j j, j equal to k j j, then sigma
j j by sigma j and we have discussed earlier
77
00:15:56,720 --> 00:16:06,700
that sigma j j is nothing but sigma j square.
So, sigma j square which is one and as a result
78
00:16:06,700 --> 00:16:19,780
you are getting all 1. So, conceptually that
the same variable you are trying to find out
79
00:16:19,780 --> 00:16:25,360
the covariance between the same variable,
then that will be give the variance.
80
00:16:25,360 --> 00:16:30,490
And here what you are doing, you are basically
standardizing it by dividing the standard
81
00:16:30,490 --> 00:16:37,320
deviation to the co variance component. So,
as a result this standardization effect is
82
00:16:37,320 --> 00:16:47,600
bringing you that all the diagonal elements
will be 1. The same thing will happen for
83
00:16:47,600 --> 00:17:00,660
sample data also. What do mean by suppose
correlation between j and k is plus 1, correlation
84
00:17:00,660 --> 00:17:13,819
between j k is minus 1, correlation between
j k is 0. This one is saying that perfectly
85
00:17:13,819 --> 00:17:22,970
positively correlated and one stands for the
perfect correlation, this minus stands for
86
00:17:22,970 --> 00:17:27,360
negative, which is why that was positively
correlated, this is negatively correlated,
87
00:17:27,360 --> 00:17:38,480
and this one is having no correlation positive
negative and no correlation.
88
00:17:38,480 --> 00:17:47,470
And if you draw this, suppose you have two
variables this side suppose x j and this side
89
00:17:47,470 --> 00:17:56,529
x k and if you draw scatter plot for positive
correlation, you may get like this that is
90
00:17:56,529 --> 00:18:09,749
rho j k equal to one. And negatively correlation
means x j will increase x k will decrease
91
00:18:09,749 --> 00:18:18,950
or vice versa. So, you can think like this,
here what is happening in this case this rho
92
00:18:18,950 --> 00:18:30,299
j k equal to minus 1. Assuming that all the
values of x j and x k falling on this line,
93
00:18:30,299 --> 00:18:44,409
that is why negative that mean here when x
j increasing, x k is decreasing.
94
00:18:44,409 --> 00:18:55,210
And in the in this case when x j is increasing,
x k also increasing, all are falling under
95
00:18:55,210 --> 00:19:03,909
the positive 1 possible when it will increase
both the variable co vary in the same direction,
96
00:19:03,909 --> 00:19:09,249
negative means in the opposite direction and
1 is possible, when you will find a perfect
97
00:19:09,249 --> 00:19:14,970
straight line if you do curve fitting. And
when you will get zero suppose, your points
98
00:19:14,970 --> 00:19:31,590
are like this, this is your x j, this y axis
is x k, points are like this.
99
00:19:31,590 --> 00:19:40,929
You cannot find circle, this points are resembles
circle it’s totally random, so when you
100
00:19:40,929 --> 00:19:53,019
find this type of randomness, that mean it
is resembling a circle. There is no relation
101
00:19:53,019 --> 00:19:59,679
because you see you take any direction; you
will not get any pattern here, which is the
102
00:19:59,679 --> 00:20:11,580
meaning of correlation coefficient like rho
j k. You convert it into the same how do you
103
00:20:11,580 --> 00:20:12,649
calculate?
104
00:20:12,649 --> 00:20:25,190
You have data set like x, where same data
set n cross p
and I have already given you that we have
105
00:20:25,190 --> 00:20:44,450
converted this x star, where each of the observation
was subtracted by its mean like this. This
106
00:20:44,450 --> 00:21:02,190
n 1 minus x 1 bar, n 2 minus x 2 bar, n p
minus x p bar, this was we created earlier.
107
00:21:02,190 --> 00:21:20,049
And here we created one general x i j, here
also we have created that x i j star, where
108
00:21:20,049 --> 00:21:33,999
x star i j is x i j minus x j bar. Now, let
us create another variable, let us write like
109
00:21:33,999 --> 00:21:54,330
this x tilde i j, this one is x i j minus
x j bar by s j, what are you doing? You are
110
00:21:54,330 --> 00:21:59,769
first finding out the mean subtracted value,
and then dividing it by the corresponding
111
00:21:59,769 --> 00:22:10,009
standard deviation. I can write it like this,
x i j minus x j bar square root of s j j.
112
00:22:10,009 --> 00:22:22,679
If you follow these then you will create a
matrix which is x tilde, this one will be
113
00:22:22,679 --> 00:22:49,049
look like this x 1 one minus x 1 bar, these
divided by S 1 1 square root then x 2 1 minus
114
00:22:49,049 --> 00:23:01,269
x 1 bar divided by square root of S 1 1, same
manner all the observation in the x variable
115
00:23:01,269 --> 00:23:15,340
is x n 1 minus x 1 bar divided by square root
of S 1 1. For x 2 what you will do x 1 2 minus
116
00:23:15,340 --> 00:23:28,450
x 2 bar divided by square root of S 2 2, this
is 2 2 minus x 2 bar divided by S 2 2 like
117
00:23:28,450 --> 00:23:39,990
this, x n 2 minus x 2 bar divided by s 2 2.
So, if you go in this manner for p th variable
118
00:23:39,990 --> 00:23:52,710
then you will write x 1 p minus x p bar divided
by square root of x p p, then x 2 p minus
119
00:23:52,710 --> 00:24:02,940
x p bar divided by square root of S p p, same
manner x n p minus x p bar divided by square
120
00:24:02,940 --> 00:24:22,330
root of S p p. This is a transform at data
matrix p cross p, when it is x 1, all will
121
00:24:22,330 --> 00:24:32,539
be 1 1, x 2 all will be 2, when it is p all
will be x p p. See the similarity is all these
122
00:24:32,539 --> 00:24:42,029
are x 1, when each observation is subtracted
by the same mean vector of the variable 1,
123
00:24:42,029 --> 00:24:49,610
and each that sub resultant quantity is divided
by square root of s 1 1.
124
00:24:49,610 --> 00:24:58,649
Similarly, here x 2 bar square root of S 2
2, similarly here x p bar square root of S
125
00:24:58,649 --> 00:25:15,350
p p. So, you see that earlier ultimately,
if you see the covariance and correlation
126
00:25:15,350 --> 00:25:24,960
relationship, you have found out that see
j k, when you say j k, we will basically the
127
00:25:24,960 --> 00:25:32,480
co variance component is divided by the corresponding
standard deviation. So, in order to achieve
128
00:25:32,480 --> 00:25:39,389
this, what we are doing here, we are now dividing
each of the observation by the corresponding
129
00:25:39,389 --> 00:25:41,460
standard deviation.
130
00:25:41,460 --> 00:25:51,009
Now, if you find out this one with this so
x transpose then it will be p cross n, if
131
00:25:51,009 --> 00:25:59,769
you give the transpose here x tilde transpose
dot product x tilde, that will be your n cross
132
00:25:59,769 --> 00:26:12,429
p. So, the resultant quantity will be your
p cross p then what will happen? This will
133
00:26:12,429 --> 00:26:26,850
not be identity matrix; you see you will get
this one as n minus 1 into r. If you want
134
00:26:26,850 --> 00:26:33,759
to check it, you check it very simply, suppose
my data matrix is like this, I will take only
135
00:26:33,759 --> 00:26:44,679
three values x 1 1, x 2 1, x 3 1 then here
the second variable you take x 1 2, x 2 2
136
00:26:44,679 --> 00:26:50,600
and x 3 2.
So, in this case the data matrix is 3 cross
137
00:26:50,600 --> 00:27:01,929
2, where n is 3 and p is 2. Then what you
are creating here, you are creating x tilde.
138
00:27:01,929 --> 00:27:14,220
What is this? This is nothing but x 1 1 minus
x p bar by I am writing S 1 only, instead
139
00:27:14,220 --> 00:27:20,820
of square root of S 1 1, that is S 1 I am
writing. Then this will be what x 2 1 minus
140
00:27:20,820 --> 00:27:33,049
x 1 bar by S 1 and x 3 1 minus x 1 bar by
s 1 and this will be x 1 2 minus x 2 bar by
141
00:27:33,049 --> 00:27:47,889
S 2 then x 2 2 minus x 2 bar by S 2, then
x 3 2 minus x 2 bar by S 3. What will happen
142
00:27:47,889 --> 00:28:00,909
if you now do like another, keep in mind the
variable and accordingly write down this one.
143
00:28:00,909 --> 00:28:11,519
So, when if I do like this x tilde transpose
x tilde, what will happen your x 1 1 minus
144
00:28:11,519 --> 00:28:26,559
x 1 bar by S 1 x 2 1 minus x 1 bar by S 1
x 3 1 minus x 1 bar by S 1. And then this
145
00:28:26,559 --> 00:28:39,239
one will be x 1 2 minus x 2 bar by s 2 x 2
2 minus x 2 bar by S 2 x 3 2 minus x 2 bar
146
00:28:39,239 --> 00:28:51,399
by s 2, this times you will be writing the
same thing x 1 1 minus x 1 bar by S 1 x 2
147
00:28:51,399 --> 00:28:57,629
1 minus x 1 bar by S 1 x 3 1 minus x 1 bar
by s 1.
148
00:28:57,629 --> 00:29:10,950
And then here x 1 2 minus x 2 bar by S 2 x2 2 minus x 2 bar by S 2 x 3 2 minus x 2 bar
149
00:29:10,950 --> 00:29:22,309
by S 2, this one is 1 2 3. So, 2 cross 3,
this is 3 cross 2 you see this into this plus
150
00:29:22,309 --> 00:29:30,320
this into this plus this into this, what is
happening here x 1 1 minus x 1 bar by S 1
151
00:29:30,320 --> 00:29:36,409
x 1 1 minus x 1 bar by S 1, you are getting
a square. So, what you are doing then. You
152
00:29:36,409 --> 00:29:45,669
are basically getting some total if I write
i equal to 1 to 3 because our observation
153
00:29:45,669 --> 00:30:00,609
is 1 2 3. Now, second one stands for the variable
so x i 1 minus x 1 bar divided by divided
154
00:30:00,609 --> 00:30:11,830
by S 1 that square you are getting.
Then what will be this one, this cross this,
155
00:30:11,830 --> 00:30:18,710
now this cross this, what will we get x 1
1 minus x 1 bar x 1 2 minus x 2 bar. See you
156
00:30:18,710 --> 00:30:29,799
what you are getting here, you are also getting
i equal to 1 2 3 x i 1 minus x 1 bar divided
157
00:30:29,799 --> 00:30:41,779
by S 1 into x 2 minus x 2 bar divided by S
2. Same quantity will be getting here i equal
158
00:30:41,779 --> 00:30:54,960
to 1 2 3 x i 1 minus x 1 bar by S 1 and x
i 2 minus x 2 bar by S 2. Here you will be
159
00:30:54,960 --> 00:31:05,249
getting a square term i equal to 1 2 3 i 2
minus x 2 bar.
160
00:31:05,249 --> 00:31:14,289
Getting any similarity here, any clue are
you getting you seeing. We say what a sum
161
00:31:14,289 --> 00:31:29,149
total is of if I ask you what S 1 square is.
Suppose n equal to 3, then what is s 1 1 square,
162
00:31:29,149 --> 00:31:47,450
1 by n minus 1 sum total of i equal to 1 to
n x 1 or x i 1 you write minus x 1 bar square
163
00:31:47,450 --> 00:32:00,999
S 1. So, this quantity is what n minus 1 into
S 1 square.
164
00:32:00,999 --> 00:32:09,979
So, ultimately what will happen ultimately,
you will get this quantity like this that
165
00:32:09,979 --> 00:32:26,169
we can write n minus 1 means 3 minus 1 3 minus
1 into s 1 square by s 1 square. Then second
166
00:32:26,169 --> 00:32:36,559
one what will happen, this is the co variance
n minus 1, you will get S 1 2 by S 1 S 2 that
167
00:32:36,559 --> 00:32:47,820
is n is 3 here. So, 3 minus 1 S 1 2 by S 1
S 2 and then this one also you get 3 minus
168
00:32:47,820 --> 00:32:58,309
1 S 2 square by S 2 square. Then if I just
take out 3 minus 1 which is basically 2 then
169
00:32:58,309 --> 00:33:10,580
what I will get, I will get 1 s 1 2 by s 1
i s 2 s 1 2 by s 1 into s 2 and 1.
170
00:33:10,580 --> 00:33:17,909
As a result what we writing this is n minus
1 r n minus 1 is 2 that mean this is 2 r.
171
00:33:17,909 --> 00:33:33,759
Now, 2 is 2 cancelled out so r is now 1 r
1 2 r 1 2 1, which is now 1 S 1 2 by S 1 into
172
00:33:33,759 --> 00:33:50,350
S 2 then S 1 2 by S 1 into S 2 into 1. I think
you were seen earlier that the correlation
173
00:33:50,350 --> 00:33:58,239
and covariance, the relationship you were
seen earlier the relationship what in population
174
00:33:58,239 --> 00:34:10,790
domain we say this one I have said to you,
this as well as this. Now, instead of rho
175
00:34:10,790 --> 00:34:13,350
in the sample domain, what you can write?
176
00:34:13,350 --> 00:34:22,210
You can write S j k equal to sigma j k by
sigma j sigma k. Now, if your j equal to 1
177
00:34:22,210 --> 00:34:31,179
and k equal to 2 and this S 1 2 equal to sigma
1 2 by sigma 1 sigma 2, what is exactly happened
178
00:34:31,179 --> 00:34:44,069
here. This is not S j k, this is basically
rho j k, this is rho j k then this will be
179
00:34:44,069 --> 00:34:57,099
your r j k, this is basically replaced by
sigma and then it will be like this. So, if
180
00:34:57,099 --> 00:35:00,670
I can write row j k here, rho 1 2 like this one.
181
00:35:00,670 --> 00:35:07,980
Now, in the sample domain when we go that
will be your r 1 2 will be your S 1 2 by S
182
00:35:07,990 --> 00:35:19,530
1 into S 2, that is why these r 1 2 is S 1
2 co variance by their corresponding standard
183
00:35:19,530 --> 00:35:33,880
deviation. So, for the same data set now can
you compute this r value? I think we have
184
00:35:33,880 --> 00:35:55,770
computed one place r value, you have computed,
we have computed here we have seen S equal
185
00:35:55,770 --> 00:36:06,640
to this. So, that means your standard deviation
S 1 is given and S 2 is given for the same
186
00:36:06,640 --> 00:36:07,109
data.
187
00:36:07,109 --> 00:36:14,970
If I want to compute my r, then what will
be your r value? Your r value will be first
188
00:36:14,970 --> 00:36:25,000
that blindly you can write like this, diagonally
it will be 1 r 1 2 will be S 1 2, so S 1 2
189
00:36:25,000 --> 00:36:37,760
is 5. So, you can write 5 divided by their
standard deviation S 1 is 1, S 2 is 5 so that
190
00:36:37,760 --> 00:36:59,940
mean 5 by 5 5 by 5, 1 1 1 1 1. You are getting
perfect correlation, 1 means perfect correlation.
191
00:36:59,940 --> 00:37:11,520
So, this is what your multivariate descriptive
statistic, we will talk about that is why
192
00:37:11,520 --> 00:37:20,530
the mean vector correlation matrix and covariance
matrix. Now, you can very easily convert this
193
00:37:20,530 --> 00:37:20,839
one.
194
00:37:20,839 --> 00:37:39,640
Suppose my covariance matrix is this, S 1
1 S 1 2 S 1 p 1 2 2 2 2 p like S 1 p 2 p p
195
00:37:39,640 --> 00:37:56,539
p. My correlation matrix is this 1 r 1 2 r
1 p r 1 2 again 1 r 2 p so like this r 1 p
196
00:37:56,539 --> 00:38:13,039
r 2 p 1. So, you create another diagonal matrix
D s. This one is same p cross p matrix, this
197
00:38:13,039 --> 00:38:19,000
is p cross p this also p cross p, same p cross
p matrix only the diagonal elements will be
198
00:38:19,000 --> 00:38:30,220
the variance component of diagonal will be
0. So, this is S 1 1 0 0 0 0 S 2 2 0 0 like
199
00:38:30,220 --> 00:38:45,359
0 0 0 0 S p p. So, the diagonal elements of
D s are the diagonal element of the covariance
200
00:38:45,359 --> 00:38:51,980
matrix, up diagonal elements are 0.
It will create like this and suppose you know
201
00:38:51,980 --> 00:39:05,980
S, you can just you with one trick you can
find out that R is Ds to the power half S
202
00:39:05,980 --> 00:39:15,460
D s to the power minus half, both case minus
half D s to the power minus half. If you use
203
00:39:15,460 --> 00:39:20,730
mat lab if use this mat lab now straight way,
we will calculate all those things from the
204
00:39:20,730 --> 00:39:26,770
data, but suppose you want do the conversion,
you mean in excel you can do this. What you
205
00:39:26,770 --> 00:39:34,950
are doing? Most of the time you may be knowing
this one that variance component, once you
206
00:39:34,950 --> 00:39:39,170
know S you know the variance, co variance
also, you want to calculate R.
207
00:39:39,170 --> 00:39:44,829
Suppose, you know this variance component
and correlation is known, correlation matrix
208
00:39:44,829 --> 00:39:50,549
is known, you want to go to co variance matrix
from co relation matrix. What you have to
209
00:39:50,549 --> 00:40:10,339
do, you have to write like this plus R. So, this is basically from co variance to correlation and here correlation
210
00:40:10,339 --> 00:40:17,589
to covariance. Only thing you want to require
in the second case, the variance component
211
00:40:17,589 --> 00:40:22,000
of all the variables considered.
212
00:40:22,000 --> 00:40:30,940
Another important concept in multivariate
data analysis is sum square and cross product
213
00:40:30,940 --> 00:40:56,109
matrix, which is known as S S C P. So, if
you see, when you calculate the correlation
214
00:40:56,109 --> 00:41:08,420
matrix, we are using the formula that one
for n minus 1 cross S equal to X star transpose
215
00:41:08,420 --> 00:41:17,490
x star, we have used. We have used n minus
1 R equal to X tilde transpose, x tilde we
216
00:41:17,490 --> 00:41:28,640
have used where, both x star and x tilde are
basically transformed matrix from the original
217
00:41:28,640 --> 00:41:38,140
data which is x.
So, suppose x I am writing like this, x 1
218
00:41:38,140 --> 00:41:55,020
1 x 2 1 x 3 1 x 1 2 x 2 2 x 3 2, you do like
this. Now, you calculate x transpose x, what
219
00:41:55,020 --> 00:42:05,710
will happen here, you will be getting like
this x i 1 square i equal to 1 to 3, here
220
00:42:05,710 --> 00:42:23,490
the i equal to 1 to 3 x i 1 x i 2, here i
equal to 1 to 3 x i 1 x i 2, here x i 2 square
221
00:42:23,490 --> 00:42:31,480
i equal to 1 to 3. I have taken 3 three cross
2, if you multiply we will be getting because
222
00:42:31,480 --> 00:42:37,450
earlier, we have seen subtracted by mean and
for divided by standard deviation case, we
223
00:42:37,450 --> 00:42:39,039
have seen we got similar formula.
224
00:42:39,039 --> 00:42:45,890
Now, the same thing if you think from the
p cross p variable point of view then X transpose
225
00:42:45,890 --> 00:42:53,880
x will be a p cross p because x transpose
this one is p cross n and this one is n cross
226
00:42:53,880 --> 00:43:00,289
p, you will be getting like this. So, your
matrix will be like, this sum total x i 1
227
00:43:00,289 --> 00:43:15,900
square then x 1 x i 2, then your x i 1 x i
p, here x i 1 x i 2 here x i 2 square, then
228
00:43:15,900 --> 00:43:29,349
x i 2 x i p. In similarity, I am writing all
those things x i 1 x i p x i 2 x i p then
229
00:43:29,349 --> 00:43:49,579
x i p square. So, it is a p cross p matrix
then i equal i definitely equal to 1 to n
230
00:43:49,579 --> 00:44:11,390
all cases 1 to n.
This 3 matrices like x transpose x, x star
231
00:44:11,390 --> 00:44:19,890
transpose x star, x tilde transpose x tilde,
these are all sum square and cross product
232
00:44:19,890 --> 00:44:28,230
matrix, all S S C P why? You see now, these
are the some square, all diagonal elements
233
00:44:28,230 --> 00:44:40,809
are sum square and up diagonal you see cross product all up diagonal are cross product.
234
00:44:40,809 --> 00:44:51,020
So, sum squares for the variance cross product from the co variance that mean from this matrix
235
00:44:51,020 --> 00:44:58,670
also. Once you know these we can use these or these matrixes were ultimately, we can
236
00:44:58,670 --> 00:45:03,520
calculate the descriptive statistics like
co variance and correlation matrix. This one
237
00:45:03,520 --> 00:45:16,180
is very-very important matrix, later on particularly
in regression; you will be using this matrix.
238
00:45:16,180 --> 00:45:28,839
Now, let us see that how to calculate, suppose
this is the problem given.
239
00:45:28,839 --> 00:45:37,630
That compute S and R for the data given, this
data said you have seen earlier so I have
240
00:45:37,630 --> 00:45:45,809
used excel only.
241
00:45:45,809 --> 00:45:52,950
Using excel I have created so this is my data
matrix, I want to compute the mean then as
242
00:45:52,950 --> 00:46:02,089
there are n data points so I created one unit
vector with n data points. So, my aim is 12
243
00:46:02,089 --> 00:46:12,289
here. So, it is 12 cross 1 vector then X bar
is 1 by 12 X transpose 1 when I multiply all
244
00:46:12,289 --> 00:46:23,289
those things, I got this values. So, profit
mean is 10.67, then sense volume 1002.75,
245
00:46:23,289 --> 00:46:35,299
7.92 is your absent sing, then break down
59.33 and 1.06 is the m ratio case. So, your
246
00:46:35,299 --> 00:46:39,930
first step will be this, find out x bar.
247
00:46:39,930 --> 00:46:46,029
And you use this type of formulation, then
what you require to calculate, you require
248
00:46:46,029 --> 00:46:59,099
calculating s. You require converting this
x value to x star that means each of the suppose
249
00:46:59,099 --> 00:47:07,099
10 minus 10.67 that is why minus 0.67 coming
here X star. Once you get this, this is the
250
00:47:07,099 --> 00:47:15,760
formula S is 1 by n minus 1 X star transpose
X star, this will give you this value.
251
00:47:15,820 --> 00:47:25,420
On the left hand side the bottom portion,
this is nothing but S S C P matrix X star
252
00:47:25,420 --> 00:47:36,369
transpose X star. You can do the same thing
now; we are interested to know that tilde.
253
00:47:36,369 --> 00:47:42,990
Here this is basically X tilde, this one X
tilde transpose x tilde, this transpose part
254
00:47:42,990 --> 00:47:51,000
is missing x tilde and R is 1 by 11 into X
tilde transpose X tilde, you are getting like
255
00:47:51,000 --> 00:47:58,220
this. You see once you go by this way calculating,
we will get all the diagonal elements 1, if
256
00:47:58,220 --> 00:48:18,000
you do not get that, there is a problem. You
have any questions so far now. Although the
257
00:48:18,000 --> 00:48:24,670
next class I will be explaining in detail,
when we will start that multivariate normal
258
00:48:24,670 --> 00:48:34,829
distribution. What we have assumed here? We
have assumed here is x that is a variable
259
00:48:34,829 --> 00:48:41,910
vector x 1 x 2 dot dot x p, that is P cross 1.
260
00:48:41,910 --> 00:48:48,730
And we assume that this follows normal distribution
that is multivariate normal, which will be
261
00:48:48,730 --> 00:49:00,210
denoted by like this n p and mu and sigma.
Now, you are well accustomed with the nomenclature.
262
00:49:00,210 --> 00:49:08,430
Nomenclature in the sense you know that mean
is mean vector, if there is p cross 1 variable,
263
00:49:08,430 --> 00:49:20,789
then my mean is again p cross 1 that is mu
1 mu 2 mu p. And you also know this one, this
264
00:49:20,789 --> 00:49:34,529
is nothing but covariance matrix so this covariance matrix is our p cross p matrix sigma 1 1 sigma 1 2 sigma
265
00:49:34,640 --> 00:49:44,680
1 p sigma 1 2 sigma 2 2 sigma 2 p sigma 1 p sigma 2
p sigma p p.
266
00:49:44,680 --> 00:49:56,510
Multivariate normal distribution is characterized
by p variable with parameter that is mu that
267
00:49:56,510 --> 00:50:10,240
is a mean vector and covariance matrix. So,
please remember these are population parameters,
268
00:50:10,240 --> 00:50:24,480
these two are population parameters. So, far
we have not discussed about that whether the
269
00:50:24,480 --> 00:50:30,990
data is coming from multivariate normal or
not but ultimately, we will be going to multivariate
270
00:50:30,990 --> 00:50:37,500
normal distribution because most of the models
that will be relied on these assumption multivariate
271
00:50:37,500 --> 00:50:47,890
normality. Now, when p equals to 1 that is
univariate normal, and now what is the probability
272
00:50:47,890 --> 00:50:50,579
density function of univariate normal.
273
00:50:50,579 --> 00:51:00,410
Suppose x is a random variable, which is univariate
normal with mu and with sigma square. Then
274
00:51:00,410 --> 00:51:08,900
if I want to know, what is your probability
density function of x, this is your f x p
275
00:51:08,900 --> 00:51:18,099
d f, you will write 1 by root over 2 pie sigma
square e to the power minus half x minus mu
276
00:51:18,099 --> 00:51:30,170
by sigma square, where minus infinite less
than x less than plus infinite. So, this is
277
00:51:30,170 --> 00:51:45,529
what you have seen earlier that this is what
our normal distribution is. So, what will
278
00:51:45,529 --> 00:51:53,089
be the equivalent distribution when number
of variable is more than 1 that will be our
279
00:51:53,089 --> 00:51:59,480
starting point in the next class.
Thank you.