1
00:00:16,570 --> 00:00:24,500
so we have been looking at iterative methods
for solving linear algebraic equations and
2
00:00:24,500 --> 00:00:31,160
we have looked at gauss seidel jacobi relaxation
methods and its variants we also have looked
3
00:00:31,160 --> 00:00:44,730
at the convergence behaviour we analyses convergence
behaviour and we know how to ensure convergence
4
00:00:44,730 --> 00:00:51,780
by modifying the problem and so on now there
is one more iterative method which is quite
5
00:00:51,780 --> 00:01:01,730
popular and which also converges pretty fast
so this is numerical optimization based method
6
00:01:01,730 --> 00:01:09,360
so well one of the reasons for covering this
is that this will be also useful when we go
7
00:01:09,360 --> 00:01:23,909
forward to non-linear algebraic equations
so as i said we want to solve this problem
8
00:01:23,909 --> 00:01:37,520
a x=b and then this can be solved by minimizing
with respect to x a x-b transpose a x-b if
9
00:01:37,520 --> 00:01:46,180
i minimize this with respect to x then i can
reach the solution of a x =b in fact i reached
10
00:01:46,180 --> 00:01:58,310
the solution if i take this as phi then dou
phi/dou x the necessary condition for optimality
11
00:01:58,310 --> 00:02:07,820
turns out to be a transpose a x-b=0
12
00:02:07,820 --> 00:02:15,560
and obviously if a is nonsingular if necessary
condition is satisfied you will reach the
13
00:02:15,560 --> 00:02:23,181
optimum the second derivative what is second
derivative here the second derivative will
14
00:02:23,181 --> 00:02:27,610
be a transpose a which is symmetric positive
definite so you are actually reaching the
15
00:02:27,610 --> 00:02:34,669
global minimum okay yesterday somebody had
a doubt that iterative methods that we have
16
00:02:34,669 --> 00:02:40,730
looked at will they give you local solutions
or global solution?
17
00:02:40,730 --> 00:02:46,570
jacobi method gauss seidel method relaxation
method if those methods are converging if
18
00:02:46,570 --> 00:02:49,920
you know that they are converging they will
converges to the global solution for linear
19
00:02:49,920 --> 00:02:54,200
algebraic equations there is nothing like
local and global solutions if they are converging
20
00:02:54,200 --> 00:02:59,769
they will converge to the global solution
now of course we can a little bit simplify
21
00:02:59,769 --> 00:03:08,950
this if a is symmetric positive definite matrix
in that case we can just minimize with respect
22
00:03:08,950 --> 00:03:20,799
to x 1/2 x transpose a x-x transpose b
23
00:03:20,799 --> 00:03:28,600
if a is symmetry positive definite okay a
is a special case symmetric positive definite
24
00:03:28,600 --> 00:03:37,600
then minimizing this objective function will
give you the optimum now i want to do a general
25
00:03:37,600 --> 00:03:48,419
method called gradient based optimization
method this is now described in appendix d
26
00:03:48,419 --> 00:03:59,470
okay in my notes this is described in appendix
d on page 48 i want to solve this using a
27
00:03:59,470 --> 00:04:08,230
numerical search i do not want to use this
condition directly i do not want to use this
28
00:04:08,230 --> 00:04:10,479
condition and solve it okay
29
00:04:10,479 --> 00:04:16,321
if i have to use this condition and then solve
it for x it would be either iterative method
30
00:04:16,321 --> 00:04:19,950
or it will be a direct method i do not want
to go into that i want to i do not want to
31
00:04:19,950 --> 00:04:25,280
use gauss seidel method or anything i want
to use iterative scheme which is based on
32
00:04:25,280 --> 00:04:33,850
optimization techniques okay optimization
techniques in general deal with so i am going
33
00:04:33,850 --> 00:04:43,470
to do this gradient method this is also called
as steepest descent method
34
00:04:43,470 --> 00:04:52,380
so right now i am going to be worried about
developing an iterative method for minimizing
35
00:04:52,380 --> 00:05:03,710
with respect to x some objective function
phi x where phi x is from r n to r phi x is
36
00:05:03,710 --> 00:05:10,730
a scalar objective function some scalar objective
function it need not be norm it need not be
37
00:05:10,730 --> 00:05:15,690
it is some objective function that you are
defined okay it need not be always positive
38
00:05:15,690 --> 00:05:19,350
i am not worried about that i am just worried
about scalar objective function so it is r
39
00:05:19,350 --> 00:05:21,730
n to r okay
40
00:05:21,730 --> 00:05:28,730
i want to come up with the iterative scheme
to reach a local minimum in this particular
41
00:05:28,730 --> 00:05:36,620
case because in general phi x need not be
nicely behaved okay and then after i derived
42
00:05:36,620 --> 00:05:46,330
that i want to apply it to this specific case
okay so i want to it is the purpose is twofold
43
00:05:46,330 --> 00:05:52,810
one is to introduce to you gradient based
methods okay and its variants which are very
44
00:05:52,810 --> 00:05:57,990
useful in optimization and i will show you
what are the applications later
45
00:05:57,990 --> 00:06:06,110
so numerical search which is based on gradient
and then we will of course apply to our specific
46
00:06:06,110 --> 00:06:14,280
problem that is solving linear algebraic equations
okay so this method is also known as steepest
47
00:06:14,280 --> 00:06:20,980
descent you may have done this in your undergraduate
i am not too sure the steepest descent it
48
00:06:20,980 --> 00:06:31,830
is also called cauchy method it is just known
by very various names gradient based method
49
00:06:31,830 --> 00:06:42,850
the basic idea is that if i looked at a level
surface what is the level surface?
50
00:06:42,850 --> 00:06:59,540
level surface is a set of points x is set
of point all point x such that phi x=constant
51
00:06:59,540 --> 00:07:08,840
it is a scalar objective function right the
scalar objective function so phi x=constant
52
00:07:08,840 --> 00:07:18,720
i want to look at level surfaces that is i
want to look at locus of x let us say if it
53
00:07:18,720 --> 00:07:25,770
is 2-dimensional object if it is x is a vector
which is in 2-dimensions x1 x2 okay i am actually
54
00:07:25,770 --> 00:07:36,840
looking at
55
00:07:36,840 --> 00:07:48,440
okay so this is say c1 this is c2 this is
c3 this is c4 and so on so this is my x1 x2
56
00:07:48,440 --> 00:07:52,360
plane
57
00:07:52,360 --> 00:08:02,630
i am plotting all those points in x1 x2 plane
for which phi of x1 x2 =constant so let us
58
00:08:02,630 --> 00:08:07,699
say this is 5 this is 4 this is 3 this is
2 i am plotting all the points locus of all
59
00:08:07,699 --> 00:08:19,560
the points these are called as level surfaces
okay i am not plotting phi of x in this plot
60
00:08:19,560 --> 00:08:32,180
okay i am plotting so actually if you do a
3-dimensional plot x1 x2 and phi okay this
61
00:08:32,180 --> 00:08:38,579
will be nothing but the cross-sectional plane
projected onto x1 x2 it is set of all points
62
00:08:38,579 --> 00:08:48,810
see if have you seen mat lab symbol mat lab
symbol is like one speak right now if you
63
00:08:48,810 --> 00:08:55,470
take it as a objective function okay let us
say height above the or you take mountain
64
00:08:55,470 --> 00:09:03,640
height of the mountain above the ground surface
is the objective function okay i am trying
65
00:09:03,640 --> 00:09:10,561
to find out set of all points where the height
is constant okay how will you get it? take
66
00:09:10,561 --> 00:09:19,380
a plane horizontal to x y project it onto
x and y you will get the set of all points
67
00:09:19,380 --> 00:09:21,360
so these are a set of all points
68
00:09:21,360 --> 00:09:30,470
what is phi x? view phi x as a height okay
and x1 x2 as ground locations if you take
69
00:09:30,470 --> 00:09:35,010
constant level it is also called level surfaces
probably the reason for level surface is relate
70
00:09:35,010 --> 00:09:44,750
it to level okay they are called as level
surfaces okay now i am going to use the local
71
00:09:44,750 --> 00:09:54,040
behaviour of this level surfaces to come up
with iteration scheme for solving this minimization
72
00:09:54,040 --> 00:09:57,810
problem for the time being i am going to forget
about solving linear algebra equations
73
00:09:57,810 --> 00:10:03,940
i am just concentrating on this general problem
some phi of x it need not be this phi of x
74
00:10:03,940 --> 00:10:09,640
any phi of x okay not as specific one
75
00:10:09,640 --> 00:10:22,620
so what i am going to do now is let us say
i have some guess solution x k
76
00:10:22,620 --> 00:10:29,560
is some guess solution x k may not unlikely
to minimize but i say it is my guess what
77
00:10:29,560 --> 00:10:34,550
is the philosophy in iterative methods? you
start with the guess and then you move onto
78
00:10:34,550 --> 00:10:40,670
next guess right start with one guess move
onto the next guess and then hope that iteration
79
00:10:40,670 --> 00:10:47,640
converge okay to the solution in this case
what it will converge to will be a local solution
80
00:10:47,640 --> 00:10:52,110
well in some cases it will converge to the
global solution but that depends it depends
81
00:10:52,110 --> 00:10:59,240
upon the problem a it depends upon your initial
guess if the problem is highly non-linear
82
00:10:59,240 --> 00:11:07,010
with funny shapes it depends upon the problem
is one which has only one peak or one valley
83
00:11:07,010 --> 00:11:14,180
well you know it will reach the global minimum
okay so x k is my guess solution well our
84
00:11:14,180 --> 00:11:17,240
good old friend is taylor series theorem
85
00:11:17,240 --> 00:11:39,700
and i am going to use taylor series theorem
to phi x i am going to write as phi x k+ okay
86
00:11:39,700 --> 00:11:56,540
which is same as phi x k+ delta x k where
delta x k is obviously x-x k so this is my
87
00:11:56,540 --> 00:12:07,730
x-x k this is delta x k okay if i do taylor
series approximation in the neighborhood of
88
00:12:07,730 --> 00:12:28,930
x k okay so this is approximately = this phi
of x is approximately =phi at x k+ grad phi
89
00:12:28,930 --> 00:12:44,670
so let us develop a notation or let us put
this grad phi x k
90
00:12:44,670 --> 00:13:02,100
so gradient of phi evaluated at x k that is
what i mean so this transpose delta x k
91
00:13:02,100 --> 00:13:10,000
okay and there will be higher order terms
i am neglecting higher order terms i am looking
92
00:13:10,000 --> 00:13:19,450
locally in the neighborhood of x k how this
function behaves okay how does this function
93
00:13:19,450 --> 00:13:26,510
behave in the neighborhood of x k? and then
i want to look at the level surface that is
94
00:13:26,510 --> 00:13:47,760
phi x=constant okay in a small neighborhood
x k some point x k i get this approximation
95
00:13:47,760 --> 00:13:53,950
of phi x as this okay
96
00:13:53,950 --> 00:14:10,900
what happens at x=x k
delta x k=0 okay so which means at x=x k if
97
00:14:10,900 --> 00:14:24,950
i am looking at a level surface okay that
means phi x k=constant at that point see suppose
98
00:14:24,950 --> 00:14:33,810
let us go back to here let us say this is
your x k this is my x k okay i am trying to
99
00:14:33,810 --> 00:14:46,820
model this curve locally okay you will see
that actually i will model it using the tangent
100
00:14:46,820 --> 00:14:54,269
okay i will model it using the tangent plane
that will become clear now soon so what is
101
00:14:54,269 --> 00:14:59,510
the simplest approximation? this curve is
there
102
00:14:59,510 --> 00:15:04,860
what is the simplest approximation you can
construct? straight line locally for a small
103
00:15:04,860 --> 00:15:09,340
neighborhood you can construct a straight
line approximation to the curve that is what
104
00:15:09,340 --> 00:15:16,380
i am doing how do i get the slope of the straight
line? through taylor series i am getting that
105
00:15:16,380 --> 00:15:22,980
so the local slope of this line through taylor
series okay so taylor series is my vehicle
106
00:15:22,980 --> 00:15:32,430
to construct the local approximation so now
this phi x k is constant if i substitute here
107
00:15:32,430 --> 00:15:34,399
okay what will i get?
108
00:15:34,399 --> 00:15:44,589
see this becomes c so c=c so what is the local
behaviour of the curve? so this implies that
109
00:15:44,589 --> 00:16:02,610
gradient of phi at x k transpose delta x k=0
is everyone with me on this? this is a scalar
110
00:16:02,610 --> 00:16:10,209
function by the way this is a vector gradient
is a vector okay this is also vector delta
111
00:16:10,209 --> 00:16:23,829
x k is also a vector okay so this transpose
this is 0 geometrically what does it mean?
112
00:16:23,829 --> 00:16:34,130
the gradient is perpendicular to delta x k
delta x-x k is perpendicular locally to the
113
00:16:34,130 --> 00:16:36,329
gradient okay
114
00:16:36,329 --> 00:17:00,670
so locally gradient of phi is orthogonal to
x-x k this is what we have found out actually
115
00:17:00,670 --> 00:17:09,949
this gradient transpose delta x k=0 okay is
the equation of the tangent plane to the level
116
00:17:09,949 --> 00:17:21,369
surface okay in general i am talking n-dimensions
it is a tangent hyperplane
117
00:17:21,369 --> 00:17:35,789
in the n-dimensional space okay so well what
i want to show here is that this local behaviour
118
00:17:35,789 --> 00:17:43,179
of the function in the neighborhood of the
point x k can be used to find out the direction
119
00:17:43,179 --> 00:17:47,169
in which function decreases at the maximum
rate
120
00:17:47,169 --> 00:17:56,450
see if i want to if i am at x k let us go
back here what i am doing? i am minimizing
121
00:17:56,450 --> 00:18:09,880
phi okay so if i want to move from x k to
x k+1 which direction i should move? i should
122
00:18:09,880 --> 00:18:19,820
move in that direction in which function decreases
at a maximum rate why? question is why is
123
00:18:19,820 --> 00:18:32,989
it what is directional derivative? so i want
to prove it angle will be
124
00:18:32,989 --> 00:18:39,230
so which is the directional derivative here
delta x k is the directional derivative or
125
00:18:39,230 --> 00:18:45,710
gradient is directional gradient is a directional
derivative so i want to show that if delta
126
00:18:45,710 --> 00:18:54,090
x k is aligned along the directional derivative
that is gradient then function increases maximum
127
00:18:54,090 --> 00:19:03,080
okay if it is aligned along negative of the
gradient direction then the function decreases
128
00:19:03,080 --> 00:19:05,369
maximum okay
129
00:19:05,369 --> 00:19:15,489
so this local gradient actually gives me maximum
rate of increase and negative of that gives
130
00:19:15,489 --> 00:19:20,830
me maximum rate of decrease and i am going
to use this local gradient to come up with
131
00:19:20,830 --> 00:19:28,470
the new point x k+1 okay so before i do that
i have to show that this is the maximum the
132
00:19:28,470 --> 00:19:33,720
direction of maximum decent first interpretation
that we have learnt here is that this is nothing
133
00:19:33,720 --> 00:19:41,549
but equation of the tangent hyperplane and
delta x k is perpendicular to the gradient
134
00:19:41,549 --> 00:19:43,340
locally okay
135
00:19:43,340 --> 00:19:55,820
now so i am looking at set of all x i am looking
at a unit ball in the neighborhood of x k
136
00:19:55,820 --> 00:20:03,009
okay i am constructing a small unit ball in
the neighborhood of x k okay such that it
137
00:20:03,009 --> 00:20:11,070
is set of all points such that magnitude is
unity of delta x k okay so just if you go
138
00:20:11,070 --> 00:20:22,179
back to this figure i am constructing a small
unit ball here such that you pick up any point
139
00:20:22,179 --> 00:20:37,779
okay its distance from x k is <1 is this clear?
i am just picking up a set to do the analysis
140
00:20:37,779 --> 00:20:38,779
okay
141
00:20:38,779 --> 00:20:44,700
now what is going to help me here is something
that you probably can guess what is going
142
00:20:44,700 --> 00:20:53,950
to help me here is cauchy schwarz inequality
okay this is the inner product of this vector
143
00:20:53,950 --> 00:21:05,149
with this vector okay which is <=by cauchy
schwarz inequality what is this? this is norm
144
00:21:05,149 --> 00:21:26,580
right but then i am looking at set of all
x in the unit ball okay so this is 1 maximum
145
00:21:26,580 --> 00:21:36,350
value this can take is 1 so which means okay
so if this is 1 so maximum value this can
146
00:21:36,350 --> 00:21:37,929
take is 1
147
00:21:37,929 --> 00:22:01,419
then i can write that grad phi x k transport
delta x k this quantity okay is strictly < norm
148
00:22:01,419 --> 00:22:28,869
of this right this inequality also means that
-of is <
149
00:22:28,869 --> 00:22:35,289
i have just expanded this inequality here
i had written absolute value so in a unit
150
00:22:35,289 --> 00:22:45,989
ball in the neighborhood of x k i can say
that this quantity is bounded between these
151
00:22:45,989 --> 00:22:51,500
2 numbers this is a positive number this is
a negative number this quantity cannot be
152
00:22:51,500 --> 00:22:55,340
smaller than this what is the smallest value
this quantity can take?
153
00:22:55,340 --> 00:23:09,429
when will it take this value? when delta x
is aligned along which direction gradient
154
00:23:09,429 --> 00:23:14,850
direction when delta x is aligned along the
gradient direction this inequality will be
155
00:23:14,850 --> 00:23:25,179
equality smallest change now why i am worried
about this okay let us go back and look here
156
00:23:25,179 --> 00:23:37,889
let retain this figure let us go back here
157
00:23:37,889 --> 00:23:52,820
see this phi x which is written as phi x k+
delta x k right i have written this like this
158
00:23:52,820 --> 00:24:09,269
and actually i am worried about how this function
behaves phi x-phi x k
159
00:24:09,269 --> 00:24:18,149
i want to go to x from x k i want to go to
a new value x from x k okay this is we say
160
00:24:18,149 --> 00:24:36,590
that in small neighborhood this is approximately
=gradient of phi okay gradient of phi is given
161
00:24:36,590 --> 00:24:44,489
by this okay so this the behaviour of this
quantity actually dictates how locally the
162
00:24:44,489 --> 00:24:53,580
how this function behaves locally is it clear?
this is taylor series expansion i just wrote
163
00:24:53,580 --> 00:25:01,029
this sometime back okay i am just rearranging
this thing on the right hand side i have taken
164
00:25:01,029 --> 00:25:03,419
on the left hand side okay
165
00:25:03,419 --> 00:25:13,639
see if i move away from x k to some new x
okay if i move away from x k to new x which
166
00:25:13,639 --> 00:25:21,169
direction i should move? if i want to decrease
the function which direction i should move?
167
00:25:21,169 --> 00:25:26,940
i should move negative of the gradient direction
okay because what is the smallest value this
168
00:25:26,940 --> 00:25:38,019
can take? using see i am restricting myself
to a unit ball around x k i want to move inside
169
00:25:38,019 --> 00:25:41,779
this unit ball i just want to know where to
inside this unit ball
170
00:25:41,779 --> 00:25:46,039
what is the objective? i want to move in such
a way that the function decreases at the maximum
171
00:25:46,039 --> 00:25:53,610
rate okay now i know that from this cauchy
inequality i know that the maximum rate of
172
00:25:53,610 --> 00:26:00,279
decrease will be obtained when delta x k is
aligned along the gradient direction but not
173
00:26:00,279 --> 00:26:11,100
along negative of the gradient direction then
i will get this - here okay i will get - here
174
00:26:11,100 --> 00:26:19,929
this cauchy inequality when do you get what
is cauchy inequality can you tell?
175
00:26:19,929 --> 00:26:29,950
we relate cauchy inequality to cos theta angle
okay so i am talking about 2 special angles
176
00:26:29,950 --> 00:26:41,399
one is angle is 0 other is angle 180 okay
negative and positive direction if you are
177
00:26:41,399 --> 00:26:46,820
maximizing the function you should move along
the positive of the gradient direction if
178
00:26:46,820 --> 00:26:50,899
you are minimizing the function you should
move along the negative of the gradient direction
179
00:26:50,899 --> 00:26:58,700
because this difference will be smallest negative
when will it be smallest negative?
180
00:26:58,700 --> 00:27:04,729
look here when will it be smallest negative?
negative of the gradient direction okay so
181
00:27:04,729 --> 00:27:12,659
if i move along the negative of the gradient
direction okay i will decrease the function
182
00:27:12,659 --> 00:27:22,299
okay so way i should choose my next point
okay from x k when i go to x k+1 i should
183
00:27:22,299 --> 00:27:28,029
choose my next point okay by moving along
the negative of the gradient direction since
184
00:27:28,029 --> 00:27:34,309
i am minimizing the function okay my objective
was to minimize phi of x with respect to x
185
00:27:34,309 --> 00:27:37,549
k with respect to x okay
186
00:27:37,549 --> 00:27:43,549
locally what i find is that locally the function
will decrease maximum if i move along negative
187
00:27:43,549 --> 00:27:51,919
of the gradient direction see what is negative
of the gradient direction? if this is grad
188
00:27:51,919 --> 00:28:09,190
phi x k/norm right okay and negative of this
why i am dividing by this because i am looking
189
00:28:09,190 --> 00:28:18,639
at unit vectors so this is a unit vector okay
what will be this transpose this square of
190
00:28:18,639 --> 00:28:27,940
the what will be this transpose this? inner
product inner product is square of the norm
191
00:28:27,940 --> 00:28:33,999
inner product of vector with itself square
of the norm right
192
00:28:33,999 --> 00:28:41,009
so if you take inner product of this with
this you will get square of this divided by
193
00:28:41,009 --> 00:28:50,340
this you will get negative –of is everyone
with me on this is this clear? you move in
194
00:28:50,340 --> 00:28:54,600
the negative of the gradient direction the
function well locally decrease at the maximum
195
00:28:54,600 --> 00:29:00,139
rate okay so that is going to be my algorithm
for
196
00:29:00,139 --> 00:29:16,039
so to find x k+1 i am going to take x k and
- negative of the gradient direction so lambda
197
00:29:16,039 --> 00:29:42,249
g k where g k is nothing but grad f x k/norm
okay is this fine? this is the direction okay
198
00:29:42,249 --> 00:29:48,629
or we can put + here and take this - does
not matter whichever way you want to look
199
00:29:48,629 --> 00:29:55,159
negative of the gradient direction i am taking
unit direction along the gradient okay i am
200
00:29:55,159 --> 00:29:59,789
well of course i am looking at 2 norm i am
not really right now worried about other norms
201
00:29:59,789 --> 00:30:07,309
so these are all 2 norms wherever i am writing
norms these are 2 norms so this is my negative
202
00:30:07,309 --> 00:30:14,999
of the gradient direction and what is this
lambda now okay now i know that locally if
203
00:30:14,999 --> 00:30:22,349
i go along negative of the gradient direction
function is decreasing how much do i move?
204
00:30:22,349 --> 00:30:29,349
see i just know that this direction is steepest
descent okay i should move 1 meters 5 meters
205
00:30:29,349 --> 00:30:31,450
10 meters how much should i move okay
206
00:30:31,450 --> 00:30:39,409
so i am going to put 1 unknown here which
is step length this is my step length and
207
00:30:39,409 --> 00:30:49,340
this is my direction okay now how much to
move? i am going to do another optimization
208
00:30:49,340 --> 00:30:59,320
problem okay having decided to move in this
direction i am going to now solve for this
209
00:30:59,320 --> 00:31:10,399
problem lambda k is minimization with respect
to lambda phi of okay what is the difference
210
00:31:10,399 --> 00:31:16,009
between the original problem and this minimization
problem? this is a one-dimensional minimization
211
00:31:16,009 --> 00:31:17,009
problem
212
00:31:17,009 --> 00:31:22,070
lambda is a scalar lambda is a step length
okay the direction is fixed how much to move
213
00:31:22,070 --> 00:31:29,379
is given by the step length parameter okay
now how to solve this problem in some cases
214
00:31:29,379 --> 00:31:34,440
this problem can be solved analytically in
some cases this problem can be solved has
215
00:31:34,440 --> 00:31:44,409
to be solved numerically okay now
if you just go back this is called as line
216
00:31:44,409 --> 00:31:52,710
search one-dimensional optimization problem
this is called as line search because we know
217
00:31:52,710 --> 00:31:54,249
in which direction to move
218
00:31:54,249 --> 00:32:01,279
we just want to find out how much to move
okay so this phi becomes this x k is known
219
00:32:01,279 --> 00:32:06,279
g k is known lambda is unknown with respect
to 1 scalar i have to find out of course what
220
00:32:06,279 --> 00:32:20,799
i have to do is to solve for dou phi/dou lambda=0
whichever value gives me minimum sorry whichever
221
00:32:20,799 --> 00:32:27,839
value satisfies this optimality condition
i choose that value and use it for my step
222
00:32:27,839 --> 00:32:33,179
length this has to be done in some cases if
phi is a highly complex nonlinear function
223
00:32:33,179 --> 00:32:40,759
this has to be done using non-linear optimization
or using iterative process you guess and then
224
00:32:40,759 --> 00:32:46,719
find out the minimum i have described that
but now in the case of solving a x =b we have
225
00:32:46,719 --> 00:33:01,839
some nice time we can do this analytically
okay so let me go back and is this clear is
226
00:33:01,839 --> 00:33:08,639
the ideal clear? the line of argument is like
this locally the steepest or the direction
227
00:33:08,639 --> 00:33:14,859
in which objective function decreases maximum
is negative of the gradient okay
228
00:33:14,859 --> 00:33:19,620
you do not know how much to move so you know
the direction to move but you do not know
229
00:33:19,620 --> 00:33:25,450
how much to move that is quantified by this
lambda okay and then we have obtain lambda
230
00:33:25,450 --> 00:33:38,879
by one-dimensional minimization with respect
to lambda okay i am just going to maximum
231
00:33:38,879 --> 00:33:52,090
value of so i want to see i want to find out
see i am decreasing phi right okay so now
232
00:33:52,090 --> 00:33:56,200
in one shot i would like to decrease when
i am taking one step
233
00:33:56,200 --> 00:34:00,820
i would like to decrease as much as possible
so how do you find out how much is possible
234
00:34:00,820 --> 00:34:06,489
see just imagine that you are going down the
slope okay now let us say the slope is like
235
00:34:06,489 --> 00:34:13,869
this and then it flattens out okay now locally
if you go down for 1 meter your height will
236
00:34:13,869 --> 00:34:17,850
decrease but your height might degrees even
if you go 5 meters know so how do you know
237
00:34:17,850 --> 00:34:26,250
how much to go i know that this is local decent
but should i go 1 meters or 3 meters or 5
238
00:34:26,250 --> 00:34:29,870
meters or 9 meters 9 meters might take me
up i do not know
239
00:34:29,870 --> 00:34:37,180
see the contour could be like this and then
going up so i should find out what is best
240
00:34:37,180 --> 00:34:44,990
possible step length okay i should go so that
there is a minimization otherwise see all
241
00:34:44,990 --> 00:34:54,110
this just remember one thing you are trying
to do a local moment only based on the local
242
00:34:54,110 --> 00:35:01,400
derivative there is limited information one
derivative of a function carries okay so you
243
00:35:01,400 --> 00:35:09,600
cannot take too large steps using just local
gradient information okay and then you should
244
00:35:09,600 --> 00:35:13,050
not take too small step also right
245
00:35:13,050 --> 00:35:18,270
so to balance that we actually introduce this
lambda and then we minimize functions with
246
00:35:18,270 --> 00:35:23,560
respect to lambda again and then find out
how much to move okay now let us see this
247
00:35:23,560 --> 00:35:47,710
application in solving a x =b okay so my phi
x here is now i am going to formulate just
248
00:35:47,710 --> 00:35:58,130
for the sake of writing simplicity i am going
to say that this is 1/2 x transpose a x-x
249
00:35:58,130 --> 00:36:00,470
transpose b okay
250
00:36:00,470 --> 00:36:05,010
and i am going to solve for the case where
a is symmetry positive definite if your matrix
251
00:36:05,010 --> 00:36:09,610
a is not symmetry positive definite what to
do you know already pre-multiply both the
252
00:36:09,610 --> 00:36:15,060
sides by a transpose so you do not have to
so i am just going to look at the case right
253
00:36:15,060 --> 00:36:19,230
now for deriving the algorithm for the sake
of simplicity of notation i am going to look
254
00:36:19,230 --> 00:36:27,750
at the case where a is symmetric and positive
definite okay
255
00:36:27,750 --> 00:36:34,210
now let us apply the algorithm this is my
phi okay i have a guess solution my guess
256
00:36:34,210 --> 00:36:56,220
solution is x k what is the
local gradient? that is what is grad phi=a
257
00:36:56,220 --> 00:37:05,730
x-b differentiate this with respect this is
a vector transpose a x symmetric positive
258
00:37:05,730 --> 00:37:11,400
definite vector differentiate this with respect
to x differentiate with respect to x derivative
259
00:37:11,400 --> 00:37:24,440
of this objective function with respect to
x will give you a x-b what is phi x k? evaluated
260
00:37:24,440 --> 00:37:42,740
at x k x k is your guess solution okay a x
k-b everyone with me on this? okay
261
00:37:42,740 --> 00:37:51,850
so what i want to do next well i do not have
to always find unit direction i wrote the
262
00:37:51,850 --> 00:37:58,290
algorithm with unit directions i can write
it with respect to the direction and use lambda
263
00:37:58,290 --> 00:38:06,530
lambda will get scaled accordingly okay so
i can say that i want to move now in the direction
264
00:38:06,530 --> 00:38:28,660
which is lambda g k where g k=a x k-b okay
now you want to do the step length minimization
265
00:38:28,660 --> 00:38:34,790
can you do the step length minimization? can
you solve it? just write
266
00:38:34,790 --> 00:38:42,100
what is the step length minimization problem
now? what will be phi x? what will be phi
267
00:38:42,100 --> 00:39:28,370
x k+1? it will be 1/2+x k+ lambda g k transpose
a x k+ lambda g k right - what are the things
268
00:39:28,370 --> 00:39:37,730
which are known here? i know x k i know g
k because g k is function of x k i know g
269
00:39:37,730 --> 00:39:52,720
k i do not know only lambda okay can you tell
me what will be this quantity dou phi x k+1/dou
270
00:39:52,720 --> 00:40:09,060
lambda i want to set this =0 what is this
quantity? just find out well there is one
271
00:40:09,060 --> 00:40:11,110
small problem here
272
00:40:11,110 --> 00:40:16,640
i want to move in the negative of the gradient
direction so this is make one correction i
273
00:40:16,640 --> 00:40:21,710
want to move in the negative of the gradient
direction the gradient direction is this negative
274
00:40:21,710 --> 00:40:31,610
of the gradient direction is okay so this
is the gradient direction and my g k direction
275
00:40:31,610 --> 00:40:35,880
in which i want to move is negative of the
gradient direction so put a – here well
276
00:40:35,880 --> 00:40:53,120
what you have to do a course is expand this
what you will realize is that it terms x k
277
00:40:53,120 --> 00:40:54,540
transpose a x k will vanish
278
00:40:54,540 --> 00:40:59,460
because they are not functions of lambda you
have to only take those terms in which lambda
279
00:40:59,460 --> 00:41:05,510
will appear there will be crossed terms and
there will be lambda square will come out
280
00:41:05,510 --> 00:41:12,800
because lambda square g k transpose a g k
okay here again you can neglect the term x
281
00:41:12,800 --> 00:41:19,460
k transpose b because it is not a function
of lambda you can take only this term okay
282
00:41:19,460 --> 00:41:30,640
what you get after you minimize just expand
just try what is this quantity? you do not
283
00:41:30,640 --> 00:41:34,600
have to substitute this
284
00:41:34,600 --> 00:41:41,190
you maintain everything in terms of g k okay
maintaining everything in terms of g k try
285
00:41:41,190 --> 00:41:52,450
to find out what is which value of lambda
will give you
286
00:41:52,450 --> 00:42:01,070
what i expect is if you do this scalar optimization
problem you should get an equation just check
287
00:42:01,070 --> 00:42:26,020
this you get an equation of the type lambda*g
k transpose a g k-b transpose g k=0 you will
288
00:42:26,020 --> 00:42:43,610
get an equation of this type just check
289
00:42:43,610 --> 00:42:54,340
if expand this when you expand this you will
get only one variable polynomial lambda square
290
00:42:54,340 --> 00:42:56,470
lambda and the constant
291
00:42:56,470 --> 00:43:02,460
you will get only one variable polynomial
because lambda is a scalar g is known vector
292
00:43:02,460 --> 00:43:15,920
x is a known vector okay so actually it turns
out that
293
00:43:15,920 --> 00:43:33,400
lambda k which minimizes this is nothing but
b transpose g k/g k transpose
294
00:43:33,400 --> 00:43:44,440
a g k
295
00:43:44,440 --> 00:43:52,730
okay so my algorithm my numerical algorithm
becomes how do you summarize the numerical
296
00:43:52,730 --> 00:44:05,650
algorithm? okay this is my numerical algorithms
how do i go from x k to x k+1? i first compute
297
00:44:05,650 --> 00:44:08,380
negative of the gradient direction
298
00:44:08,380 --> 00:44:19,700
see what is the simplicity here? no matrix
inversion is involved okay i just have to
299
00:44:19,700 --> 00:44:24,120
compute the gradient direction gradient direction
is nothing but actually error between right
300
00:44:24,120 --> 00:44:30,720
hand side and left hand side this is my guess
solution this is my b actually i want this
301
00:44:30,720 --> 00:44:37,500
when will you get the solution? gradient becomes=0
what is the meaning of gradient becoming=0?
302
00:44:37,500 --> 00:44:44,230
you have reached the solution very very straight
forward simple interpretation in this case
303
00:44:44,230 --> 00:44:58,060
if gradient becomes=0 this is the necessary
condition for optimality right when if the
304
00:44:58,060 --> 00:45:12,980
gradient is non 0 okay you will keep moving
how much to move? lambda k times g k okay
305
00:45:12,980 --> 00:45:20,080
this is the optimum step length if you move
less than this okay then you are not decreasing
306
00:45:20,080 --> 00:45:26,810
the function enough if you do more than this
that will not help okay using the local gradient
307
00:45:26,810 --> 00:45:36,580
you can move only this much okay this is the
optimum value to which you should move every
308
00:45:36,580 --> 00:45:38,030
time
309
00:45:38,030 --> 00:45:45,060
this is the scalar calculation this is an
inner product calculation a symmetric positive
310
00:45:45,060 --> 00:45:50,700
definite is inner product calculation okay
calculating this scalar is very very easy
311
00:45:50,700 --> 00:46:03,210
calculating this error very very easy when
will you terminate iterations? when g k is
312
00:46:03,210 --> 00:46:18,100
very very small right so i could terminate
the equations by saying norm g k is 00:46:35,930
epsilon norm g k is very very small or you
good also sometimes it is better to check
314
00:46:35,930 --> 00:46:49,890
whether you can put this also g k+1-g k
315
00:46:49,890 --> 00:46:54,530
this can be a time termination criteria if
there is no significant change in the derivative
316
00:46:54,530 --> 00:47:02,920
okay if you have very large matrices this
is very very useful this method can quickly
317
00:47:02,920 --> 00:47:07,440
come to the solutions particularly if a is
symmetric positive definite then you can reach
318
00:47:07,440 --> 00:47:11,510
the solution i think there is a specific result
about this we will talk about it later there
319
00:47:11,510 --> 00:47:16,210
is a modification of this called as conjugate
gradient method
320
00:47:16,210 --> 00:47:28,840
and we will talk about the conjugate gradient
method to very quickly in the next lecture
321
00:47:28,840 --> 00:47:36,110
and then i will move onto well-conditioned
and ill conditioned system so this method
322
00:47:36,110 --> 00:47:43,670
actually is very often used for solving large
scale problems and computation involved are
323
00:47:43,670 --> 00:47:49,710
very very simple we just have to compute the
gradient direction and inner products okay
324
00:47:49,710 --> 00:47:54,880
and you can very quickly get approximate solutions
of or you can quickly go very close to the
325
00:47:54,880 --> 00:47:59,220
true solution using this method okay