1
00:00:17,279 --> 00:00:25,789
so we have been looking at methods for solving
non linear algebraic equations and initially
2
00:00:25,789 --> 00:00:33,600
we looked at this successive substitutions
very, very briefly. so which are derivative
3
00:00:33,600 --> 00:00:41,739
free methods no gradient calculation and then
i move to this univariate newton method basically
4
00:00:41,739 --> 00:00:46,910
a variant called secant method.
5
00:00:46,910 --> 00:00:54,329
and then the motivation for doing this is
to look at a method which is intermediate
6
00:00:54,329 --> 00:01:01,960
between univariate method and multivariate
method. this method is called as wegstein
7
00:01:01,960 --> 00:01:11,410
iterations and then we will move on to modifications
of the newton method. the derivation of newton
8
00:01:11,410 --> 00:01:16,840
method we have already done using taylor series
expansion. so this is application of taylor
9
00:01:16,840 --> 00:01:17,840
series expansion.
10
00:01:17,840 --> 00:01:28,460
but there are many more modifications which
are useful for implementing in a practical
11
00:01:28,460 --> 00:01:32,380
scenario. so i will discuss about them today.
12
00:01:32,380 --> 00:01:52,359
so first is this secant method. so secant
method i want to solve for f of x=0 x belongs
13
00:01:52,359 --> 00:02:27,180
to r univariate problem i want to solve for
f of x=0 and the way this is done. so this
14
00:02:27,180 --> 00:02:30,150
is update rule for the newton method.
15
00:02:30,150 --> 00:02:40,760
and then in secant method you just approximate
this you replace the derivative is
16
00:02:40,760 --> 00:03:13,870
approximated as f at x. we approximate this
instead of using exact derivative you use
17
00:03:13,870 --> 00:03:28,730
last two iterations and then you find the
next guess. so to kickoff secant method you
18
00:03:28,730 --> 00:03:36,849
need 2 initial guesses and now what i want
to do is come up with a multivariate analog
19
00:03:36,849 --> 00:03:59,189
of secant method. so now my problem is so
this
20
00:03:59,189 --> 00:04:01,970
is multivariate secant method.
21
00:04:01,970 --> 00:04:08,970
so it is known as wegstein iterations or wegstein
method and what i am going to do now here
22
00:04:08,970 --> 00:04:19,151
is i am going to solve for f of x=0 where
x belongs to rn and f is a n cross 1 function
23
00:04:19,151 --> 00:04:41,720
vector. so this is n cross 1 function vector.
let say there is some way by which we can
24
00:04:41,720 --> 00:04:44,400
arrange this set of equations.
25
00:04:44,400 --> 00:05:05,820
so we essentially have fi x=0 x belongs to
rn i going from i to n and let say we have
26
00:05:05,820 --> 00:05:31,400
some
way of arranging this equations as x1-gi x=fi
27
00:05:31,400 --> 00:05:47,410
x=0. okay i have some way of arranging this
into this kind of a form. the simplest way
28
00:05:47,410 --> 00:06:02,979
could be just add if i start from this if
i add xi+ fi x if i rewrite it like this i
29
00:06:02,979 --> 00:06:16,900
could call this as gi x. so basically i want
to rearrange it. i want to rearrange it as
30
00:06:16,900 --> 00:06:28,920
some xi- this is
just adding and subtracting xi on both sides.
31
00:06:28,920 --> 00:06:37,380
so this why i am putting into this form because
a particular way this method is implemented
32
00:06:37,380 --> 00:06:42,150
that is the reason why i am putting into this
form, but what you will see soon is that this
33
00:06:42,150 --> 00:06:48,690
method is nothing but a multivariable analog
of the secant method. so now what i am going
34
00:06:48,690 --> 00:07:02,759
to do is something like this.
35
00:07:02,759 --> 00:07:33,490
i am going to define this s i.
36
00:07:33,490 --> 00:07:45,389
i am going to define this slope
37
00:07:45,389 --> 00:08:01,090
not exactly partial derivative, but this is
some kind of a crude derivative of g s i is
38
00:08:01,090 --> 00:08:08,750
a crude derivative of g with respect to xi
or the rate of change with respect to xi that
39
00:08:08,750 --> 00:08:28,690
is how you can look at it. now i am going
to apply the secant method to ith equation.
40
00:08:28,690 --> 00:08:37,710
i have these equations fi x=0 just for the
time being keep this si this side.
41
00:08:37,710 --> 00:08:57,390
why am i defining this si will become clear
soon. i am going to apply. now look at this,
42
00:08:57,390 --> 00:09:08,059
this is ith component of x vector the new
one is old value of the ith component +a correction
43
00:09:08,059 --> 00:09:18,970
which is like secant method apply it only
to the scalar function fi. f i is a scalar
44
00:09:18,970 --> 00:09:25,949
function. capital f is the vector function.
fi is the component of the vector function
45
00:09:25,949 --> 00:09:33,850
f. so i am taking one scalar function ith
scalar function.
46
00:09:33,850 --> 00:09:43,830
and this is if you look carefully this is
fi x k-fi xk-1/ this 1 upon that will become
47
00:09:43,830 --> 00:09:47,449
right.
48
00:09:47,449 --> 00:10:08,780
so this derivative has been computed for the
ith function with respect to ith element in
49
00:10:08,780 --> 00:10:15,861
x and that is how i am going to generate.
if you do this is nothing but a wegstein method
50
00:10:15,861 --> 00:10:25,760
if you generate iterations like this. see
what is the advantage of doing this way over
51
00:10:25,760 --> 00:10:32,950
newton method. first of all you do not require
explicit derivative calculations this is only
52
00:10:32,950 --> 00:10:37,520
an approximation. how many such derivatives
suppose you call this as a derivative approximation
53
00:10:37,520 --> 00:10:43,770
how many such derivative approximation you
need= number of equations compared with the
54
00:10:43,770 --> 00:10:44,770
newton method.
55
00:10:44,770 --> 00:10:53,350
you need to compute jacobian how many elements
in the jacobian n cross n. so if you have
56
00:10:53,350 --> 00:10:58,030
100 equations to solve you have to compute
derivatives which are 100 cross 100 if your
57
00:10:58,030 --> 00:11:02,829
thousand equation is solved you have to compute
numerical derivative even if you compute it
58
00:11:02,829 --> 00:11:09,080
numerically this thousand cross thousand huge
number of calculations whereas here you have
59
00:11:09,080 --> 00:11:10,380
less number of calculations.
60
00:11:10,380 --> 00:11:21,130
so this is somewhere in between. it does use
some kind of rate information, but not all
61
00:11:21,130 --> 00:11:26,540
possible rates some partial rate information
is used, but not all possible rates. so this
62
00:11:26,540 --> 00:11:39,290
is computationally more let say attractive
because it requires less calculations. so
63
00:11:39,290 --> 00:11:44,670
it does use some kind of rate information
but not fully some partial rate information
64
00:11:44,670 --> 00:11:50,530
is used. now this is not the way it is normally
reported or implemented.
65
00:11:50,530 --> 00:11:58,429
a slight variation is done not in terms of
the formula, but in terms of the way it is
66
00:11:58,429 --> 00:12:01,430
implemented now which is what i am going.
67
00:12:01,430 --> 00:12:14,100
now this derivative is same as now fi is xi-gi.
i am going to substitute that here. so if
68
00:12:14,100 --> 00:12:35,439
i substitute that here it will be x1k-xi k-1-.
okay let us develop a slightly simplified
69
00:12:35,439 --> 00:12:52,120
notations then the things will become. so
let call gi k is defined as gi of xk. so that
70
00:12:52,120 --> 00:12:59,820
if gi k-1 which means gi evaluated at xk-1
and so on.
71
00:12:59,820 --> 00:13:16,640
so with this notation this derivative here
i am going to write this as xi k-xi k-1-gi
72
00:13:16,640 --> 00:13:45,620
k/xi k
which is=1- how was si k defined. see this
73
00:13:45,620 --> 00:13:55,559
will give you 1 and this divided by this is
si k. so this formula which i have here now
74
00:13:55,559 --> 00:14:13,890
in terms of si k this can be written as xi
k+1=xi k. now what is fi- xi k-gi k what is
75
00:14:13,890 --> 00:14:27,980
fi k. this
76
00:14:27,980 --> 00:14:48,709
is xi k- we have defined it like this* 1/1
upon –si k same formula i have just re-written
77
00:14:48,709 --> 00:14:55,660
in the different form same formula we have
written in different form.
78
00:14:55,660 --> 00:15:04,000
well if you were to implement if i just go
back here if you were to implement this formula
79
00:15:04,000 --> 00:15:09,429
as it is it is fine nothing wrong with it.
that will be wegstein method. why i am doing
80
00:15:09,429 --> 00:15:17,590
this is because i want to clamp some values
i want to introduce some heuristic into my
81
00:15:17,590 --> 00:15:28,059
iterations so that is where i am just doing
this rearrangement.
82
00:15:28,059 --> 00:15:42,720
so this particular equation now i can rearrange
this as follows. i will just rearrange that
83
00:15:42,720 --> 00:15:56,720
equation nothing else just put it. now i am
going to define another variable which is
84
00:15:56,720 --> 00:16:12,660
omega i k which is 1 upon 1- si k. i am going
to introduce a new variable omega ik which
85
00:16:12,660 --> 00:16:23,430
is 1 upon 1-si k. and in terms of omega ik
i am going to rewrite this as 1-omega ik xik+
86
00:16:23,430 --> 00:16:50,480
omega ik. the reason why i am going this is
draw some parallel with successive substitution
87
00:16:50,480 --> 00:17:01,980
with relaxation method this is like relaxation
iterations except that omega in relaxation
88
00:17:01,980 --> 00:17:02,980
is typically fixed.
89
00:17:02,980 --> 00:17:08,630
here omega is not fixed omega is changing
with the iterations from one iteration to
90
00:17:08,630 --> 00:17:13,679
the other iteration omega is changing. so
time varying omega it is like successive substitution
91
00:17:13,679 --> 00:17:23,230
with relaxation and now what we are going
to do this is some kind of thumb rule to make
92
00:17:23,230 --> 00:17:35,940
your computations.
93
00:17:35,940 --> 00:17:57,240
typically we tried to restrict omega between
0 and some alpha suggested alpha for this
94
00:17:57,240 --> 00:18:09,180
alpha is 5 typically you restrict this between
0 to 5. so you are doing something like a
95
00:18:09,180 --> 00:18:15,820
relaxation method, successive substitutions
with variable omega. and omega you want to
96
00:18:15,820 --> 00:18:25,750
restrict between some number between 0 and
5. so all this jugglery i have done because
97
00:18:25,750 --> 00:18:32,380
i wanted to put this limit that is why i am
doing all this jugglery otherwise i could
98
00:18:32,380 --> 00:18:33,510
have.
99
00:18:33,510 --> 00:18:43,680
so this multivariate version of secant method
together with this limit imposed on omega
100
00:18:43,680 --> 00:18:49,430
this is called as wegstein iterations this
is very popular method and if you go to many
101
00:18:49,430 --> 00:18:57,940
of these plant wide steady state simulation
software like aspen hysys you will one of
102
00:18:57,940 --> 00:19:05,490
the options they will give this wegstein iterations.
wegstein iterations can be performed even
103
00:19:05,490 --> 00:19:12,920
if the function fi is not differentiable you
do not require differentiability here you
104
00:19:12,920 --> 00:19:15,170
just require evaluation of functions at two
points.
105
00:19:15,170 --> 00:19:26,030
and divide it by you know you may have discontinuities
when you are writing equation for some unit
106
00:19:26,030 --> 00:19:31,350
operation. you may have different equations
in different regions depending upon the operating
107
00:19:31,350 --> 00:19:36,300
region and so on. if flooding occur you may
have some different equation normal operation
108
00:19:36,300 --> 00:19:42,450
you may have some different equation. in chemical
plant you can have this kinds of situations.
109
00:19:42,450 --> 00:19:56,050
so we actually many times prefer this as in
between way of doing calculation completely
110
00:19:56,050 --> 00:20:03,650
gradient based and completely gradient free.
so this is somewhere in between. so given
111
00:20:03,650 --> 00:20:07,770
a large scale problem what i would do is well
i would first try wegstein method if it works
112
00:20:07,770 --> 00:20:16,020
great. if it does work probably i should look
for something else. so because this is computationally
113
00:20:16,020 --> 00:20:22,750
more friendly, but is this clear any doubt
about this.
114
00:20:22,750 --> 00:20:30,680
let us move on to now the newton method which
we have derived from taylor series approximation,
115
00:20:30,680 --> 00:20:37,280
multivariate taylor series approximation then
we also did some exercises in which we solved
116
00:20:37,280 --> 00:20:44,690
some problems using newton methods. so you
already know something about newton method
117
00:20:44,690 --> 00:21:04,680
now what more is there to it. so the next
one is on our radar is newton method
118
00:21:04,680 --> 00:21:06,950
and you will say that we already know about
newton method.
119
00:21:06,950 --> 00:21:18,300
i want to solve for f of x=0 x belongs to
rn and then all that you do is at each time
120
00:21:18,300 --> 00:21:29,730
period you solve for this let us define this
jacobian at time k to make my writing simple
121
00:21:29,730 --> 00:21:41,640
i am going to use this terminology jk. jk
is dou f/dou x evaluated at x=x k.
122
00:21:41,640 --> 00:21:58,640
and then one more notation that i am going
to introduce here is f k this is function
123
00:21:58,640 --> 00:22:09,490
vector f evaluated at xk. i am just introducing
this notation so that my subsequent derivations
124
00:22:09,490 --> 00:22:16,700
become simpler in notation. just remember
that f superscript k is nothing but function
125
00:22:16,700 --> 00:22:31,940
f evaluated at x k. so my formula which you
have implemented is xk+1=x k+ delta xk. and
126
00:22:31,940 --> 00:22:42,090
delta xk is computed by solving this linear
algebraic equation.
127
00:22:42,090 --> 00:22:55,920
this is what you know right now as newton
method. well one variant which is often implemented
128
00:22:55,920 --> 00:23:05,180
rather than doing this is to make this equation
well conditioned you pre multiply this by
129
00:23:05,180 --> 00:23:32,190
jk transpose. so instead of doing this equation
often this is done
130
00:23:32,190 --> 00:23:38,010
in fact the computing tutorial which i have
given you i asked you to modify this step
131
00:23:38,010 --> 00:23:47,280
like this and then solve this resulting problem
using gauss-seidel method.
132
00:23:47,280 --> 00:23:58,360
what is the reason this become positive definite
gauss-seidel method is guaranteed to converge.
133
00:23:58,360 --> 00:24:03,140
in general when we work with positive definite
matrices, matrices are well conditioned easier
134
00:24:03,140 --> 00:24:09,230
to work with positive definite matrices. so
this is instead of this step we often implement
135
00:24:09,230 --> 00:24:11,580
this step one more modification.
136
00:24:11,580 --> 00:24:25,240
this is called as damped newton method. so
what we do here is that so one modification
137
00:24:25,240 --> 00:24:32,450
i told you is this. other modification is
we change this particular equation we modify
138
00:24:32,450 --> 00:24:52,480
it to x k+1=x k+ lambda times. so lambda is
chosen between 0 and 1. now what is the reason?
139
00:24:52,480 --> 00:25:04,020
first of all remember when you took newton
step it was based on local linearization of
140
00:25:04,020 --> 00:25:07,050
function vector f of xk.
141
00:25:07,050 --> 00:25:15,630
how did we take this step it was based on
the local linearization this jk is a local
142
00:25:15,630 --> 00:25:25,520
jacobian local derivative. so actually what
should happen is that when you take this step
143
00:25:25,520 --> 00:25:33,650
the function vector should reduce because
i want to go to f of x=0. when i go from xk
144
00:25:33,650 --> 00:25:47,500
to xk+1 this function vector should reduce.
now a step which is based on linearization
145
00:25:47,500 --> 00:25:54,090
may not ensure that at a new point actually
the function is reducing.
146
00:25:54,090 --> 00:25:58,650
because you have done a local approximation
of a non linear function make some decision
147
00:25:58,650 --> 00:26:06,830
of based on the local approximation. this
may not lead to a good value of xk+1. see
148
00:26:06,830 --> 00:26:17,290
what should happen i should move towards the
solution. now what is the guarantee that if
149
00:26:17,290 --> 00:26:26,770
i make some decisions based on the local slope
alone it will lead to decrease or it will
150
00:26:26,770 --> 00:26:29,240
lead to small value of f.
151
00:26:29,240 --> 00:26:41,710
to put it in little more mathematical words
see what should happen i want that f of norm
152
00:26:41,710 --> 00:26:52,450
of f k+1 this should be less than norm of
do you agree with me. when i take a new step
153
00:26:52,450 --> 00:27:00,290
if i evaluate the function vector at a new
point. see i want to finally go to f x=0.
154
00:27:00,290 --> 00:27:10,480
when i make a new step so this function evaluation
at a new step that is xk+1 should actually
155
00:27:10,480 --> 00:27:15,900
reduce as when compared to this.
156
00:27:15,900 --> 00:27:24,530
now this may not happen if i set lambda=1
it may not happen because delta x has been
157
00:27:24,530 --> 00:27:34,130
determined using local slope. so what is the
way out? so why should we choose lambda less
158
00:27:34,130 --> 00:27:44,330
than 1 let us look at our rational behind
it. so essentially i chose a lambda by some
159
00:27:44,330 --> 00:27:55,110
means such that this condition that is function
evaluated at k+1 is less than function evaluated
160
00:27:55,110 --> 00:28:05,390
at k. you could do this let me go back here
before i move on to the rational.
161
00:28:05,390 --> 00:28:12,520
so what i want to do is first i check lambda=1.
if for lambda=1 if this condition is satisfied
162
00:28:12,520 --> 00:28:21,130
i am happy. i accept delta x chose lambda=1
and proceed with the next iteration. if it
163
00:28:21,130 --> 00:28:30,420
does not happen okay i will reduce lambda
to say 0.9 for example i will give you a very
164
00:28:30,420 --> 00:28:36,340
crude way of doing it. i will reduce it to
0.9. then for lambda= 0.9 delta x is fixed.
165
00:28:36,340 --> 00:28:37,480
i am not going to change.
166
00:28:37,480 --> 00:28:44,120
delta x has been formed by using the jacobian
so i want to reduce lambda, i will reduce
167
00:28:44,120 --> 00:28:49,630
lambda and check whether this condition is
satisfied. if not i will further reduce lambda
168
00:28:49,630 --> 00:28:54,970
i go on reducing lambda till this condition
is satisfied. the moment i get one lambda
169
00:28:54,970 --> 00:29:04,260
for which this condition is met. i will take
that step and so how this is done. one algorithm
170
00:29:04,260 --> 00:29:07,179
for doing this is called as armijo line search.
171
00:29:07,179 --> 00:29:12,430
and i am not going to go into details of that
i have given this here in the notes it is
172
00:29:12,430 --> 00:29:20,030
in table 1 damped newton algorithm with armijo
line search. so those are detailed as to how
173
00:29:20,030 --> 00:29:25,100
do you select iteratively lambda. what i want
to do on the is not the algorithm as to how
174
00:29:25,100 --> 00:29:29,429
to select lambda such that this is met. so
that is matter of implementation i want to
175
00:29:29,429 --> 00:29:32,730
give the rational why this is done.
176
00:29:32,730 --> 00:29:38,950
this is done this lambda which is less than
1 is done because we are doing taylor series
177
00:29:38,950 --> 00:29:48,240
approximation and taylor series approximation
is actually valid in a small neighborhood
178
00:29:48,240 --> 00:29:53,470
and then this lambda actually helps us to
find out what is that neighborhood where you
179
00:29:53,470 --> 00:29:59,680
should apply taylor series approximation.
so what i am going to do now is i am going
180
00:29:59,680 --> 00:30:03,140
to look at this function.
181
00:30:03,140 --> 00:30:14,700
i am going to look at this function phi which
is i am going to look at this function vector.
182
00:30:14,700 --> 00:30:30,490
f is my function vector, fk transpose what
is this, this is nothing but ½ f k+1 2 square
183
00:30:30,490 --> 00:30:41,490
nothing but norm 2 square. i am going to look
at this function. now this particular function
184
00:30:41,490 --> 00:30:58,730
is nothing but ½ f of xk+ lambda delta xk
transpose actually the raw method which we
185
00:30:58,730 --> 00:31:02,230
have studied in the beginning and which we
have implemented for some simple problems
186
00:31:02,230 --> 00:31:03,690
works only for simple cases.
187
00:31:03,690 --> 00:31:14,700
to make it work for a large size complex problems
we have to do all kinds of tricks. so this
188
00:31:14,700 --> 00:31:24,080
is a scalar function phi is a scalar function.
i am going to now expand phi in the neighborhood
189
00:31:24,080 --> 00:31:30,780
of xk.
190
00:31:30,780 --> 00:31:46,100
so my phi xk+1=phi so this taylor theorem
is like foundation it helps you everywhere
191
00:31:46,100 --> 00:32:00,000
wherever you move in applied engineering mathematics
it is one of the cornerstone. now when you
192
00:32:00,000 --> 00:32:06,570
are writing it like this what is unknown to
you here only lambda. delta x we have already
193
00:32:06,570 --> 00:32:16,570
calculated xk is known to us. so this vector
is known i am just now worried about choosing
194
00:32:16,570 --> 00:32:18,140
lambda correctly.
195
00:32:18,140 --> 00:32:34,260
so lambda dou phi/dou lambda + lambda square/2
dou 2 ph/ dou lambda square and so on. i am
196
00:32:34,260 --> 00:32:53,929
expanding this as a function of right. so
now what should happen is phi xk-1-phi xk
197
00:32:53,929 --> 00:33:07,400
what should this quantity be positive or negative?
it should be negative this quantity should
198
00:33:07,400 --> 00:33:17,429
be negative. this is nothing, but look here.
this is if i were to square this and subtract
199
00:33:17,429 --> 00:33:21,990
okay i will this quantity just multiplying
by ½. ½ is not going to make much difference
200
00:33:21,990 --> 00:33:30,300
½ is a positive quantity. so this quantity
is what i am worried about.
201
00:33:30,300 --> 00:33:55,270
now we know that delta xk=-jk inverse fk and
lambda times dou phi/ dou lambda this is same
202
00:33:55,270 --> 00:34:22,929
as lambda times grad phi x k transpose delta
x k. just check this i am just doing derivatives
203
00:34:22,929 --> 00:34:35,270
by in succession. i first differentiate phi
with respect to entire quantity and then so
204
00:34:35,270 --> 00:34:46,000
this is dou phi/dou x* dou x/dou lambda i
am not writing that just skipping the in between
205
00:34:46,000 --> 00:34:47,740
steps okay.