1 00:00:17,279 --> 00:00:25,789 so we have been looking at methods for solving non linear algebraic equations and initially 2 00:00:25,789 --> 00:00:33,600 we looked at this successive substitutions very, very briefly. so which are derivative 3 00:00:33,600 --> 00:00:41,739 free methods no gradient calculation and then i move to this univariate newton method basically 4 00:00:41,739 --> 00:00:46,910 a variant called secant method. 5 00:00:46,910 --> 00:00:54,329 and then the motivation for doing this is to look at a method which is intermediate 6 00:00:54,329 --> 00:01:01,960 between univariate method and multivariate method. this method is called as wegstein 7 00:01:01,960 --> 00:01:11,410 iterations and then we will move on to modifications of the newton method. the derivation of newton 8 00:01:11,410 --> 00:01:16,840 method we have already done using taylor series expansion. so this is application of taylor 9 00:01:16,840 --> 00:01:17,840 series expansion. 10 00:01:17,840 --> 00:01:28,460 but there are many more modifications which are useful for implementing in a practical 11 00:01:28,460 --> 00:01:32,380 scenario. so i will discuss about them today. 12 00:01:32,380 --> 00:01:52,359 so first is this secant method. so secant method i want to solve for f of x=0 x belongs 13 00:01:52,359 --> 00:02:27,180 to r univariate problem i want to solve for f of x=0 and the way this is done. so this 14 00:02:27,180 --> 00:02:30,150 is update rule for the newton method. 15 00:02:30,150 --> 00:02:40,760 and then in secant method you just approximate this you replace the derivative is 16 00:02:40,760 --> 00:03:13,870 approximated as f at x. we approximate this instead of using exact derivative you use 17 00:03:13,870 --> 00:03:28,730 last two iterations and then you find the next guess. so to kickoff secant method you 18 00:03:28,730 --> 00:03:36,849 need 2 initial guesses and now what i want to do is come up with a multivariate analog 19 00:03:36,849 --> 00:03:59,189 of secant method. so now my problem is so this 20 00:03:59,189 --> 00:04:01,970 is multivariate secant method. 21 00:04:01,970 --> 00:04:08,970 so it is known as wegstein iterations or wegstein method and what i am going to do now here 22 00:04:08,970 --> 00:04:19,151 is i am going to solve for f of x=0 where x belongs to rn and f is a n cross 1 function 23 00:04:19,151 --> 00:04:41,720 vector. so this is n cross 1 function vector. let say there is some way by which we can 24 00:04:41,720 --> 00:04:44,400 arrange this set of equations. 25 00:04:44,400 --> 00:05:05,820 so we essentially have fi x=0 x belongs to rn i going from i to n and let say we have 26 00:05:05,820 --> 00:05:31,400 some way of arranging this equations as x1-gi x=fi 27 00:05:31,400 --> 00:05:47,410 x=0. okay i have some way of arranging this into this kind of a form. the simplest way 28 00:05:47,410 --> 00:06:02,979 could be just add if i start from this if i add xi+ fi x if i rewrite it like this i 29 00:06:02,979 --> 00:06:16,900 could call this as gi x. so basically i want to rearrange it. i want to rearrange it as 30 00:06:16,900 --> 00:06:28,920 some xi- this is just adding and subtracting xi on both sides. 31 00:06:28,920 --> 00:06:37,380 so this why i am putting into this form because a particular way this method is implemented 32 00:06:37,380 --> 00:06:42,150 that is the reason why i am putting into this form, but what you will see soon is that this 33 00:06:42,150 --> 00:06:48,690 method is nothing but a multivariable analog of the secant method. so now what i am going 34 00:06:48,690 --> 00:07:02,759 to do is something like this. 35 00:07:02,759 --> 00:07:33,490 i am going to define this s i. 36 00:07:33,490 --> 00:07:45,389 i am going to define this slope 37 00:07:45,389 --> 00:08:01,090 not exactly partial derivative, but this is some kind of a crude derivative of g s i is 38 00:08:01,090 --> 00:08:08,750 a crude derivative of g with respect to xi or the rate of change with respect to xi that 39 00:08:08,750 --> 00:08:28,690 is how you can look at it. now i am going to apply the secant method to ith equation. 40 00:08:28,690 --> 00:08:37,710 i have these equations fi x=0 just for the time being keep this si this side. 41 00:08:37,710 --> 00:08:57,390 why am i defining this si will become clear soon. i am going to apply. now look at this, 42 00:08:57,390 --> 00:09:08,059 this is ith component of x vector the new one is old value of the ith component +a correction 43 00:09:08,059 --> 00:09:18,970 which is like secant method apply it only to the scalar function fi. f i is a scalar 44 00:09:18,970 --> 00:09:25,949 function. capital f is the vector function. fi is the component of the vector function 45 00:09:25,949 --> 00:09:33,850 f. so i am taking one scalar function ith scalar function. 46 00:09:33,850 --> 00:09:43,830 and this is if you look carefully this is fi x k-fi xk-1/ this 1 upon that will become 47 00:09:43,830 --> 00:09:47,449 right. 48 00:09:47,449 --> 00:10:08,780 so this derivative has been computed for the ith function with respect to ith element in 49 00:10:08,780 --> 00:10:15,861 x and that is how i am going to generate. if you do this is nothing but a wegstein method 50 00:10:15,861 --> 00:10:25,760 if you generate iterations like this. see what is the advantage of doing this way over 51 00:10:25,760 --> 00:10:32,950 newton method. first of all you do not require explicit derivative calculations this is only 52 00:10:32,950 --> 00:10:37,520 an approximation. how many such derivatives suppose you call this as a derivative approximation 53 00:10:37,520 --> 00:10:43,770 how many such derivative approximation you need= number of equations compared with the 54 00:10:43,770 --> 00:10:44,770 newton method. 55 00:10:44,770 --> 00:10:53,350 you need to compute jacobian how many elements in the jacobian n cross n. so if you have 56 00:10:53,350 --> 00:10:58,030 100 equations to solve you have to compute derivatives which are 100 cross 100 if your 57 00:10:58,030 --> 00:11:02,829 thousand equation is solved you have to compute numerical derivative even if you compute it 58 00:11:02,829 --> 00:11:09,080 numerically this thousand cross thousand huge number of calculations whereas here you have 59 00:11:09,080 --> 00:11:10,380 less number of calculations. 60 00:11:10,380 --> 00:11:21,130 so this is somewhere in between. it does use some kind of rate information, but not all 61 00:11:21,130 --> 00:11:26,540 possible rates some partial rate information is used, but not all possible rates. so this 62 00:11:26,540 --> 00:11:39,290 is computationally more let say attractive because it requires less calculations. so 63 00:11:39,290 --> 00:11:44,670 it does use some kind of rate information but not fully some partial rate information 64 00:11:44,670 --> 00:11:50,530 is used. now this is not the way it is normally reported or implemented. 65 00:11:50,530 --> 00:11:58,429 a slight variation is done not in terms of the formula, but in terms of the way it is 66 00:11:58,429 --> 00:12:01,430 implemented now which is what i am going. 67 00:12:01,430 --> 00:12:14,100 now this derivative is same as now fi is xi-gi. i am going to substitute that here. so if 68 00:12:14,100 --> 00:12:35,439 i substitute that here it will be x1k-xi k-1-. okay let us develop a slightly simplified 69 00:12:35,439 --> 00:12:52,120 notations then the things will become. so let call gi k is defined as gi of xk. so that 70 00:12:52,120 --> 00:12:59,820 if gi k-1 which means gi evaluated at xk-1 and so on. 71 00:12:59,820 --> 00:13:16,640 so with this notation this derivative here i am going to write this as xi k-xi k-1-gi 72 00:13:16,640 --> 00:13:45,620 k/xi k which is=1- how was si k defined. see this 73 00:13:45,620 --> 00:13:55,559 will give you 1 and this divided by this is si k. so this formula which i have here now 74 00:13:55,559 --> 00:14:13,890 in terms of si k this can be written as xi k+1=xi k. now what is fi- xi k-gi k what is 75 00:14:13,890 --> 00:14:27,980 fi k. this 76 00:14:27,980 --> 00:14:48,709 is xi k- we have defined it like this* 1/1 upon –si k same formula i have just re-written 77 00:14:48,709 --> 00:14:55,660 in the different form same formula we have written in different form. 78 00:14:55,660 --> 00:15:04,000 well if you were to implement if i just go back here if you were to implement this formula 79 00:15:04,000 --> 00:15:09,429 as it is it is fine nothing wrong with it. that will be wegstein method. why i am doing 80 00:15:09,429 --> 00:15:17,590 this is because i want to clamp some values i want to introduce some heuristic into my 81 00:15:17,590 --> 00:15:28,059 iterations so that is where i am just doing this rearrangement. 82 00:15:28,059 --> 00:15:42,720 so this particular equation now i can rearrange this as follows. i will just rearrange that 83 00:15:42,720 --> 00:15:56,720 equation nothing else just put it. now i am going to define another variable which is 84 00:15:56,720 --> 00:16:12,660 omega i k which is 1 upon 1- si k. i am going to introduce a new variable omega ik which 85 00:16:12,660 --> 00:16:23,430 is 1 upon 1-si k. and in terms of omega ik i am going to rewrite this as 1-omega ik xik+ 86 00:16:23,430 --> 00:16:50,480 omega ik. the reason why i am going this is draw some parallel with successive substitution 87 00:16:50,480 --> 00:17:01,980 with relaxation method this is like relaxation iterations except that omega in relaxation 88 00:17:01,980 --> 00:17:02,980 is typically fixed. 89 00:17:02,980 --> 00:17:08,630 here omega is not fixed omega is changing with the iterations from one iteration to 90 00:17:08,630 --> 00:17:13,679 the other iteration omega is changing. so time varying omega it is like successive substitution 91 00:17:13,679 --> 00:17:23,230 with relaxation and now what we are going to do this is some kind of thumb rule to make 92 00:17:23,230 --> 00:17:35,940 your computations. 93 00:17:35,940 --> 00:17:57,240 typically we tried to restrict omega between 0 and some alpha suggested alpha for this 94 00:17:57,240 --> 00:18:09,180 alpha is 5 typically you restrict this between 0 to 5. so you are doing something like a 95 00:18:09,180 --> 00:18:15,820 relaxation method, successive substitutions with variable omega. and omega you want to 96 00:18:15,820 --> 00:18:25,750 restrict between some number between 0 and 5. so all this jugglery i have done because 97 00:18:25,750 --> 00:18:32,380 i wanted to put this limit that is why i am doing all this jugglery otherwise i could 98 00:18:32,380 --> 00:18:33,510 have. 99 00:18:33,510 --> 00:18:43,680 so this multivariate version of secant method together with this limit imposed on omega 100 00:18:43,680 --> 00:18:49,430 this is called as wegstein iterations this is very popular method and if you go to many 101 00:18:49,430 --> 00:18:57,940 of these plant wide steady state simulation software like aspen hysys you will one of 102 00:18:57,940 --> 00:19:05,490 the options they will give this wegstein iterations. wegstein iterations can be performed even 103 00:19:05,490 --> 00:19:12,920 if the function fi is not differentiable you do not require differentiability here you 104 00:19:12,920 --> 00:19:15,170 just require evaluation of functions at two points. 105 00:19:15,170 --> 00:19:26,030 and divide it by you know you may have discontinuities when you are writing equation for some unit 106 00:19:26,030 --> 00:19:31,350 operation. you may have different equations in different regions depending upon the operating 107 00:19:31,350 --> 00:19:36,300 region and so on. if flooding occur you may have some different equation normal operation 108 00:19:36,300 --> 00:19:42,450 you may have some different equation. in chemical plant you can have this kinds of situations. 109 00:19:42,450 --> 00:19:56,050 so we actually many times prefer this as in between way of doing calculation completely 110 00:19:56,050 --> 00:20:03,650 gradient based and completely gradient free. so this is somewhere in between. so given 111 00:20:03,650 --> 00:20:07,770 a large scale problem what i would do is well i would first try wegstein method if it works 112 00:20:07,770 --> 00:20:16,020 great. if it does work probably i should look for something else. so because this is computationally 113 00:20:16,020 --> 00:20:22,750 more friendly, but is this clear any doubt about this. 114 00:20:22,750 --> 00:20:30,680 let us move on to now the newton method which we have derived from taylor series approximation, 115 00:20:30,680 --> 00:20:37,280 multivariate taylor series approximation then we also did some exercises in which we solved 116 00:20:37,280 --> 00:20:44,690 some problems using newton methods. so you already know something about newton method 117 00:20:44,690 --> 00:21:04,680 now what more is there to it. so the next one is on our radar is newton method 118 00:21:04,680 --> 00:21:06,950 and you will say that we already know about newton method. 119 00:21:06,950 --> 00:21:18,300 i want to solve for f of x=0 x belongs to rn and then all that you do is at each time 120 00:21:18,300 --> 00:21:29,730 period you solve for this let us define this jacobian at time k to make my writing simple 121 00:21:29,730 --> 00:21:41,640 i am going to use this terminology jk. jk is dou f/dou x evaluated at x=x k. 122 00:21:41,640 --> 00:21:58,640 and then one more notation that i am going to introduce here is f k this is function 123 00:21:58,640 --> 00:22:09,490 vector f evaluated at xk. i am just introducing this notation so that my subsequent derivations 124 00:22:09,490 --> 00:22:16,700 become simpler in notation. just remember that f superscript k is nothing but function 125 00:22:16,700 --> 00:22:31,940 f evaluated at x k. so my formula which you have implemented is xk+1=x k+ delta xk. and 126 00:22:31,940 --> 00:22:42,090 delta xk is computed by solving this linear algebraic equation. 127 00:22:42,090 --> 00:22:55,920 this is what you know right now as newton method. well one variant which is often implemented 128 00:22:55,920 --> 00:23:05,180 rather than doing this is to make this equation well conditioned you pre multiply this by 129 00:23:05,180 --> 00:23:32,190 jk transpose. so instead of doing this equation often this is done 130 00:23:32,190 --> 00:23:38,010 in fact the computing tutorial which i have given you i asked you to modify this step 131 00:23:38,010 --> 00:23:47,280 like this and then solve this resulting problem using gauss-seidel method. 132 00:23:47,280 --> 00:23:58,360 what is the reason this become positive definite gauss-seidel method is guaranteed to converge. 133 00:23:58,360 --> 00:24:03,140 in general when we work with positive definite matrices, matrices are well conditioned easier 134 00:24:03,140 --> 00:24:09,230 to work with positive definite matrices. so this is instead of this step we often implement 135 00:24:09,230 --> 00:24:11,580 this step one more modification. 136 00:24:11,580 --> 00:24:25,240 this is called as damped newton method. so what we do here is that so one modification 137 00:24:25,240 --> 00:24:32,450 i told you is this. other modification is we change this particular equation we modify 138 00:24:32,450 --> 00:24:52,480 it to x k+1=x k+ lambda times. so lambda is chosen between 0 and 1. now what is the reason? 139 00:24:52,480 --> 00:25:04,020 first of all remember when you took newton step it was based on local linearization of 140 00:25:04,020 --> 00:25:07,050 function vector f of xk. 141 00:25:07,050 --> 00:25:15,630 how did we take this step it was based on the local linearization this jk is a local 142 00:25:15,630 --> 00:25:25,520 jacobian local derivative. so actually what should happen is that when you take this step 143 00:25:25,520 --> 00:25:33,650 the function vector should reduce because i want to go to f of x=0. when i go from xk 144 00:25:33,650 --> 00:25:47,500 to xk+1 this function vector should reduce. now a step which is based on linearization 145 00:25:47,500 --> 00:25:54,090 may not ensure that at a new point actually the function is reducing. 146 00:25:54,090 --> 00:25:58,650 because you have done a local approximation of a non linear function make some decision 147 00:25:58,650 --> 00:26:06,830 of based on the local approximation. this may not lead to a good value of xk+1. see 148 00:26:06,830 --> 00:26:17,290 what should happen i should move towards the solution. now what is the guarantee that if 149 00:26:17,290 --> 00:26:26,770 i make some decisions based on the local slope alone it will lead to decrease or it will 150 00:26:26,770 --> 00:26:29,240 lead to small value of f. 151 00:26:29,240 --> 00:26:41,710 to put it in little more mathematical words see what should happen i want that f of norm 152 00:26:41,710 --> 00:26:52,450 of f k+1 this should be less than norm of do you agree with me. when i take a new step 153 00:26:52,450 --> 00:27:00,290 if i evaluate the function vector at a new point. see i want to finally go to f x=0. 154 00:27:00,290 --> 00:27:10,480 when i make a new step so this function evaluation at a new step that is xk+1 should actually 155 00:27:10,480 --> 00:27:15,900 reduce as when compared to this. 156 00:27:15,900 --> 00:27:24,530 now this may not happen if i set lambda=1 it may not happen because delta x has been 157 00:27:24,530 --> 00:27:34,130 determined using local slope. so what is the way out? so why should we choose lambda less 158 00:27:34,130 --> 00:27:44,330 than 1 let us look at our rational behind it. so essentially i chose a lambda by some 159 00:27:44,330 --> 00:27:55,110 means such that this condition that is function evaluated at k+1 is less than function evaluated 160 00:27:55,110 --> 00:28:05,390 at k. you could do this let me go back here before i move on to the rational. 161 00:28:05,390 --> 00:28:12,520 so what i want to do is first i check lambda=1. if for lambda=1 if this condition is satisfied 162 00:28:12,520 --> 00:28:21,130 i am happy. i accept delta x chose lambda=1 and proceed with the next iteration. if it 163 00:28:21,130 --> 00:28:30,420 does not happen okay i will reduce lambda to say 0.9 for example i will give you a very 164 00:28:30,420 --> 00:28:36,340 crude way of doing it. i will reduce it to 0.9. then for lambda= 0.9 delta x is fixed. 165 00:28:36,340 --> 00:28:37,480 i am not going to change. 166 00:28:37,480 --> 00:28:44,120 delta x has been formed by using the jacobian so i want to reduce lambda, i will reduce 167 00:28:44,120 --> 00:28:49,630 lambda and check whether this condition is satisfied. if not i will further reduce lambda 168 00:28:49,630 --> 00:28:54,970 i go on reducing lambda till this condition is satisfied. the moment i get one lambda 169 00:28:54,970 --> 00:29:04,260 for which this condition is met. i will take that step and so how this is done. one algorithm 170 00:29:04,260 --> 00:29:07,179 for doing this is called as armijo line search. 171 00:29:07,179 --> 00:29:12,430 and i am not going to go into details of that i have given this here in the notes it is 172 00:29:12,430 --> 00:29:20,030 in table 1 damped newton algorithm with armijo line search. so those are detailed as to how 173 00:29:20,030 --> 00:29:25,100 do you select iteratively lambda. what i want to do on the is not the algorithm as to how 174 00:29:25,100 --> 00:29:29,429 to select lambda such that this is met. so that is matter of implementation i want to 175 00:29:29,429 --> 00:29:32,730 give the rational why this is done. 176 00:29:32,730 --> 00:29:38,950 this is done this lambda which is less than 1 is done because we are doing taylor series 177 00:29:38,950 --> 00:29:48,240 approximation and taylor series approximation is actually valid in a small neighborhood 178 00:29:48,240 --> 00:29:53,470 and then this lambda actually helps us to find out what is that neighborhood where you 179 00:29:53,470 --> 00:29:59,680 should apply taylor series approximation. so what i am going to do now is i am going 180 00:29:59,680 --> 00:30:03,140 to look at this function. 181 00:30:03,140 --> 00:30:14,700 i am going to look at this function phi which is i am going to look at this function vector. 182 00:30:14,700 --> 00:30:30,490 f is my function vector, fk transpose what is this, this is nothing but ½ f k+1 2 square 183 00:30:30,490 --> 00:30:41,490 nothing but norm 2 square. i am going to look at this function. now this particular function 184 00:30:41,490 --> 00:30:58,730 is nothing but ½ f of xk+ lambda delta xk transpose actually the raw method which we 185 00:30:58,730 --> 00:31:02,230 have studied in the beginning and which we have implemented for some simple problems 186 00:31:02,230 --> 00:31:03,690 works only for simple cases. 187 00:31:03,690 --> 00:31:14,700 to make it work for a large size complex problems we have to do all kinds of tricks. so this 188 00:31:14,700 --> 00:31:24,080 is a scalar function phi is a scalar function. i am going to now expand phi in the neighborhood 189 00:31:24,080 --> 00:31:30,780 of xk. 190 00:31:30,780 --> 00:31:46,100 so my phi xk+1=phi so this taylor theorem is like foundation it helps you everywhere 191 00:31:46,100 --> 00:32:00,000 wherever you move in applied engineering mathematics it is one of the cornerstone. now when you 192 00:32:00,000 --> 00:32:06,570 are writing it like this what is unknown to you here only lambda. delta x we have already 193 00:32:06,570 --> 00:32:16,570 calculated xk is known to us. so this vector is known i am just now worried about choosing 194 00:32:16,570 --> 00:32:18,140 lambda correctly. 195 00:32:18,140 --> 00:32:34,260 so lambda dou phi/dou lambda + lambda square/2 dou 2 ph/ dou lambda square and so on. i am 196 00:32:34,260 --> 00:32:53,929 expanding this as a function of right. so now what should happen is phi xk-1-phi xk 197 00:32:53,929 --> 00:33:07,400 what should this quantity be positive or negative? it should be negative this quantity should 198 00:33:07,400 --> 00:33:17,429 be negative. this is nothing, but look here. this is if i were to square this and subtract 199 00:33:17,429 --> 00:33:21,990 okay i will this quantity just multiplying by ½. ½ is not going to make much difference 200 00:33:21,990 --> 00:33:30,300 ½ is a positive quantity. so this quantity is what i am worried about. 201 00:33:30,300 --> 00:33:55,270 now we know that delta xk=-jk inverse fk and lambda times dou phi/ dou lambda this is same 202 00:33:55,270 --> 00:34:22,929 as lambda times grad phi x k transpose delta x k. just check this i am just doing derivatives 203 00:34:22,929 --> 00:34:35,270 by in succession. i first differentiate phi with respect to entire quantity and then so 204 00:34:35,270 --> 00:34:46,000 this is dou phi/dou x* dou x/dou lambda i am not writing that just skipping the in between 205 00:34:46,000 --> 00:34:47,740 steps okay.