1
00:00:00,120 --> 00:00:31,920
Now, so far we talked about random variables
both discrete and continuous random variables
2
00:00:31,920 --> 00:00:39,219
but we didnít say anything about time dependence.
We didnít say anything about how these random
3
00:00:39,219 --> 00:00:45,250
variables could possibly change with time.
From now on we will look at random variables
4
00:00:45,250 --> 00:00:49,629
which change with time, which evolve with
time and then you have whatís called a random
5
00:00:49,629 --> 00:01:05,720
process or a stochastic process.
So this is going to be our next topic
6
00:01:05,720 --> 00:01:10,730
which is concerned, this subject is concerned
with the study of random variables with some
7
00:01:10,730 --> 00:01:18,190
rule for the evolution of certain probability
distributions as a function of time okay.
8
00:01:18,190 --> 00:01:24,160
Now the first thing we have to appreciate
is that a random process, if you sample this
9
00:01:24,160 --> 00:01:30,240
random process at discrete instance of time,
you get a time series with values for the
10
00:01:30,240 --> 00:01:34,500
random variable drawn from the sample space
of this random variable.
11
00:01:34,500 --> 00:01:42,620
So if we for instance say that this random
variable could have values x 1, x 2 dot dot
12
00:01:42,620 --> 00:01:48,650
dot etc., then from this set of values if
you sample this process at various instance
13
00:01:48,650 --> 00:02:02,560
of time say t 1, t 2 and so on; these are
the sampling instance, instance of time or
14
00:02:02,560 --> 00:02:09,020
particular values of the time variable. Itís
general technical term is epochs, sampling
15
00:02:09,020 --> 00:02:20,930
epochs and this is these are elements of the
sample space of the random variable X okay.
16
00:02:20,930 --> 00:02:28,659
Then one ask for the probability that any
of these variables values is attained at any
17
00:02:28,659 --> 00:02:34,280
given instant of time okay. Now we need, this
is very cumbersome notation, so what Iíll
18
00:02:34,280 --> 00:02:41,700
do is to just take the index here and label
this value here by that index okay and for
19
00:02:41,700 --> 00:02:46,469
this index Iíll call use the symbol j, k,
and so on and so forth. When I have too many
20
00:02:46,469 --> 00:02:53,909
of them Iíll call it j 1, j 2 etc., etc.
So the question is what is the probability
21
00:02:53,909 --> 00:03:00,629
for a discrete random variable functions whatís
the probability that uh at some instant of
22
00:03:00,629 --> 00:03:12,269
time t 1 the value happens to be some j 1
or x sub j 1.
23
00:03:12,269 --> 00:03:18,090
So Iíll call that the one-time probability.
Or if itís a continuous random variable then
24
00:03:18,090 --> 00:03:24,599
I use Iíll interchangeably use this j 1 t
1, but Iíll be careful to indicate the fact
25
00:03:24,599 --> 00:03:29,620
that this thing is a continuous variable here.
For the moment of course letís leave it at
26
00:03:29,620 --> 00:03:37,140
discrete and I have this. I could also ask
whatís the probability that you have the
27
00:03:37,140 --> 00:03:46,999
value x sub j 2 at time t 2 and the value
j 1 x sub j 1 at time t 1. Thatís a different
28
00:03:46,999 --> 00:03:50,980
function. This is a joint probability.
Itís a different function from this. there
29
00:03:50,980 --> 00:03:56,010
are two-time arguments here . To keep track
of that, let me call this P 2 and let me call
30
00:03:56,010 --> 00:04:05,680
this P 1 and clearly this can go on. I look
at the three-time probability, the four-time
31
00:04:05,680 --> 00:04:12,340
probability and so on. Now to specify this
random variable completely, I need to tell
32
00:04:12,340 --> 00:04:21,169
you all these probability, joint probabilities.
So the first thing we learn is that a stochastic
33
00:04:21,169 --> 00:04:27,240
process is described by an infinite hierarchy
of probabilities or in the case of continuous
34
00:04:27,240 --> 00:04:33,819
variables, probability densities but itís
an infinite hierarchy to start with.
35
00:04:33,819 --> 00:04:39,389
Of course with this formidable problem thereís
not much one can do unless you start making
36
00:04:39,389 --> 00:04:44,509
certain simplifying assumptions. But thereís
one thing we can do which is not even an assumption
37
00:04:44,509 --> 00:05:02,540
and thatís the following.
You can always take the n time probability.
38
00:05:02,540 --> 00:05:10,180
This n time probability can always be written
as equal to the probability that we have j
39
00:05:10,180 --> 00:05:17,090
n, t n given that all these earlier things
happen. So I am assuming here of course that
40
00:05:17,090 --> 00:05:25,099
t 1 less than t 2 less than dot dot less than
t n and Iím writing the earliest times to
41
00:05:25,099 --> 00:05:31,630
the right and the latest times to the left;
thatís the standard notation and this n time
42
00:05:31,630 --> 00:05:37,600
probability can be written as the product
of a conditional probability and a vertical
43
00:05:37,600 --> 00:05:43,610
bar will denote the probability of whatever
is on this side given whatever is on the right
44
00:05:43,610 --> 00:05:52,310
hand side of the bar. That will be my notation.
So we have j n minus 1, t n minus 1 all the
45
00:05:52,310 --> 00:06:00,160
way up to j 1 t 1.
This is again an n time argument probability,
46
00:06:00,160 --> 00:06:16,449
so itís still n. But this however is now
a conditional probability, conditional. Whereas
47
00:06:16,449 --> 00:06:30,139
this one is just a joint probability multiplied
by the probability that all these events have
48
00:06:30,139 --> 00:06:39,780
occurred that is j n minus 1, t n minus 1
dot dot up to j 1 t 1 and thatís a function
49
00:06:39,780 --> 00:06:50,710
of n minus 1 variable. So itís P n minus
1 and in turn you can take this quantity and
50
00:06:50,710 --> 00:06:56,389
write it as a conditional probability of this
last event occurring given that all the other
51
00:06:56,389 --> 00:07:01,080
events have occurred and so on.
So finally you can write it as a product of
52
00:07:01,080 --> 00:07:07,319
an n time conditional probability and n minus
1 time, n minus 2 time right up to the single
53
00:07:07,319 --> 00:07:17,169
time probability P 1 of j 1, t 1. So thatís
one simplification one can do right away immediately.
54
00:07:17,169 --> 00:07:22,270
But even thatís not helpful because you still
have this formidable task of specifying all
55
00:07:22,270 --> 00:07:27,510
these conditional probabilities if these things
have happened okay and thatís why the general
56
00:07:27,510 --> 00:07:30,659
theory is.
One can proceed further with this and so on
57
00:07:30,659 --> 00:07:35,510
but we are going to restrict ourselves to
a very special instance, a very special kind
58
00:07:35,510 --> 00:07:44,289
of random process where the memory is a short-term
memory in a very specific sense okay. Now
59
00:07:44,289 --> 00:07:51,789
this implies that the probability that this
happens at time t n depends on all that happened
60
00:07:51,789 --> 00:07:58,460
earlier on earlier instance of time okay.
But this is like saying you have a memory
61
00:07:58,460 --> 00:08:02,629
in this process.
Now experience tells us with random process
62
00:08:02,629 --> 00:08:08,530
of various kinds tells us that in nature very
often if you use the right number of variables,
63
00:08:08,530 --> 00:08:14,169
if you take a complete set of variables in
a very specific sense then itís short-term
64
00:08:14,169 --> 00:08:19,120
memory that occurs, never long-term memory.
No history dependence in a certain specific
65
00:08:19,120 --> 00:08:24,430
sense. Just to give you an example, if you
look at to give you a sort of trivial example,
66
00:08:24,430 --> 00:08:29,530
if you look at Newtonís equation for a particle
moving in space, this looks like a second
67
00:08:29,530 --> 00:08:37,320
order differential equation in time okay.
So not only to tell you what so if you want
68
00:08:37,320 --> 00:08:43,240
to plot a trajectory of a particle you have
to know not only the position of the particle
69
00:08:43,240 --> 00:08:47,170
at a certain instance of time, but also the
slope of the trajectory at that instant of
70
00:08:47,170 --> 00:08:53,260
time. This is like saying really to specify
things completely, the fact that the force
71
00:08:53,260 --> 00:08:57,930
specifies the acceleration rather than the
velocity, tells you that you need both the
72
00:08:57,930 --> 00:09:03,779
initial velocity and the initial position.
Which means that dynamics is really happening
73
00:09:03,779 --> 00:09:09,690
in a phased pace comprising the configuration
space of coordinates as well as the velocity
74
00:09:09,690 --> 00:09:14,620
components or the momentum components right
and once you put in in terms of those extra
75
00:09:14,620 --> 00:09:19,760
variables the full set of variables then the
equations are motion of first order differential
76
00:09:19,760 --> 00:09:24,490
equations. So the initial state, any given,
at any given instant of time will determine
77
00:09:24,490 --> 00:09:29,220
once you solve the equations of motion will
determine the future state of the system right.
78
00:09:29,220 --> 00:09:34,730
So thatís an example where the dynamics is
really first order in time so that the future
79
00:09:34,730 --> 00:09:39,070
is determined by the initial condition or
the present and not on how you reach that
80
00:09:39,070 --> 00:09:44,420
present in exactly the same way as in quantum
mechanics where the Schrodingerís equation
81
00:09:44,420 --> 00:09:48,779
is the first order differential equation in
time for the state vector. So if you tell
82
00:09:48,779 --> 00:09:53,200
me the state vector at an initial instant
of time and the Hamiltonian which gives you
83
00:09:53,200 --> 00:09:57,339
the rule of evolution, you can predict what
the future state of the system is going to
84
00:09:57,339 --> 00:10:03,800
be in principle.
So this experience tells us that it may be
85
00:10:03,800 --> 00:10:10,790
worthwhile looking at those random processes
or stochastic processes where this conditional
86
00:10:10,790 --> 00:10:16,890
end time probability is not dependent on the
earlier variables other than the one immediately
87
00:10:16,890 --> 00:10:26,790
preceding here. So if this is equal to P n,
now itís no longer P n but itís P j n, t
88
00:10:26,790 --> 00:10:39,390
n, j n minus 1, t n minus 1 and itís just
a two-time probability; so itís P 2. If this
89
00:10:39,390 --> 00:10:48,621
is equal to this quantity here for all n so
P 3, P 4, P 5 etc., it doesnít matter; every
90
00:10:48,621 --> 00:10:53,010
one of those things gets truncated to just
this here.
91
00:10:53,010 --> 00:11:15,940
If that happens then itís called a Markov
process. So again to repeat, a Markov process,
92
00:11:15,940 --> 00:11:20,360
it says nothing about the form of the probability
distributions, it doesnít say anything about
93
00:11:20,360 --> 00:11:24,699
whether itís a Gaussian or whatever, those
things come later. It says something about
94
00:11:24,699 --> 00:11:30,610
the level of memory in the process. Sometimes
there are cases where youíd like to have
95
00:11:30,610 --> 00:11:34,810
this dependent on the preceding 2 instance
of time and then itís called a two-step Markov
96
00:11:34,810 --> 00:11:37,470
and so on.
But Iím not going to get into that now. This
97
00:11:37,470 --> 00:11:44,570
is our straightforward definition of what
a Markov process is, okay; doesnít always
98
00:11:44,570 --> 00:11:50,370
have to happen. But it turns out that if you
model physical systems appropriately with
99
00:11:50,370 --> 00:11:54,949
the right number of variables, almost always
you end up with a Markov process. Notable,
100
00:11:54,949 --> 00:11:59,370
there are notable exceptions. Weíll talk
about a few of them. But the fact is that
101
00:11:59,370 --> 00:12:04,949
in most cases experience tells you how to
model a random process and in general the
102
00:12:04,949 --> 00:12:09,880
most common one that you use always is a Markov
process okay.
103
00:12:09,880 --> 00:12:14,440
Now exactly as in the vector example I gave
of a particle moving in space, it might so
104
00:12:14,440 --> 00:12:20,170
happen that the random variable is not a single
random variable but a set of random variables,
105
00:12:20,170 --> 00:12:25,130
couple random variables. Then it would be
a vector process of some kind maybe and then
106
00:12:25,130 --> 00:12:29,529
itís a Markov process still in terms of memory
but there wonít be a single index here but
107
00:12:29,529 --> 00:12:34,230
you need now several labels here for all the
variables.
108
00:12:34,230 --> 00:12:38,899
So thatís a possibility we keep in mind okay
and thatís a matter of notation which we
109
00:12:38,899 --> 00:12:45,630
can sort out if the occasion arises but this
is what I mean by a Markov process, this thing
110
00:12:45,630 --> 00:12:51,170
here. A similar thing for continuous processes,
instead of probabilities the same thing is
111
00:12:51,170 --> 00:12:56,769
true for densities okay and then Iíll call
it a conditional density in this case. But
112
00:12:56,769 --> 00:13:01,500
itís a two-time conditional density here.
As soon as you have this, you immediately
113
00:13:01,500 --> 00:13:05,680
see that this joint probability simplifies
enormously.
114
00:13:05,680 --> 00:13:23,060
So if you make the Markov assumption, this
becomes equal for a Markov process to a product
115
00:13:23,060 --> 00:13:40,300
of P P 2 of j r plus 1, t r plus 1 given j
r and t r and is a product from r equal to
116
00:13:40,300 --> 00:13:54,920
1 to n minus 1 out here so the last one is
this guy here multiplied by a P 1 of j 1 t
117
00:13:54,920 --> 00:14:10,510
1. So it at once simplifies okay into a product
of two-time probabilities multiplied by a
118
00:14:10,510 --> 00:14:20,050
one-time probability P 1 okay. So the problem
now reduces to specifying these 2 quantities
119
00:14:20,050 --> 00:14:25,279
and once you do that then we have all information
we need for this infinite hierarchy of probabilities
120
00:14:25,279 --> 00:14:28,500
okay.
So itís a great simplifying assumption, the
121
00:14:28,500 --> 00:14:35,139
Markov assumption is a very very uh it immediately
changes the complexion of the whole problem
122
00:14:35,139 --> 00:14:41,449
and makes it a much more tractable problem
to handle okay. As you will see this itself
123
00:14:41,449 --> 00:14:47,380
includes in it enormous amounts of complexity
but it still makes the problem quite tractable.
124
00:14:47,380 --> 00:14:54,029
So weíll focus on such uh cases here. Weíll
look at many examples of Markov processes.
125
00:14:54,029 --> 00:15:01,600
Thereís another further simplification that
can happen and that has to do with the fact
126
00:15:01,600 --> 00:15:07,139
that the process that we are talking about
may not change statistically speaking as time
127
00:15:07,139 --> 00:15:13,560
progresses. In other words it could be exactly
the same process statistically no statistical
128
00:15:13,560 --> 00:15:19,399
properties change as a function of time. In
other words the randomness is not ageing in
129
00:15:19,399 --> 00:15:23,509
some sense. Thereís no systematic drift or
anything like that.
130
00:15:23,509 --> 00:15:29,089
If that happens that would be the analog of
an autonomous dynamical system where you donít
131
00:15:29,089 --> 00:15:35,200
have explicit time dependence in the way in
the dynamical variables evolve in the dynamical
132
00:15:35,200 --> 00:15:40,970
rules. They will satisfy some kind of differential
equations but then those differential equations
133
00:15:40,970 --> 00:15:46,930
donít explicitly involve the time okay. so
the analog of that here would be a process
134
00:15:46,930 --> 00:15:53,500
where the origin of time doesnít matter and
therefore this quantity here is a function
135
00:15:53,500 --> 00:16:00,440
only of the elapsed time t r plus 1 minus
t sub r okay. And what would that imply and
136
00:16:00,440 --> 00:16:10,579
thatís called a stationary random process.
So stationarity implies statistical properties
137
00:16:10,579 --> 00:16:24,370
donít change with time at all. So it implies
that P 2 of say k, t, give j at time t prime
138
00:16:24,370 --> 00:16:37,570
this quantity is a function of t minus t prime
and not of t and t prime separately okay.
139
00:16:37,570 --> 00:16:48,000
So you could write this as equal to P 2 of
k, t minus t prime j, 0. In other words I
140
00:16:48,000 --> 00:16:55,870
can shift the origin of time and nothing happens.
The probability donít change okay and very
141
00:16:55,870 --> 00:17:05,860
often Iím going to make life easier and write
this as P 2 of k, t, j where k and j are state
142
00:17:05,860 --> 00:17:16,089
labels or they stand for sample space elements.
Iím going to use this kind of notation all
143
00:17:16,089 --> 00:17:24,750
the time. This is t minus t prime uh j. I
dropped the 0 here. Itís understood that
144
00:17:24,750 --> 00:17:35,710
itís a function of difference of time arguments
here. What would it imply also for this quantity
145
00:17:35,710 --> 00:17:46,080
P 1 of j, t. This should be independent of
time. So all time dependents disappears in
146
00:17:46,080 --> 00:17:58,640
the one-time probability. So this is equal
to P 1 of j okay. No t dependence at all and
147
00:17:58,640 --> 00:18:04,990
that together with the Markov assumption here.
So for a stationary Markov process, for a
148
00:18:04,990 --> 00:18:20,640
stationary Markov process, this thing here
implies this is equal to a product from r
149
00:18:20,640 --> 00:18:51,220
equal to 1 to n minus 1 P 2 of j r plus 1
t r plus 1 minus t r j r. So we now just have
150
00:18:51,220 --> 00:18:59,650
a two-time probability to handle, a one-time
probability, time dependent probability, conditional
151
00:18:59,650 --> 00:19:08,650
probability to handle and an absolute probability
here okay. So a stationary Markov process
152
00:19:08,650 --> 00:19:14,950
is completely defined if you tell me this
quantity as a function of t minus t prime
153
00:19:14,950 --> 00:19:22,400
and this quantity out here okay.
Now all the models we talk about are going
154
00:19:22,400 --> 00:19:28,250
to specify these 2 quantities okay and if
there is no confusion, once we reach that
155
00:19:28,250 --> 00:19:33,929
stage Iíll often drop this 1 and a 2. The
moment thereís a time argument and there
156
00:19:33,929 --> 00:19:37,919
are these arguments with this bar I know I
am talking about a conditional density or
157
00:19:37,919 --> 00:19:46,320
probability and this for a probability itself
in this case. You could put in one more bit
158
00:19:46,320 --> 00:19:53,000
of physical uh assumption or a physical input
and thatís the following although this is
159
00:19:53,000 --> 00:19:56,390
not absolutely essential. In general we wonít
need it.
160
00:19:56,390 --> 00:20:06,390
But it will so turn out that you could ask
what happens to this quantity k, t, j as t
161
00:20:06,390 --> 00:20:15,130
tends to infinity okay. Notice Iíve dropped
this 2 here. Itís supposed to be there but
162
00:20:15,130 --> 00:20:21,100
I just dropped it for convenience. What would
you expect would happen to this quantity,
163
00:20:21,100 --> 00:20:27,960
this probability, conditional probability
as t tends to infinity. Well you might expect,
164
00:20:27,960 --> 00:20:32,429
intuitively you might expect that this quantity
should tend to something which depends on
165
00:20:32,429 --> 00:20:37,779
k but shouldnít depend on the initial condition
j, initial state j.
166
00:20:37,779 --> 00:20:43,890
As t becomes very long, memory is lost completely.
So I would kind of expect in the same way
167
00:20:43,890 --> 00:20:49,720
I expect autocorrelations to die down and
so on and so forth. Iíd expect that this
168
00:20:49,720 --> 00:20:58,470
tends to something which depends only on k
n therefore itís just the probability k okay
169
00:20:58,470 --> 00:21:06,980
with a 1 here but this needs to be established.
We need to make sure this really happens okay.
170
00:21:06,980 --> 00:21:11,700
On the other hand if the system has in the
common example of some system in thermodynamic
171
00:21:11,700 --> 00:21:16,710
equilibrium for example Iíd expect the statistical
properties arenít changing.
172
00:21:16,710 --> 00:21:21,890
Then if I choose a particular initial condition
and ask what happens conditioned upon that
173
00:21:21,890 --> 00:21:29,190
initial state if some variable changes with
time and I find some expression for the probability
174
00:21:29,190 --> 00:21:34,850
associated with it I could ask what happens
if time elapses, a long time elapses and the
175
00:21:34,850 --> 00:21:39,960
system, nothing is happening to it statistically,
Iíd expect it would this relation to hold
176
00:21:39,960 --> 00:21:42,830
good.
For instance if this was the velocity of a
177
00:21:42,830 --> 00:21:50,789
molecule and I start with a particular molecule
whose velocity is some given number I specify
178
00:21:50,789 --> 00:21:55,470
and then I let it lose among all the other
molecules and I ask whatís the probability
179
00:21:55,470 --> 00:22:00,990
or probability density that it has a certain
given velocity a long long time after I started
180
00:22:00,990 --> 00:22:07,940
Iíd expect it to just attain the equilibrium
density all over again okay. So Iíd expect
181
00:22:07,940 --> 00:22:14,110
it would tend to the Maxwellian distribution
on this side independent of what initial velocity
182
00:22:14,110 --> 00:22:18,410
I started with okay.
Well, thatís a physical expectation. If the
183
00:22:18,410 --> 00:22:22,809
system has enough junk in it and there are
enough influences which are completely independent
184
00:22:22,809 --> 00:22:28,690
of each other randomizing the whole process
then I would expect this to happen. In technical
185
00:22:28,690 --> 00:22:34,330
terms one says that if dynamical system has
a sufficient degree or whatís called mixing
186
00:22:34,330 --> 00:22:40,510
this will be true in general. So we will take
a look at examples when this happens.
187
00:22:40,510 --> 00:22:46,130
But remember that weíve already assumed that
itís a stationary process okay. If itís
188
00:22:46,130 --> 00:22:50,090
not stationary then of course this is even
this is not true thereís a time argument
189
00:22:50,090 --> 00:22:57,020
sitting here and it could well be that the
initial state is remembered okay. So this
190
00:22:57,020 --> 00:23:01,529
poses an incredible amount of simplification
once you have this. The moment you have a
191
00:23:01,529 --> 00:23:07,390
property like this, it means the entire process
is determined completely by this one-time
192
00:23:07,390 --> 00:23:14,690
conditional probability because from that
you get this the 0 time thing and you get
193
00:23:14,690 --> 00:23:20,740
all the other joint probabilities as well
through this formula.
194
00:23:20,740 --> 00:23:28,380
So a stationary Markov process with this property
here of mixing actually is determined completely
195
00:23:28,380 --> 00:23:35,680
by determining this probability dense this
probability conditional probability and then
196
00:23:35,680 --> 00:23:42,669
it reduces to a question of writing down equations
for this probability in general okay. So the
197
00:23:42,669 --> 00:23:46,309
processes we will look at, a large number
of them will fall into this category and we
198
00:23:46,309 --> 00:23:50,410
will write down specific equations for this
quantity.
199
00:23:50,410 --> 00:23:55,230
If you think a little bit you realize that
any modeling that you do for physical systems
200
00:23:55,230 --> 00:24:01,080
of probabilities would be always to write
down equations for conditional probabilities
201
00:24:01,080 --> 00:24:06,691
or probability densities. You need to know
given something then whatís the probability
202
00:24:06,691 --> 00:24:11,460
of something else happening and so on. You
never say something about absolute probabilities
203
00:24:11,460 --> 00:24:19,029
itself. Itís always conditional probabilities.
So conveniently for us joint probabilities
204
00:24:19,029 --> 00:24:24,700
reduced to conditional probabilities okay.
So all we need to do is to model these conditional
205
00:24:24,700 --> 00:24:31,309
probabilities appropriately and then we are
done okay. So itís important to distinguish
206
00:24:31,309 --> 00:24:38,029
between several assumptions here. First the
Markov assumption has reduced things to one-step
207
00:24:38,029 --> 00:24:44,110
memory if you like and then the stationarity
assumption reduces time arguments in this
208
00:24:44,110 --> 00:24:47,950
fashion here.
And itís important to remember that it does
209
00:24:47,950 --> 00:24:54,370
so for an arbitrary n. No matter how many
time arguments you have out here this conditional
210
00:24:54,370 --> 00:24:59,650
probability depends only on the preceding
instant of time okay. That instant is not
211
00:24:59,650 --> 00:25:07,090
specify this arbitrary, some earlier instant
of time and thatís it. Thatís all you need
212
00:25:07,090 --> 00:25:10,890
and then if it is true for every such earlier
instant of time you have a Markov process
213
00:25:10,890 --> 00:25:14,940
okay.
So in a sense this process is kind of renewing
214
00:25:14,940 --> 00:25:19,470
itself at any instant of time itís forgotten
the past and now it looks at what it does
215
00:25:19,470 --> 00:25:25,409
next in the future. So itís not surprising
that there are going to be renewal equations
216
00:25:25,409 --> 00:25:32,059
and so on associated with this sort of process
okay. For instance you could ask can I write
217
00:25:32,059 --> 00:25:42,279
down an equation for this p. And now letís
use symbols like j, k, l etc., because we
218
00:25:42,279 --> 00:25:48,960
are not going to deal with these n time probabilities
anymore but essentially just one-step memory.
219
00:25:48,960 --> 00:26:00,039
So letís simplify notation and ask what is
this likely to be k, t, j with 0 on this side
220
00:26:00,039 --> 00:26:12,070
okay. Now clearly, if itís a Markov process,
which has this property of renewing itself
221
00:26:12,070 --> 00:26:22,170
all the time uh letís look at a case where
j, k, etc., can take values 1, 2, up to some
222
00:26:22,170 --> 00:26:28,710
n. In other words, the sample space is discrete
and you have capital N of these possible values.
223
00:26:28,710 --> 00:26:34,539
We could of course subsequently look at cases
where n tends to infinity or becomes continuous
224
00:26:34,539 --> 00:26:41,950
and so on okay.
And this is equal to on this side the probability
225
00:26:41,950 --> 00:26:54,800
that you started with j
and reached some intermediate state l at some
226
00:26:54,800 --> 00:27:06,250
intermediate time t prime. So on the time
axis here is 0, and here is t prime, here
227
00:27:06,250 --> 00:27:21,610
is t and in the remaining time you move from
l to k t minus so letís write it out properly.
228
00:27:21,610 --> 00:27:31,220
P of t prime, we started with j but reached
an intermediate state l and then the probability
229
00:27:31,220 --> 00:27:39,700
that you went from that l in the remaining
time t minus t prime to the state k.
230
00:27:39,700 --> 00:27:44,470
But you could have done so through a variety
of paths, all kinds of intermediate states
231
00:27:44,470 --> 00:27:57,090
l would have been allowed. So you have here
a summation l equal to 1 to N in this fashion
232
00:27:57,090 --> 00:28:03,080
okay. So for a stationary Markov process,
this tells you because itís not dependent
233
00:28:03,080 --> 00:28:09,080
on any earlier instance the memory is a one-step
memory it says to go the probability of going
234
00:28:09,080 --> 00:28:14,650
from an initial state j to a final state k
in time t is the probability of going from
235
00:28:14,650 --> 00:28:21,510
j to l at some till some intermediate time
t prime and then in the remaining time going
236
00:28:21,510 --> 00:28:26,059
from l to the final state k, this desired
state k okay.
237
00:28:26,059 --> 00:28:31,540
And you must sum over all the intermediate
possibilities (()) (28:30) and thatís the
238
00:28:31,540 --> 00:28:55,660
summation over l out there okay. This is like
a chain equation. Itís got a technical name.
239
00:28:55,660 --> 00:29:12,250
Itís called the ChapmanñKolmogorov equation.
It should really be called the ChapmanñKolmogorov,
240
00:29:12,250 --> 00:29:19,350
Bachelier, Smoluchowski equation etc.; several
people uh were associated with this equation.
241
00:29:19,350 --> 00:29:24,880
But itís popularly called the ChapmanñKolmogorov
equation in this case okay.
242
00:29:24,880 --> 00:29:30,480
Now if these were continuous random variables
then youíd have to integrate over this state,
243
00:29:30,480 --> 00:29:35,460
the intermediate state l rather than sum over
it but thatís a matter of notation in this
244
00:29:35,460 --> 00:29:46,320
case okay. What do you, whatís the first
thing that strikes you about this equation?
245
00:29:46,320 --> 00:29:54,000
Well, first let me say that this is not restricted
to Markov processes. There are other processes
246
00:29:54,000 --> 00:29:59,049
which also obey the chain equation but Markov
processes obey it um. So itís not uniquely
247
00:29:59,049 --> 00:30:02,419
a property of Markov processes.
ìProfessor - student conversation startsî
248
00:30:02,419 --> 00:30:06,830
Ya. Pardon me. You are fixing t prime here.
We are not fixing t prime. So this is true
249
00:30:06,830 --> 00:30:18,510
for any t prime in 0, t. I think from each
n the t prime that we choose is the same right,
250
00:30:18,510 --> 00:30:27,110
when we make a sum. Yes, yes of course. Yes,
certainly. You must sum over all intermediate
251
00:30:27,110 --> 00:30:28,870
states at some intermediate instant of time.
.ìProfessor - student conversation endsî.
252
00:30:28,870 --> 00:30:33,010
So if you draw a picture, hereís the initial
state, hereís the final state. Here are all
253
00:30:33,010 --> 00:30:37,660
the possible intermediate states. We are going
from propagating from here to there, here
254
00:30:37,660 --> 00:30:45,159
to here in this fashion and thereís a time
slice here at this point at time t prime.
255
00:30:45,159 --> 00:30:51,850
So you are summing over all those possibilities
and adding the probability probabilities appropriately
256
00:30:51,850 --> 00:31:05,630
to get this right here okay. So what is it
that strikes you about this equation immediately?
257
00:31:05,630 --> 00:31:12,639
As a mathematical equation, this is not so
tractable as it looks because itís a non-linear
258
00:31:12,639 --> 00:31:19,809
equation. This equation here is not linear
in this P okay and therefore itís a fairly
259
00:31:19,809 --> 00:31:25,020
complicated equation. Itís not immediately
obvious what the solution will be okay.
260
00:31:25,020 --> 00:31:30,269
ìProfessor - student conversation startsî
Yes. In the first problem D, is it t plus
261
00:31:30,269 --> 00:31:39,330
t prime or t minus t prime? Well, the time
interval left here is t minus t prime. So
262
00:31:39,330 --> 00:31:43,399
thatís all the time available for the system
to go from the intermediate state to the final
263
00:31:43,399 --> 00:31:50,360
state. So itís this interval multiplied by
that interval. Also this equation hold for
264
00:31:50,360 --> 00:31:55,390
one stationary processes but they need not
be Markov, is that right. ìProfessor - student
265
00:31:55,390 --> 00:31:59,429
conversation endsî.
Well this chain equation yes. They are stationary
266
00:31:59,429 --> 00:32:04,299
processes, but thereís a wider class of processes
called renewal processes for which this equation
267
00:32:04,299 --> 00:32:08,270
would also hold good. Itís called. Itís
an example of whatís called a renewal equation
268
00:32:08,270 --> 00:32:14,289
right. But we are concerned here with Markov
processes okay. So I am not going to get into
269
00:32:14,289 --> 00:32:18,779
the technicality of looking at processes other
than that. If time permits we will talk about
270
00:32:18,779 --> 00:32:25,600
such renewal processes later on.
When we do Poisson processes and so on then
271
00:32:25,600 --> 00:32:33,159
Iíll mention what happens if you look at
a more general case here. So this nonlinearity
272
00:32:33,159 --> 00:32:39,049
makes it intractable in some sense and if
itís a continuous variable then for the probability
273
00:32:39,049 --> 00:32:43,730
densities you have an integral equation because
there is an integral on the right hand side
274
00:32:43,730 --> 00:32:49,179
which is nonlinear and therefore fairly hard
to solve. It would be convenient to write
275
00:32:49,179 --> 00:32:55,920
this in terms of a linear equation for this
P. For this purpose one introduces the following
276
00:32:55,920 --> 00:33:02,590
idea. Doesnít always work, but when it does
this is what happens.
277
00:33:02,590 --> 00:33:13,320
So one introduces the idea of a transition
rate
278
00:33:13,320 --> 00:33:19,120
and the idea is the following. Consider this
probability here for extremely small values
279
00:33:19,120 --> 00:33:26,070
of t, very close to 0 or this probability
uh for extremely small values of t minus t
280
00:33:26,070 --> 00:33:38,700
prime close to 0. So if you look at P of k
delta t, j over here, this is state j at time
281
00:33:38,700 --> 00:33:45,750
0 and this is state k at an infinitesimal
time delta t. What would you expect this to
282
00:33:45,750 --> 00:33:49,659
be proportional to?
If delta t goes to 0 Iíd expect that it is
283
00:33:49,659 --> 00:33:53,880
going to remain at the initial state. Iíd
expect a delta function there right. But if
284
00:33:53,880 --> 00:34:00,360
delta t is infinitesimal then I would expect
that this quantity for all k not equal to
285
00:34:00,360 --> 00:34:09,960
j, for all k not equal to j this must be of
the form some delta t multiplied by w k, j
286
00:34:09,960 --> 00:34:18,040
where this quantity is a transition probability
per unit time that the system jumps from the
287
00:34:18,040 --> 00:34:24,090
state j to the state k okay.
Iíd expect the answer to be proportional
288
00:34:24,090 --> 00:34:29,430
to delta t and a constant of proportionality
is a per unit time. This is a probability.
289
00:34:29,430 --> 00:34:46,600
So this must have dimensions 1 over time okay
and this is a transition probability or rate
290
00:34:46,600 --> 00:35:05,220
to jump. No guarantee that this exists. No
guarantee at all this exists okay. But if
291
00:35:05,220 --> 00:35:11,480
it does then it has the physical connotation
of a transition rate because when you multiply
292
00:35:11,480 --> 00:35:18,950
it by the time interval delta t you get the
actual probability, conditional probability
293
00:35:18,950 --> 00:35:25,250
okay.
The same thing could well be true for even
294
00:35:25,250 --> 00:35:31,420
a non-stationary process. What would happen
in that case if I had a a t plus delta t here.
295
00:35:31,420 --> 00:35:40,350
So if I have a non-stationary process of the
form k, t plus delta t, j at time t you could
296
00:35:40,350 --> 00:35:46,550
still assume that if delta t is sufficiently
small and k is not equal to j, this should
297
00:35:46,550 --> 00:35:52,880
be proportional to delta t multiplied by a
transition probability but that transition
298
00:35:52,880 --> 00:35:58,520
rate would depend on time right.
So the generalization of this idea of a transition
299
00:35:58,520 --> 00:36:05,120
rate to a non-stationary process is fairly
straightforward. This would again become equal
300
00:36:05,120 --> 00:36:17,550
to w of k delta t uh k, t well k, j and then
a t here to show that the transition rate
301
00:36:17,550 --> 00:36:22,210
itself could change as a function of time
because the statistical properties is changing
302
00:36:22,210 --> 00:36:27,970
with time. So the great advantage of having
made the stationarity assumption is that the
303
00:36:27,970 --> 00:36:34,202
transition rates are independent of time okay.
So this is a very physical thing that we are
304
00:36:34,202 --> 00:36:43,100
talking about. If I make that assumption,
then whatís the next step. Whatís going
305
00:36:43,100 --> 00:36:48,700
to happen here? Well the obvious thing to
do is to say letís make t minus t prime delta
306
00:36:48,700 --> 00:36:59,960
t um and then for this quantity put that in,
put that expression in and thereíd be answers,
307
00:36:59,960 --> 00:37:03,660
thereíd be things proportional to delta t.
the obvious thing to do is to subtract from
308
00:37:03,660 --> 00:37:13,190
this k of t minus delta t at time j from both
sides and then divide out through delta t
309
00:37:13,190 --> 00:37:18,890
and convert it to a differential equation.
So this is what one would do immediately right.
310
00:37:18,890 --> 00:37:28,020
So I leave that to you as an exercise and
itís not hard to show that with this assumption
311
00:37:28,020 --> 00:37:42,090
this equation translates to d over dt of P
of k, t, j becomes equal to summation l equal
312
00:37:42,090 --> 00:38:04,630
to 1 to N and now we got to be a little careful
um. P of l, t, j; w of k, l; l not equal to
313
00:38:04,630 --> 00:38:19,630
k this side minus
314
00:38:19,630 --> 00:38:29,310
because you subtracted this quantity you end
up with a minus sign here and now letís look
315
00:38:29,310 --> 00:38:33,490
at this equation carefully.
So the trick is to subtract from this both
316
00:38:33,490 --> 00:38:41,330
sides of this equation, subtract the following
quantity minus P uh first set, set t minus
317
00:38:41,330 --> 00:38:50,650
t prime equal to delta t and subtract P of
k, t minus delta t which is t prime by the
318
00:38:50,650 --> 00:38:57,260
way from both sides and put that in and maneuver.
ìProfessor - student conversation startsî
319
00:38:57,260 --> 00:39:03,910
Sir. Ya. When we are considering the product
of probability, yes, ChapmanñKolmogorov equation,
320
00:39:03,910 --> 00:39:10,960
why are we not considering all possible times?
Ah, itís not necessary. Any time will be
321
00:39:10,960 --> 00:39:13,460
true. Okay. ìProfessor - student conversation
endsî.
322
00:39:13,460 --> 00:39:19,270
So look at it physically like the picture
I drew. You want to start at t equal to 0
323
00:39:19,270 --> 00:39:25,910
at this point and at time t you want to reach
this point at time t you are starting in this
324
00:39:25,910 --> 00:39:33,410
state ending in this state and you have many
routes to go through with different probabilities
325
00:39:33,410 --> 00:39:38,560
and now the statement is the probability to
go from here to there, the total probability
326
00:39:38,560 --> 00:39:43,870
is the sum of all these individual probabilities
such that you go from here to here at some
327
00:39:43,870 --> 00:39:49,190
time t prime and then you traverse the rest
of the way and it doesnít matter where you
328
00:39:49,190 --> 00:39:56,460
take the time slice okay.
These quantities are mutually exclusive. They
329
00:39:56,460 --> 00:40:01,990
are different intermediate states which is
the reason you sum over it okay. So when you
330
00:40:01,990 --> 00:40:07,940
sum over this these probabilities, whatís
the meaning of the word and it means if you
331
00:40:07,940 --> 00:40:13,530
have several possibilities, you sum over their
probabilities; this and this and this and
332
00:40:13,530 --> 00:40:18,590
this. If you have or then of course itís
a different story. Sorry, if itís and you
333
00:40:18,590 --> 00:40:23,210
multiply the probabilities which is what I
have done but if you have or you sum over
334
00:40:23,210 --> 00:40:26,460
them and thatís what I have done because
they are mutually exclusive.
335
00:40:26,460 --> 00:40:31,940
This is different from this is different from
this. So itís only at the same instant of
336
00:40:31,940 --> 00:40:38,320
time that these are all mutually exclusive
possibilities okay. So itís worth pointing
337
00:40:38,320 --> 00:40:45,220
this out. Itís not an equation in time. Itís
not an integral in time. There are such renewal
338
00:40:45,220 --> 00:40:49,710
equations. Weíll talk about them subsequently.
But this is a summation over intermediate
339
00:40:49,710 --> 00:40:56,170
states here at any given instant of time in
between okay and therefore I can choose that
340
00:40:56,170 --> 00:41:01,260
interval as I inter intermediate time as I
please. I choose this to be infinite decimal.
341
00:41:01,260 --> 00:41:05,300
ìProfessor - student conversation startsî
Shouldnít it be a double interval, double
342
00:41:05,300 --> 00:41:10,780
summation. No, no that will be over counting,
that will be over counting. This is the physical
343
00:41:10,780 --> 00:41:18,641
way to look at it. This is over counting because
these parts could intersect and so on. So
344
00:41:18,641 --> 00:41:24,120
you definitely have to do this at one instant
of time. So you add over mutually excluded
345
00:41:24,120 --> 00:41:27,660
events okay. ìProfessor - student conversation
endsî.
346
00:41:27,660 --> 00:41:31,600
Now letís look at this equation a little
bit. So this derivation is something Iím
347
00:41:31,600 --> 00:41:35,770
going to leave to you, its straightforward
enough. But whatís the interpretation of
348
00:41:35,770 --> 00:41:42,760
this equation. It says the conditional probability
from to go from j to k, the rate of change
349
00:41:42,760 --> 00:41:50,990
of this probability has 2 contributions. One
is a gain term out here where you go from
350
00:41:50,990 --> 00:41:57,070
j to l an intermediate state multiplied by
the probability per unit time that you go
351
00:41:57,070 --> 00:42:07,340
from l to the final state desired k.
Thatís the gain term
352
00:42:07,340 --> 00:42:14,840
and this is the lost term exactly like in
the rate equation because youíve gone from
353
00:42:14,840 --> 00:42:21,260
j to k the state that you want to but then
you jump out of it with this transition rate
354
00:42:21,260 --> 00:42:29,120
with this probability here okay. So the input
into this is first you do this then you subtract
355
00:42:29,120 --> 00:42:36,280
this and then you use conservation of probability.
You use the fact that if you start with P
356
00:42:36,280 --> 00:42:52,831
k, t, j and you sum over all k out here, all
k now from 1 to N, what should you get? You
357
00:42:52,831 --> 00:42:56,920
should get 1 because you start with a state
and the system has not disappeared, itís
358
00:42:56,920 --> 00:43:01,820
in one of the states available to it including
the initial state itself, so when you include
359
00:43:01,820 --> 00:43:10,430
that the sum should be equal to 1 for all
t okay. This is equal to 1 for all t greater
360
00:43:10,430 --> 00:43:15,000
than equal to 0. Thatís input, thatís put
in.
361
00:43:15,000 --> 00:43:23,920
You need to put that 1 in and thatís how
you get this minus term appropriately okay.
362
00:43:23,920 --> 00:43:29,230
So the interpretation is quite clear. The
rate of change of this probability this increases
363
00:43:29,230 --> 00:43:35,320
when you have gain and it depletes when you
have a loss and this is the precise equation
364
00:43:35,320 --> 00:43:42,001
for it okay. This is called the master equation.
This word is used in many many context, but
365
00:43:42,001 --> 00:43:56,340
this is the most common context okay. Now
whatís the great advantage of this master
366
00:43:56,340 --> 00:44:00,950
equation.
Itís a linear equation. The price you pay
367
00:44:00,950 --> 00:44:06,970
for it of course is that it becomes a differential
equation here in time, a first order differential
368
00:44:06,970 --> 00:44:16,000
equation okay. But itís a linear equation.
The matter is not so simple even now because
369
00:44:16,000 --> 00:44:21,890
in general if itís a continuous variable
then these would be probability, conditional
370
00:44:21,890 --> 00:44:27,350
probability densities and this would be an
integral. So then you have a integrodifferential
371
00:44:27,350 --> 00:44:31,970
equation, linear but an integrodifferential
equation and thatís not so simple to solve
372
00:44:31,970 --> 00:44:36,070
either okay. In fact we are going to look
at that.
373
00:44:36,070 --> 00:44:39,520
What will happen in that case is that this
side will get converted, there had been an
374
00:44:39,520 --> 00:44:44,450
integral here. We can get rid of that integral
but we will get get it converted to a partial
375
00:44:44,450 --> 00:44:49,270
differential equation in the variable itself
but it will unfortunately be an infinite order
376
00:44:49,270 --> 00:44:55,430
partial differential equation in general okay
at least formally and then we look at further
377
00:44:55,430 --> 00:45:00,440
cases, sub cases etc. But at the moment we
are talking about discrete variables with
378
00:45:00,440 --> 00:45:05,760
discrete sample spaces.
Then this is what you have as the master equation
379
00:45:05,760 --> 00:45:13,330
okay. Now when you do chemical reactions you
write down rate equations for the concentrations
380
00:45:13,330 --> 00:45:17,770
of various species. You have precisely the
same sort of equation, set of equations. You
381
00:45:17,770 --> 00:45:26,010
have things which are gain terms and loss
terms of this kind. So this is often called
382
00:45:26,010 --> 00:45:29,740
a rate equation or something like that but
in this context these are equations for the
383
00:45:29,740 --> 00:45:36,160
conditional probability, probability itself
okay. So the next task is to solve this.
384
00:45:36,160 --> 00:45:39,330
By the way whatís the initial condition?
Itís a first order differential equation
385
00:45:39,330 --> 00:45:45,180
and time, so we need an initial condition
to solve it.
386
00:45:45,180 --> 00:45:59,090
And of course P of k, 0, j equal to delta
k j. Given that you are starting in the state
387
00:45:59,090 --> 00:46:07,260
j at t equal to 0, of course at t equal to
0 this becomes delta k j okay. Now in a slightly
388
00:46:07,260 --> 00:46:12,451
more general context you could look at this;
j is sitting here as a dummy variable, as
389
00:46:12,451 --> 00:46:20,130
a sort of spectator throughout. You could
write such an equation for the probabilities
390
00:46:20,130 --> 00:46:27,190
themselves without putting this j in and then
specify an initial distribution of jís.
391
00:46:27,190 --> 00:46:31,320
Then the initial condition would not be a
delta function but some appropriate distribution.
392
00:46:31,320 --> 00:46:39,790
Weíll look at those cases as well. But this
is the task one has to now attack, this quantity
393
00:46:39,790 --> 00:46:47,510
here. Now letís see what we can do about
this. The first thing to do is to notice that
394
00:46:47,510 --> 00:46:54,950
if j and k run from 1 to N, these indices
run from 1 to N then this equation has the
395
00:46:54,950 --> 00:47:09,070
following structure.
Let me write P of 1, t, j uh; well let me
396
00:47:09,070 --> 00:47:21,860
let me write this, let let write this as a
column vector P uh of t uh given j so let
397
00:47:21,860 --> 00:47:27,980
me suppress this j index for a moment because
itís a spectator sitting out here. For every
398
00:47:27,980 --> 00:47:39,740
j this is true, for each and every j. You
have such such a master equation. So let me
399
00:47:39,740 --> 00:47:46,390
suppress that for a moment and write this
P of t to be a column vector which is P of
400
00:47:46,390 --> 00:47:57,720
1, t; P of 2, t up to P of N, t with jís,
understood, on the right hand side after the
401
00:47:57,720 --> 00:48:00,620
bar.
If I have defined a column vector of that
402
00:48:00,620 --> 00:48:14,840
kind then this equation here takes the form
d over dt P of t equal to some W P of t. W
403
00:48:14,840 --> 00:48:39,080
is a matrix of some kind okay and what are
the elements of W? W j k is just w j k. This
404
00:48:39,080 --> 00:48:52,791
is for j not equal to k. On the other hand
the diagonal elements of this matrix, remember
405
00:48:52,791 --> 00:49:00,630
here itís l that is getting summed over.
So in that sense this term comes out into
406
00:49:00,630 --> 00:49:06,950
the summation and this fellow multiplies the
sum over l okay.
407
00:49:06,950 --> 00:49:16,361
So itís immediately clear the moments thought
that W kk equal to minus the sum of all the
408
00:49:16,361 --> 00:49:29,460
other elements in that column okay. So this
is equal to minus summation W l k uh l not
409
00:49:29,460 --> 00:49:49,360
equal to l equal to 1 to N, l not equal to
k. So you can rewrite this this set of linear
410
00:49:49,360 --> 00:49:56,610
equations in the form of a matrix equation
with a certain column vector P which determines
411
00:49:56,610 --> 00:50:00,110
all the probabilities that you want, conditional
probabilities.
412
00:50:00,110 --> 00:50:06,190
And then multiplied by uh on the right hand
side you have this square matrix N by N matrix
413
00:50:06,190 --> 00:50:12,470
acting on this column vector where this matrix
has off diagonal elements which are all the
414
00:50:12,470 --> 00:50:18,350
transition probabilities and the diagonal
elements are minus the sum of the rest of
415
00:50:18,350 --> 00:50:29,430
the elements. Thatís a very special kind
of matrix because it says the sum of the elements
416
00:50:29,430 --> 00:50:36,370
of every column of this matrix is 0 okay.
Now what does that tell us immediately about
417
00:50:36,370 --> 00:50:45,480
the eigenvalues of this matrix? Well the determinant
is 0 because the sum of each column is 0 so
418
00:50:45,480 --> 00:50:52,310
the determinant is 0 right? The moment the
determinant is 0, you know that 0 is a eigenvalue
419
00:50:52,310 --> 00:51:01,060
of this matrix right. So this means that this
equation in general, this equation would have
420
00:51:01,060 --> 00:51:09,550
an eigenvector. You expect it to have a nontrivial
eigenvector such that W on P is 0 which would
421
00:51:09,550 --> 00:51:15,800
imply that d over dt of that P is 0 which
would imply that this is a stationary distribution.
422
00:51:15,800 --> 00:51:22,710
It doesnít depend on time at all okay. So
this is buried in it, this whole thing is
423
00:51:22,710 --> 00:51:27,200
buried in it and we will see what happens.
Of course there are other eigenvalues as well.
424
00:51:27,200 --> 00:51:34,750
Whatís the formal solution to this equation?
I have an equation of this kind, whatís the
425
00:51:34,750 --> 00:51:40,160
formal solution. Well, it depends on the initial
condition right? Now what the initial condition
426
00:51:40,160 --> 00:51:46,291
be? We know that at t equal to 0 this quantity
here at t equal to 0 is a delta k j.
427
00:51:46,291 --> 00:51:53,250
Iíve written this equation here. This is
the k index and I have suppressed the j index.
428
00:51:53,250 --> 00:52:00,980
So at t equal to 0, whatís P of 0? Itís
going to have 0ís everywhere except at the
429
00:52:00,980 --> 00:52:10,340
j th element where youíd have 1. So you got
to solve this equation with the initial condition
430
00:52:10,340 --> 00:52:19,450
that P of 0 equal to 0, 0 etc., till you hit
a 1 and then 0ís again and this will be the
431
00:52:19,450 --> 00:52:37,400
j th, it will be in the j th row okay. Now
given that initial condition, whatís the
432
00:52:37,400 --> 00:52:43,770
formal solution to this equation; the exponential.
Because this W is independent of time and
433
00:52:43,770 --> 00:52:51,050
whatís the physical assumption that made
W independent of time, stationarity, stationarity.
434
00:52:51,050 --> 00:53:08,470
We assumed it was a stationary process. Otherwise
is not true okay. You still have the formidable
435
00:53:08,470 --> 00:53:14,570
task of exponentiating this matrix. But we
know in principle whatís going to happen.
436
00:53:14,570 --> 00:53:20,780
If this matrix has eigenvalues lambda 1, lambda
2 to lambda n then in general generically
437
00:53:20,780 --> 00:53:24,860
barring repeated eigenvalues and so on we
are going to have terms on the right hand
438
00:53:24,860 --> 00:53:29,700
side which go like e to the lambda 1 t, e
to the lambda 2 t and so on.
439
00:53:29,700 --> 00:53:35,110
So they are going to be exponentials of the
eigenvalue multiplied by time. ìProfessor
440
00:53:35,110 --> 00:53:41,790
- student conversation startsî So this implies
that the eigenvalues cannot be uh real and
441
00:53:41,790 --> 00:53:44,620
positive because if they are the probabilities
keep on multiplying. Yes, absolutely. ìProfessor
442
00:53:44,620 --> 00:53:46,990
- student conversation endsî. Absolutely.
This immediately tells you we know nothing
443
00:53:46,990 --> 00:53:49,920
about this matrix. At the moment we know nothing
about it.
444
00:53:49,920 --> 00:53:56,060
What we know is the following. We know that
these elements, these fellows are all positive
445
00:53:56,060 --> 00:54:01,310
or maybe 0. There could be some states where
thereís no transition directly possible from
446
00:54:01,310 --> 00:54:09,830
k to l. So this could be 0 right, but certainly
not negative. So we have a matrix whose elements
447
00:54:09,830 --> 00:54:17,700
are all real. All the off diagonal elements
are either positive or 0. No negative elements
448
00:54:17,700 --> 00:54:22,640
and all the diagonal elements are negative
because they are minus some positive numbers
449
00:54:22,640 --> 00:54:26,500
okay and the matrix is real, not necessarily
symmetric.
450
00:54:26,500 --> 00:54:34,970
Because there is nothing that says w j k must
be w j k nothing at all. So given that we
451
00:54:34,970 --> 00:54:39,610
still see from this physically weíd be very
surprised if you got an eigenvalue which has
452
00:54:39,610 --> 00:54:45,770
got a positive real part because immediately
it would imply that this probability is growing
453
00:54:45,770 --> 00:54:51,270
unboundedly with time. So you need to be sure
that the eigenvalues cannot have positive
454
00:54:51,270 --> 00:54:57,390
real parts. They could be complex but there
are current complex conjugate pairs.
455
00:54:57,390 --> 00:55:03,160
And once they do that what it would mean is
if you have an eigenvalue of the form lambda
456
00:55:03,160 --> 00:55:13,030
plus or minus i mu then this would go like
e to the lambda t cos or sin mu t. Thatís
457
00:55:13,030 --> 00:55:19,020
what the solution would look like but we must
be sure that this lambda is in fact negative.
458
00:55:19,020 --> 00:55:23,970
So weíd expect something like this e to the
power minus lambda t where this is positive
459
00:55:23,970 --> 00:55:30,050
and possibly oscillatory behaviour etc.
So this is what we should make sure we have
460
00:55:30,050 --> 00:55:38,010
and we should expect. Now weíd expect that
as t becomes infinite, Iíd expect the t dependence
461
00:55:38,010 --> 00:55:43,740
have to disappear and thinks to go to where.
Well, once I say that all the eigenvalues
462
00:55:43,740 --> 00:55:51,510
have negative real parts, all these fellows
go to 0 but we know that 0 has to be an eigenvalue
463
00:55:51,510 --> 00:55:57,110
of this matrix. Therefore thereíd be some
constant which is sitting there and the P
464
00:55:57,110 --> 00:56:01,780
of t will tend to that constant which will
be the stationary probability right.
465
00:56:01,780 --> 00:56:06,950
Now this is sort of formalized by a little
theorem in matrix analysis called Gershgorinís
466
00:56:06,950 --> 00:56:11,310
theorem. I am not sure if youíve heard of
this but let me explain what this theorem
467
00:56:11,310 --> 00:56:19,290
is because itís simple enough.
It says if you have a square matrix with various
468
00:56:19,290 --> 00:56:30,350
elements an N by N matrix then letís suppose
the general element of this matrix is a 11,
469
00:56:30,350 --> 00:56:38,070
a 12 bla bla bla etc., a NN then it says the
eigenvalues of this matrix, whatever be this
470
00:56:38,070 --> 00:56:45,860
matrix in the complex plane because eigenvalues
are in general complex are located in certain
471
00:56:45,860 --> 00:56:54,990
circles or discs and these discs are found
as follows. Take a 11 and mark it on the complex
472
00:56:54,990 --> 00:56:58,030
plane.
It could in general be a complex matrix with
473
00:56:58,030 --> 00:57:03,330
complex entries we donít care is sitting
somewhere here and then you take the rest
474
00:57:03,330 --> 00:57:15,340
of these elements here at their moduli together
and that gives you a positive number right
475
00:57:15,340 --> 00:57:22,120
and that positive number draw a circle of
that radius about this point okay. So draw
476
00:57:22,120 --> 00:57:27,640
a circle whether you choose rows or columns
it doesnít matter because the eigenvalues
477
00:57:27,640 --> 00:57:33,980
of a matrix are unchanged if you change the
matrix to its transpose.
478
00:57:33,980 --> 00:57:41,090
So the radius here would be the sums of these
moduli. Similarly, take the next row take
479
00:57:41,090 --> 00:57:54,150
a 22, thatís somewhere here and draw a similar
circle etc. These things are called Gershgorin
480
00:57:54,150 --> 00:58:04,060
discs and the statement is all the eigenvalues
will lie either in or around these circles,
481
00:58:04,060 --> 00:58:08,940
thatís all and itís a very simple theorem
to prove. You can prove it by elementary means
482
00:58:08,940 --> 00:58:18,010
okay. Now these are discs, could be disjoint.
There could another disc here which is disjoint.
483
00:58:18,010 --> 00:58:25,030
There could be things which overlap we donít
care. What we do know, thereís an extra theorem
484
00:58:25,030 --> 00:58:30,740
which says that if any of these discs is disjoint
then you are guaranteed to have at least 1
485
00:58:30,740 --> 00:58:36,100
eigenvalue there in the disc and this is a
completely general theorem. It doesnít say
486
00:58:36,100 --> 00:58:40,200
anything about the nature of the matrix. It
doesnít assume whether itís real elements,
487
00:58:40,200 --> 00:58:48,760
complex elements etc., we donít care, still
true. Now if you apply this to this w whatís
488
00:58:48,760 --> 00:58:58,980
going to happen?
In the case of w we know that all the diagonal
489
00:58:58,980 --> 00:59:09,970
elements are negative real numbers. So they
are all sitting here or here or here etc.
490
00:59:09,970 --> 00:59:16,470
and in each case the elements add up. The
rest of the elements are minus whatever was
491
00:59:16,470 --> 00:59:24,690
the diagonal element right so the radius is
just this distance and this fellow here has
492
00:59:24,690 --> 00:59:33,210
a thing like this etc. and all the eigenvalues
are in the intersection of these discs which
493
00:59:33,210 --> 00:59:38,660
means no eigenvalue can have a positive real
part immediately.
494
00:59:38,660 --> 00:59:42,180
And all the eigenvalues other than 0 will
have negative real parts and therefore the
495
00:59:42,180 --> 00:59:47,210
system will the probabilities relax towards
the equilibrium distribution. So w is called
496
00:59:47,210 --> 00:59:55,270
the relaxation matrix in the physical literature.
So I stop here now and we take it from this
497
00:59:55,270 --> 00:59:56,790
point.