1
00:00:42,700 --> 00:00:50,460
In the earlier class, we defined two new information
measures. These were joint information measure,
2
00:00:50,460 --> 00:00:54,149
and the other was conditional information
measure.
3
00:00:54,149 --> 00:01:25,849
Joint information measure was given as H of
X, Y; i is equal to 1 to n, j is equal to
4
00:01:26,020 --> 00:01:35,820
1 to m. This was the joint information measure,
which we define when we have two events taking
5
00:01:35,830 --> 00:01:43,479
place simultaneously and we observe them as
1. Another information measure which we define
6
00:01:43,479 --> 00:01:52,699
was conditional information measure, and that
was defined as H of Y given X was equal to
7
00:01:52,700 --> 00:02:04,980
minus, summation over xi yj log of probability,
j is equal to 1 to m.
8
00:02:17,260 --> 00:02:24,260
This is a conditional information measure
with regard to experiment Y given X, and similarly
9
00:02:24,549 --> 00:02:35,760
we can define H of X given Y. Let us look
into the properties of H Y given X, little
10
00:02:35,760 --> 00:02:43,560
more into depth before we proceed ahead. So,
the two important properties of H Y given
11
00:02:43,569 --> 00:02:52,449
X would be, one properties H Y given X is
always greater than equal to 0. And other
12
00:02:52,489 --> 00:03:03,849
property is that H of Y given X is always
less than equal to H of Y, with equality if
13
00:03:04,959 --> 00:03:14,959
and only if X and Y are statistically independent.
Let us try to prove these two properties pertaining
14
00:03:14,970 --> 00:03:23,570
to H of Y given X. Now to prove this property,
the first property it is not very difficult.
15
00:03:27,140 --> 00:03:39,800
So, proof the first one we have to prove is
H of Y given X is always greater than equal
16
00:03:39,810 --> 00:03:55,870
to 0. Now, because probability of yj given
xi is less than equal to 1, for all i and
17
00:03:56,799 --> 00:04:06,939
j therefore, this implies that minus log of
probability, yj given xi is always greater
18
00:04:06,939 --> 00:04:19,219
than equal to 0. And so from the definition
of H Y X which is nothing but summation over
19
00:04:19,400 --> 00:04:33,120
probability of xi yj log probability of, yj
given xi. This quantity is positive, this
20
00:04:33,120 --> 00:04:39,849
quantity is positive so for totally this quantity
is always greater than equal to 0. So, the
21
00:04:39,849 --> 00:04:46,849
first property has been proved.
Now to prove the second property, that is
22
00:04:47,419 --> 00:04:54,419
H of Y given X is always less than equal to
H of Y, it is not very difficult. We write
23
00:04:57,830 --> 00:05:04,830
of equation minus H Y is equal to minus summation
over probability of xi yj log of p of yj given
24
00:05:16,500 --> 00:05:23,500
xi minus H Y, is probability of yj log probability
of, j is equal to 1 to m. Here also j is equal
25
00:05:33,490 --> 00:05:40,490
to 1 to m, i is equal to 1 to n now, this
is very simply can be written as, this quantity
26
00:06:19,720 --> 00:06:26,650
out here summation j is equal to 1 to m, p
yj can be substituted by double summation
27
00:06:26,650 --> 00:06:31,879
and once you do that double summation, this
expression can be written as this.
28
00:06:31,879 --> 00:06:38,879
Now, we also have seen that log x is always
less than equal to x minus 1. If I use this
29
00:06:42,220 --> 00:06:49,220
property then i can write this expression
out here, x is equivalent to this quantity
30
00:06:52,120 --> 00:06:59,120
out here then I can write that H of Y given,
minus H of Y is always less than equal to
31
00:07:06,009 --> 00:07:13,009
double summation probability of xi j, probability
of yj, i is equal to 1 to n, j is equal to
32
00:07:29,789 --> 00:07:36,789
1 to m. Multiplied by, because this expression
is a natural base whereas, we have here log
33
00:07:43,979 --> 00:07:49,580
to the base 2 therefore, when we convert from
log to the base 1 to the, log to the base
34
00:07:49,580 --> 00:07:56,580
e we will get this factor out here.
Now, this can be a simply simplified as minus
35
00:08:17,979 --> 00:08:24,979
double summation of this whole thing, gets
multiplied by log to the e. Now, writing this
36
00:08:36,169 --> 00:08:43,169
expression using the Bayes theorem we will
get summation of probability of xi probability
37
00:08:45,300 --> 00:08:52,300
of yj minus, probability of xi yj, i is equal
to 1 to n, j is equal to 1 to m. Similarly,
38
00:08:59,569 --> 00:09:06,569
i will have i is equal to 1 to n, j 1 to m
now, this summations are 1 minus 1 and this
39
00:09:16,279 --> 00:09:23,279
is equal to 0.
Therefore, what we get is H of Y given X is
40
00:09:24,570 --> 00:09:31,570
always less than equal to H of Y, what this
expression says is that the uncertainty, which
41
00:09:38,670 --> 00:09:45,670
is there in the event Y or experiment Y, after
the event X has been observed, will be always
42
00:09:51,200 --> 00:09:58,200
less than the uncertainty, which is there
initially when I do not observe X. So, when
43
00:09:59,810 --> 00:10:06,810
I have the full knowledge about the event
X, the uncertainty about the event Y will
44
00:10:08,100 --> 00:10:15,100
be always less than the, uncertainty of the
event Y, when I do not observe event X. And
45
00:10:18,240 --> 00:10:25,240
therefore, the information of H Y given X,
will be always less than equal to H of Y.
46
00:10:26,560 --> 00:10:33,560
Now, let us have a look at the relationships
between the joint information, conditional
47
00:10:39,430 --> 00:10:46,430
information measures and the marginal information.
So, I want to find out the relationship, this
48
00:10:53,300 --> 00:11:00,300
is my joint information, I have my marginal
information measure. And then I have my conditional
49
00:11:04,440 --> 00:11:06,360
information measure.
50
00:11:06,360 --> 00:11:13,160
Now, I want to find out the relationship between
these three quantities. Now, we will show
51
00:11:13,160 --> 00:11:20,160
very shortly that H of X given Y, is nothing
but equal to H X plus H of X given Y or I
52
00:11:22,700 --> 00:11:29,700
can also write this H of Y plus H of Y given
X. So, this is another important relationship
53
00:11:36,190 --> 00:11:43,190
which will be using during the course of our
lecture today. So, let us try to prove this
54
00:11:43,760 --> 00:11:48,040
relationship, let us try to prove the first
relationship. Let us look at the definition
55
00:11:48,040 --> 00:11:54,320
of H of X given Y, which we just started in
the morning.
56
00:11:54,320 --> 00:12:01,320
So, this is nothing but probability of xi
j log of p of xi yj. So, let us start with
57
00:12:14,060 --> 00:12:21,060
H X Y which by definition is given by this
term expression. So, this if I simplify it,
58
00:12:25,980 --> 00:12:32,980
I can write it as probability of xi yj log
of p of xi probability of yj given xi. This
59
00:12:41,339 --> 00:12:48,329
I am writing using the Bayes rule so i is
equal to 1 to n, j is equal to 1 to m. And
60
00:12:48,329 --> 00:12:55,329
this, I can simply as probability of xi yj
log of p of xi minus summation, double summation
61
00:13:04,579 --> 00:13:11,579
probability of xi yj log of probability of
yj given xi, i is equal to 1 to n, j is equal
62
00:13:19,480 --> 00:13:26,480
to 1 to m and similarly, out here.
Now, this quantity out here is nothing but
63
00:13:29,680 --> 00:13:36,680
your H X and this quantity is nothing by definition
H Y given X. This quantity out here should
64
00:13:44,250 --> 00:13:51,250
not be H X given Y, but should be H of Y given
X and similarly, this quantity should be H
65
00:13:55,279 --> 00:14:02,279
of X given Y. So, finally I get the joint
information which I get from two events X
66
00:14:06,550 --> 00:14:13,550
and Y is equal to the information, which I
get from the event X alone, plus the information,
67
00:14:15,000 --> 00:14:22,000
additional information, which I will get from
event Y, after the event X has occurred. So,
68
00:14:22,220 --> 00:14:29,220
similarly I can show that this is nothing
but H of Y plus H of X given Y. Now, H of
69
00:14:46,399 --> 00:14:52,660
X given Y is this quantity out here.
70
00:14:52,660 --> 00:14:59,660
If you look at H of XY is equal to H of X
plus, H of Y given X. We just proved that
71
00:15:05,790 --> 00:15:12,790
this quantity is always less than, the information
in Y alone. So, this quantity out here will
72
00:15:13,790 --> 00:15:20,790
be always less than, H X plus H of Y so the
joint information in X and Y is always less
73
00:15:25,829 --> 00:15:32,829
than, the sum of the information in X and
Y. And they only equal when X and Y are independent,
74
00:15:36,959 --> 00:15:43,839
with this little background we will move ahead,
where we had the left last time. And we were
75
00:15:43,839 --> 00:15:50,839
studying the properties of Markov source so
let me revisit the Markov source and let us
76
00:15:52,440 --> 00:15:57,630
look at, depth into the properties of this
Markov source.
77
00:15:57,630 --> 00:16:04,630
So, I will start again with a first order
Markov process so first order Markov source
78
00:16:05,170 --> 00:16:12,170
will have its source alphabet. Let me assume
as, s1, s2 up to sq the size of the alphabet
79
00:16:17,190 --> 00:16:24,190
of this source is q and this is the first
order Markov source. What I mean by first
80
00:16:33,970 --> 00:16:40,350
order Markov source is that, the occurrence
of any particular symbol will be dependent
81
00:16:40,350 --> 00:16:47,040
upon, the occurrence of the previous symbol.
That is what we mean by a first order Markov
82
00:16:47,040 --> 00:16:54,040
source, let me assume that I have a time instant
t1, at this time instant t1, let us assume
83
00:16:57,769 --> 00:17:04,769
some symbol occurs at this time instant 1.
And let us call that symbol which occurs is
84
00:17:05,060 --> 00:17:12,060
i so s1i is one of the symbols, from this
source alphabet. And let me assume that, I
85
00:17:13,730 --> 00:17:20,730
have another time instance t 2 and at that
instant another symbol occurs at time instant
86
00:17:22,110 --> 00:17:29,110
t 2 and let us call this s 2 j. s 2 j is again
one of the symbols from this source alphabet.
87
00:17:31,090 --> 00:17:38,090
Now, if I were to find out what is the information
which I gain, when I transfer, when I go from,
88
00:17:47,500 --> 00:17:54,500
when I translate from s1i to s 2 j. Then to
find out that information, I will require
89
00:17:58,119 --> 00:18:05,119
the conditional probabilities of s 2 j given,
s1i. So, if I have this conditional probabilities
90
00:18:10,419 --> 00:18:17,419
available, then I can calculate what is the
information, which I get when I transit from
91
00:18:18,570 --> 00:18:25,570
s1i to s 2 j. Now, considering the time instant
at t1 and time instant t 2, as two different
92
00:18:30,030 --> 00:18:37,030
experiments with relation, which are related
to X and Y. Similar to what we define HYX,
93
00:18:37,919 --> 00:18:44,919
we can define the information which I get,
additional information which I get, when I
94
00:18:45,889 --> 00:18:52,580
transit from s1i to s 2 j.
So, that is very easy to calculate and I can
95
00:18:52,580 --> 00:18:59,580
say that H of s 2 to s 1, would be given by
this expression. So, this is the amount of
96
00:19:38,099 --> 00:19:45,099
information and which I get when there is
an arbitrary transition from one state to
97
00:19:45,169 --> 00:19:50,500
another state. In the light of what we have
done earlier, with relation to the experiment
98
00:19:50,500 --> 00:19:55,349
X and Y, I can write that the additional amount
of information which I get, when arbitrary
99
00:19:55,349 --> 00:20:02,349
transition like this from, t1 to t 2 is given
by this expression out here. So, if I were
100
00:20:05,330 --> 00:20:10,500
to find out, what is the joint information
between s 2 and s 1.
101
00:20:10,500 --> 00:20:17,500
We can find out very easily as H of s1 s 2,
this is the joint information, which I will
102
00:20:24,220 --> 00:20:31,220
get from messages of length 2 would be, i
is equal to 1 to q, j is equal to 1 to q,
103
00:20:37,429 --> 00:20:44,429
probability of s1i, s 2 j log of probability
of s1i s 2 j. Now, this can be easily shown
104
00:20:58,970 --> 00:21:05,970
to be as H of s1 plus H of s 2 to given s
1. This relationship we have just seen instead
105
00:21:14,940 --> 00:21:21,940
of s1 and s 2, we had seen in terms of X and
Y. So, it is not very difficult to derive
106
00:21:22,300 --> 00:21:29,300
this relationship now, we have also seen that
because H of s 2 given s 1, is always less
107
00:21:34,480 --> 00:21:41,480
than equal to H of s 1. Therefore, H of s1
s2 is always less than equal to H of s 1.
108
00:21:51,659 --> 00:21:58,659
This quantity out here, it should be H of
s 2 is this so H of s 1 s 2 is always less
109
00:22:08,520 --> 00:22:15,520
than equal to H s1 plus H s 2, which we have
derived and if you assume this Markov process
110
00:22:15,550 --> 00:22:22,550
as stationary and ergodic, then H of s1 is
equal to H of s 2 is equal to H of S. And
111
00:22:29,220 --> 00:22:36,220
therefore, H of s 1 s 2 will be always less
than equal to twice of H of S.
112
00:22:43,790 --> 00:22:50,790
So, what we have derived now is that, when
I look at the entropy of messages which are
113
00:22:59,520 --> 00:23:06,520
of length two symbols. And the symbols come
from a Markov process of first order then
114
00:23:08,030 --> 00:23:15,030
the total information in the messages of symbols
of length two, turns out to be less than equal
115
00:23:16,109 --> 00:23:23,109
to twice of H of S, HS. Now, if the process,
if this Markov process of 0 order then I would
116
00:23:27,470 --> 00:23:38,139
have got is equal to twice of HS. So, the
conclusion is that whenever the dependency
117
00:23:38,140 --> 00:23:45,140
among the symbols then the messages of length
L, the total information will be there in
118
00:23:48,760 --> 00:23:55,760
that, will be always less than equal to 8
times the entropy of the source. We had seen
119
00:23:59,470 --> 00:24:06,470
this result earlier, where we had proved that
if the symbols are independent then H of s
120
00:24:07,429 --> 00:24:14,429
1 s 2 turns out to be twice HS. Now, this
we had proved it for the first order Markov
121
00:24:18,020 --> 00:24:24,220
process. Now, let us move to the sum higher
order Markov process, and let us make the
122
00:24:24,220 --> 00:24:26,879
things little more generic.
123
00:24:26,879 --> 00:24:33,879
So, I will consider Markov source of order
k, which is greater than 1. And at the moment
124
00:24:49,899 --> 00:24:56,899
I am interested in the occurrence of a particular
symbol at the time instant say, capital N.
125
00:25:01,320 --> 00:25:07,300
So, let me say that the symbol which occurred
a time instant capital n, let me denoted it
126
00:25:07,300 --> 00:25:14,300
as sN. And if I assume that this symbol, which
occurs from the source S is coming from a
127
00:25:16,300 --> 00:25:23,300
Markov process. Then what I am interested
is, I am interested in the additional information,
128
00:25:24,889 --> 00:25:31,889
the average additional information, which
I get on the occurrence of a symbol, at a
129
00:25:32,119 --> 00:25:39,119
time instant sN, given that I have observed
all the preceding symbols, right from time
130
00:25:43,010 --> 00:25:50,010
instant 1 up to time instant N minus 1.
So, if I use the general properties of entropy
131
00:25:53,780 --> 00:26:00,780
then I can define the average additional information,
which am going to get on the observation of
132
00:26:02,570 --> 00:26:09,470
sN, having observed the preceding symbols
will be, nothing but this quantity sN minus
133
00:26:09,470 --> 00:26:16,470
1, sN minus 2 this will continue up to Y.
So, this is the additional information which
134
00:26:24,669 --> 00:26:31,669
I get when I observe, symbol sN at time instant
N now, this quantity I will define it to be
135
00:26:37,210 --> 00:26:44,210
as by definition I will call it as FN of S.
Now, before we go ahead there is some interesting
136
00:26:49,909 --> 00:26:56,909
properties of these FNs. One interesting property
would be that H of sN, given s of N minus
137
00:27:02,359 --> 00:27:09,359
1, N minus 2, s2, s1 is always less than equal
to H of sN given sN minus 1, sN minus 2 up
138
00:27:25,839 --> 00:27:32,839
to s2.
Now, it is not very difficult to prove this
139
00:27:33,339 --> 00:27:40,339
it is very satisfying that, the occurrence
of s1, cannot increase the uncertainty of
140
00:27:46,540 --> 00:27:53,540
the occurrence of sN. So, what it, what this
implies, the relationship implies is that,
141
00:28:13,629 --> 00:28:20,629
what this relationship implies is that, the
knowledge that is delivered by the first symbol
142
00:28:20,659 --> 00:28:27,659
s1, cannot lead to an increase in the uncertainty,
about the NH symbol. But it will always decrease
143
00:28:31,210 --> 00:28:38,210
or leave it unchanged so this is the significance
of this expression. Now, there is a very important
144
00:28:39,010 --> 00:28:46,010
theorem, which will try to derive from this.
145
00:28:49,500 --> 00:28:56,500
Theorem says, the conditional amount of information
that is FNs, which is by definition equal
146
00:29:25,929 --> 00:29:32,929
to H of sN given sN minus 1, up to s1 of the
Nth symbol. In the case where, the preceeding
147
00:30:09,260 --> 00:30:16,260
N minus 1 symbols are known, is a monotonic
decreasing function of N. What we mean by
148
00:30:40,190 --> 00:30:47,190
that is, H of sN given N minus 1 preceeding
symbols, will be always less than equal to
149
00:30:55,629 --> 00:31:02,629
H of sN minus 1 given sN minus 2 to sN1 and
this way we can. So, this is a very important
150
00:31:24,679 --> 00:31:30,609
theorem, which is associated with a general
Markov process.
151
00:31:30,609 --> 00:31:36,060
So, this is what we had defined as additional
information, which I get on the occurrence
152
00:31:36,060 --> 00:31:42,349
of the Nth symbol, when I know the preceeding
N minus 1 symbol so what it says that information
153
00:31:42,349 --> 00:31:49,349
which I get from here, will be always less
than or equal to the information which I get,
154
00:31:50,740 --> 00:31:57,740
when I go back to the time, instant N minus
1. And observe the symbol and given that,
155
00:31:58,820 --> 00:32:03,869
at that time instance N minus 2 preceeding
symbols have been observed. And if I continue
156
00:32:03,869 --> 00:32:10,869
like this finally, I land up with the first
symbol. So, the uncertainty which I have,
157
00:32:11,720 --> 00:32:18,720
when I observe the first symbol is the maximum
compared to the uncertainty, which I have
158
00:32:20,089 --> 00:32:24,990
at the Nth instant of time. We will try to
prove this theorem.
159
00:32:24,990 --> 00:32:31,990
So, let us look into the proof of this theorem
since, the source the Markov source which
160
00:32:49,310 --> 00:32:56,310
we are considering is stationary, the sources
stationary. The conditional amount of information
161
00:33:17,540 --> 00:33:24,540
is independent
of the position
of the N th symbol, in the sequence of source
symbols, which are being emitted. So, what
162
00:34:07,399 --> 00:34:14,399
it means is that H of sN minus 1 given sN
minus 2, s1 is equal to H of sN given S of
163
00:34:31,240 --> 00:34:38,240
N minus 1, but this will go up to s2. I can
write this expression because the sources
164
00:34:39,859 --> 00:34:46,859
is stationary.
And we have just seen that, the property of
165
00:34:51,120 --> 00:34:58,120
a Markov source is
166
00:35:03,460 --> 00:35:10,460
conditional information, will be always this
will be always less than the quantity on the
167
00:35:13,490 --> 00:35:20,490
right hand side. This I can write because
the source is stationary and this is the property
168
00:35:23,300 --> 00:35:30,300
of the additional information measure and
from these two it directly follows that, H
169
00:35:31,080 --> 00:35:38,080
of sN given sN minus 1 up to s1 is less than
equal to H of sN minus 1 given sN minus 2.
170
00:35:53,480 --> 00:36:00,480
So, using these two properties I get this
so this is by definition, nothing but FN of
171
00:36:05,820 --> 00:36:12,820
s is less than equal to F of N minus 1 s.
So, similarly I can extend H of sN minus 1
172
00:36:17,840 --> 00:36:24,840
given sN minus 2 s1 is less than equal to
H of sN minus 2 given sN minus 3 up to sN
173
00:36:27,650 --> 00:36:34,650
1. So, I can extend like this and simply show
that this condition is true. Now, what it
174
00:36:57,460 --> 00:37:04,460
follows that as I keep on increasing N, the
value of FNs keeps on decreasing now, since
175
00:37:06,390 --> 00:37:10,520
FNs is always greater than equal to 0.
176
00:37:10,520 --> 00:37:17,520
What this implies is that, limit of N tending
to infinity of FNs, should converge and let
177
00:37:24,650 --> 00:37:31,650
me call that limit as by definition H infinity
s. And this is nothing but limit of N tending
178
00:37:36,060 --> 00:37:43,060
to infinity of additional information of sN
given N minus 1 preceeding symbols have been
179
00:37:46,990 --> 00:37:53,990
observed. This is bits per symbol and from
this expression out here, because this is
180
00:38:10,840 --> 00:38:17,840
expression from these two expressions I can
write, 0 is greater than, this expression
181
00:38:32,140 --> 00:38:39,140
is F1s
Now, this quantity out here, the quantity
182
00:38:40,740 --> 00:38:47,740
which I get when, N tends to infinity of FNs
is by definition is known as, this quantity
183
00:38:50,230 --> 00:38:57,230
is the amount of
information of a discrete
information source with memory. Now, we have
formally defined the information measure for
184
00:39:31,320 --> 00:39:38,320
a Markov source with memory. We have considered
the value of k to be arbitrary, the k stands
185
00:39:43,480 --> 00:39:48,510
for the order of the Markov process. So, what
we get is that, if I want to calculate the
186
00:39:48,510 --> 00:39:52,990
entropy of a Markov process then entropy of
the Markov process is nothing but limit N
187
00:39:52,990 --> 00:39:58,490
tending infinity of FNs.
And that is nothing by definition limit N
188
00:39:58,490 --> 00:40:05,490
tending to infinity, the additional amount
of information. So, this by definition is
189
00:40:06,230 --> 00:40:13,230
the, definition for the information measure
of a Markov source memory. Now, if you assume
190
00:40:17,980 --> 00:40:24,980
a Markov source of order k then what this
implies is that as I keep on increasing the
191
00:40:26,130 --> 00:40:33,130
value of N, beyond certain value of N, this
FNs will not fall. This is very easy to appreciate.
192
00:40:39,770 --> 00:40:46,770
Because probability of sN given sN minus 1
up to s1, will be equal to probability of
193
00:40:52,210 --> 00:40:59,210
sN given sN minus 1 up to sN minus k, when
the Markov source is of a kth order. And in
194
00:41:07,010 --> 00:41:14,010
this, relationship is valid then H of sN given
sN minus 1 up to s1, would be equal to H of
195
00:41:19,470 --> 00:41:26,470
sN given sN minus 1 up to sN minus k. Because,
only k preceeding symbols will come into picture
196
00:41:33,910 --> 00:41:40,910
and because the Markov source is and stationary
I can write this as H of sk plus 1 given s
197
00:41:45,540 --> 00:41:52,540
of k, s of k minus 1 up to s1.
So, for a Markov source of order k H infinity
198
00:41:59,810 --> 00:42:06,810
s will be equal to, nothing but F of k plus
1 s is equal to this quantity. Because for
199
00:42:16,070 --> 00:42:23,070
N beyond k plus 1, for N beyond k plus 1 this
quantity will remain constant and it will
200
00:42:28,480 --> 00:42:35,480
not go lower than this value. And then I can
write H infinity s is equal to this value,
201
00:42:38,130 --> 00:42:45,130
for 0 memory source k is equal to 0 and in
that case, H infinity s is nothing but H of
202
00:42:48,090 --> 00:42:55,090
S. So far we have considered the symbol, the
messages with symbol length of unity. The
203
00:43:04,750 --> 00:43:11,750
next question is like, we had done earlier
if I look at messages, of length other than
204
00:43:15,440 --> 00:43:22,440
1 then what happens to the entropy of those
messages. Let us look into that.
205
00:43:23,840 --> 00:43:30,840
So, let us assume that I have a message v,
which is composed of N symbols of this Markov
206
00:43:41,560 --> 00:43:48,560
process. So, I have s1, si1, s12 up to sin
capital N. So, what I do basically is that,
207
00:44:01,150 --> 00:44:08,150
I assume that I have messages now of N symbols,
capital N. I had messages of N symbol now,
208
00:44:11,130 --> 00:44:18,130
if I look at these messages, and if I were
to find out the entropy of this then how this
209
00:44:18,370 --> 00:44:25,370
entropy is related to my original entropy.
Let us analyze this so if I want to calculate
210
00:44:28,910 --> 00:44:35,910
entropy of this then I can write entropy of
H V, is nothing but H of s1 s2 to sN. So,
211
00:44:47,000 --> 00:44:54,000
this is the information which I get from messages
of symbols consisting of length N, capital
212
00:45:02,130 --> 00:45:09,130
N.
Now, if I define another quantity as average
213
00:45:11,690 --> 00:45:18,690
information per symbol, which is by definition
equal to 1 by N of H V so this is my average
214
00:45:22,840 --> 00:45:29,840
information which I get per symbol. So, this
is equal to 1 by N H of s1 s2, this will be
215
00:45:39,720 --> 00:45:46,720
bits per symbol. Now, obviously the symbols
are statistically independent then H of sN
216
00:45:55,800 --> 00:46:02,800
would be, nothing but summation of H of si.
This expression I can write, for this provided
217
00:46:10,590 --> 00:46:17,590
my symbols in this message of symbol length
N are independent.
218
00:46:18,260 --> 00:46:25,260
And in that case, I can simplify this to be
as H of s now, if the symbols are dependent
219
00:46:32,280 --> 00:46:38,740
then I cannot write like this. And then I
can go to more fundamental definition, I can
220
00:46:38,740 --> 00:46:45,740
say that H Ns is equal to 1 by N. And this
information out here, joint information in
221
00:46:48,630 --> 00:46:55,630
N symbols can be written as H s1 plus H of
s2 given s1 plus H of sN given sN minus 1
222
00:47:12,860 --> 00:47:19,860
up to s1and this is equal to 1 upon N times.
So, this I get, average information per symbol
223
00:47:43,770 --> 00:47:50,770
and that is related to 1 by N summation of
Fjs. Now, that was the case for FNs we can
224
00:47:57,170 --> 00:48:04,170
similarly, show that H of Ns is a mono tonic
decreasing function of N. And it will be interesting
225
00:48:10,340 --> 00:48:16,400
to find out that, the limiting value of HNs
turns out to be the same, as the limiting
226
00:48:16,400 --> 00:48:23,400
value of FNs. The limiting value of FNs we
have seen, it was H infinity s and this was
227
00:48:26,710 --> 00:48:32,720
the entropy of a Markov source, that is what
we have defined. Now, we can also show that
228
00:48:32,720 --> 00:48:40,220
the limiting value of HNs as N tends to infinity,
also turns out to be H infinity s, we will
229
00:48:40,230 --> 00:48:44,500
try to prove this in next lecture.