1
00:00:17,279 --> 00:00:22,519
Welcome to today's class on probability and
information theory. This is an extremely important
2
00:00:22,519 --> 00:00:27,759
field of study and a huge field. So, today
we shall be trying to concentrate on some
3
00:00:27,759 --> 00:00:33,729
of the basic principles, which are necessary
to understand the design and analysis of ciphers.
4
00:00:33,729 --> 00:00:38,909
So, in today's talk, we shall be talking about
the importance of probability in cryptography
5
00:00:38,909 --> 00:00:44,040
and then discuss about computational security,
then follow it up with some discussions on
6
00:00:44,040 --> 00:00:51,010
binomial distributions and its applications,
and very important birthday paradox, and then
7
00:00:51,010 --> 00:00:54,789
conclude with some concepts of entropy and
information theory.
8
00:00:54,789 --> 00:01:01,789
Now, as we have been discussing, if you remember
in our first class that we are trying to answer
9
00:01:05,729 --> 00:01:09,039
the questions of this nature, like, how probable
is the insecure event?
10
00:01:09,039 --> 00:01:14,979
So, you remember that in our example on the
coin flipping over telephone, the question
11
00:01:14,979 --> 00:01:20,640
was, what is the probability of Alice to create
x which is not equal to y, such that, f x
12
00:01:20,640 --> 00:01:26,829
and f y are essentially the same? If you remember
that the question was like, whether Alice
13
00:01:26,829 --> 00:01:31,810
is able to choose two different x and y values,
such that, the outcome, which is denoted by
14
00:01:31,810 --> 00:01:36,320
f x and f y are the same?
The other question that we tried to address
15
00:01:36,320 --> 00:01:40,590
was, what is the probability that the Bob
can guess the parity of x? That was the important
16
00:01:40,590 --> 00:01:45,829
information and the question was, whether
from the value of f x, Bob is able to extract
17
00:01:45,829 --> 00:01:50,680
out the information of the parity of x?
Therefore, these types of questions will again
18
00:01:50,680 --> 00:01:57,500
and again appear, in the course of, when we
are trying to design and also analyse the
19
00:01:57,500 --> 00:02:01,740
ciphers; therefore, the theory of probability
is quite central to the development of this
20
00:02:01,740 --> 00:02:02,500
field.
21
00:02:02,500 --> 00:02:08,910
So, a good cryptosystem, as we discussed in
the last to last classes, was that it should
22
00:02:08,910 --> 00:02:13,570
produce a cipher text, which has got a random
distribution that means - we should look as
23
00:02:13,570 --> 00:02:18,859
much random as possible, to a distinguisher.
Therefore, I mean it should not be detectible,
24
00:02:18,859 --> 00:02:23,940
it should not be easily distinguished from
a random distribution; so therefore, which
25
00:02:23,940 --> 00:02:29,060
has to random in the entire space of the cipher
text message and the thing is that if it is
26
00:02:29,060 --> 00:02:34,750
perfectly random, let us leave it perfectly
random; we will try to make this notion more
27
00:02:34,750 --> 00:02:40,200
and more concrete as we proceed. There should
not be any information leakage.
28
00:02:40,200 --> 00:02:45,599
So, lot of terms have been coined out here,
like, information and also the notion of perfect
29
00:02:45,599 --> 00:02:49,330
randomness; we will try to make these things
more concrete and more mathematical as we
30
00:02:49,330 --> 00:02:54,830
proceed.
Now, you just try to understand that intuitively;
31
00:02:54,830 --> 00:03:00,409
it means that when something is perfectly
random, then essentially is not giving us
32
00:03:00,409 --> 00:03:07,409
any extra information; therefore, the idea
is that it has got zero information content.
33
00:03:07,760 --> 00:03:12,459
If you remember, we talked about the magic
function f x and therefore, if the function
34
00:03:12,459 --> 00:03:19,459
f x does not leak any information or any other
fact about the parity of x, then we say that
35
00:03:20,049 --> 00:03:27,049
it does not leak any information. Therefore,
this information, or rather, the lack of information,
36
00:03:27,989 --> 00:03:32,819
is sometimes also referred to as uncertainty
of ciphers. So, we will try to measure these
37
00:03:32,819 --> 00:03:37,970
terms, like, perfect randomness, information
and also uncertainty.
38
00:03:37,970 --> 00:03:44,970
We rely heavily on the theory of probability
and also develop theory of information using
39
00:03:45,099 --> 00:03:52,040
these probabilistic notions. So, another important
concept, although we shall not be really going
40
00:03:52,040 --> 00:03:55,590
deep into this, is something - the concept
of provable security.
41
00:03:55,590 --> 00:04:01,670
So, something which is called semantic security
is very central, and it is defined as follows:
42
00:04:01,670 --> 00:04:06,560
remember, Alice and Bob, therefore, the idea
is that Alice essentially encrypts either
43
00:04:06,560 --> 00:04:12,620
0 or 1 with equal probability; therefore,
Bob knows either 0 is encrypted or 1 is encrypted
44
00:04:12,620 --> 00:04:17,219
and the probability of choosing a 0 and 1
is half.
45
00:04:17,219 --> 00:04:24,219
Now, Alice encrypts these either 0 or 1 and
sends the resultant cipher c to Bob as a challenge.
46
00:04:24,849 --> 00:04:31,849
Now, Bob has to guess from the challenge without
the decryption key that whether 0 was encrypted
47
00:04:32,039 --> 00:04:36,650
or 1 was encrypted.
So, if Bob is not provided with a cipher text,
48
00:04:36,650 --> 00:04:43,650
then what will Bob does? Bob will simply guess.
Now, when Bob is provided the cipher text,
49
00:04:45,610 --> 00:04:51,000
he should not be able to guess any better
than the random guess; so, that is the idea.
50
00:04:51,000 --> 00:04:56,550
If he is not able to guess any better than
the random guess, then we say that the encryption
51
00:04:56,550 --> 00:05:01,050
algorithm which Alice is using is semantically
secured.
52
00:05:01,050 --> 00:05:08,050
If we just consider a simple kind of stream
cipher, for example, if there is a kind of
53
00:05:11,270 --> 00:05:18,270
message m, what Alice does is that it chooses
randomly a key value and it creates a cipher
54
00:05:18,830 --> 00:05:24,729
text; therefore, this message can either be
0 or 1. So, the key will also be either 0
55
00:05:24,729 --> 00:05:27,250
or 1 and it is randomly chosen.
56
00:05:27,250 --> 00:05:33,400
Now, this particular cipher text has been
provided to Bob and Bob from this cipher text
57
00:05:33,400 --> 00:05:40,180
value has to guess, whether 0 was encrypted
or whether 1 was encrypted. This is an encryption
58
00:05:40,180 --> 00:05:47,180
algorithm, which Alice uses. Bob is either
receiving m or it is receiving m bar, because
59
00:05:47,449 --> 00:05:53,220
the key can either take 0 value or 1 value;
we are just considering a very simple case.
60
00:05:53,220 --> 00:05:59,270
So, Bob has to guess whether 0 has been encrypted
or whether 1 has been encrypted and if Bob
61
00:05:59,270 --> 00:06:03,850
does not have any information of the cipher
text, then Bob would have simply guessed.
62
00:06:03,850 --> 00:06:09,770
So, in this case also, when Bob is even provided
with the cipher text, the probability of Bob
63
00:06:09,770 --> 00:06:14,520
being able to guess whether the message is
0 or 1 should even, then be close to half;
64
00:06:14,520 --> 00:06:18,740
so, that is the idea of semantic security.
65
00:06:18,740 --> 00:06:25,740
So, Bob or any eves-dropper does not have
any advantage over the random guess; so semantic
66
00:06:26,449 --> 00:06:33,449
security tries to encapsulate or capture this
particular notion.
67
00:06:33,479 --> 00:06:38,220
We have been discussing about something called
message indistinguishability, that is, semantic
68
00:06:38,220 --> 00:06:44,860
security and message indistinguishability
are the same notions; these notions say that
69
00:06:44,860 --> 00:06:51,190
the attacker is not able to distinguish between
the encryptions of either a 0 or a 1.
70
00:06:51,190 --> 00:06:58,190
Often we talk about something which is called
computational security. We will see that there
71
00:07:00,360 --> 00:07:06,050
are two ways, you can model the attacker,
you can either assume that the attacker is
72
00:07:06,050 --> 00:07:10,800
an unbounded adversary, that is, he is all
powerful; it has got large access to large
73
00:07:10,800 --> 00:07:17,039
amount of resource and it can do large computations;
so, that is the notion of an unbounded attacker.
74
00:07:17,039 --> 00:07:21,629
So, you can make your security algorithms,
which are protected against an unbounded attacker,
75
00:07:21,629 --> 00:07:25,379
you can try to do that. The other thing which
you can do is that you can assume that in
76
00:07:25,379 --> 00:07:30,949
today's computational power or today's world,
the maximum competitions which an adversary
77
00:07:30,949 --> 00:07:36,379
can do -like, what is believed in today's
thumb rule is that, if there is any algorithm
78
00:07:36,379 --> 00:07:41,530
which requires more than 2 power of 80 computations,
then it is termed as infeasible.
79
00:07:41,530 --> 00:07:47,370
So, if they can limit or bound the attacker
of today's world by, say, 2 power of 80 computations,
80
00:07:47,370 --> 00:07:51,849
and if we can prove that the particular attack
for an encryption algorithm requires more
81
00:07:51,849 --> 00:07:55,430
than 2 power of 80 computations, then we are
happy as a designer.
82
00:07:55,430 --> 00:08:01,509
So, that is the idea of our computational
security analysis, that is, I am not trying
83
00:08:01,509 --> 00:08:05,970
to really give you guarantees of security
against an unbounded attacker, but we are
84
00:08:05,970 --> 00:08:12,099
trying to give you security against a bounded
adversary - an adversary, which is bounded
85
00:08:12,099 --> 00:08:17,330
by computational power.
The other advantage that you arrive apart
86
00:08:17,330 --> 00:08:22,620
from the simplicity is, often you will find
that when we are trying to give security guarantees
87
00:08:22,620 --> 00:08:26,979
against an unbounded adversary, you may end
up in an encryption algorithm or a technique
88
00:08:26,979 --> 00:08:31,729
which is not practical.
But as we discussed, we also need practical
89
00:08:31,729 --> 00:08:37,770
security, that is, the cipher should be practical;
so, in order to make it practical, it is often
90
00:08:37,770 --> 00:08:43,229
advantageous to assume that the adversary
is actually not unbounded by this kind of
91
00:08:43,229 --> 00:08:44,020
bounded.
92
00:08:44,020 --> 00:08:50,439
So, that is the notion and importance of computational
security analysis; we try to make this idea
93
00:08:50,439 --> 00:08:56,220
more clear as we proceed. Therefore, we define
a crypto-system to be computationally secure
94
00:08:56,220 --> 00:09:01,860
if the best algorithm for breaking it requires
at least N operations, where N is a very large
95
00:09:01,860 --> 00:09:04,750
number.
Another approach is to reduce the problem;
96
00:09:04,750 --> 00:09:08,929
therefore, this is another approach which
is often taken to reduce the problem of breaking
97
00:09:08,929 --> 00:09:14,720
a cryptosystem to a known problem, like, "factoring
a large number to its prime factors".
98
00:09:14,720 --> 00:09:20,260
It is often assumed that factoring a large
number to its prime factor is a difficult
99
00:09:20,260 --> 00:09:24,559
problem; this is a quite hard problem. Although,
we do not have real proofs for proving this
100
00:09:24,559 --> 00:09:29,290
fact, but the idea is that this particular
problem has withstood large number of analysis
101
00:09:29,290 --> 00:09:35,010
and withstood a long period of time and it
is believed that it is fairly hard.
102
00:09:35,010 --> 00:09:39,829
So, the approach which is taken is that when
a crypto-system is given, for example, if
103
00:09:39,829 --> 00:09:46,069
I just take the example of an asymmetric crypto-system
called RSA, and somebody asks me to prove
104
00:09:46,069 --> 00:09:52,189
the security of this system, the approach
which is adopted is, to reduce this problem
105
00:09:52,189 --> 00:09:58,530
of proving, or reduce this problem of breaking
this RSA crypto-system to this known problem;
106
00:09:58,530 --> 00:10:02,780
that is, we do a reduction proof and show
that if this problem, the idea is that if
107
00:10:02,780 --> 00:10:09,319
this problem - is a difficult problem, so
is the problem of breaking this RSA crypto-system.
108
00:10:09,319 --> 00:10:15,459
Thus these kinds of proofs are relative and
they are not absolute and they are sometimes
109
00:10:15,459 --> 00:10:22,100
called, proof by reduction; so we will also
see some examples of such kind of security
110
00:10:22,100 --> 00:10:28,850
proofs in our class also. Probability is a
good tool and it is a tool which helps us
111
00:10:28,850 --> 00:10:33,799
to analyse the ciphers. So, let us try to
make the concepts little bit more well-defined.
112
00:10:33,799 --> 00:10:39,059
So, there are some important definitions,
one of them is probability space. The probability
113
00:10:39,059 --> 00:10:45,610
space is a fixed set of points which is arbitrary,
that its kind is denoted by s often.
114
00:10:45,610 --> 00:10:52,390
So, let us consider an example- suppose, there
is an unbiased coin; so the unbiased coin
115
00:10:52,390 --> 00:10:57,670
can take two possible values. Therefore, we
define its sample space to be the values like,
116
00:10:57,670 --> 00:11:02,449
head and tail; so, it can take either head
value or it can take either tail value. Then,
117
00:11:02,449 --> 00:11:08,179
we define an experiment; so, the experiment
is defined as follows - experiment e1 is nothing
118
00:11:08,179 --> 00:11:15,179
but the outcome of a toss. So, if we assume
that this is an unbiased coin, then this experiment,
119
00:11:17,199 --> 00:11:22,069
what it does is that it chooses or samples
out a point from the sample space.
120
00:11:22,069 --> 00:11:27,880
Therefore, this result can either be a head
or it can be a tail; so, it just chooses a
121
00:11:27,880 --> 00:11:33,380
point from this head and tail. Similarly,
you can actually make this example a little
122
00:11:33,380 --> 00:11:40,360
bit more complicated and even for this simple
example, I think we can actually understand
123
00:11:40,360 --> 00:11:47,360
the concepts quite well.
So, if I just make it little general, then
124
00:11:47,480 --> 00:11:51,660
it will look like that the probability space
are arbitrary, but fixed set of points. For
125
00:11:51,660 --> 00:11:56,540
example, the head and tail, it could be more
than that also and we denote that by S.
126
00:11:56,540 --> 00:12:00,370
What the experiment does is that, it is an
action of taking a point from S; so, it just
127
00:12:00,370 --> 00:12:07,370
chooses a random point from the sample space,
this sample point, which is commonly called
128
00:12:08,900 --> 00:12:10,079
outcome of an experiment.
129
00:12:10,079 --> 00:12:13,620
So, you toss a coin, it is either head or
a tail; so, either head or the tail is the
130
00:12:13,620 --> 00:12:19,419
sample point of the experiment. Let us try
to make little bit more complicated, so you
131
00:12:19,419 --> 00:12:25,780
have got two possibilities, head or tail,
but what you do is that you toss the coin
132
00:12:25,780 --> 00:12:27,959
for ten number of times.
133
00:12:27,959 --> 00:12:34,189
If you toss the coin for ten number of times,
then there are several possible outcomes.
134
00:12:34,189 --> 00:12:41,049
So, if you toss the coin for ten number of
times, then it could be like, it is a sequence
135
00:12:41,049 --> 00:12:45,069
of heads and tails - it can be head tail,
tail, tail, and so on, tail head or something
136
00:12:45,069 --> 00:12:49,990
like that.
You denote the head by 0 or the tail by 1,
137
00:12:49,990 --> 00:12:54,069
you know that there are actually how many
possible enumerations or possible outcomes?
138
00:12:54,069 --> 00:13:01,030
There are 2 power of 10 possible outcomes.
Now, if I denote the particular kind of event
139
00:13:01,030 --> 00:13:06,730
as, saying that the end is five times head
and five times tail, then it means, that from
140
00:13:06,730 --> 00:13:11,510
all the sample points, I am interested in
the probability of one particular event.
141
00:13:11,510 --> 00:13:17,049
So, this particular event, like this head
and tail can occur actually in certain possible
142
00:13:17,049 --> 00:13:24,049
ways. So, we can actually compute the number
of times, exactly five times head falls by
143
00:13:24,660 --> 00:13:31,459
simply computing 10 choose 5. Therefore, probability
of this event, that is, the probability of
144
00:13:31,459 --> 00:13:38,459
the event that there are 5 heads will be nothing
but 10 choose 5 divided by 2 power of 10;
145
00:13:39,370 --> 00:13:43,449
so, this way of computing the probability
is something that we have quite seen.
146
00:13:43,449 --> 00:13:49,640
Therefore if I make it, or rather, define
it, this is the classical definition of probability,
147
00:13:49,640 --> 00:13:55,799
that is, if there is an experiment, which
yields one out of n possible equally probable
148
00:13:55,799 --> 00:14:01,400
points and that every experiment must yield
a point of course, and let m be the number
149
00:14:01,400 --> 00:14:06,400
of points which form event E, then the probability
of an event E is defined as the ratio of m
150
00:14:06,400 --> 00:14:08,140
and n.
151
00:14:08,140 --> 00:14:13,209
There is a statistical definition also. Now,
why do we require this statistical definition,
152
00:14:13,209 --> 00:14:18,260
if something we can easily understand? For
example, if I tell you that there is an unbiased
153
00:14:18,260 --> 00:14:24,380
coin and if I tell you that the probability
of a head is half, then that does not mean
154
00:14:24,380 --> 00:14:29,400
that if I toss the coin for 2 number of times,
then one of them will be a head, it does not
155
00:14:29,400 --> 00:14:34,870
mean that. What it means is that if you keep
on tossing the coin for a large number of
156
00:14:34,870 --> 00:14:41,299
times, then the half of the number of times
you toss will be, at least, approximately
157
00:14:41,299 --> 00:14:46,650
half will be head.
So, the notion of probability actually holds
158
00:14:46,650 --> 00:14:51,480
in reality, when you repeat the experiment
for a large number of times; so, there is
159
00:14:51,480 --> 00:14:56,480
a notion of statistics involved in the way
we are defining probability.
160
00:14:56,480 --> 00:15:01,240
Suppose, there are n experiments and these
are carried out under the same conditions
161
00:15:01,240 --> 00:15:06,890
in which the event E has occurred mu times.
So, for a large value of n, that means, this
162
00:15:06,890 --> 00:15:10,900
is important, that if I repeat the experiment
for a large number of times, then the event
163
00:15:10,900 --> 00:15:15,390
E is said to have the probability which is
denoted by probability of E, approximately
164
00:15:15,390 --> 00:15:19,240
equal to mu by n.
So, what I do is that instead of computing
165
00:15:19,240 --> 00:15:24,169
the probability by finding out the outcomes
and the number of possible ways these outcomes
166
00:15:24,169 --> 00:15:28,240
can come, what we can do is that we can keep
on repeating the experiment.
167
00:15:28,240 --> 00:15:34,040
We find that, if we actually repeat the experiment
for a large number of times and out of the
168
00:15:34,040 --> 00:15:38,000
mu number of times the particular event has
occurred, then we can fairly approximate its
169
00:15:38,000 --> 00:15:43,699
probability by the ratio of mu by n; so, this
is the statistical definition and often useful
170
00:15:43,699 --> 00:15:45,370
for analysis.
171
00:15:45,370 --> 00:15:50,280
Some of the very elementary probability rules
are as follows, now, we know this, but this
172
00:15:50,280 --> 00:15:55,079
is a kind of recapitulation; that is, probability
of A union B is equal to probability of A
173
00:15:55,079 --> 00:16:00,720
plus probability of B minus probability of
A intersection B; if they are mutually exclusive,
174
00:16:00,720 --> 00:16:04,150
that we know, that probability of A intersection
B works out to zero and therefore probability
175
00:16:04,150 --> 00:16:09,390
of A union B is equal to probability of A
plus probability of B. Now, this particular
176
00:16:09,390 --> 00:16:14,000
rule often handy in giving us upper bounds
of the probability of A union B, because we
177
00:16:14,000 --> 00:16:18,160
can say that probability of A union B will
be lesser than equal to probability of A plus
178
00:16:18,160 --> 00:16:23,900
probability of B, because this probability
is definitely greater than or equal to 0.
179
00:16:23,900 --> 00:16:29,449
So, we know the definition of conditional
probability that probability of A, given B,
180
00:16:29,449 --> 00:16:35,370
so this probability means that we know the
event B has occurred, what is the probability
181
00:16:35,370 --> 00:16:38,760
of A?
The way it is being computed is, probability
182
00:16:38,760 --> 00:16:45,760
of A intersection B divided by probability
of B. Now, if A and B are independent, then
183
00:16:46,610 --> 00:16:51,740
what is the probability of A, given B? That
is, it is same as probability of A because
184
00:16:51,740 --> 00:16:55,750
A does not really depend upon the outcome
of B.
185
00:16:55,750 --> 00:17:02,530
So, the probability of A given B becomes equal
to probability of A when they are independent
186
00:17:02,530 --> 00:17:07,530
and therefore probability of A intersection
B becomes equal to probability of A multiplied
187
00:17:07,530 --> 00:17:13,020
by probability of B. So, that is, if A and
B are independent, then probability of A intersection
188
00:17:13,020 --> 00:17:18,679
B is nothing else, probability of A multiplied
by probability of B.
189
00:17:18,679 --> 00:17:22,069
Then there is a very important law, which
is known as the law of total probability.
190
00:17:22,069 --> 00:17:29,069
If there are n events like E 1, E 2, and so
on till E n and the union of them is a sample
191
00:17:29,480 --> 00:17:36,480
space S. If I know that they are mutually
non-intersecting, that is, E i and E j cannot
192
00:17:36,530 --> 00:17:43,530
take place together; therefore, the intersection
of E i and E j is equal to the null set. Then,
193
00:17:44,030 --> 00:17:51,030
for any event A, we can say probability of
A is equal to, So what we do is, we multiply
194
00:17:51,620 --> 00:17:57,230
probability of E i, that is, the probabilities
that the ith event has occurred with probability
195
00:17:57,230 --> 00:18:03,160
of A, given E i; that is, the probability
of A. Given that the event E i has occurred,
196
00:18:03,160 --> 00:18:09,310
and then take a sum over all possible i values;
so, i runs from 1 to n. So, this is also a
197
00:18:09,310 --> 00:18:13,480
very handy rule for doing our computations.
198
00:18:13,480 --> 00:18:19,890
When we are talking about cryptography, we
are not talking about a continuous probability
199
00:18:19,890 --> 00:18:24,830
space; we are actually talking about discrete
space. Therefore, in this case, the total
200
00:18:24,830 --> 00:18:29,500
possible sample points can actually take some
discrete possible outcomes; so it can run
201
00:18:29,500 --> 00:18:35,670
from x 1 to say, till x hash s; hash s is
nothing but the number of elements in the
202
00:18:35,670 --> 00:18:39,410
sample point.
What we do is, that for each of these sample
203
00:18:39,410 --> 00:18:43,840
points, we actually define a probability.
So, first of all we define something which
204
00:18:43,840 --> 00:18:50,100
is called a random variable and then we say
that this particular random variable can take
205
00:18:50,100 --> 00:18:55,380
such and such possible values; then, we try
to assign a probability for this random variable.
206
00:18:55,380 --> 00:19:01,060
So, if there is a discrete space S, which
has got a countable number of points like
207
00:19:01,060 --> 00:19:08,060
x 1, x 2 so on till x hash S, then a discrete
variable is nothing but a numerical result
208
00:19:08,920 --> 00:19:13,290
of an experiment.
So, it is a function which is defined on a
209
00:19:13,290 --> 00:19:20,220
discrete sample space. For example, if I say
that example of a discrete variable could
210
00:19:20,220 --> 00:19:27,040
be like, if there are some points like x 1,
x 2 and x hash S, suppose, we define a function,
211
00:19:27,040 --> 00:19:32,670
like suppose the value of x, and imagine that
all of them are binary values. Suppose, there
212
00:19:32,670 --> 00:19:39,010
are four bit binary value, so there could
be sixteen possible such numbers like, 000
213
00:19:39,010 --> 00:19:46,010
00 00 to all 1s to all four 1s. We say that
we are interested in finding out the probability,
214
00:19:47,230 --> 00:19:53,440
that is, the numbers are, for example, even;
so, we know that what we will do is, from
215
00:19:53,440 --> 00:19:58,940
all these possible values we will try to find
out those binary values which end with zero.
216
00:19:58,940 --> 00:20:02,960
So, we define an experiment and therefore,
this is a function which is defined on a discrete
217
00:20:02,960 --> 00:20:04,600
sample space.
218
00:20:04,600 --> 00:20:11,310
So, what we do is that now, let S be a discrete
probability space and X be a random variable;
219
00:20:11,310 --> 00:20:17,680
so, X is the random variable, and we actually
start assigning probability values for these
220
00:20:17,680 --> 00:20:23,940
random variables.
So, what are the possible values this random
221
00:20:23,940 --> 00:20:30,220
variable X can take? It can either take x
1 value, x 2 value, until x of x hash S, and
222
00:20:30,220 --> 00:20:34,010
what we do is that for each of these possible
values of the random variable, we assign a
223
00:20:34,010 --> 00:20:39,010
probability.
So, we say the random variable X can take
224
00:20:39,010 --> 00:20:44,970
the value of x 1 with a probability of say,
p 1, the random variable X can take the value
225
00:20:44,970 --> 00:20:51,200
of x 2 with a probability of p 2. The random
variable X can so on take the value of say
226
00:20:51,200 --> 00:20:56,510
x hash S with a probability of p hash S; so,
we can define the probabilities like this.
227
00:20:56,510 --> 00:21:01,550
So, these probabilities will be discrete probabilities
and it needs to satisfy two important properties
228
00:21:01,550 --> 00:21:06,100
- one of them is that each of this probability
value should be greater than or equal to 0,
229
00:21:06,100 --> 00:21:09,720
that is, they should not be negative, and
other thing is that the summation of all these
230
00:21:09,720 --> 00:21:16,320
probabilities should be equal to 1.
Therefore, these probability should satisfy
231
00:21:16,320 --> 00:21:22,930
these two properties and we can actually say
that these probabilities are essentially a
232
00:21:22,930 --> 00:21:27,810
map from the sample space S to the set of
real numbers, because the probability values
233
00:21:27,810 --> 00:21:34,810
are nothing but the real numbers which lies
between 0 and 1, both sides included; so,
234
00:21:36,010 --> 00:21:41,100
these are actually the individual probabilities
and this should satisfy these two important
235
00:21:41,100 --> 00:21:42,800
properties.
236
00:21:42,800 --> 00:21:48,820
One of the very frequently used distribution
is something which is called uniform distribution
237
00:21:48,820 --> 00:21:54,420
and we know that in this case, all the values
like x 1till x hash S are equally probable.
238
00:21:54,420 --> 00:22:00,170
So, the random variable X can take x 1 or
x 2 or x 3or x hash S with the same probability
239
00:22:00,170 --> 00:22:05,400
and the probability is nothing but 1 divided
by hash s. Then X is said to follow a uniform
240
00:22:05,400 --> 00:22:11,100
distribution and it is often denoted like
this, that is, suppose, say that p choosing
241
00:22:11,100 --> 00:22:16,100
uniformly from s; therefore, it means that
choose p uniformly from S. So, this is a very
242
00:22:16,100 --> 00:22:19,090
common notation which is useful.
243
00:22:19,090 --> 00:22:24,170
Then we actually define a very important distribution
which is called a binomial distribution. It
244
00:22:24,170 --> 00:22:28,640
says that, suppose, there is an experiment
and it has got two possible values or possible
245
00:22:28,640 --> 00:22:33,040
outcomes, like, it could be either a success
or it could be either a failure or a HEAD
246
00:22:33,040 --> 00:22:39,350
or a TAIL, then we repeat the experiments
independently. Such experiments and these
247
00:22:39,350 --> 00:22:44,480
are called Bernoulli Trials, and if I denote
the probability of a HEAD to be p and the
248
00:22:44,480 --> 00:22:49,010
probability of a TAIL to be 1 minus p of course,
because the HEAD and the TAIL together, if
249
00:22:49,010 --> 00:22:53,520
I add these two probabilities, should be equal
to 1, what is the probability that there are
250
00:22:53,520 --> 00:22:58,390
k successes in n trials?
So, I repeat the experiments and the question
251
00:22:58,390 --> 00:23:04,970
is, what is the probability that among the
n trials, there are actually k number of successes?
252
00:23:04,970 --> 00:23:09,170
We know, this actually works out to this,
that is, n choose k which means that we choose
253
00:23:09,170 --> 00:23:14,920
the k success points and then for k success
points, the probability will be actually p
254
00:23:14,920 --> 00:23:21,100
to raise to the power of k. Since, the other
points are failure points, so they should
255
00:23:21,100 --> 00:23:25,500
be also multiplied by one minus p whole to
the power of n minus k, because there are
256
00:23:25,500 --> 00:23:32,500
n minus k failure points; so, this gives us
the number of ways of the probability that
257
00:23:33,110 --> 00:23:36,070
there are k success in n trials.
258
00:23:36,070 --> 00:23:43,070
Now, if a random variable Y takes values like
0 1 to n, so 0 1 to n means - if I say that
259
00:23:44,850 --> 00:23:49,760
this success, that is, the number of successes
in n trials and I denote that by a random
260
00:23:49,760 --> 00:23:55,040
variable, then this random variable can take
values from 0, that is, it can be that there
261
00:23:55,040 --> 00:24:01,360
are no success to n, that is, all of them
are success. Therefore, the random variable
262
00:24:01,360 --> 00:24:08,200
Y can take values from 0 to n and for values,
which lie between 0 to 1, that is, for p,
263
00:24:08,200 --> 00:24:12,570
the probability that Y, and we say that, the
probability that Y is equal to k which means
264
00:24:12,570 --> 00:24:18,300
that there are k success is given by, n choose
k p to the power of k multiplied by 1 minus
265
00:24:18,300 --> 00:24:23,320
p whole to the power of n minus k.
If this random variable satisfies this probability
266
00:24:23,320 --> 00:24:30,320
distribution, then we say that Y follows binomial
distribution and this is the very common and
267
00:24:31,170 --> 00:24:34,710
useful distribution.
268
00:24:34,710 --> 00:24:39,360
Then we talk about something called law of
large numbers very useful. So, it says that
269
00:24:39,360 --> 00:24:44,010
- if we repeat a trial for a large number
of times, where n is suppose infinity and
270
00:24:44,010 --> 00:24:50,260
n tends to infinity and we note the number
of successes, after a point the number of
271
00:24:50,260 --> 00:24:54,300
success will actually remain a constant and
will be computed by something, which is called
272
00:24:54,300 --> 00:24:58,540
the expectation. The expectation is nothing
but the number of times you are repeating
273
00:24:58,540 --> 00:25:01,330
the experiment multiplied by the probability
of success.
274
00:25:01,330 --> 00:25:08,330
The p is the probability of success; so this
is often referred to the expectation of the
275
00:25:08,530 --> 00:25:12,200
random variable. As we stated, we have a limit
n tends to infinity, the probability that
276
00:25:12,200 --> 00:25:19,200
epsilon n by n minus p is lesser than a very
small number - small, but fixed number - and
277
00:25:19,250 --> 00:25:23,300
this probability is equal to one. Therefore,
this is something which is called the law
278
00:25:23,300 --> 00:25:28,490
of large numbers, which says that if you repeat
the experiment for a large number of times,
279
00:25:28,490 --> 00:25:33,880
then essentially you will find that the number
of times you get the number of success will
280
00:25:33,880 --> 00:25:37,850
actually be found out by computing the expectation
of the random variable.
281
00:25:37,850 --> 00:25:44,240
So, the expectation of the random variable
gives us an estimate about the number of times
282
00:25:44,240 --> 00:25:48,860
of success of an experiment will occur, if
I repeat the experiment for a large number
283
00:25:48,860 --> 00:25:55,180
of times. Now, this particular law and the
concept of binomial distribution have got
284
00:25:55,180 --> 00:25:59,740
a very important application in the field
of cryptographic analysis.
285
00:25:59,740 --> 00:26:06,740
So, for that, let us consider the particular
result. It says that let epsilon be an event
286
00:26:07,870 --> 00:26:13,700
in a probability space X with probability
of epsilon being equal to p and p is greater
287
00:26:13,700 --> 00:26:20,470
than 0, and what we do is that repeatedly,
we perform the random experiment X independently.
288
00:26:20,470 --> 00:26:25,240
So, what we are saying here is, there is an
event epsilon; so, epsilon is an event and
289
00:26:25,240 --> 00:26:32,240
there is the probability space X. So, what
we know that the probability that epsilon
290
00:26:35,290 --> 00:26:42,290
occurs, that is, this particular event occurs,
is actually given by p, where p is some non-zero
291
00:26:43,290 --> 00:26:49,240
value, it is greater than zero. So, what we
then do is that we repeat this experiment
292
00:26:49,240 --> 00:26:53,280
again and again.
So, we repeat the experiment X once, we do
293
00:26:53,280 --> 00:27:00,280
the experiment X twice, and what we need is
that we need to find out the expected number
294
00:27:02,990 --> 00:27:08,630
of experiments of x until epsilon occurs the
first time. So, what we need is that we need
295
00:27:08,630 --> 00:27:13,410
the expected number of the experiments of
x until epsilon occurs the first time. So,
296
00:27:13,410 --> 00:27:20,410
we are interested in a particular event; we
know the probability of this event and we
297
00:27:22,770 --> 00:27:27,540
need to find out the number of times we would
like to repeat this experiment, until we first
298
00:27:27,540 --> 00:27:34,540
get the success. The success is defined as
the fact that this event, epsilon occurs.
299
00:27:34,610 --> 00:27:41,610
So, how do I compute that? What we say, if
this G is a random variable, it says that
300
00:27:43,250 --> 00:27:50,250
this is the expected number of experiments
of x until epsilon occurs the first time,
301
00:28:09,250 --> 00:28:13,360
and what we will do is that we will try to
give you a proof or develop a proof that this
302
00:28:13,360 --> 00:28:20,360
expectation E G is actually given by the reciprocal
of p. Now, what is the impact of this result?
303
00:28:22,210 --> 00:28:29,070
Suppose, there is an attack, suppose, you
develop an attack and you say that the probability
304
00:28:29,070 --> 00:28:33,600
that this attack works has a complexity of
say, 2 to the power of minus n.
305
00:28:33,600 --> 00:28:40,600
So, that means the fact, that you can actually
find out that particular key is 2 to power
306
00:28:43,780 --> 00:28:50,760
of minus n, then these particular results
gives us a kind of indication that, if I repeat
307
00:28:50,760 --> 00:28:56,920
the experiment, if I repeat the attack for
1 by 2 to the power of minus n number of times,
308
00:28:56,920 --> 00:29:02,050
that is, 2 to the power of n number of times,
then I should get the attack working at least
309
00:29:02,050 --> 00:29:05,080
once.
That is, after 2 to the power of n operations,
310
00:29:05,080 --> 00:29:12,080
I should get the attack to work. So, therefore,
if I say that for example, in US if I say,
311
00:29:12,679 --> 00:29:18,270
that they have got 128 bit security and if
I tell you that the probability that you can
312
00:29:18,270 --> 00:29:23,309
actually make the attack work is 2 power of
minus 128, then immediately you know that
313
00:29:23,309 --> 00:29:27,080
if I repeat the experiment for 2 to the power
of 128 number of times, for which is very
314
00:29:27,080 --> 00:29:32,950
huge, you will get 1 success.
Therefore, this actually gives us a nice indication
315
00:29:32,950 --> 00:29:38,850
to find out, if there is a property which
you can exploit for an attack, it gives you
316
00:29:38,850 --> 00:29:42,920
an estimate about the number of times you
need to repeat the experiment to get a success
317
00:29:42,920 --> 00:29:46,840
once.
The proof of this is actually quite simple
318
00:29:46,840 --> 00:29:52,490
and follows from the binomial distribution
notions and the idea is as follows: what you
319
00:29:52,490 --> 00:29:59,490
do is, you find out probability of G equal
to the value at G is equal to t. You know
320
00:30:00,440 --> 00:30:07,440
that G equal to t means the success occurs
at the th number of times, that is, it occurs
321
00:30:11,820 --> 00:30:18,820
at tth instance, which means that the previous
experiments till t minus 1 has been failures.
322
00:30:19,840 --> 00:30:26,490
Therefore, this probability can be computed
by 1 minus p raised to power of t minus 1
323
00:30:26,490 --> 00:30:33,490
because they were all failures multiplied
by p; therefore, the expectation of g, by
324
00:30:34,630 --> 00:30:40,570
using the law of large numbers, will be equal
to this probability multiplied by the number
325
00:30:40,570 --> 00:30:44,570
of times you are repeating the experiment.
So, what you are doing is that you are repeating
326
00:30:44,570 --> 00:30:50,410
this for t number of times and therefore,
you multiply this by 1 minus p to the power
327
00:30:50,410 --> 00:30:57,410
of t minus 1 and multiply that by p. Now,
you note that this particular t can be anything,
328
00:30:57,730 --> 00:31:04,730
it can go from t equal to 1, when the first
time we are actually not going to prove this,
329
00:31:10,890 --> 00:31:17,830
even further, this is actually equal to minus
p differential with respect to p of sigma
330
00:31:17,830 --> 00:31:24,830
1 minus p whole to the power of t.
So, this will work out like this where t runs
331
00:31:25,000 --> 00:31:32,000
from 1 to infinity; you can check this that
it actually gives you the same value; so,
332
00:31:32,720 --> 00:31:38,350
then that refers from simple differential
calculus, if you differentiate this, you will
333
00:31:38,350 --> 00:31:42,630
get this.
So, this means that if I take this sigma,
334
00:31:42,630 --> 00:31:48,210
then this is nothing but minus p d d p and
I would like to add this sigma. So, this 1
335
00:31:48,210 --> 00:31:52,700
minus p to the power of 1 plus 1 minus p to
the power of 2 plus 1 minus p to the power
336
00:31:52,700 --> 00:31:59,700
of 3, so until 1 minus p to the power of infinity.
So, you note that since this value is less
337
00:32:00,170 --> 00:32:05,460
than 1, this sigma actually converges and
this summation actually converges and what
338
00:32:05,460 --> 00:32:12,460
we get is d d p of 1 by p minus 1, and this
actually works out to 1 by p. So, therefore,
339
00:32:17,480 --> 00:32:22,290
from here we actually get an estimate about
the number of times we have to repeat the
340
00:32:22,290 --> 00:32:29,290
experiment, so that we actually get this experiment
to work.
341
00:32:29,350 --> 00:32:34,700
Therefore, it is important to know this result,
because this actually gives us an idea about
342
00:32:34,700 --> 00:32:41,700
if there is an experiments' success probability
defined, then from there to get an estimate
343
00:32:42,920 --> 00:32:48,740
about the number of times we need to repeat
the experiment to get that success or to get
344
00:32:48,740 --> 00:32:49,720
that event.
345
00:32:49,720 --> 00:32:55,500
Then we come to the next important concept
in today's class, which is called the birthday
346
00:32:55,500 --> 00:33:01,170
paradox. Birthday paradox is actually quite
central to the idea of analysis of ciphers.
347
00:33:01,170 --> 00:33:08,140
So, consider a function f, which is a mapping
from X to Y, where Y is a set of n elements.
348
00:33:08,140 --> 00:33:14,860
Consider this class of students form X, let
Y denote the birthday, say, fifteenth September
349
00:33:14,860 --> 00:33:16,309
is a birthday of a person X.
350
00:33:16,309 --> 00:33:23,309
So, I mean, there are two things - one of
them is X and the other one is Y. So, we are
351
00:33:26,750 --> 00:33:32,900
considering a mapping from X to Y. So, let
us consider this class of students from X.
352
00:33:32,900 --> 00:33:37,240
Let us consider that in this class there are
X students, that is, there are X students
353
00:33:37,240 --> 00:33:43,559
and let us consider that their possible number
of birthdays can be 365. So, there are 365
354
00:33:43,559 --> 00:33:49,940
possible days of birthday. So, choose a person,
say A and we know that A will be mapped to
355
00:33:49,940 --> 00:33:55,640
one of these days from this 365; if there
is another person, then he will also or she
356
00:33:55,640 --> 00:34:02,640
will be mapped to another particular day among
this 365. So, the question is, if you consider,
357
00:34:03,030 --> 00:34:08,240
for example, what we are essentially considering
is that we are considering the fact, that
358
00:34:08,240 --> 00:34:15,240
there can be two persons or say, another person
who are born on the same day; so, there is
359
00:34:15,520 --> 00:34:17,200
the collision of these two birthdays.
360
00:34:17,200 --> 00:34:22,500
So, we can say that the problem is like this,
that is, we can abstract out this problem
361
00:34:22,500 --> 00:34:28,530
as this, that is, choose k pair-wise distinct
points from X uniformly and define collision
362
00:34:28,530 --> 00:34:34,450
to be the event for i not equal to j, where
f x i is equal to f x j. We check from the
363
00:34:34,450 --> 00:34:39,379
corresponding f x i's, when a collision occurs
and clearly, the probability of a collision
364
00:34:39,379 --> 00:34:44,040
increases if k is increased. That is, if I
choose large number of points, that is, if
365
00:34:44,040 --> 00:34:48,270
I choose large number of students from the
class, then the probability of that they are
366
00:34:48,270 --> 00:34:52,869
born on the same day also increases.
The question now, is what is the least value
367
00:34:52,869 --> 00:34:58,820
of k? We are actually interested in the least
value of k, so that the probability of a collision
368
00:34:58,820 --> 00:35:03,400
is more than epsilon; so we would like to
give a lower bound of the probability. So,
369
00:35:03,400 --> 00:35:06,950
first of all you understand that there are
two important things - one of them is this
370
00:35:06,950 --> 00:35:10,740
least value, and then we are actually giving
the lower bound of the probability.
371
00:35:10,740 --> 00:35:17,190
So, why are we choosing the least value of
k? Because if I start increasing this k and
372
00:35:17,190 --> 00:35:21,390
if this k is quite large, then this probability
will obviously increase and the probability
373
00:35:21,390 --> 00:35:25,700
of collision will obviously increase, but
what we are interested is in finding out,
374
00:35:25,700 --> 00:35:31,090
what is the least value of this k so that
this collision occurs?
375
00:35:31,090 --> 00:35:34,790
Then we are actually trying to give a lower
bound of this probability, that is, the probability
376
00:35:34,790 --> 00:35:41,280
should be at least this much, so we would
like to give an answer to this question. So,
377
00:35:41,280 --> 00:35:46,290
in talking about in the birthdays' term, like
we are actually interested in finding out,
378
00:35:46,290 --> 00:35:50,470
what is the number of students, which should
be in that class, that is, what is the least
379
00:35:50,470 --> 00:35:55,210
number of students which should be in the
class, so that the probability that two of
380
00:35:55,210 --> 00:35:59,590
them are born on the same day is say more
than half.
381
00:35:59,590 --> 00:36:04,210
So, I am interested in finding out, what is
the least size of the class. So that there
382
00:36:04,210 --> 00:36:09,000
are two students, at least, who are born on
the same day and a probability of this fact
383
00:36:09,000 --> 00:36:14,940
is more than or greater than or equal to half.
So, we can actually compute this quite easily
384
00:36:14,940 --> 00:36:20,310
and we can do this as follows: like, the probability
of no collisions, let us find that first in
385
00:36:20,310 --> 00:36:21,690
among this k persons.
386
00:36:21,690 --> 00:36:28,690
So, we know that if there are k persons, that
is, if there are k persons in the class and
387
00:36:29,780 --> 00:36:35,260
let us consider that this person is mapped
to a particular date, that is, he is basically
388
00:36:35,260 --> 00:36:40,440
born on the same day. So, if we are considering
the probability of no collisions, then it
389
00:36:40,440 --> 00:36:47,440
means that the second person should not be
born on this date that means, from 365 days,
390
00:36:47,580 --> 00:36:54,580
he can be born on 364 days; so this is 364
divided by 365 or that is 1 minus 1 by 365.
391
00:36:59,960 --> 00:37:05,520
What about the third person? The third person
cannot be born on this date or this date,
392
00:37:05,520 --> 00:37:10,010
therefore, his probability will be 1 minus
2 divided by 365.
393
00:37:10,010 --> 00:37:14,800
Similarly, if I consider the last person,
that is, like this person, then his probability
394
00:37:14,800 --> 00:37:21,800
will be 1 minus k minus 1 divided by 365 because
there are k persons in this set. I am considering
395
00:37:22,710 --> 00:37:27,470
this least size of the set.
Therefore, this probability and all of them
396
00:37:27,470 --> 00:37:33,130
are independent events. So, the probability
of no collision among these k persons can
397
00:37:33,130 --> 00:37:40,130
be found out by this, that is, I multiply
1 minus 1 by 365 with 1 minus 2 by 365 with
398
00:37:40,859 --> 00:37:47,859
1 minus k minus 1 by 365 and so on; therefore,
this is nothing but the product of these probabilities.
399
00:37:48,130 --> 00:37:54,119
Now, for a large n and a small x, we have
got this approximation, that is, one plus
400
00:37:54,119 --> 00:38:00,500
x by n is nothing but e to the power of x
by n; so for the large n and a small x, this
401
00:38:00,500 --> 00:38:07,500
holds. So, if I use this approximation, then
this product of 1 minus i by 365, as each
402
00:38:09,540 --> 00:38:15,600
of this terms is substituted by e to the power
of minus i divided by 365, then we try to
403
00:38:15,600 --> 00:38:20,460
find out the product of these terms. This
actually works out to e to the power of minus
404
00:38:20,460 --> 00:38:26,859
k into k minus 1 divided by 730, because when
we are doing a product of these terms, in
405
00:38:26,859 --> 00:38:31,340
the powers we are doing actually a sigma;
so, if I do this sigma, then this sigma of
406
00:38:31,340 --> 00:38:38,340
minus i by 365 will work out as this.
407
00:38:39,810 --> 00:38:46,810
See, what we are doing is this, that is, i
equal to 1 to k minus 1 and we are multiplying
408
00:38:46,890 --> 00:38:53,890
1 minus i by 365. So, by using the approximation
is, i equal to 1 to k minus 1 into e to the
409
00:38:56,100 --> 00:39:01,970
power of minus i by 365.
Now, when we are doing this product, then
410
00:39:01,970 --> 00:39:07,710
this means that what we are doing is e to
the power of minus i by 365 and then we are
411
00:39:07,710 --> 00:39:14,080
doing a summation here, so that means that
it is e to the power of 1 by 365, if you take
412
00:39:14,080 --> 00:39:21,080
common, it is 1 plus 2 plus, so on till k
minus 1. So, this is e to the power of 1 by
413
00:39:26,160 --> 00:39:33,160
365 into k into k minus 1 by 2; there is a
minus out. So, this minus will come out; so,
414
00:39:35,220 --> 00:39:42,220
minus 1 by 365 into k into k minus 1 by 2,
so that is e to the power of minus k into
415
00:39:43,310 --> 00:39:50,310
k minus 1 divided by 730.
Suppose, we say this is the probability where
416
00:39:53,119 --> 00:39:57,619
there is no collision, so what is the probability
that there is at least one collision? That
417
00:39:57,619 --> 00:40:04,619
is simple, that is 1 minus e to the power
of minus k into k minus 1 divided by 730,
418
00:40:05,430 --> 00:40:12,040
and if I say this probability should be at
least equal to 0.5, then we can actually get
419
00:40:12,040 --> 00:40:17,330
an estimate of this k by computing, or rather,
equating this to 0.5. That means, you are
420
00:40:17,330 --> 00:40:21,940
trying to say, this probability should be
greater than equal to 0.5, but in order to
421
00:40:21,940 --> 00:40:28,940
compute the value of k, let us equate this
to be 0.5 and find out an estimate of k.
422
00:40:31,040 --> 00:40:35,119
We can actually see this and I am not really
going into this, that is, you can find by
423
00:40:35,119 --> 00:40:42,119
calculations like this, that is, 1 minus e
to the power of minus k into k minus 1 divided
424
00:40:42,750 --> 00:40:48,940
by 730 is equal to 0.5 and therefore, k into
k minus 1 divided by 730 will be equal to
425
00:40:48,940 --> 00:40:55,349
l n 2; so, you can actually do this type.
Therefore, what you can do is that you can
426
00:40:55,349 --> 00:41:02,349
bring this e to the power of minus k into
k minus 1 by 730 will be equal to 0.5 and
427
00:41:04,660 --> 00:41:11,660
then if you take a log on both sides, then
it works out to k into k minus 1 by 730 is
428
00:41:12,359 --> 00:41:19,359
nothing but l n of 2. Therefore, you can get
k square minus k will be equal to 730 into
429
00:41:21,030 --> 00:41:28,030
l n 2. Therefore, here if you neglect k with
k square, then k will be roughly equal to
430
00:41:28,330 --> 00:41:35,330
square root of 730 l n 2 and that will be
approximately 23; so, which means that if
431
00:41:39,640 --> 00:41:45,070
there is a random room of 23 people, then
the probability that there are 2 persons with
432
00:41:45,070 --> 00:41:47,960
the same birthday is 0.5.
433
00:41:47,960 --> 00:41:53,340
This actually seems to be a paradox, why?
Because what if I asked you, what is the probability
434
00:41:53,340 --> 00:41:58,490
that there are two people who are born on
the same date is actually very small; it is
435
00:41:58,490 --> 00:42:01,440
actually 1 by 365, so that is a very small
number.
436
00:42:01,440 --> 00:42:06,849
But we see that in a random room of 23 people,
the probability that there are two persons
437
00:42:06,849 --> 00:42:12,200
which are born in the same birthday is actually
quite high; it is actually 0.5 and we will
438
00:42:12,200 --> 00:42:16,990
see that if I increase this 23 people to larger
and actually this probability shoots up quite
439
00:42:16,990 --> 00:42:21,690
fast.
Now, we have a very specific example, but
440
00:42:21,690 --> 00:42:26,780
you can work out the more general case, that
is, instead of 365 we can have nearly 2 to
441
00:42:26,780 --> 00:42:31,280
the power of n possible outcomes and then
try to find out the estimate of k, but what
442
00:42:31,280 --> 00:42:35,910
we will find is that this k will roughly be
proportional to the square root of total number
443
00:42:35,910 --> 00:42:41,000
of possible ways. So, which means that if
you need to do a brute force search of 2 to
444
00:42:41,000 --> 00:42:45,270
the power of n possible values, then if you
apply the birthday paradox, then this gives
445
00:42:45,270 --> 00:42:51,580
you an estimate that after 2 to the power
of n by 2 random searches, there is a high
446
00:42:51,580 --> 00:42:55,900
probability that two of them will actually
result in a collision.
447
00:42:55,900 --> 00:43:01,210
So, this particular paradox or this particular
analysis is used again and again to do the
448
00:43:01,210 --> 00:43:05,760
analysis of ciphers and develop with security
proofs and other stuff. So, there are lot
449
00:43:05,760 --> 00:43:09,859
of applications deciding the bit length of
the hash function, digital signature schemes
450
00:43:09,859 --> 00:43:15,160
are have to be kept more than 128 bits; it
is used for doing cryptanalysis like index
451
00:43:15,160 --> 00:43:22,160
computation which are algorithms to solve
something, which is called, the discrete logarithmic
452
00:43:22,290 --> 00:43:23,240
problems.
453
00:43:23,240 --> 00:43:28,650
We actually try to find out a very interesting
application of the birthday paradox; so it
454
00:43:28,650 --> 00:43:34,109
is something which is called cycle finding
algorithms. Suppose, there is a linked list
455
00:43:34,109 --> 00:43:39,190
which is very large and I am interested in
finding out a cycle in the linked list; so,
456
00:43:39,190 --> 00:43:45,369
this is one question that may appear. One
way of doing that is, go through the entire
457
00:43:45,369 --> 00:43:50,950
linked list and this store a huge amount of
data and once you see a repetition, you report
458
00:43:50,950 --> 00:43:57,099
a cycle, but can we do better than that?
Because what the birthday paradox says is
459
00:43:57,099 --> 00:44:02,619
that if you choose the elements in these lists
at random and if there are 2 to the power
460
00:44:02,619 --> 00:44:09,390
of n possible elements, then after 2 to the
power of n by 2 possible searches or possible
461
00:44:09,390 --> 00:44:16,390
points, there is a high probability that two
of them will actually lead to a collision.
462
00:44:16,480 --> 00:44:23,480
Suppose, you actually keep on this arbitrarily
choosing the points, then we will find that
463
00:44:24,270 --> 00:44:29,770
after some point it may happen that if you
keep on computing, you will find there will
464
00:44:29,770 --> 00:44:35,869
be a resulting collision the moment you get
a collision, you know that there is a cycle.
465
00:44:35,869 --> 00:44:41,000
Consider a function F from S to itself, what
you do is, you start from X, then what you
466
00:44:41,000 --> 00:44:45,990
that is you start from X 0, So, X 0 could
be here and then you generate a sequence by
467
00:44:45,990 --> 00:44:51,349
recursively as follows: X i plus 1 is equal
to F of X i and you just keep on computing
468
00:44:51,349 --> 00:44:52,050
this value.
469
00:44:52,050 --> 00:44:57,200
The goal is to find a collision, such that
X i and X j are same, that is, X i and X j
470
00:44:57,200 --> 00:45:01,650
are resulting in the same value. So, we can
actually have a birthday approach, that is,
471
00:45:01,650 --> 00:45:06,330
note if f is random then the birthday paradox
comes into play and we expect a collision
472
00:45:06,330 --> 00:45:11,900
after 2 to the power of n by 2 points, if
S has got 2 to the power n points.
473
00:45:11,900 --> 00:45:16,740
So, assume that the cycle structure is like
this, that there is a tail from X 0 to X s
474
00:45:16,740 --> 00:45:23,740
minus 1 and there is a loop from X s to X
s plus l, that is, from X 0 to X s minus 1,
475
00:45:23,869 --> 00:45:28,210
you come here - this is the tail, and then
from the next one, that is, the X s to X s
476
00:45:28,210 --> 00:45:34,440
plus l, there is a kind of cycle; so, that
means this cycle has got a length of l.
477
00:45:34,440 --> 00:45:40,070
Now, the question is how to detect this cycle?
One way of doing this could be a tree based
478
00:45:40,070 --> 00:45:43,720
approach. So, what you do is that you start
storing the sequence elements in a binary
479
00:45:43,720 --> 00:45:48,170
search tree as long as there is no duplicate;
so, you keep on adding them to the tree and
480
00:45:48,170 --> 00:45:53,230
as long as there is no duplicate. Now, the
first duplicate occurs when X s plus l is
481
00:45:53,230 --> 00:45:57,470
to be inserted because there is already X
s which is inserted into the tree and the
482
00:45:57,470 --> 00:46:01,890
moment you get to insert X s plus l, you know
that there is a kind of duplication and therefore
483
00:46:01,890 --> 00:46:05,340
you report a collision.
Now, what is the time complexity here? We
484
00:46:05,340 --> 00:46:09,000
know that if we are using a binary search
tree, then the complexity will be around O
485
00:46:09,000 --> 00:46:14,650
s plus l log of s plus l, but what about the
space complexity? The space complexity is
486
00:46:14,650 --> 00:46:20,130
still O s plus l, so that means that although
the runtime is optimal because actually you
487
00:46:20,130 --> 00:46:25,599
cannot do better than this in terms of time,
but the space requirement is quite high, that
488
00:46:25,599 --> 00:46:30,970
is, this could be an exponentially large;
this could be quite a large value, so the
489
00:46:30,970 --> 00:46:33,710
question is, can I make the space requirement
less.
490
00:46:33,710 --> 00:46:38,260
A very interesting technique is being adopted
for finding out the cycles. A very simple
491
00:46:38,260 --> 00:46:44,130
algorithm I will discuss here, is called Floyd's
cycle finding algorithm and it works out as
492
00:46:44,130 --> 00:46:49,490
follows; we will see an application of this
in context to factorization, when we discuss
493
00:46:49,490 --> 00:46:53,710
about factorization. So, you see that what
we do is that we set that Y 0 is equal to
494
00:46:53,710 --> 00:46:59,990
X 0 and we compute another sequence Y i plus
1 as F of F of Y i, so instead of applying
495
00:46:59,990 --> 00:47:05,560
once F, I am applying twice F.
So, the input initial sequence is X 0 and
496
00:47:05,560 --> 00:47:10,510
we said the maximum interactions is M; so,
what we do is that we start x is equal to
497
00:47:10,510 --> 00:47:16,180
X 0 and y is equal to X 0. So, we start at
the same point and then for all these possible
498
00:47:16,180 --> 00:47:20,869
iterations that we have set as the maximum
value, we compute x is equal to F x and we
499
00:47:20,869 --> 00:47:27,869
compute twice for y; so, we apply F twice
and we obtain the points. If we get a collision
500
00:47:28,550 --> 00:47:32,770
at some point, like if x and y is same, then
we say that there is the collision between
501
00:47:32,770 --> 00:47:39,770
i and 2 i, because if you say that this x
sequence that at point i and y sequence at
502
00:47:40,200 --> 00:47:46,310
point 2 i and if you get, so then you say
that there is a collision, otherwise you say
503
00:47:46,310 --> 00:47:52,560
that there is a failure. If you see that actually
you have got quite good chance of getting
504
00:47:52,560 --> 00:47:58,190
a collision because of the simple fact that
this particular length, that is, the length
505
00:47:58,190 --> 00:48:04,960
of the cycle will divide 2 i minus i; it will
actually divide this 2 i minus i.
506
00:48:04,960 --> 00:48:11,960
So, this is actually a very useful algorithm.
It is often useful for doing analysis and
507
00:48:15,089 --> 00:48:20,420
one of the reasons is that for example, if
you consider the cycle it is like this, it
508
00:48:20,420 --> 00:48:27,420
looks like a rho. If you start from X 0 and
you come to this, suppose this point is X
509
00:48:27,560 --> 00:48:34,560
s minus 1 and this is X s and we keep on obtaining
and then this is again X s plus l; therefore
510
00:48:34,869 --> 00:48:40,660
the length of the cycle is l. There are some
l successive points here and therefore, you
511
00:48:40,660 --> 00:48:45,280
know immediately that there must be from the
l successive points, there must be one point
512
00:48:45,280 --> 00:48:51,450
here, call it i dash, which is divisible by
l; so, there are l successive terms, so there
513
00:48:51,450 --> 00:48:58,450
must be one term i dash which l divides. Therefore,
if l divides i dash, then that means that
514
00:49:00,070 --> 00:49:07,070
l divides 2 i dash minus i dash, so that is
i dash itself; that means, if you compute
515
00:49:08,630 --> 00:49:15,170
the series of X like this, that is, compute
X i dash and if you compute X 2 i dash, then
516
00:49:15,170 --> 00:49:21,089
that means you will find out if you are basically
computing these values, then you may not be
517
00:49:21,089 --> 00:49:25,190
able to find out the first time this particular
collision occurs, but you will find out a
518
00:49:25,190 --> 00:49:32,190
successive point when the collision occurs.
Therefore that will give you this collision
519
00:49:33,250 --> 00:49:40,250
which will lead to the detection of the cycle,
so we will see this in one of our future classes
520
00:49:41,589 --> 00:49:43,609
when we discuss about factorization.
521
00:49:43,609 --> 00:49:50,609
So, we conclude our talk with some notions
of information measurement. Let us consider
522
00:49:54,020 --> 00:49:59,810
a language of n different symbols - a 1 to
a n, and let us assign independent probabilities
523
00:49:59,810 --> 00:50:04,349
like - probability of a 1, probability of
a 2, and so on, till probability of a n.
524
00:50:04,349 --> 00:50:08,820
These probabilities must satisfy that sigma
of these probabilities will be equal to 1;
525
00:50:08,820 --> 00:50:15,290
so, what is the entropy of the source S? The
entropy of source S is defined as H S is equal
526
00:50:15,290 --> 00:50:22,290
to sigma probability of a i multiplied by
logarithm of 1 by probability of a i base
527
00:50:23,820 --> 00:50:24,540
2.
528
00:50:24,540 --> 00:50:30,869
So, this actually gives us the number of bits
which are required per source output; therefore,
529
00:50:30,869 --> 00:50:37,869
this notion of entropy is often useful for
computing the information content. Let us
530
00:50:38,140 --> 00:50:43,790
consider some examples, if S outputs a 1 with
probability of 1, then H S will be equal to
531
00:50:43,790 --> 00:50:47,880
0 because your probability is equal to one;
therefore, if i plug in probability 1 here,
532
00:50:47,880 --> 00:50:54,040
then this logarithm computes to 0 and therefore,
this H S is 0. But if S outputs n symbols
533
00:50:54,040 --> 00:50:58,869
with equal probability, that is, the probability
is 1 by n, that is, S is the source of uniform
534
00:50:58,869 --> 00:51:05,050
distribution, then this H S will compute to
1 by n sigma of logarithm n base 2 and that
535
00:51:05,050 --> 00:51:09,310
is nothing but logarithm n base 2.
Therefore, H S can be thought of as the amount
536
00:51:09,310 --> 00:51:16,310
of uncertainty or information in each output
from S; so consider, if you have got a binary
537
00:51:16,650 --> 00:51:21,580
sequence of values, like if there is a n bit
length then, there are 2 to the power of n
538
00:51:21,580 --> 00:51:27,070
possible values here and all these 2 to the
power of n possible values can be chosen.
539
00:51:27,070 --> 00:51:32,930
Therefore, the probability of a particular
sequence is nothing but 1 by 2 to the power
540
00:51:32,930 --> 00:51:34,300
of n.
541
00:51:34,300 --> 00:51:41,300
So, in that case the number of number of bits
which are there, are in order to find out
542
00:51:45,410 --> 00:51:50,849
the value of X and are actually n; therefore,
you need to ascertain these n bits in order
543
00:51:50,849 --> 00:51:55,830
to find out the value.
So, that is essentially the amount of uncertainty
544
00:51:55,830 --> 00:52:02,400
or the amount of information you need to find
out the value. If we think of the previous
545
00:52:02,400 --> 00:52:07,270
example, that is, if S outputs a particular
A 1 with the probability of 1 that means,
546
00:52:07,270 --> 00:52:10,619
you know that there is no information, that
is, no uncertainty; therefore, the uncertainty
547
00:52:10,619 --> 00:52:16,950
is 0. In this case, the uncertainty was 0,
but here the uncertainty is logarithm n base
548
00:52:16,950 --> 00:52:23,950
2, that means, if I say, if S outputs n symbols
and if n is equal to 2 to the power of capital
549
00:52:26,270 --> 00:52:33,270
N, which I denoted by n bits, then this logarithm
will work out to be n. So, that means that
550
00:52:34,099 --> 00:52:40,650
there is an uncertainty of n bits, which is
there in H S. So, there is an uncertainty
551
00:52:40,650 --> 00:52:47,650
of capital N bits in the source S; therefore
this notion of entropy gives us an idea about
552
00:52:49,420 --> 00:52:51,450
uncertainty or information.S
553
00:52:51,450 --> 00:52:57,930
Now, I will leave you with a question that
is - suppose that there is a four digit PINs
554
00:52:57,930 --> 00:53:04,260
and which are randomly distributed. How many
people must be in a room such that the probability
555
00:53:04,260 --> 00:53:08,109
that two of them have got the same pin is
at least half?
556
00:53:08,109 --> 00:53:11,980
So, you can immediately understand that you
should be applying birthday paradox to solve
557
00:53:11,980 --> 00:53:18,170
this problem. The references that are used
are: Wenbo Mao's Modern Cryptography Theory
558
00:53:18,170 --> 00:53:22,540
and Practice and some of the things are taken
from Algorithmic Cryptanalysis by Antoine
559
00:53:22,540 --> 00:53:28,910
Joux and by Buchmann, Introduction to Cryptography
- it is a Springer book.
560
00:53:28,910 --> 00:53:34,099
So, in next day's topic we will take up the
topic of classical cryptosystems and discuss
561
00:53:34,099 --> 00:53:36,170
about some classical methods of doing cryptography.