1
00:00:17,190 --> 00:00:23,570
Welcome to this class on cryptanalysis of
classical ciphers. So, we will essentially
2
00:00:23,570 --> 00:00:28,560
be continuing with what we were discussing
about classical ciphers, and rather discuss
3
00:00:28,560 --> 00:00:35,220
about some cryptanalytic techniques or attacking
methods or methods to find out the keys in
4
00:00:35,220 --> 00:00:36,950
context to classical ciphers.
5
00:00:36,950 --> 00:00:41,870
So, in today's class, our objectives are as
follows: we will discuss about some models
6
00:00:41,870 --> 00:00:46,300
of for which are used for cryptanalysis; we
exists for cryptanalysis, and then, essentially
7
00:00:46,300 --> 00:00:51,830
discuss about the cryptanalysis of monoalphabetic
and polyalphabetic ciphers and conclude with
8
00:00:51,830 --> 00:00:54,400
cryptanalysis of hill ciphers.
9
00:00:54,400 --> 00:01:00,550
So, to start with we were discussing that
one of the fundamental principles in cryptography
10
00:01:00,550 --> 00:01:06,680
is or cryptanalysis is Kerckhoff's principle,
which says that the cryptosystem is always
11
00:01:06,680 --> 00:01:11,600
available in the public domain is known to
the adversary, but what is not known is the
12
00:01:11,600 --> 00:01:15,130
value of the key.
So, the entire secrecy of the cryptosystem
13
00:01:15,130 --> 00:01:22,130
lies in the key, and cryptanalysis is that
field of cryptology is tries to obtain the
14
00:01:22,310 --> 00:01:26,620
value of the key, and it tries to find out
the value of the key better than, typically
15
00:01:26,620 --> 00:01:30,650
better than a brute force search that, rather
than, rather than doing an exhaustive search,
16
00:01:30,650 --> 00:01:35,840
it tries to find out or develop method and
methods to find out or rather to obtain the
17
00:01:35,840 --> 00:01:39,350
key better than a brute force search of the
for the key.
18
00:01:39,350 --> 00:01:44,820
So, there are certain models which are been
laid down for cryptanalysis and these models
19
00:01:44,820 --> 00:01:49,620
are often useful for the study of ciphers
- the first and the most obvious in cipher
20
00:01:49,620 --> 00:01:54,030
text only attack, where the opponent posses
a string of cipher text and that means that
21
00:01:54,030 --> 00:02:00,470
the opponent the attacker has access to only
the ciphertext or the crypto text from there
22
00:02:00,470 --> 00:02:06,370
it tries to obtain the key.
Now, this is the kind of attack which is squarely
23
00:02:06,370 --> 00:02:12,560
would expects in a real life scenario, and
therefore, it is kind of the sort of the strongest
24
00:02:12,560 --> 00:02:16,750
or the hardest tasks from the point of view
of attacker, which has only the information
25
00:02:16,750 --> 00:02:23,040
of the cipher text, but in real life or rather
in in when we do our studies, we actually
26
00:02:23,040 --> 00:02:30,040
sometimes allow the attacker or the cryptanalyser
or the or the cryptanalyst some extra information
27
00:02:31,710 --> 00:02:35,250
apart from the cipher text.
What we do is there be sometimes, for example,
28
00:02:35,250 --> 00:02:40,480
we give an information of the plaintext and
try to say that you have the cipher text and
29
00:02:40,480 --> 00:02:44,730
you also have the corresponding plaintext,
and from, there you try to to deduce the value
30
00:02:44,730 --> 00:02:50,709
of the key. Now, this may not be, I mean,
I mean intuitively very very practical, but
31
00:02:50,709 --> 00:02:56,340
it is often very relevant and there can be
lot of examples where these kind of attacks
32
00:02:56,340 --> 00:03:01,330
which are known as the known plaintext attacks
or chosen plaintext attacks can be also practical.
33
00:03:01,330 --> 00:03:07,730
So, in typically in a known plaintext attack,
the opponent possesses a plaintext x and also
34
00:03:07,730 --> 00:03:12,620
the corresponding cipher text y.
Now, in order to give a practical scenario,
35
00:03:12,620 --> 00:03:17,900
imagine that we that is, for example, as a
case where we write an email and we respond
36
00:03:17,900 --> 00:03:22,550
to the email by generally press/pressing pressing
the reply button, so, what, what may happen
37
00:03:22,550 --> 00:03:27,880
is, for examples, the, the text that I have
written remains in your reply message. So,
38
00:03:27,880 --> 00:03:32,610
this is a typical example where I, for example,
I know what is the content of the plaintext,
39
00:03:32,610 --> 00:03:36,680
and if I observed that kind of way to out,
then also have an access to a corresponding
40
00:03:36,680 --> 00:03:40,580
cipher texts.
So, this is an example of know plaintext attacks,
41
00:03:40,580 --> 00:03:45,800
where I have an access to a plaintext and
also I obtain the corresponding cipher text.
42
00:03:45,800 --> 00:03:52,800
Now also this becomes kind of more relevant
in context to asymmetric ciphers where anybody
43
00:03:53,520 --> 00:03:57,920
can entry it, because, because as we as we
discussing that in a asymmetric cipher, there
44
00:03:57,920 --> 00:04:02,190
are two keys - right one of them is the public
key and other one is the private key.
45
00:04:02,190 --> 00:04:07,319
So, so, in this case, everybody who are, I
mean knows the public public key, right? So,
46
00:04:07,319 --> 00:04:10,819
the public key is known to everybody; so,
which means thateverybody] everybody can or
47
00:04:10,819 --> 00:04:17,819
anybody can encrypt a plaintext message, and
therefore, obtain I mean to obtain, the, the
48
00:04:18,739 --> 00:04:22,360
cipher text for a corresponding plaintext
is not a difficult job. So, therefore, you
49
00:04:22,360 --> 00:04:25,210
can have ample amount of tupples for plaintext
and ciphertext.
50
00:04:25,210 --> 00:04:30,080
So therefore, in such kind of conditions,
known plaintext attack is kind of not very
51
00:04:30,080 --> 00:04:34,820
kind and not very non intuitive or not very
impractical, and therefore, it is also used
52
00:04:34,820 --> 00:04:40,590
an important model for sharing. So, they,
there are some other model, so, cryptanalysis
53
00:04:40,590 --> 00:04:44,860
also like a chosen plaintext attack, where
attacker can choose plaintext and obtain the
54
00:04:44,860 --> 00:04:49,380
corresponding ciphertext. So, in this case,
the attacker can does, not, not only know
55
00:04:49,380 --> 00:04:55,199
the plaintext but can also choose the plaintext.
So, this is an extra power extra capability
56
00:04:55,199 --> 00:04:59,620
which is there in the adversary.
As you can understand all these are actually
57
00:04:59,620 --> 00:05:05,060
different levels of adversary. So, which one
is the kind of strongest attackers? The chosen
58
00:05:05,060 --> 00:05:09,790
plaintext attacker is, stronger, stronger
than the known plaintext attacker, which is
59
00:05:09,790 --> 00:05:12,190
stronger than the cipher text only attacker,
right?
60
00:05:12,190 --> 00:05:17,100
So, in terms of information, the cipher text
only attacker as by the least information
61
00:05:17,100 --> 00:05:22,260
compare to compare to known plaintext attacker
and, which, which has lesser information compare
62
00:05:22,260 --> 00:05:24,520
to chosen plaintext attacker.
63
00:05:24,520 --> 00:05:29,350
So, there, there is another important model
which is also used, which is known as the
64
00:05:29,350 --> 00:05:34,470
chosen ciphertext. So, in this case, the opponent
has got temporary access to also the decryption
65
00:05:34,470 --> 00:05:39,630
function. What he can do is that he can choose
ciphertext and decrypt to obtain the corresponding
66
00:05:39,630 --> 00:05:45,860
plaintexts, but what is to be kept in mind
is that in this case of chosen ciphertext,
67
00:05:45,860 --> 00:05:52,860
the, the idea is like this, that is, an attacker
is given a large number of ciphertext and
68
00:05:53,270 --> 00:05:58,040
also using its decryption function, it obtains
its corresponding plaintext.
69
00:05:58,040 --> 00:06:02,669
But in this decryption function, the key is
kind of embedded. Therefore, when we are doing
70
00:06:02,669 --> 00:06:06,840
the decryption function, then the attacker
does not have the knowledge of the key, right?
71
00:06:06,840 --> 00:06:11,290
but the at end of these kind of operation,
which we call as oracle queries; what is done
72
00:06:11,290 --> 00:06:17,020
is that the attacker is given a challenge
ciphertext and is asked to find out the corresponding
73
00:06:17,020 --> 00:06:21,850
key. So, this comes at end of all the previous
exchanges which has taken place.
74
00:06:21,850 --> 00:06:26,080
So, in this case, the attacker is kind of
the strongest, right? This version is kind
75
00:06:26,080 --> 00:06:30,509
of the strongest notion, which has got access
to not only the encryption function but also
76
00:06:30,509 --> 00:06:37,400
can do decryption functions of significant
number of corresponding ciphertext, right?
77
00:06:37,400 --> 00:06:42,590
So, therefore, in these case, in each case,
objective is to obtain the key, and we can
78
00:06:42,590 --> 00:06:47,240
say that in increasing order of strength,
the ciphertext only attacker is the least
79
00:06:47,240 --> 00:06:51,479
stronger attacker, strong, strong attacker;
then comes known plaintext attack; then comes
80
00:06:51,479 --> 00:06:55,780
chosen plaintext attack, and finally, the
chosen ciphertext attack which is the strongest
81
00:06:55,780 --> 00:06:59,310
form of the attacker, ok?
So, when you are designing a cipher, then
82
00:06:59,310 --> 00:07:03,210
ideally you would like to kind of look into
the chosen ciphertext attack, and kind of
83
00:07:03,210 --> 00:07:10,210
say that my attack, my crypto system is prevented
or protected against as a strong form of the
84
00:07:10,550 --> 00:07:13,320
attacker such as that of the chosen ciphertext
attack, right?
85
00:07:13,320 --> 00:07:17,940
So, therefore, ideally I would like to kind
of counter even the chosen ciphertext attack,
86
00:07:17,940 --> 00:07:22,740
and from the point of view, when you are kind
of attacking system, when you are doing research
87
00:07:22,740 --> 00:07:28,509
in crpytanalysis, then you would be more happy
if you able to find out a ciphertext only
88
00:07:28,509 --> 00:07:32,500
attack, and you kind of less happy as you
go down the series, right? So because the
89
00:07:32,500 --> 00:07:36,080
attacks becomes less and less stronger or
more weak, ok.
90
00:07:36,080 --> 00:07:43,080
So, now, we will kind of discuss some tools
which are often used for crpytanalysis, but
91
00:07:43,470 --> 00:07:49,789
it may be kept in mind these techniques are
mainly applicable to cipher which is old that
92
00:07:49,789 --> 00:07:54,259
is, the classical ciphers, the machine ciphers
or the modern ciphers are much more robust
93
00:07:54,259 --> 00:07:59,699
or much more strong. So, do start with, let
us make some observations, like we see that
94
00:07:59,699 --> 00:08:03,240
English language, for example, has got certain
probabilities of occurrences.
95
00:08:03,240 --> 00:08:06,990
So, therefore, typically we have got twenty
six letters and there are certain statistics
96
00:08:06,990 --> 00:08:12,289
which you can easily observe if you kind of
do as do do a analysis of a large number of
97
00:08:12,289 --> 00:08:16,130
texts.
For example, e has got the highest probability
98
00:08:16,130 --> 00:08:20,150
and occurs with the probability of around
twelve percent. The next in order comes, these,
99
00:08:20,150 --> 00:08:27,150
these letters, these alphabets like t a o
i n s h and r, and d and l comes around 0.04,
100
00:08:29,729 --> 00:08:36,180
and then some other letters like as mention
they are like c u m w f g y p and b, which
101
00:08:36,180 --> 00:08:43,180
comes between 0.015 and 0.028, and then, we
have got v k j x q and z which are less than
102
00:08:45,380 --> 00:08:49,310
zero point zero one.
So, if, so these are the single operators
103
00:08:49,310 --> 00:08:54,430
or letters. If you kind of absorbs diagrams
or double occurrences of letters, then t h
104
00:08:54,430 --> 00:08:59,209
is supposedly the most commonly offering diagram.
Then you have got h e, then you have got i
105
00:08:59,209 --> 00:09:05,089
n and you have got these diagrams in decreasing
order. Similarly, common tri trigrams would
106
00:09:05,089 --> 00:09:11,380
be like t h e t h e is the highest occurring
trigram and then you have got these particular
107
00:09:11,380 --> 00:09:15,930
series. So, you can actually form statistics
and we call this statistics as appriority
108
00:09:15,930 --> 00:09:18,620
statistics before we start the cryptanalysis,
ok?
109
00:09:18,620 --> 00:09:22,360
So, therefore, this is the knowledge of the
plaintext text that you have. So, you know
110
00:09:22,360 --> 00:09:28,660
that a person who is using a cipher, is trying
to, is trying to do an encryption over a meaningful
111
00:09:28,660 --> 00:09:33,230
message, right? So, this this meaningful message
is in our case assume to be, formed, formed
112
00:09:33,230 --> 00:09:37,630
of English language, English language characters,
it is an is belongs to the English language
113
00:09:37,630 --> 00:09:41,020
grammar, and therefore, what we have done
is, for example, these statistics, are, are
114
00:09:41,020 --> 00:09:46,200
of some English language texts that have been
preprocessed, and from there, people have,
115
00:09:46,200 --> 00:09:51,850
kind out, found out frequency distribution
of single occurrences, of double occurrences,
116
00:09:51,850 --> 00:09:56,790
try triple occurrences and so on, and this
is actually a very useful information when
117
00:09:56,790 --> 00:10:01,000
you are doing cryptanalysis.
So, let us to start with, let us consider
118
00:10:01,000 --> 00:10:07,300
a shift cipher. If you remember any, shift,
shift cipher which we studied in context to
119
00:10:07,300 --> 00:10:11,680
monoalphabetic ciphers, then every letter
is kind of given a shift, right?
120
00:10:11,680 --> 00:10:16,870
So, therefore, for example, e which is the
most commonly occurring letter is also shifted,
121
00:10:16,870 --> 00:10:21,180
right? So, in case of ceaser cipher, you know
it has been shifted by say three steps, so,
122
00:10:21,180 --> 00:10:27,330
e becomes f g and h, right? Therefore, if
e was the most commonly occurring letter in
123
00:10:27,330 --> 00:10:32,660
normal plaintext, then in the cipher text
which uses the ceaser cipher for example,
124
00:10:32,660 --> 00:10:35,380
then h would be the most commonly occurring
letter, right?
125
00:10:35,380 --> 00:10:42,029
So, therefore, if you observe, if you take
a kind of piece of cryptogram on the ciphertext
126
00:10:42,029 --> 00:10:47,269
and you start find now, finding out the later
which occurs most frequently, and suppose
127
00:10:47,269 --> 00:10:53,040
that, that letter is h, right? So, in that
case, you can kind of conclude that h has
128
00:10:53,040 --> 00:10:57,500
got corresponding with e in the normal letter,
right? So, that is the way how you can actually
129
00:10:57,500 --> 00:11:01,240
obtain the shift which exits incase of shift
ciphers.
130
00:11:01,240 --> 00:11:06,470
So, that is what we are said here. So, that
incase of, so, this is actually an example
131
00:11:06,470 --> 00:11:10,170
of a cipher-text-only attack because here
using only a letter frequencies, because you
132
00:11:10,170 --> 00:11:12,390
are having an access only to the cipher text,
ok?
133
00:11:12,390 --> 00:11:16,459
The other information which you have got is
the letter frequencies in the English language
134
00:11:16,459 --> 00:11:22,830
or that is the, plaintext, plaintext character
sets. So, these are the kind of some frequency
135
00:11:22,830 --> 00:11:26,519
distribution. You can see that e is the most
commonly according letter. Similarly, here
136
00:11:26,519 --> 00:11:33,519
we have got t a o i n and so on. So, this
is the gradual frequency bar to show the frequency
137
00:11:34,080 --> 00:11:35,240
distribution.
138
00:11:35,240 --> 00:11:41,330
So, in context of affine cipher, you see the
suppose an attacker slow as little consider
139
00:11:41,330 --> 00:11:46,420
little bit of, let us consider the affine
cipher, you remember the affine cipher, right?
140
00:11:46,420 --> 00:11:50,899
So, suppose an attacker has got the following
cipher from an affine cipher. So, this is
141
00:11:50,899 --> 00:11:57,140
the kind of cipher text which has been kind
of the retrieved, by, by an attacker and he
142
00:11:57,140 --> 00:12:01,050
knows that the cipher which is been correspondingly
used is the affine cipher.
143
00:12:01,050 --> 00:12:06,240
So, let us try to do some cryptanalysis. So,
what, first of all we try to find out the
144
00:12:06,240 --> 00:12:10,850
frequency of occurrences of the letters. So,
we find out for example, that r has got an
145
00:12:10,850 --> 00:12:15,870
occurrence of eight; d has the occurrence
of seven; e and h and k has an occurrence
146
00:12:15,870 --> 00:12:21,230
of five; f s and v has an occurrence of four.
So, first of all we would try to guess the
147
00:12:21,230 --> 00:12:26,050
letters and solve the equations, and then,
decrypt the cipher and judge whether it makes
148
00:12:26,050 --> 00:12:30,410
a meaningful sentence or not.
So, the first case would be like I told you
149
00:12:30,410 --> 00:12:36,420
that r is the highest according letter, and
therefore, it should corresponds to e, because
150
00:12:36,420 --> 00:12:42,260
as we are studying in this frequency diagram,
that e was an is generally the most frequently
151
00:12:42,260 --> 00:12:47,110
available letter, is most frequently occurring
letter. So, in the cipher text, r is the most
152
00:12:47,110 --> 00:12:51,029
commonly according letter, then we known that
the same this is the affine cipher then r
153
00:12:51,029 --> 00:12:55,490
should have corresponded to e.
So, therefore, we make an mapping like r is
154
00:12:55,490 --> 00:13:02,490
e, and similarly, and the next according letter
is in this case d. So, the next occurring
155
00:13:03,240 --> 00:13:08,120
letter in this case being d, we say that d
must correspond to the next occurring letter
156
00:13:08,120 --> 00:13:12,640
which is t. So, we say that d must have been
mapped from t.
157
00:13:12,640 --> 00:13:17,829
So, therefore, we can actually write this
equations we know that e k. So, if i encode
158
00:13:17,829 --> 00:13:22,720
e as four and r is- as seventeen, so, all
the alphabets have been encoded by numbers
159
00:13:22,720 --> 00:13:28,190
from zero to twenty five. So, similarly, we
have got the other equation as e k on nineteen
160
00:13:28,190 --> 00:13:34,550
and we know that that is equal to three; so,
d is three and nineteen is denotes d.
161
00:13:34,550 --> 00:13:38,540
So then, we have got these equations. So,
if I, if I remember the affine ciphers, we
162
00:13:38,540 --> 00:13:42,269
had two keys. The key was kind of a tuple
a, b, and therefore, you can write equations
163
00:13:42,269 --> 00:13:49,269
like 4 a plus b is equal to 17 and 19 a plus
b is equal to 3. Therefore, you will solve
164
00:13:49,880 --> 00:13:54,000
this, you will find that a is equal to 6 and
b is equal to 19. So, immediately you can
165
00:13:54,000 --> 00:13:59,089
say that this is a wrong guess, why? Because
if you remember in a affine cipher, it is
166
00:13:59,089 --> 00:14:04,120
the requirement because of the invert ability
of the affine cipher that a has to be co-prime
167
00:14:04,120 --> 00:14:08,860
to twenty six. So, if we take the greatest
common divisor of a and 26, it should get
168
00:14:08,860 --> 00:14:13,870
one, but since this number is 6, and you know
that if you take the g c d with 26, you actually
169
00:14:13,870 --> 00:14:17,899
get two, and since this is not equivalent
to one, so, this is an incorrect decipherment.
170
00:14:17,899 --> 00:14:23,500
So, therefore, we go for the next guess. So,
keeping r as e, what we do is that we take
171
00:14:23,500 --> 00:14:29,260
the next according letter, which is in this
case e, and we say that let e mapped to t.
172
00:14:29,260 --> 00:14:35,029
So, in this case you get a is equal to 13,
and 13 is again not correct because 13 and
173
00:14:35,029 --> 00:14:38,209
26 the g c d will be 13 which is again not
equal to one.
174
00:14:38,209 --> 00:14:43,450
So then, we go for the next occurrence. Therefore,
we again keep r as e and the next occurring
175
00:14:43,450 --> 00:14:48,579
letter is h, which is map say t is map to
h, and therefore, you solve for, again you
176
00:14:48,579 --> 00:14:52,630
will get a is equal to 18, which is again
not correct because of the same reason that
177
00:14:52,630 --> 00:14:56,440
the g c d of a and 26 is again not equal to
one.
178
00:14:56,440 --> 00:15:01,550
So, we can continue like this; so, luckily
for us the next guess is correct, and we say
179
00:15:01,550 --> 00:15:07,079
that let t get mapped to k. Therefore, you
see that the next occurrence letters, so,
180
00:15:07,079 --> 00:15:13,300
we take has t has got mapped into the next
according letter that is k in this case, and
181
00:15:13,300 --> 00:15:17,529
therefore, we see that t has got mapped into
k, and therefore, the corresponding equation
182
00:15:17,529 --> 00:15:22,589
is in this case, we solve this and get a is
equal to three and b is equal to five.
183
00:15:22,589 --> 00:15:27,769
So, we now think that this is correct because
if i take a g c d of a and 26 that I indeed
184
00:15:27,769 --> 00:15:33,720
get one, and therefore, I say let the formula
can be this that is 3 x plus 5 mod 26, which
185
00:15:33,720 --> 00:15:39,790
is the encryption of x when k is used as the
key. So, corresponding decryption function
186
00:15:39,790 --> 00:15:45,519
is actually it could obtain as 9 y minus 19
mod 26 and this decryption function exists,
187
00:15:45,519 --> 00:15:51,410
because g c d of a and 26 was equal to one.
So, this we are discussed in the last class.
188
00:15:51,410 --> 00:15:56,089
So, using this decryption function, we can
actually decrypt the entire ciphertext and
189
00:15:56,089 --> 00:16:02,550
we get that algorithm are quite general definition
so far and you get kind of a meaningful text,
190
00:16:02,550 --> 00:16:07,839
and therefore, if the decrypt and if this
decrypted test, I mean text wouldn't have
191
00:16:07,839 --> 00:16:12,949
been meaningful, then we would have tried
another guess. So, therefore, you see that
192
00:16:12,949 --> 00:16:18,290
this is a kind of technique, and therefore,
it can be programmed where you can actually
193
00:16:18,290 --> 00:16:22,889
compute the frequency and solve the equations
and check whether the g c d of a and 26 was
194
00:16:22,889 --> 00:16:26,290
equal to one.
So, in this case, in case of affine ciphers,
195
00:16:26,290 --> 00:16:31,089
you know that the total number of possible
keys keys is 12 into 26 which is equal to
196
00:16:31,089 --> 00:16:35,889
312 keys which is quite small, and therefore,
you can indeed write a program to try all
197
00:16:35,889 --> 00:16:42,889
the keys, but you know that, therefore, this
idea about the the idea behind affine cipher
198
00:16:42,940 --> 00:16:48,000
is that you can actually use the frequency
analysis technique quite deficiently, that
199
00:16:48,000 --> 00:16:53,000
is, you can actually form an apriority kind
of frequency distribution of the English language
200
00:16:53,000 --> 00:16:56,680
text. From the English language text, you
can form an apriority frequency distribution
201
00:16:56,680 --> 00:17:01,480
of the characters of the alphabet of the diagrams
and the trigrams and you can actually use
202
00:17:01,480 --> 00:17:06,250
this information to obtain the corresponding
the cipherment of an, of the output of an
203
00:17:06,250 --> 00:17:10,010
affine cipher, ok?
But this may not be so obvious when you are
204
00:17:10,010 --> 00:17:15,630
have actually having a polyalphabetic cipher,
because in a polyalphabetic cipher, one particular
205
00:17:15,630 --> 00:17:19,539
alphabet, if you remember, can get mapped
into various alphabets, right? And therefore,
206
00:17:19,539 --> 00:17:26,019
the frequency distribution is not exactly
maintained in this fashion but, you can actually
207
00:17:26,019 --> 00:17:31,029
discuss and we can a we will be discussing
and see that actually you can use these idea,
208
00:17:31,029 --> 00:17:33,989
but you have to use it in a little bit more
cleaver way, ok?
209
00:17:33,989 --> 00:17:40,989
So, let us study that next. So, therefore,
we discussed the cryptanalysis of the polyalphabetic
210
00:17:41,320 --> 00:17:46,389
cipher. Example of that, we discussed was
the vigenere cipher. So, in some sense, the
211
00:17:46,389 --> 00:17:50,739
cryptanalysis of vigenere cipher is also a
systematic method and can be totally programmed,
212
00:17:50,739 --> 00:17:55,509
but first of all let us try to understand
that, that in a, in a case of a polyalphabetic
213
00:17:55,509 --> 00:17:59,840
cipher, if you remember that, you have to
actually obtain the entire- you have to obtain
214
00:17:59,840 --> 00:18:00,269
the keywords, right?
215
00:18:00,269 --> 00:18:05,759
Therefore, the first step is determine the
length of the key. So, what is done in a polyalphabetic
216
00:18:05,759 --> 00:18:11,509
cipher, if you to just to recap, you remember
that you have got this plain text message.
217
00:18:11,509 --> 00:18:18,509
So, let us just consider a plaintext message
the, for example, this is a class on cryptanalysis
218
00:18:28,179 --> 00:18:34,940
and the key could be, for example, code. So,
in case of a polyalphabetic cipher, what we
219
00:18:34,940 --> 00:18:40,450
did was we took this code, this key was code
and we kind of repeated them, right?
220
00:18:40,450 --> 00:18:47,450
So therefore, so we kept like this, and, and
we performed an addition operation. So, let
221
00:18:55,259 --> 00:18:59,539
as it was normal shift cipher. So, we took
t, we added with the corresponding number
222
00:18:59,539 --> 00:19:04,700
with c and took a modular and obtained this.
So, the point which we noted here that is
223
00:19:04,700 --> 00:19:10,929
the this t because of its different occurrences
could actually transform by either c or o
224
00:19:10,929 --> 00:19:17,119
or with d or with e. So, therefore, every
letter like if I take t for example, can be
225
00:19:17,119 --> 00:19:24,119
shifted by c; can be shifted by o; can be
shifted by d; can be shifted by e. So, if
226
00:19:24,359 --> 00:19:29,600
you are got keyword like this which has got
instead of four length has got a length of
227
00:19:29,600 --> 00:19:35,859
m, then every letter has if there is a possibility
can be encoded in a possibly by m possible
228
00:19:35,859 --> 00:19:38,269
transformations, right?
229
00:19:38,269 --> 00:19:42,509
So therefore, the first interesting of the
first important step is to determine the length
230
00:19:42,509 --> 00:19:48,059
of this key, that is, to find out the how
many ways can letter get mapped. So, for this
231
00:19:48,059 --> 00:19:53,749
we do a study, and the first step is known
as the kasiski step. So, therefore, the first
232
00:19:53,749 --> 00:19:58,539
step is to determine the length m of the word;
it is called kasiski test, and then, we will
233
00:19:58,539 --> 00:20:01,679
confirm them by a term which is call as a
index of coincidence.
234
00:20:01,679 --> 00:20:06,820
So, we will first of all try to determine
these key, that is, k is equal to k 1 k 2,
235
00:20:06,820 --> 00:20:11,090
and therefore, first of all we find out the
length of key, and then, in second step, we
236
00:20:11,090 --> 00:20:16,139
will determine the key and we will determine
each of these key quite separately, separately
237
00:20:16,139 --> 00:20:19,499
or is independently of each other each other.
238
00:20:19,499 --> 00:20:24,509
So, the first observation is that if there
are two identical plaintext segments, then
239
00:20:24,509 --> 00:20:29,849
they will be encrypted to the same ciphertext
when they appear delta positions apart in
240
00:20:29,849 --> 00:20:36,369
the plaintext - where delta is actually congruent
to zero module p, module m. So, this is actually
241
00:20:36,369 --> 00:20:41,259
holds vice versa.
So, therefore, the idea is that if there are
242
00:20:41,259 --> 00:20:46,179
two identical plaintext segments, say for
example, there are two identical plaintext
243
00:20:46,179 --> 00:20:51,009
segments and they will be encrypted to the
same ciphertext whenever they appear a multiple
244
00:20:51,009 --> 00:20:54,220
of n number of times apart in the plaintext,
ok?
245
00:20:54,220 --> 00:21:00,419
See for example, if there is the particular
letter like t h e, so, this is that an example
246
00:21:00,419 --> 00:21:07,239
of a trigram, and this is been therefore,
if there is a repetition of this plaintext
247
00:21:07,239 --> 00:21:13,570
somewhere like if the t h e occurs again somewhere,
and in the ciphertext also we have got a kind
248
00:21:13,570 --> 00:21:19,330
of suppose, we just take any letter any corresponding
ciphertext like suppose this is map by a b
249
00:21:19,330 --> 00:21:26,330
c, then you can actually say that the separation
of this and this, that is, this separation
250
00:21:27,840 --> 00:21:34,139
is actually a multiple of the size of the
key, because the key has to exactly kind of
251
00:21:34,139 --> 00:21:38,499
divide, this the size of the key has to exactly
divide the separation.
252
00:21:38,499 --> 00:21:44,509
So therefore, this occurs here like this there
is an identical kind of plaintext segment
253
00:21:44,509 --> 00:21:50,519
and it also encrypts to the same ciphertext.
Then we can actually say that there separation,
254
00:21:50,519 --> 00:21:56,019
so, if you measure the separation of these
two text, then this distance is actually a
255
00:21:56,019 --> 00:21:58,539
multiple of the size of the key.
256
00:21:58,539 --> 00:22:02,519
So, you can actually to take the ciphertext,
and from there, you can find out similar such
257
00:22:02,519 --> 00:22:09,029
occurrences and you can actually start observing
the distances, and because the size of the
258
00:22:09,029 --> 00:22:14,029
key, actually divides all these distances.
So then, what you can do is the, you can actually
259
00:22:14,029 --> 00:22:20,549
find out greatest common divisor of the distances
and that can actually serve as a key size;
260
00:22:20,549 --> 00:22:25,679
so, that can serve as your key size.
So, what you can do is that you can very well
261
00:22:25,679 --> 00:22:31,840
said, that the key, that the key size is actually
a, so, therefore, it actually divides all
262
00:22:31,840 --> 00:22:38,090
these distances, and therefore, it must divide
the greatest common divisor of these distances,
263
00:22:38,090 --> 00:22:43,340
and then, in a later test which is known as
the index of, coincidence, coincidence test,
264
00:22:43,340 --> 00:22:50,340
we kind of confirm this test, I mean we kind
of confirm this key size.
265
00:22:50,570 --> 00:22:54,599
So therefore, so we what we do is that we
take the ciphertext and we search them for
266
00:22:54,599 --> 00:23:00,409
pairs of identical segments and then we record
the distances between them, between the starting
267
00:23:00,409 --> 00:23:05,009
positions. So, suppose they are delta 1 delta
2 and so on, then m should divide all of these
268
00:23:05,009 --> 00:23:11,580
delta i's, and therefore, m should divide
the g c d of all these delta i's. So, this
269
00:23:11,580 --> 00:23:16,349
can be use. So, in the next text, what we
do is that we actually use the index of coincidence
270
00:23:16,349 --> 00:23:21,179
to determine n as well as to rather we use
it to confirm m which is determine by the
271
00:23:21,179 --> 00:23:25,019
kasiski test.
So, what is the definition of the index of
272
00:23:25,019 --> 00:23:32,019
coincidence? So, let as see the definition.
So, suppose, here x is equal to x 1 to x x
273
00:23:32,739 --> 00:23:39,019
1 x 2 and so on till x n, this is a string
of length n, right? Now, the index of coincidence
274
00:23:39,019 --> 00:23:46,019
of x is denoted, so, it is often denoted by
i c x and it is defined to be the probability
275
00:23:46,409 --> 00:23:53,409
that two random elements of x are identical.
So, you take an x which is the string and
276
00:23:53,909 --> 00:23:57,830
you find out the index of coincidence and
it is defined to be the probability that if
277
00:23:57,830 --> 00:24:01,369
you take two random elements of x, then they
are identical.
278
00:24:01,369 --> 00:24:08,090
So, so, let us try to compute this index of
coincidence and let as assume that the frequencies
279
00:24:08,090 --> 00:24:15,090
of a b and so on till z in x is actually obtain
to be f 0 f 1 and so on till f 25. So, now,
280
00:24:16,539 --> 00:24:23,229
we will find out that what is the probability
that two random elements of x are identical?
281
00:24:23,229 --> 00:24:29,149
So, how many ways can you actually choose
randomly to two elements from this string
282
00:24:29,149 --> 00:24:32,779
of length n? You can actually choose them
by m c to s.
283
00:24:32,779 --> 00:24:38,919
And since you are you want to find out the
probably the two random elements of x are
284
00:24:38,919 --> 00:24:42,269
actually identical; so, which means that if
the first letter which you have chosen is
285
00:24:42,269 --> 00:24:47,659
a, then the second letter is also a; if the
second letter, if the first letter is b, in
286
00:24:47,659 --> 00:24:52,799
a second letter is also b.
So, if you assume that of the frequency of
287
00:24:52,799 --> 00:24:59,289
occurrence of a is denoted by f zero, then
how many ways can we choose, can both the
288
00:24:59,289 --> 00:25:03,019
choosing slide, both the, when you are choosing,
both, both times, then both times you are
289
00:25:03,019 --> 00:25:03,840
choosing a.
290
00:25:03,840 --> 00:25:09,779
So, how many ways can you do that? It is actually
if the first time, if you are chosen this
291
00:25:09,779 --> 00:25:16,739
f zero times, right? In the second case, it
is the, it can be like the remaining f 0 minus
292
00:25:16,739 --> 00:25:20,460
1 cases because of first cases we have already
chosen this; so, that means that if you kind
293
00:25:20,460 --> 00:25:25,929
of make a sigma over this, then that means
that, so if, if you have got a occurrences,
294
00:25:25,929 --> 00:25:32,929
that is, if you have got what I am trying
to say is this, that is, there are n letters,
295
00:25:37,099 --> 00:25:44,099
so, you have got x 1 to x n and that forms
your string x, and you know that the occurrence
296
00:25:46,590 --> 00:25:52,320
of a here is denoted by f 0; you are occurrence
of d is denoted by f 1; similarly, the occurrence
297
00:25:52,320 --> 00:25:59,320
of z is denoted by f 25.
So, how many ways, can, can you choose to
298
00:26:01,190 --> 00:26:08,190
two elements in this n choose two, and how
many ways can the both the letters be a? So,
299
00:26:10,710 --> 00:26:14,539
that is the question, right? You have to determine
the probability; you have to find out the
300
00:26:14,539 --> 00:26:21,539
probability that two random elements of x
are identical. This is what do you have to
301
00:26:30,739 --> 00:26:37,450
find now. So, you see that, the, the number
of cases where we have got a is actually denoted
302
00:26:37,450 --> 00:26:41,820
by f 0.
So therefore, if we have got to choose two
303
00:26:41,820 --> 00:26:47,519
letters from here, both of them are a, it
is actually f 0 choose 2, because of by choose
304
00:26:47,519 --> 00:26:53,929
this two letters from this f 2 is, right?
The second thing you can actually choose them
305
00:26:53,929 --> 00:27:00,929
by f 1 choose 2, because that gives the number
of ways in which you can choose b. So, you
306
00:27:01,509 --> 00:27:05,859
can continue like this and the final thing
will be f 25 choose 2.
307
00:27:05,859 --> 00:27:11,179
So, you can actually approximate this and
this will be actually equal to sigma this
308
00:27:11,179 --> 00:27:18,179
actually equal to sigma f i into f i minus
1 divided by n into n minus 1 for all possible
309
00:27:19,899 --> 00:27:25,739
i's. So, this you can actually approximate
and make it equal to sigma f i square by n
310
00:27:25,739 --> 00:27:32,570
squared. You can actually bring this n square
below and you can actually neglect this minus
311
00:27:32,570 --> 00:27:39,570
1. So, this actually will work out to be equal
to sigma p i square; so, that is the square
312
00:27:43,769 --> 00:27:45,200
of the probability.
313
00:27:45,200 --> 00:27:52,200
So, you can actually deduce this in this passion,
and therefore, you obtain this results, that
314
00:27:52,690 --> 00:27:59,200
is, the index of coincidence is nothing but
sigma f i into f i minus 1 divided by n squared.
315
00:27:59,200 --> 00:28:03,389
So, you can actually neglect this minus 1
for both sides, and then, you can actually
316
00:28:03,389 --> 00:28:08,429
divide f i and make it f i by n and that is
squared. So, therefore, f i by n is nothing
317
00:28:08,429 --> 00:28:10,859
but a probability that direct letter occurs.
318
00:28:10,859 --> 00:28:16,889
So therefore, you get sigma p i square, which
is denoted by here as sigma p i square; so,
319
00:28:16,889 --> 00:28:21,479
that means that you obtain the probability
of any alphabet and then find out the square
320
00:28:21,479 --> 00:28:25,940
and then you take a sigma of all the possible
i values, ok?
321
00:28:25,940 --> 00:28:32,399
So, this you can actually do here, therefore,
suppose x is a English text denoted by the
322
00:28:32,399 --> 00:28:36,950
expected probability of occurrences of a,
b and z and so on till, a b and so on till
323
00:28:36,950 --> 00:28:42,139
z, and then, the frequency are denoted by
p 0 p 1 and so on till p 25 with values from
324
00:28:42,139 --> 00:28:46,609
the frequency graph. As we have seen before
that the probability of two random elements
325
00:28:46,609 --> 00:28:51,589
both of them being a is p 2 square; both of
them being b is p 1 square, and therefore,
326
00:28:51,589 --> 00:28:56,119
you can say that the i c x as we have seen
before is sigma p i square, and therefore,
327
00:28:56,119 --> 00:29:00,940
if you take the squares like 0.082 square
plus 0.015 square plus so on, you get a value
328
00:29:00,940 --> 00:29:07,940
of 0.065. Now, this values is very important
because of a reason that if, I tell you exactly
329
00:29:08,519 --> 00:29:13,149
why it is important, but please remember this
numbers; so, it is 0.065, ok?
330
00:29:13,149 --> 00:29:18,509
So now, if y is a ciphertext which is obtained
by a shift cipher, then what is, i c, i c
331
00:29:18,509 --> 00:29:23,279
y? So, that is the question. So, if i take
x and if i kind of transform this x by a by
332
00:29:23,279 --> 00:29:28,109
a shift cipher, then what will be the i c
y? So, you note one thing that if you would
333
00:29:28,109 --> 00:29:33,359
take a shift cipher, then every alphabet said
is kind of just permuted the frequency distribution
334
00:29:33,359 --> 00:29:37,259
as such do not get change.
So therefore, the i c y x actually do not
335
00:29:37,259 --> 00:29:42,289
get change and it should remain 0.065 because
the individual probability is will be just
336
00:29:42,289 --> 00:29:48,729
permuted, but the sigma p i square will not
change; that will remain invariant. So, this
337
00:29:48,729 --> 00:29:53,109
property is actually used or exploited to
determine the key value.
338
00:29:53,109 --> 00:29:58,979
So, we continue with our index of coincidence,
and then, what we do is that, so, if you remember
339
00:29:58,979 --> 00:30:03,690
that in our kasiski test, we have got the
suggestion of m. So, what we do is that starting
340
00:30:03,690 --> 00:30:08,899
from one to m, we actually start arranging
these entire letters like this.
341
00:30:08,899 --> 00:30:13,879
So, you have got like y 1 y 2 and so on till
y n which is the ciphertext, which is obtained
342
00:30:13,879 --> 00:30:18,969
from the polyalphabetic vigenere cipher. Then
for any given m, if we are to kind of confirm
343
00:30:18,969 --> 00:30:23,849
this value of m, what we do is that we start
dividing y into m substrings like this. So,
344
00:30:23,849 --> 00:30:29,539
what we do is that we make m substrings like
y 1 y 2 and so on till y m like this.
345
00:30:29,539 --> 00:30:35,719
So, we say got y 1 first then y 2 and so on
till y m; then y m plus 1 y m plus 2 and so
346
00:30:35,719 --> 00:30:40,820
on till y 2 m; then we have got start from
y 2 m plus 1 y 2 m plus 2 and so on till y
347
00:30:40,820 --> 00:30:46,070
3 m and we continue like this, ok?
So, if aim is indeed the keyword length, then
348
00:30:46,070 --> 00:30:52,960
you see that each of these rows each of these
rows will essentially be a shift cipher, because
349
00:30:52,960 --> 00:30:59,309
if you remember how the polyalphabetic cipher
works, so, if your keyword is actually indeed
350
00:30:59,309 --> 00:31:04,529
I mean, indeed, indeed a string of length
m, then each of this letters are actually
351
00:31:04,529 --> 00:31:11,529
being obtained from the plaintext by shifting
name by, suppose, this is the, this is by
352
00:31:11,539 --> 00:31:15,719
the first alphabetic in the key; this is shifted
by the second alphabetic in the key and so
353
00:31:15,719 --> 00:31:20,839
on; this is shifted by the mth alphabet in
the key, but what about this? This is again
354
00:31:20,839 --> 00:31:24,669
shifted by the first alphabet in the- in the
key, right?
355
00:31:24,669 --> 00:31:29,669
So therefore, all these elements which are
there in the rows or all these, cipher, cipher
356
00:31:29,669 --> 00:31:34,700
alphabets are actually shifted by the same
letter. So, this is also shifted by the same
357
00:31:34,700 --> 00:31:39,159
letter; these also shifted by letters and
these are also sifted by the same letters;
358
00:31:39,159 --> 00:31:42,649
so, which means that each of these row are
nothing but shift ciphers, and therefore,
359
00:31:42,649 --> 00:31:49,649
they are individual i c i's should also be
equal to, I mean should also be equal to 0.065,
360
00:31:49,839 --> 00:31:55,279
but if it is not so, then that is if m is
not a actual keyword, then all of them will
361
00:31:55,279 --> 00:31:58,589
be a kind of random string, because this could
be shifted by something; this will be shifted
362
00:31:58,589 --> 00:32:03,369
by something; this will be, the next one will
be shifted by something, right?
363
00:32:03,369 --> 00:32:06,929
So therefore, they will serve a as a kind
of random string, and therefore, for the random
364
00:32:06,929 --> 00:32:11,219
string, actually you will find that if each
of this alphabet occur with the probably,
365
00:32:11,219 --> 00:32:14,580
each of this alphabets character, I mean English
language alphabets accords with the probability
366
00:32:14,580 --> 00:32:21,580
of 1 by 26, then the i c of that corresponding
text will be equal to 26 into 1 by 26 square,
367
00:32:22,029 --> 00:32:26,129
because that is the sigma p i square in that
case and that works out to 0.038.
368
00:32:26,129 --> 00:32:31,519
And now, you see that the 0.038 value and
the 0.065 are actually quite distinguishable,
369
00:32:31,519 --> 00:32:35,729
right? So therefore, you see that this is
the property which exists in these character
370
00:32:35,729 --> 00:32:41,589
set, which is actually quite distinct from
that of a random string, and therefore, these
371
00:32:41,589 --> 00:32:46,940
as I told you at the very beginning of our
discuss of our classes, that if you of actually
372
00:32:46,940 --> 00:32:52,229
observed the property which makes, I mean
makes which kind of distinguishes that given
373
00:32:52,229 --> 00:32:57,639
cipher from a random distribution, then you
can actually exploit that for developing an
374
00:32:57,639 --> 00:33:00,609
attack, which is the precisely what is done
here also, ok?
375
00:33:00,609 --> 00:33:04,929
So therefore, you first of all you in order
to confirm the value of the keys of the key
376
00:33:04,929 --> 00:33:10,509
size, then what you do is that you start arranging
them like this and then you start finding
377
00:33:10,509 --> 00:33:15,899
out the i c's of the corresponding rows. If
your i c's work out to be 0.065 for all these
378
00:33:15,899 --> 00:33:22,509
rows, then the value of m is confirmed; otherwise,
if the value of i c's works out to around
379
00:33:22,509 --> 00:33:29,179
0.038, then the value of m is wrong value.
So, you can start with the next value until
380
00:33:29,179 --> 00:33:33,299
and unless m.
So for the purpose of in order to verifying
381
00:33:33,299 --> 00:33:37,489
the keyword length m, what you do is that
divide the ciphertext into m substrings and
382
00:33:37,489 --> 00:33:42,139
then you compute the index of coincidence
for each substring. If all the i c values
383
00:33:42,139 --> 00:33:47,019
of the substrings that around 0.065, then
m is the correct keyword length; otherwise,
384
00:33:47,019 --> 00:33:52,309
m is not the correct keyword length, ok?
So, if you want to use i c to determine the
385
00:33:52,309 --> 00:33:56,899
correct keyword length m, then what we will
do? You start from beginning from m equal
386
00:33:56,899 --> 00:34:03,899
to 2 3 and so on until a m for which all substrings
have i c values of around 0.065. Now, the
387
00:34:04,359 --> 00:34:09,240
question is how to determine the actual key?
Or the now you have got, you have rather you
388
00:34:09,240 --> 00:34:14,389
have confirmed the size of the key. Now, your
objective is to find out each of the keys.
389
00:34:14,389 --> 00:34:20,820
So, for doing that or rather to in order to
determine the keyword, we use another concept
390
00:34:20,820 --> 00:34:26,010
which is known as the mutual index of coincidence.
So, mutual index of coincidence works as follows:
391
00:34:26,010 --> 00:34:31,450
so, consider a two strings like x and y which
have formed of x 1 x 2 and so on till x n,
392
00:34:31,450 --> 00:34:37,589
and y is formed of y 1 y 2 and so on till
y n. These are strings of n and n dash alphabetic
393
00:34:37,589 --> 00:34:43,169
characters respectively. Then the mutual index
of coincidence of x and y is denoted by this
394
00:34:43,169 --> 00:34:48,529
m i c x, y. Here it is the probability that
a random element of x is actually equal to
395
00:34:48,529 --> 00:34:52,639
the random element of y.
So therefore, using a i mean the calculation
396
00:34:52,639 --> 00:34:57,510
is exactly the same as that of index of coincidence.
If your probabilities of occurrence in case
397
00:34:57,510 --> 00:35:02,269
of a b and so on, so, it is actually not the
probabilities but the frequencies. The frequencies
398
00:35:02,269 --> 00:35:09,269
are like f 0 f 1 and till so on till f 25,
and for the next 1 is f 0 dash f 1 dash and
399
00:35:09,329 --> 00:35:14,130
so on till f 25 dash, then the mutual index
of coincidence is obtain approximately as
400
00:35:14,130 --> 00:35:20,660
sigma f i and it obtained as sigma f i f i
dash divided by n into n dash.
401
00:35:20,660 --> 00:35:24,710
So, this n into n dash is nothing but the
tutorial number of a 's in which can choose
402
00:35:24,710 --> 00:35:28,369
to alphabets, and this is the total number
of a's in which you can actually choose so
403
00:35:28,369 --> 00:35:32,640
that the both that, so both the thing which
you choose are the same, that is, both of
404
00:35:32,640 --> 00:35:36,260
them are either a or both of them are b and
both of them or c and so on.
405
00:35:36,260 --> 00:35:41,380
So, number of a's in which you can choose
the ith from x from the first string is a
406
00:35:41,380 --> 00:35:45,609
so that both of them are the same letters
is f i and the number of a's in which you
407
00:35:45,609 --> 00:35:50,700
can choose the ith element. In the second
street is a f i dash because there of that
408
00:35:50,700 --> 00:35:57,039
is the frequency of occurrences, and therefore,
the probability of choose I mean of the probability
409
00:35:57,039 --> 00:36:02,690
that random element of x is equal to the run
an element of y is given by this ratio.
410
00:36:02,690 --> 00:36:08,990
So, if you see that if you've got a b and
so on till z and these are the corresponding
411
00:36:08,990 --> 00:36:14,640
probabilities like p 0 p 1 and till so on
till p 25, and if a i is used as a key, then
412
00:36:14,640 --> 00:36:19,019
each of these letters get transformed as a
plus k I; b get transformed to b plus k i
413
00:36:19,019 --> 00:36:21,890
and so on till z gets transformed by z plus
k i.
414
00:36:21,890 --> 00:36:26,150
So now, if i ask you like what is the probability
that in the cryptogram a character is a. So,
415
00:36:26,150 --> 00:36:31,730
a is denoted by the letter 0, by the number
0. So, therefore, it is the probability corresponding
416
00:36:31,730 --> 00:36:38,650
to j plus k i which is equal to 0, and therefore,
j will be equal to minus k i mod 26, that,
417
00:36:38,650 --> 00:36:42,420
is this probability will be equal to p minus
k j, right?
418
00:36:42,420 --> 00:36:49,420
So therefore, that is equal to p of minus
k i mean p of minus k i. So, that is p j which
419
00:36:49,829 --> 00:36:55,660
is p of minus k I; so, that is the corresponding
probability.
420
00:36:55,660 --> 00:36:59,480
So what i'm saying is basically that now if
i tell you that what is the probability that
421
00:36:59,480 --> 00:37:04,190
in the cryptogram a character is a, then this
is not a; it is actually a plus k i. So, therefore,
422
00:37:04,190 --> 00:37:08,430
among all these strings among all these alphabets,
you have to find out which one is corresponds
423
00:37:08,430 --> 00:37:14,329
to 0. So, suppose j plus k i corresponds to
0, therefore, the corresponding probability
424
00:37:14,329 --> 00:37:19,450
here will be p j and this p j is nothing but
p of minus k i.
425
00:37:19,450 --> 00:37:25,029
So similarly, the probability that both character
in x and y are a is therefore found out by
426
00:37:25,029 --> 00:37:31,819
p of minus k i multiplied by p of minus k
j because of they are independent choosing,
427
00:37:31,819 --> 00:37:36,990
and similarly, the probability that both characters
x and y are b it can obtain by p 1 minus k
428
00:37:36,990 --> 00:37:42,130
i multiplied by p 1 minus k j and you can
continue in this passion, and therefore, the
429
00:37:42,130 --> 00:37:47,569
mutually index of coincidence of these two
strings will be equal to sigma p h minus k
430
00:37:47,569 --> 00:37:54,569
i multiplied by p h minus k j - where h varies
from 0 to 25, and that is equal to sigma;
431
00:37:54,819 --> 00:38:01,819
h is equal to 0 to 25 and you can actually
make some changes in the in the variables
432
00:38:02,049 --> 00:38:09,049
and you will get the p h multiplied by p h
plus k i minus k j and then you take a sigma
433
00:38:09,549 --> 00:38:16,119
from h is equal to 0 to h equal to 25.
So therefore, you actually obtain a mutual
434
00:38:16,119 --> 00:38:21,760
index of coincidence, and this value, therefore
this mutual index of coincidence actually
435
00:38:21,760 --> 00:38:27,170
realize upon k i minus k j; so, that is the
shift, right? Therefore, it is the it depends
436
00:38:27,170 --> 00:38:33,230
upon the difference k i minus k j mod 26 and
you can actually prove this technical exercise
437
00:38:33,230 --> 00:38:40,230
that a relative shift of i yields the same
estimate as that of 26 minus I; that is quite
438
00:38:40,230 --> 00:38:43,720
trivial from this formula, right?
439
00:38:43,720 --> 00:38:47,490
So now you see that they there some typical
values of mutual index of coincidence for
440
00:38:47,490 --> 00:38:52,609
different values of k i minus k j. So, you
see that if k i minus k j is 0, then it is
441
00:38:52,609 --> 00:38:56,960
same as that of the index of coincidence,
and therefore, you get the value of 0.065,
442
00:38:56,960 --> 00:39:02,039
but for other value of k i minus k j, you
actually get a value which actually varies,
443
00:39:02,039 --> 00:39:07,970
varies around 0.03, and therefore, it is quite
distinct from this fact that k i minus k j
444
00:39:07,970 --> 00:39:11,269
is equal to 0.
So, what we can do is that you can always
445
00:39:11,269 --> 00:39:16,910
fix a y i and you can modify the corresponding
y j by subtracting from 1 to 25 and then the
446
00:39:16,910 --> 00:39:22,220
value to which we get a m i c which is close
to 0.065 will actually indicate the correct
447
00:39:22,220 --> 00:39:24,289
value of k i minus k j.
448
00:39:24,289 --> 00:39:29,710
So, you can actually I mean try to understand
using this, that is, so, if you if you want
449
00:39:29,710 --> 00:39:34,460
to compute the shift between two keys, what
we do is that under the key k i, you obtain
450
00:39:34,460 --> 00:39:38,760
the this is the corresponding frequency of
occurrences, and if under the corresponding
451
00:39:38,760 --> 00:39:43,839
key k j, this is the frequency of occurrences
and you consider the m i between these two
452
00:39:43,839 --> 00:39:49,430
series and it works out to 0.065, then you
can say that k i and k minus k j is equal
453
00:39:49,430 --> 00:39:51,269
to 0, right?
454
00:39:51,269 --> 00:39:55,400
That is absorbed from this table that is if
k i minus k j is 0, then this mutual of index
455
00:39:55,400 --> 00:40:00,640
of index of coincidence is 0.065. For what
if not? If, if it is not equal to so, then
456
00:40:00,640 --> 00:40:06,039
what we do is that you shift this keep the
first one same, that is, keep the one of the
457
00:40:06,039 --> 00:40:08,720
frequency same but you start shifting the
next one.
458
00:40:08,720 --> 00:40:13,750
So, you just start shifting them then we say
that g like this, and therefore, all of these
459
00:40:13,750 --> 00:40:20,450
frequencies, therefore, the frequency of a
character being i is now f dash i minus g.
460
00:40:20,450 --> 00:40:24,579
So, this because of the exactly of the same
thing which I told you in context to the,
461
00:40:24,579 --> 00:40:28,920
in, I told you previously.
So therefore, the corresponding frequency
462
00:40:28,920 --> 00:40:35,920
is f dash i minus g and thus we compute the
mutual index of coincidence of x and y g as
463
00:40:36,559 --> 00:40:43,559
sigma f i multiplied by with f i minus g dash
divided by n n dash, and now, if we are got
464
00:40:44,589 --> 00:40:50,079
a value of 0.065 or close to it, then you
can say that k i is equal to k j plus g. So,
465
00:40:50,079 --> 00:40:54,500
therefore, k i becomes equal to k j plus g,
and from there you can actually compute the
466
00:40:54,500 --> 00:40:57,440
difference of k i minus k j being equal to
g.
467
00:40:57,440 --> 00:41:02,279
So therefore, you keep one of them constant
and then keep on shifting the other values
468
00:41:02,279 --> 00:41:07,160
and you have also start computing the mutual
index of coincidence, in this, in this fashion,
469
00:41:07,160 --> 00:41:11,980
and then, if this mutual index of coincidence
matches 0.065, then you can say that this
470
00:41:11,980 --> 00:41:16,789
particular shift is actually the correct shift,
and k i and k j are actually having a difference
471
00:41:16,789 --> 00:41:19,950
of this shift which is equal to g in this
case.
472
00:41:19,950 --> 00:41:24,029
So, I give you some examples to show how it
works. So, this is an example to show of,
473
00:41:24,029 --> 00:41:29,759
this is an example of a ciphertext. The first
important thing is to obtain the kind of common
474
00:41:29,759 --> 00:41:35,109
occurrences using that this c h r is a trigram,
which occurs at certain distances; occurs
475
00:41:35,109 --> 00:41:42,109
kind of repeats, right? And we, we observed
to the text c h r starts at 1, 166, 236 and
476
00:41:43,630 --> 00:41:46,490
286 positions, and therefore, the distance
that between the first occurrence and the
477
00:41:46,490 --> 00:41:53,140
successive ones are 165, 235 and 285.
So, if you know that if I take a g c d of
478
00:41:53,140 --> 00:41:57,759
these distances, then it works out to five,
and therefore, we verify m by computing the
479
00:41:57,759 --> 00:42:03,339
i c by trying m is equal to 1, 2, 3, 4 and
5, ok?
480
00:42:03,339 --> 00:42:09,269
So, so, we will like to verify this m by,
the, the index of coincidence test. So, what
481
00:42:09,269 --> 00:42:13,690
we do first of all is that we kind of take
one of these rows, therefore, this, this is
482
00:42:13,690 --> 00:42:19,240
the one of the rows, we have actually divided
them and we have formed. So, what we do is
483
00:42:19,240 --> 00:42:25,529
that we start forming five rows and the first
row is given as this; this is a first row.
484
00:42:25,529 --> 00:42:30,910
So, what we do is that we actually from, I
mean a index of coincidence so we,obtain we
485
00:42:30,910 --> 00:42:35,200
again obtain the corresponding frequencies,
and after obtaining the frequencies, we obtain
486
00:42:35,200 --> 00:42:41,519
the i c value. So, if you obtain this i c
value, this value comes on 0.065, which can
487
00:42:41,519 --> 00:42:46,539
and this actually is holds for the other four
rows also. So, therefore, if the m is anything
488
00:42:46,539 --> 00:42:52,000
other than five, then as we have discuss the
i c x would have been around 0.04, but since
489
00:42:52,000 --> 00:42:57,259
you are getting a 0.065 value, then the value
of m equal to five is actually conformed by
490
00:42:57,259 --> 00:42:58,430
this text.
491
00:42:58,430 --> 00:43:03,680
So, next thing is to obtain the key. So, how
do you obtain the key? Now, there are 313
492
00:43:03,680 --> 00:43:07,779
characters in the text, it is divided into
five rows because five is the length of the
493
00:43:07,779 --> 00:43:13,569
key each having 62characters; the last row
having the remaining. Now, each row of the
494
00:43:13,569 --> 00:43:18,480
table has been shifted as we have discussed
by the same key. So, its index of coincidence
495
00:43:18,480 --> 00:43:23,250
was 0.06. So, we are have actually observed
that. Now, we need to obtain or rather compute
496
00:43:23,250 --> 00:43:26,509
the shifts by the mutual index of coincidence
text.
497
00:43:26,509 --> 00:43:30,759
So, what we do is that we are actually form
each of these rows, and for each of these
498
00:43:30,759 --> 00:43:36,730
rows, we actually assume that an English language
text would have been shifted by a zeros; in
499
00:43:36,730 --> 00:43:42,200
that case, k i was equal to 0, and we, we
try to find out for a every character or other
500
00:43:42,200 --> 00:43:47,990
for every row, what is the shift? For every
row, what is the shift? So, if you have got
501
00:43:47,990 --> 00:43:54,990
five rows, so, if you have got row 1 row 2
and so on till row 5, then you assume that
502
00:43:55,849 --> 00:44:01,339
this row 1 is shifted by the first letter
in the key which is k 1. The second one is
503
00:44:01,339 --> 00:44:07,509
been is been shifted by k 2 and so on this
one is been shifted by k m, right?
504
00:44:07,509 --> 00:44:14,349
So, the first objective is to find out what
is k 1 minus k 2. So, what is the value of
505
00:44:14,349 --> 00:44:20,779
k 1 minus k 0? So, what is k 0 in this case?
k 0 is 0 because that corresponds to the normal
506
00:44:20,779 --> 00:44:25,700
English language text. So similarly, we find
out k 2 minus 0 and so on till k n minus 0
507
00:44:25,700 --> 00:44:27,859
to obtain the corresponding values of the
key.
508
00:44:27,859 --> 00:44:32,970
So therefore, we take, we have obtained the
frequency distributions here. We know the
509
00:44:32,970 --> 00:44:37,990
frequency distribution in context of the normal
English language. So, we if kind of compute
510
00:44:37,990 --> 00:44:44,579
the mutual index of coincidence between these
string and an normal English language string.
511
00:44:44,579 --> 00:44:49,750
So that kind of gives out the estimate of
k i minus 0 k one minus 0. So, we can actually
512
00:44:49,750 --> 00:44:54,509
automate these process, and using that automation,
we are actually kind of we can actually obtain
513
00:44:54,509 --> 00:44:55,200
the key and decrypt it.
514
00:44:55,200 --> 00:45:00,990
So, I will give you another example. So, this
is an example of another ciphertext. In this
515
00:45:00,990 --> 00:45:05,640
case also you see that, we have, we have obtained
the common video according strings and this
516
00:45:05,640 --> 00:45:09,630
is the first index and the second index the
difference is denoted here, and therefore,
517
00:45:09,630 --> 00:45:14,259
the kasiski test thus predict that the key
size is the g c d which is in this case 4;
518
00:45:14,259 --> 00:45:17,180
we take the g c d these strings we get 4,
ok?
519
00:45:17,180 --> 00:45:22,250
So, we will confirm these strings. So therefore,
we again kind of break it into four rows in
520
00:45:22,250 --> 00:45:25,490
this case. So, the first string is this; second
string is this; third string is this and fourth
521
00:45:25,490 --> 00:45:32,490
string is this. So, we actually obtain quite
high values more than 0.06, and therefore,
522
00:45:32,500 --> 00:45:35,930
the size of the key is kind of confirm is
confirmed, ok?
523
00:45:35,930 --> 00:45:40,640
So then, we would need to compute the shift
of each row. So, what we do is that we perform
524
00:45:40,640 --> 00:45:44,900
the mutual index of coincidence to obtain
the actual key value. So, we run this test,
525
00:45:44,900 --> 00:45:50,119
that is, we find out we have got the English
language. So, we have got the I mean corresponding
526
00:45:50,119 --> 00:45:55,480
we obtained the corresponding frequency distribution
of these string, and we also know the frequency
527
00:45:55,480 --> 00:45:59,029
distribution which is the applied frequency
distribution in English language. We use these
528
00:45:59,029 --> 00:46:05,779
two frequencies to compute the mutual index
of coincidence of this string, and I am actually
529
00:46:05,779 --> 00:46:09,240
and the corresponding value for which actually
we get 0.065 is actually the correct string,
530
00:46:09,240 --> 00:46:15,829
ok? So therefore, we actually obtain the,
is the actual value of the key, that is, we
531
00:46:15,829 --> 00:46:22,670
obtain the shift.
So, what we do is that we assumed that the
532
00:46:22,670 --> 00:46:27,819
shift in this case is 0; the next we assume
that the shift in this case is 1, then and
533
00:46:27,819 --> 00:46:33,900
so on till shift is 25. Whichever shift actually
gives us the mutual index of value to be 0.065
534
00:46:33,900 --> 00:46:40,380
is the correct shift and that we do exactly
as this that is like this.
535
00:46:40,380 --> 00:46:47,269
So, if, if this not then, we actually start
warring this g from 0 25; from 1 to 25 and
536
00:46:47,269 --> 00:46:52,079
whichever value actually gives the value of
0.065 is the correct test in it. So, that
537
00:46:52,079 --> 00:46:59,079
is the way of obtaining the corresponding
key bit or the key value key alphabet, right?
538
00:46:59,299 --> 00:47:04,799
So, in this case, we perform this and the
corresponding if you run this test, that is
539
00:47:04,799 --> 00:47:09,259
actually automated this and we can obtain
this the key values in this case code and
540
00:47:09,259 --> 00:47:13,759
the corresponding plaintext is this and it
makes many meaningful is a meaningful test,
541
00:47:13,759 --> 00:47:19,059
and therefore, we kind of conclude that our
decryption has been correct.
542
00:47:19,059 --> 00:47:23,430
So then, we actually discuss and conclude
with the cryptanalysis of hill cipher. So,
543
00:47:23,430 --> 00:47:27,190
in this case, the cipher-text only attack
is difficult because there is a large key
544
00:47:27,190 --> 00:47:31,990
space. The key space is actually for m cross
m matrix it would be and it can be as high
545
00:47:31,990 --> 00:47:35,970
as 26 to the power of m square but actually
it will be not so high, because all the matrixes
546
00:47:35,970 --> 00:47:41,009
are not invertible, right? but in a hill cipher,
it is important that the matrix also has to
547
00:47:41,009 --> 00:47:43,390
be invertible.
So the hill ciphers but there is the point
548
00:47:43,390 --> 00:47:47,660
is that where the important point to be stressed,
that is, hill ciphers not preserve the statistics
549
00:47:47,660 --> 00:47:52,450
of the plaintext, and therefore, frequency
analysis does not work. Now for a key matrix
550
00:47:52,450 --> 00:47:57,700
of size m cross m a frequency analysis of
size m may work but it is very rare for the
551
00:47:57,700 --> 00:48:03,499
plaintext to have strings of same character;
so, i mean it may work for a size of m but
552
00:48:03,499 --> 00:48:08,910
it is very rare for a plaintext to have strings
of same characters of size m because m is
553
00:48:08,910 --> 00:48:13,589
quite large.
See for example, t h e can occur again frequently
554
00:48:13,589 --> 00:48:20,589
see a but the probability of a kind of trigram
to repeat is kind of more than a, an, than
555
00:48:22,410 --> 00:48:27,130
a kind of a string which has got a ten letters,
which has to repeat the probability kind of
556
00:48:27,130 --> 00:48:30,279
reduces.
So therefore, if you have you got a letter,
557
00:48:30,279 --> 00:48:36,440
I mean if the value of m is suppose something
like fifty, then a that the probability in
558
00:48:36,440 --> 00:48:41,180
your text is particular string of length 50
will repeat is quite small. Therefore, you
559
00:48:41,180 --> 00:48:45,749
will get very small sample to work with and
since this is statistical technique this may
560
00:48:45,749 --> 00:48:47,319
not work.
561
00:48:47,319 --> 00:48:52,640
So therefore, I mean a plains and simple analysis
a plain and simple cipher-text only attack
562
00:48:52,640 --> 00:48:57,380
like what we have seen previously may not
work; however, a known plain text attack can
563
00:48:57,380 --> 00:49:00,420
easily work like.
So, in this case, you see that e has you can
564
00:49:00,420 --> 00:49:05,460
actually create 2 m cross m matrix matrices
1 for the plaintext and 1 for the ciphertext
565
00:49:05,460 --> 00:49:09,999
and this, and if the key matrix is k is, is,
denoted by k, and then. you can actually represent,
566
00:49:09,999 --> 00:49:16,509
it, it by c is equal to p into k, and here,
every row of c and p are corresponding ciphertext
567
00:49:16,509 --> 00:49:20,920
and plaintext pairs, and therefore, you can
actually obtain the k if this plaintext in
568
00:49:20,920 --> 00:49:23,920
matrix is invertible as simply as multiplying
p inverse with c.
569
00:49:23,920 --> 00:49:30,099
So here, here we have an example. We says
that assume that m is equal three and some
570
00:49:30,099 --> 00:49:36,269
known plaintext ciphertext pairs are given
here, like suppose 0 5 0 7 1 0 is getting
571
00:49:36,269 --> 00:49:43,269
mapped into 0 3 0 6 0 0 1 3 1 7 0 7 is getting
mapped to 1 4 1 6 0 9 0 0 0 5, and 0 4 is
572
00:49:46,490 --> 00:49:52,369
getting mapped into 0 3 1 7 and 1 1. So, you
can actually from two plus matrices plaintext
573
00:49:52,369 --> 00:49:58,099
and ciphertext matrixes and use them to obtain
the inverse of the matrix p and actually in
574
00:49:58,099 --> 00:50:02,609
this case p is luckily invertible. If p is
not invertible, then you have to actually
575
00:50:02,609 --> 00:50:08,359
obtain more plaintext ciphertext spheres,
and find out which one is invertible, and
576
00:50:08,359 --> 00:50:14,410
from there, you can actually obtain the corresponding
key k by multiplying p inverse with c. So,
577
00:50:14,410 --> 00:50:20,099
this quite straight forward and can be done
and the other. So, essentially this gives
578
00:50:20,099 --> 00:50:24,980
us technique of doing a or mounting a known
plaintext attack on the finite hill cipher.
579
00:50:24,980 --> 00:50:31,240
I will give you give a certain points to think
on like, why does a hill cipher at all disturb
580
00:50:31,240 --> 00:50:35,680
the frequency of the plaintext? So, you can
take an hill cipher of size m and you can
581
00:50:35,680 --> 00:50:40,589
tried to kind of find out why essentially
if the frequency of the plaintext is disturbed.
582
00:50:40,589 --> 00:50:44,789
The other important thing which you can tried
to do is that we can write a c program to
583
00:50:44,789 --> 00:50:48,839
automate the cryptanalysis of polyalphabetic
ciphers and you can try to play around with
584
00:50:48,839 --> 00:50:54,999
various cipher, like you can take a a normal
English text and you can kind of encrypt using
585
00:50:54,999 --> 00:51:00,829
vigenere cipher; a polyalphabetic cipher choose
some key values and obtain the ciphertext,
586
00:51:00,829 --> 00:51:05,099
and then, you give fit or rather feel to your
program and see whether you can, you are able
587
00:51:05,099 --> 00:51:09,150
to retrieve the key. If you are able to retrieve
the size of the key, the actual key, and from
588
00:51:09,150 --> 00:51:14,009
there, you are actually able to decipher the
ciphertext and, this, the interesting thing
589
00:51:14,009 --> 00:51:16,849
of kasiski of this test is that it can be
automated.
590
00:51:16,849 --> 00:51:21,650
So therefore, you can actually write a nice
program and play around and experiment with
591
00:51:21,650 --> 00:51:22,099
them.
592
00:51:22,099 --> 00:51:27,009
So, the references that I have followed is
cryptography and network security the second
593
00:51:27,009 --> 00:51:33,559
edition of the book and by forouzan and myself,
and, and next day, we shall actually continue
594
00:51:33,559 --> 00:51:34,470
with shannon's theory. .