1 00:00:13,950 --> 00:00:15,990 Welcome the course on Coding Theory. 2 00:00:15,990 --> 00:00:20,340 I am Adrish Banerji from I I T Kanpur. Today we are going to talk 3 00:00:20,340 --> 00:00:28,240 about basic introduction to what coding theory is all about So we will start our lecture. 4 00:00:28,240 --> 00:00:33,910 So in introduction, as I said, we will talk about what is coding theory, we will illustrate 5 00:00:33,910 --> 00:00:34,910 with a very 6 00:00:34,910 --> 00:00:41,640 simple example how error correcting codes can be used for error detection and error 7 00:00:41,640 --> 00:00:49,320 correction Before I start my lecture, I would like to talk about the books that we are going 8 00:00:49,320 --> 00:00:52,930 to use for this course, so we are going to 9 00:00:52,930 --> 00:00:59,730 follow this book Error Control Coding by Lin and Costello. It’s a second edition of this 10 00:00:59,730 --> 00:01:07,619 book. We are going to follow this book as our textbook. And there are some very nice 11 00:01:07,619 --> 00:01:08,619 books which 12 00:01:08,619 --> 00:01:13,149 you can use as reference book, for example this book by Sloane and McWilliams is a very 13 00:01:13,149 --> 00:01:22,100 nice book on block codes. You could also follow this book by Blahut on Algebraic Codes for 14 00:01:22,100 --> 00:01:27,060 Data Transmission, this book Error Control Coding by Todd K. Moon, this also gives a 15 00:01:27,060 --> 00:01:34,049 very nice introduction to error correcting codes or you could use this book by Huffman 16 00:01:34,049 --> 00:01:39,219 and Pless called Fundamentals of Error-Control Codes. 17 00:01:39,219 --> 00:01:42,310 So when we talk about 18 00:01:42,310 --> 00:01:49,499 communications, communications basically involves three basic steps. The first is encoding a 19 00:01:49,499 --> 00:01:51,249 message. You have a message 20 00:01:51,249 --> 00:01:55,479 source that you want to represent efficiently 21 00:01:55,479 --> 00:02:01,560 Now you can for example consider a speech signal. Now if you want to transmit a speech 22 00:02:01,560 --> 00:02:07,929 signal you first have to convert analog signal to digital signal and then you need to get 23 00:02:07,929 --> 00:02:16,180 rid of useless redundancies. Why, because we want to transmit basically useful information. 24 00:02:16,180 --> 00:02:25,099 We want a, source inherently has lot of redundancy and when we try to represent a source, we 25 00:02:25,099 --> 00:02:30,450 would like to represent a source efficiently in minimum number of possible bits. So first 26 00:02:30,450 --> 00:02:32,250 step involved in any communication 27 00:02:32,250 --> 00:02:36,860 is basically encoding a message Second is, once you have 28 00:02:36,860 --> 00:02:42,090 represented your source you want to transmit that source over a communication channel. 29 00:02:42,090 --> 00:02:43,890 So the second thing 30 00:02:43,890 --> 00:02:49,069 is transmission of the message through a communication 31 00:02:49,069 --> 00:02:57,280 channel and finally once the receiver received the message, it has to decode to find out 32 00:02:57,280 --> 00:03:02,090 what was the information that was transmitted. So broadly there are three steps 33 00:03:02,090 --> 00:03:11,140 involved in communication; encoding, transmission and then finally decoding So information theory 34 00:03:11,140 --> 00:03:12,140 basically 35 00:03:12,140 --> 00:03:20,230 gives us fundamental limits of what is a maximum limit of like what is the best compression, 36 00:03:20,230 --> 00:03:25,079 fundamental limit and compression that we can achieve. It also gives us fundamental 37 00:03:25,079 --> 00:03:32,689 limits on what is the maximum transmission rate possible over a communication channel. 38 00:03:32,689 --> 00:03:38,950 So let's spend some time on what is our transmission medium. So the transmission medium over which 39 00:03:38,950 --> 00:03:45,069 we want to send a packet, that is known a channel and here I have illustrated 40 00:03:45,069 --> 00:03:51,840 two very simple channel models, the first which is binary symmetric channel; now you 41 00:03:51,840 --> 00:03:59,780 can see there are, it has binary inputs, zeros and one and similarly it has binary output 42 00:03:59,780 --> 00:04:08,450 0 and 1. Now with probability 1 minus epsilon, basically whatever you transmit is received 43 00:04:08,450 --> 00:04:15,510 correctly at the receiver. So this is a transmitter and this is a receiver side. So if you transmit 44 00:04:15,510 --> 00:04:19,889 0 with probability 1 minus epsilon, you will receive it correctly. Similarly if you transmit 45 00:04:19,889 --> 00:04:24,920 1 with probability 1 minus epsilon you will receive it correctly. And this crossover probability 46 00:04:24,920 --> 00:04:30,310 of error is basically given by epsilon. So this is basically a symmetric channel and 47 00:04:30,310 --> 00:04:34,690 it's a binary channel because the binary input binary output, it's known as binary symmetric 48 00:04:34,690 --> 00:04:43,160 channel. Another channel which basically is very commonly used to model packet data networks 49 00:04:43,160 --> 00:04:49,580 is what is known as binary erasure channels. So they are binary inputs, 0s and 1s and the 50 00:04:49,580 --> 00:04:55,750 outputs are either you see whatever has been transmitted you receive it correctly or whatever 51 00:04:55,750 --> 00:05:01,230 you have transmitted is basically erased. So this delta that you see basically, we are 52 00:05:01,230 --> 00:05:10,640 denoting an erased bit using this symbol. So with probability 1 minus delta you receive 53 00:05:10,640 --> 00:05:17,660 the bit correctly and with probability delta the bit is erased or lost. 54 00:05:17,660 --> 00:05:24,590 So in his landmark paper in 1948, Shannon introduced this concept of channel capacity, 55 00:05:24,590 --> 00:05:27,690 that what is the maximum rate at which we can communicate 56 00:05:27,690 --> 00:05:32,110 over a communication link; so a channel 57 00:05:32,110 --> 00:05:36,950 capacity is defined as the maximum amount of information that can be conveyed from the 58 00:05:36,950 --> 00:05:41,140 input to the output of a channel. Shannon 59 00:05:41,140 --> 00:05:50,220 in his theorem also proved that there exist channel coding schemes that can achieve very 60 00:05:50,220 --> 00:06:00,000 low, arbitrarily very low probability of error as long as the transmission rate is below 61 00:06:00,000 --> 00:06:05,130 channel capacity. So Shannon showed 62 00:06:05,130 --> 00:06:10,940 that there exist good channel codes as long as the transmission rate is below channel 63 00:06:10,940 --> 00:06:18,030 capacity we can achieve arbitrarily low probability of error. For example 64 00:06:18,030 --> 00:06:22,910 if we talk of a channel capacity of a particular link to be two Giga bits per second then basically 65 00:06:22,910 --> 00:06:28,590 we should be able to communicate at rate, any rate up to two Giga bits over this communication 66 00:06:28,590 --> 00:06:36,000 link without basically, and can achieve very low probability of error at the decoder. 67 00:06:36,000 --> 00:06:45,010 Now in this theorem Shannon did not specify how to design such codes which have rate close 68 00:06:45,010 --> 00:06:51,310 to capacity and that's where basically error control coding comes into picture. So the 69 00:06:51,310 --> 00:06:52,970 goal of error correcting 70 00:06:52,970 --> 00:06:58,230 So the goal of error correcting coding theory is to achieve this, to design 71 00:06:58,230 --> 00:07:06,830 codes which can achieve this limit so basically Shannon has mentioned that we could transmit, 72 00:07:06,830 --> 00:07:12,490 we could design, as long as we design error correcting codes which have rate less than 73 00:07:12,490 --> 00:07:19,350 channel capacity, we can achieve arbitrarily low probability of error. So the goal of the 74 00:07:19,350 --> 00:07:25,100 coding theory or the error control coding is to design such error correcting codes with 75 00:07:25,100 --> 00:07:32,620 rates as close to capacity which can achieve arbitrarily low probability of error. And 76 00:07:32,620 --> 00:07:33,620 Shannon 77 00:07:33,620 --> 00:07:38,670 did not specify how to design such code; so basically where coding theories come into 78 00:07:38,670 --> 00:07:45,830 picture. So how do we design an error correcting code? An error correcting code is designed 79 00:07:45,830 --> 00:07:52,730 by adding some redundant bits to the message bits. Message bits, we call them information 80 00:07:52,730 --> 00:07:59,710 bits and those additional redundant bits that we add, those are known as parity bits. So 81 00:07:59,710 --> 00:08:07,330 error correcting code is designed by properly adding some redundant bits to your message 82 00:08:07,330 --> 00:08:15,470 bit and then send this coded message over a communication link. Now we use these additional 83 00:08:15,470 --> 00:08:17,010 redundant bits 84 00:08:17,010 --> 00:08:23,060 to detect error and correct error Error correcting 85 00:08:23,060 --> 00:08:29,240 code has wide range of applications in digital communications and storage. I have listed 86 00:08:29,240 --> 00:08:36,709 few of the uses, for example when we send a signal over communication link; it gets 87 00:08:36,709 --> 00:08:42,140 corrupted by noise, fading, interference. So to combat the interference of all these 88 00:08:42,140 --> 00:08:48,780 basically we use the error correcting codes to correct the errors. Similarly 89 00:08:48,780 --> 00:08:55,520 in digital storage system you want to correct the error caused due to storage media defect, 90 00:08:55,520 --> 00:09:02,050 dust particles, radiations we use error correcting codes there. 91 00:09:02,050 --> 00:09:06,120 So let us take a very simple example of 92 00:09:06,120 --> 00:09:09,990 error correcting codes and illustrate 93 00:09:09,990 --> 00:09:20,530 how we can use error correcting codes to detect and correct errors. So example I am going 94 00:09:20,530 --> 00:09:26,780 to show you right now is of what is known as repetition code. 95 00:09:26,780 --> 00:09:32,880 So the rate is defined as the ratio of number of information bits to number of coded bits. 96 00:09:32,880 --> 00:09:39,810 So when I say rate one half code, I mean there is one information bit or one message bit 97 00:09:39,810 --> 00:09:41,510 and there are two coded bits. 98 00:09:41,510 --> 00:09:44,170 For example in a repetition code, 99 00:09:44,170 --> 00:09:51,450 in a binary repetition code basically we repeat whatever information bit is. So a rate of 100 00:09:51,450 --> 00:09:57,310 repetition code would be; would look like this. A binary rate of repetition code would 101 00:09:57,310 --> 00:10:03,960 look something like this, so for 0, we would be transmitting 0 0 and for 1, we would be 102 00:10:03,960 --> 00:10:09,010 transmitting 1 1. Similarly for a rate 103 00:10:09,010 --> 00:10:16,780 one third repetition code, for 0, we will be transmitting 0 0 0, and for 1, we will 104 00:10:16,780 --> 00:10:25,740 be transmitting 1 1 1. So you can see here, in this, in rate one half is we are adding 105 00:10:25,740 --> 00:10:33,310 one additional redundant bit and for rate one third code basically we are adding two 106 00:10:33,310 --> 00:10:40,450 additional redundant bits. Now how we are going to make use of these redundant bits 107 00:10:40,450 --> 00:10:46,460 for error correction and error detection that will be explained in the next slide. 108 00:10:46,460 --> 00:10:47,460 So let's take 109 00:10:47,460 --> 00:10:56,550 this example Let's say I want to transmit these set of bits. So I want to transmit 0 110 00:10:56,550 --> 00:11:09,740 0 1 1 0 1. Now if I use a rate one half repetition code what would be my coded bits? For 0, I 111 00:11:09,740 --> 00:11:17,060 would be transmitting 0 0, for 0, I would be encoding as 0 0, for 1 I would be encoding 112 00:11:17,060 --> 00:11:25,090 them as 1 1, and for 1 I would be sending as 1 1, for 0 I will be sending as 0 0 and 113 00:11:25,090 --> 00:11:34,560 for 1 I will be sending as 1 1. So this will be my coded sequence, Ok. Now here I have 114 00:11:34,560 --> 00:11:41,190 illustrated one case where there is a single error. So this was basically the sequence 115 00:11:41,190 --> 00:11:46,400 which was transmitted. Think of it that this bit sequence has been transmitted over a binary 116 00:11:46,400 --> 00:11:54,050 symmetrical channel and this is what the received sequence I received. So you can see here, 117 00:11:54,050 --> 00:12:00,670 this is a case of a single error. The first bit which was transmitted 0 was received as 118 00:12:00,670 --> 00:12:08,240 1. Now how can I use error correcting codes to detect error? 119 00:12:08,240 --> 00:12:16,200 So since it is a rate one half code, for each information bit I am sending two coded bits. 120 00:12:16,200 --> 00:12:21,840 So at the receiver I will look at two bits at a time. So I will look at, first I will 121 00:12:21,840 --> 00:12:28,740 look at this 1 0. Now since it is a repetition code what do you expect? I expect that both 122 00:12:28,740 --> 00:12:36,650 the bits should be same, right? But here in this case first bit is 1; second bit is 0 123 00:12:36,650 --> 00:12:45,480 which means there is a transmission error. So I am able to detect single error. How? 124 00:12:45,480 --> 00:12:52,400 Because these bits were encoded using rate half repetition code; I expect these two bits 125 00:12:52,400 --> 00:13:00,560 to be same. So I know there is an error in the first bit but I don't know whether this 126 00:13:00,560 --> 00:13:14,130 is bit 0 or bit 1. Let's look at other received bits, 0 0 this will be decoded as 0, 1 1 this 127 00:13:14,130 --> 00:13:23,120 will be decoded as 1, 1 1 this will be decoded as 1 there is no ambiguity, 0 0 this will 128 00:13:23,120 --> 00:13:30,960 be decoded as 0 again there is no ambiguity, and 1 1 this would be decoded as 1. So we 129 00:13:30,960 --> 00:13:37,990 can see that using one additional redundant bit we are able to detect 130 00:13:37,990 --> 00:13:39,650 single error 131 00:13:39,650 --> 00:13:47,490 Now let's look example for double error. So let's say the first and second bit are received 132 00:13:47,490 --> 00:14:00,870 in error. So what we have received is basically 1 1 0 0 1 1 1 1 0 0 1 1. So the first two 133 00:14:00,870 --> 00:14:09,460 received bits are in error 1 1. Now let's see whether we can detect using this rate 134 00:14:09,460 --> 00:14:15,980 half repetition code. So again we will follow the same logic for decoding. We will look 135 00:14:15,980 --> 00:14:25,490 at two bits at a time. So first two bits are 1 1; now since these bits are same, we will 136 00:14:25,490 --> 00:14:37,610 decode them as 1, but what was transmitted, it was 0. So we can see that this is a case 137 00:14:37,610 --> 00:14:44,800 of undetected error. Even those these two bits were received in error, the decoder is 138 00:14:44,800 --> 00:14:47,420 not able to detect this 139 00:14:47,420 --> 00:14:53,740 error So this kind of thing happens when the error pattern is such that it transforms one 140 00:14:53,740 --> 00:14:56,870 code word into some other code word. So since 141 00:14:56,870 --> 00:15:05,540 1 1 is a valid code word for 1; the decoder is not able to detect this error. So this 142 00:15:05,540 --> 00:15:14,190 rate one half repetition is able to detect single error but it is not able to detect 143 00:15:14,190 --> 00:15:16,440 double errors. 144 00:15:16,440 --> 00:15:22,890 Now let's look at whether it can correct any errors. So let's look at 145 00:15:22,890 --> 00:15:30,610 this example when we had single error. So note here, so what we received was 1 0. So 146 00:15:30,610 --> 00:15:35,930 we were able to detect error that there was an error. But can we correct it? No 147 00:15:35,930 --> 00:15:41,240 we cannot. Why? It is equally likely that this 148 00:15:41,240 --> 00:15:52,170 1 that we received was 0 or this 0 that we received was 1 if we are talking about binary 149 00:15:52,170 --> 00:15:58,200 symmetrical channel, right? So we do not know whether the first bit got flipped to 0, first 150 00:15:58,200 --> 00:16:06,560 bit got flipped to 1 instead of 0 or the second bit got flipped to 0 instead of being 1. So 151 00:16:06,560 --> 00:16:10,750 this particular rate half repetition code 152 00:16:10,750 --> 00:16:19,870 cannot correct any errors It can only detect single errors. 153 00:16:19,870 --> 00:16:21,910 Now look at 154 00:16:21,910 --> 00:16:30,990 another example This time we are considering a rate one third repetition code. So what 155 00:16:30,990 --> 00:16:36,600 does rate one third repetition code means? For each, and again we are considering binary 156 00:16:36,600 --> 00:16:44,641 code, so for each bit we are adding two parity bits and we are repeating the same bit. So 157 00:16:44,641 --> 00:16:51,420 for 0, we will be transmitting, we will be coding it as 0 0 0, for 1 we will be coding 158 00:16:51,420 --> 00:16:54,540 it as 1 1 1. 159 00:16:54,540 --> 00:17:03,029 So again we consider the same example of transmitting 0 0 1 1 0 1. So we are transmitting the same 160 00:17:03,029 --> 00:17:11,169 information sequence This time we are encoding them using rate one third repetition code 161 00:17:11,169 --> 00:17:20,869 so this 0 will be encoded as 0 0 0. Similarly 1 will be encoded as 1 1 1 so we will be transmitting 162 00:17:20,869 --> 00:17:28,840 this. So this information sequence will be coded in this particular way. Now we will 163 00:17:28,840 --> 00:17:31,929 again look at what happens 164 00:17:31,929 --> 00:17:38,320 when there are errors in the received sequence, like we did for rate one half repetition code. 165 00:17:38,320 --> 00:17:39,789 So let's again look at 166 00:17:39,789 --> 00:17:48,130 example for single error scenario So let's say the first bit was received in error. So 167 00:17:48,130 --> 00:17:57,679 instead of 0, we received a 1. Now let’s see whether our rate one third repetition 168 00:17:57,679 --> 00:18:04,489 code can detect single error. So since it is rate one third code for each information 169 00:18:04,489 --> 00:18:10,460 bit we are sending 3 coded bits. So we are going at the receiver. We are going to look 170 00:18:10,460 --> 00:18:16,619 at 3 bits at a time. At the decoder we are going to look at 3 bits at a time. So we will 171 00:18:16,619 --> 00:18:24,190 first look at these 3 bits, 1 0 0. Now what do you expect? We expect, since we are using 172 00:18:24,190 --> 00:18:26,860 our repetition code, we expect 173 00:18:26,860 --> 00:18:32,049 all these 3 bits to be same But here in this case, 174 00:18:32,049 --> 00:18:38,720 they are not, because what we have received is 1 0 0. Now what does that mean? 175 00:18:38,720 --> 00:18:47,700 That means there is a transmission error. So we are able to detect single error using 176 00:18:47,700 --> 00:18:54,830 a rate one third repetition code. 177 00:18:54,830 --> 00:19:02,929 If we look at other sets of bits 0 0 0, there is no error here; 1 1 1 no error, again no 178 00:19:02,929 --> 00:19:09,850 error here, no error here, no error here. So rate one half repetition code was able 179 00:19:09,850 --> 00:19:16,840 to detect single error, even rate one third code is also able to detect single error. 180 00:19:16,840 --> 00:19:27,100 Now let’s look at double error. So let's consider scenario when first two bits are 181 00:19:27,100 --> 00:19:35,990 received in error. So we have 1 1 and rest of the sequence is this, Ok. Now can we detect 182 00:19:35,990 --> 00:19:42,410 double error? We just look at; we again look at 3 bits at a time. So if you look at 3 bits 183 00:19:42,410 --> 00:19:51,440 at a time, the first 3 bits are 1 1 0. Now we could see that there is an error. Why? 184 00:19:51,440 --> 00:19:56,960 Because either this should have been 0 0 0 or 1 1 1 185 00:19:56,960 --> 00:20:03,990 but we received 1 1 0. So we are able to detect using rate one third repetition code 186 00:20:03,990 --> 00:20:11,700 we are able to detect double errors as well which we were not able to detect using rate 187 00:20:11,700 --> 00:20:17,000 one half repetition code. Now let's look at the error correcting 188 00:20:17,000 --> 00:20:21,499 capability of this code So let's go back again and look at 189 00:20:21,499 --> 00:20:27,690 single error situation So when single error happens, so something like this; let’s say 190 00:20:27,690 --> 00:20:34,360 one of the bit got flipped, 1 0 0. Now can we correct single errors? 191 00:20:34,360 --> 00:20:42,460 And the answer in this case is yes. Why? Because if you look at these 3 bits, two bits are 192 00:20:42,460 --> 00:20:49,210 already 0 and one bit is 1. So it is, and what are the possible outcomes? 193 00:20:49,210 --> 00:20:56,210 This could be either 0 0 0 or 1 1 1 and it is more likely 194 00:20:56,210 --> 00:21:03,549 that one bit got flipped. It is more likely that 0 got flipped to 1 rather than two 0s, 195 00:21:03,549 --> 00:21:06,249 two 1s getting flipped to 0. So it is 196 00:21:06,249 --> 00:21:13,059 more likely that this bit got flipped from 0 to 1 instead of these two bits getting flipped 197 00:21:13,059 --> 00:21:21,609 from 1 to 0. So using majority logic, these majority of the bits are 0; we will decode 198 00:21:21,609 --> 00:21:30,720 this as 0. So we can see this rate one third repetition code can correct single error. 199 00:21:30,720 --> 00:21:41,590 This was not possible for rate one half repetition code. Now can it correct double errors? Now 200 00:21:41,590 --> 00:21:50,619 if you look at this 1 1 0, it will think that this particular bit got flipped from 1 to 201 00:21:50,619 --> 00:22:02,269 0 so it will decode this as 1. So this cannot correct double errors. So to summarize, we 202 00:22:02,269 --> 00:22:08,070 saw that rate one half repetition code can detect single error but cannot correct single 203 00:22:08,070 --> 00:22:09,130 error. 204 00:22:09,130 --> 00:22:15,649 It cannot detect double errors whereas rate one third repetition code can correct single 205 00:22:15,649 --> 00:22:17,220 error 206 00:22:17,220 --> 00:22:22,749 and can detect single error. It can detect double error but it cannot correct double 207 00:22:22,749 --> 00:22:29,480 errors. So why is this code better than the first code? It has certainly has better error 208 00:22:29,480 --> 00:22:38,200 detecting capability than the rate one half code. This we will discuss in subsequent lectures. 209 00:22:38,200 --> 00:22:44,029 It has to do with separation between, the distance separation between the code words 210 00:22:44,029 --> 00:22:49,049 and you can see basically in this particular code we are using two redundant bits and in 211 00:22:49,049 --> 00:22:54,190 the previous case we were just using one redundant bit. So the error correcting capability and 212 00:22:54,190 --> 00:22:59,519 error detecting capability of the code is depending, is dependent on the distance properties 213 00:22:59,519 --> 00:23:05,970 of the code and we will talk about in subsequent lectures. So to summarize it I think 214 00:23:05,970 --> 00:23:10,869 this quotations by Solomon Golomb rightly 215 00:23:10,869 --> 00:23:16,570 captures what error correcting code is all about So I will read it. A message 216 00:23:16,570 --> 00:23:23,529 of content and clarity has got to be quite a rarity. To combat the terror of serious 217 00:23:23,529 --> 00:23:32,759 error, use bits of appropriate parity. Thank you.