1 00:00:18,900 --> 00:00:23,539 So, welcome back to this session on network analysis. 2 00:00:23,539 --> 00:00:26,719 So, we will continue with the second part of network analysis. 3 00:00:26,820 --> 00:00:33,300 And, today we will start with the second metric that we will get ourselves introduced to, 4 00:00:33,300 --> 00:00:35,400 and this is called clustering coefficient. 5 00:00:42,860 --> 00:00:50,060 So, the clustering coefficient actually comes from the transitivity property of a network; 6 00:00:53,520 --> 00:01:01,810 transitivity property of a particular network. 7 00:01:02,700 --> 00:01:11,600 So, the idea is very simple that if you have, if there are two nodes in the network say 8 00:01:11,600 --> 00:01:20,130 A and B having a mutual friend say C, then there is a high probability that A and B are 9 00:01:20,130 --> 00:01:22,759 also friends in the social network. 10 00:01:22,759 --> 00:01:34,859 So, A and C was a friend, B and C was a friend, then there is a high probability. 11 00:01:34,859 --> 00:01:45,340 High probability that A and B are also friends. 12 00:01:47,040 --> 00:01:53,040 So, this is what is called the famous transitivity property of a network. 13 00:01:53,580 --> 00:02:00,599 Now, this property actually comes from the observation that in many cases. 14 00:02:00,599 --> 00:02:07,459 So, this probability actually will be higher and higher, if you encounter situations like 15 00:02:07,459 --> 00:02:10,039 the one that I am drawing. 16 00:02:11,800 --> 00:02:24,240 Suppose there are many such common friends, the larger the number of common friends between 17 00:02:24,250 --> 00:02:26,650 A and B, the higher is the probability. 18 00:02:26,650 --> 00:02:37,950 So, the large, higher is the probability that A and B are friends. 19 00:02:38,819 --> 00:02:47,579 So, the larger is the number of mutual friends between A and B, the higher is the probability 20 00:02:47,579 --> 00:02:49,559 that A and B are themselves friends. 21 00:02:49,960 --> 00:02:55,800 So, this idea actually is called the transitivity of a particular social network. 22 00:02:56,590 --> 00:03:03,810 And, this transitivity can be quantified using something called the clustering coefficient. 23 00:03:04,260 --> 00:03:05,960 that we will define now. 24 00:03:09,980 --> 00:03:14,280 So, the clustering coefficient can be measured for each vertex. 25 00:03:14,280 --> 00:03:15,680 So, it is a vertex centric measure. 26 00:03:16,050 --> 00:03:30,470 So, clustering coefficient; clustering coefficient is a vertex centric measure. 27 00:03:30,960 --> 00:03:36,040 So, you can measure this value for each individual vertex separately. 28 00:03:36,620 --> 00:03:44,170 So what you do is, suppose you want to measure the clustering coefficient of a node say x 29 00:03:44,170 --> 00:03:48,840 here, you basically look into the neighbors of x. 30 00:03:48,840 --> 00:03:54,189 So, say x has 1, 2, 3 and 4 neighbors. 31 00:03:54,189 --> 00:04:00,599 You look into the total number of connections between the neighbors of x. 32 00:04:00,599 --> 00:04:19,150 So, in this particular example number of connections between neighbors of x; this is expressed 33 00:04:19,150 --> 00:04:44,910 as a fraction of the total number of possible connections between or among all the neighbors of X 34 00:04:45,380 --> 00:04:48,380 So, this is basically the definition of clustering coefficient. 35 00:04:48,900 --> 00:04:53,759 So, you look into the neighbors of a particular node x. 36 00:04:53,759 --> 00:04:57,620 So, if you are interested to find out the clustering coefficient of x, you look into 37 00:04:57,620 --> 00:04:58,620 the neighbors of x. 38 00:04:58,789 --> 00:05:04,330 Here for this particular example, you have 1, 2, 3 and 4 neighbors. 39 00:05:04,330 --> 00:05:13,810 So, you can write the clustering coefficient of the node x in this particular example is, 40 00:05:13,810 --> 00:05:16,310 among the neighbors there are how many edges? 41 00:05:16,310 --> 00:05:21,370 There is 1 edge, 2 edge, 3 edge and 4 edge. 42 00:05:21,970 --> 00:05:26,890 So, you have four divided by the total number of possible edges. 43 00:05:27,009 --> 00:05:29,189 So, what is the total number of possible edges? 44 00:05:29,490 --> 00:05:35,379 It should be; so, since there are four nodes, there should be four C 2 possible edges. 45 00:05:35,379 --> 00:05:38,459 So, this is basically the clustering coefficient. 46 00:05:38,800 --> 00:05:44,020 This ratio actually defines the clustering coefficient of a particular node. 47 00:05:44,360 --> 00:05:49,919 So, in this way you can measure the clustering coefficient for each individual node in the 48 00:05:49,919 --> 00:05:50,499 network. 49 00:05:50,919 --> 00:05:57,629 And, the clustering coefficient of the whole network is just an average of all the individual 50 00:05:57,629 --> 00:05:59,549 clustering coefficients of the different nodes. 51 00:05:59,699 --> 00:06:03,059 So, so it is a node centric property. 52 00:06:03,259 --> 00:06:07,079 First of all, for each individual node you measure the clustering coefficient. 53 00:06:07,580 --> 00:06:14,000 Basically, for each individual node you try to estimate what is the extent of cliquishness, 54 00:06:14,000 --> 00:06:18,669 how complete or how cliquish the neighborhood of that particular node is. 55 00:06:18,669 --> 00:06:24,860 So, you try to express that as a fraction of the maximum cliquishness possible, and 56 00:06:24,860 --> 00:06:29,280 then this fraction can be estimated for each individual node. 57 00:06:29,280 --> 00:06:34,470 And then, for the whole network you just have to average out all the individual clustering 58 00:06:34,470 --> 00:06:36,020 coefficients for the different nodes. 59 00:06:36,020 --> 00:06:43,210 So, that is how you define the clustering coefficient of a particular network. 60 00:06:43,210 --> 00:06:51,599 Now, for this part of the lecture I have shown you three examples. 61 00:06:51,599 --> 00:06:55,849 If you see in the slides, I have shown you three examples. 62 00:06:55,849 --> 00:07:03,580 There are three different examples, where the clustering coefficient for the first example 63 00:07:03,580 --> 00:07:10,000 for that black node, there in the center, should be equal to zero because there is no 64 00:07:10,000 --> 00:07:17,919 edge that exists between its neighbors. 65 00:07:17,919 --> 00:07:27,539 So, similarly the clustering coefficient for the second example is actually 0.5 because 66 00:07:27,539 --> 00:07:32,319 there are three edges between the neighbors. 67 00:07:32,319 --> 00:07:33,880 There are three edges between the neighbors. 68 00:07:33,880 --> 00:07:36,819 And, there could be a possible of four C 2 edges. 69 00:07:36,819 --> 00:07:39,150 So, that is how you measure. 70 00:07:39,150 --> 00:07:45,939 Similarly, for the third example, the clustering coefficient is one because all the nodes that 71 00:07:45,939 --> 00:07:50,560 are neighbors of the black node are completely connected. 72 00:07:50,560 --> 00:07:56,669 That is how you actually measure the clustering coefficient for each individual node. 73 00:07:56,669 --> 00:08:06,290 So, this is; in the next slide, I actually defined the formula more precisely. 74 00:08:06,290 --> 00:08:13,150 The same thing that I have written down in the text, now the interesting part is that 75 00:08:13,150 --> 00:08:14,150 you can measure. 76 00:08:14,150 --> 00:08:20,199 Once you have this quantity, you can measure the extent of transitivity of the different 77 00:08:20,199 --> 00:08:21,450 real world networks. 78 00:08:21,450 --> 00:08:29,330 So, and actually that was the thing that people were trying to do in early 2001, 2002. 79 00:08:29,330 --> 00:08:35,940 And, what they found is that so if you look into these networks the worldwide web, the 80 00:08:35,940 --> 00:08:42,040 internet, the co-authorship network, the metabolic network, the C. elegance network, you see 81 00:08:42,040 --> 00:08:47,150 most of them have clustering coefficients ranging from between, somewhere between 0 82 00:08:47,150 --> 00:08:48,150 and 1. 83 00:08:48,150 --> 00:08:55,250 But, what is interesting to note is that networks like co-authorship network has a very high 84 00:08:55,250 --> 00:08:56,330 clustering coefficient. 85 00:08:56,330 --> 00:09:01,440 That is, these are mostly social networks which have very high transitivity. 86 00:09:01,440 --> 00:09:09,050 That is, the idea is that if there are a lot of mutual coauthors between a pair of scientists, 87 00:09:09,050 --> 00:09:14,110 then it is highly likely that those two scientists have also coauthored a paper. 88 00:09:14,110 --> 00:09:19,900 So, that probability is quite high and is close to, roughly close to 0.43 as per the 89 00:09:19,900 --> 00:09:23,040 table shown in the slides. 90 00:09:23,040 --> 00:09:27,300 So, another interesting example is this actor network. 91 00:09:27,300 --> 00:09:31,540 This is again a, since this is an interesting example I would like to take up these and 92 00:09:31,540 --> 00:09:32,870 discuss a bit more. 93 00:09:32,870 --> 00:09:43,000 So, this example actually draws from the complex system of movies and actors. 94 00:09:43,000 --> 00:09:59,710 So, you conceive first of all of a bipartite network, where one partition is basically 95 00:09:59,710 --> 00:10:08,100 the movies or the movie nodes and the other partition is basically the actors or the actor 96 00:10:08,100 --> 00:10:09,100 nodes. 97 00:10:09,100 --> 00:10:13,630 Now, you have some movies and you have some actors. 98 00:10:13,630 --> 00:10:19,030 Now, you draw an edge between a movie and an actor. 99 00:10:19,030 --> 00:10:26,520 If suppose, this say the name of this movie is M 1 and the name of this actor is A 1. 100 00:10:26,520 --> 00:10:41,990 You draw an edge between M 1 and A 1, if A 1 is in the cast of the movie M 1. 101 00:10:41,990 --> 00:10:47,230 So, this is how you construct a movie-actor bipartite network. 102 00:10:47,290 --> 00:11:00,150 Now, from this bipartite network you can construct something called one mode projections. 103 00:11:03,200 --> 00:11:12,880 This one mode projections can be drawn; again on this side, one mode projections. 104 00:11:14,250 --> 00:11:27,200 So, these one mode projections can be drawn on actor nodes as well as on, sorry, on actor 105 00:11:27,200 --> 00:11:31,900 nodes as well as on movie nodes. 106 00:11:37,400 --> 00:11:41,240 So, what you would do in drawing the one more projection? 107 00:11:41,480 --> 00:11:47,040 So, let us try to define that in the next part. 108 00:11:47,580 --> 00:11:48,920 So, suppose you have. 109 00:11:49,930 --> 00:12:05,750 So, from the movie-actor bipartite network, you construct one more 110 00:12:05,750 --> 00:12:08,550 projection say on the actor nodes as follows. 111 00:12:09,420 --> 00:12:14,540 Suppose there are two actors A 1 and A 2. 112 00:12:15,000 --> 00:12:33,410 You draw an edge between A 1 and A 2, if A 1 and A 2 have co-acted in a movie say M. 113 00:12:33,410 --> 00:12:39,170 Now, this graph as you can imagine can be a weighted graph. 114 00:12:39,170 --> 00:12:44,680 So, this can have a weight w. 115 00:12:44,680 --> 00:13:06,270 And, this w is nothing but if A 1 and A 2 have co-acted in w movies, then the weight 116 00:13:06,270 --> 00:13:11,220 of the edge is w. 117 00:13:11,220 --> 00:13:29,780 Now, in this way you can act, you can construct a actor-actor one mode projection. 118 00:13:29,780 --> 00:13:36,980 So now, if you think of the movie-actor bipartite network, when you are constructing this projection 119 00:13:36,980 --> 00:13:39,450 on the actor nodes, what is happening? 120 00:13:39,450 --> 00:13:45,500 If you think carefully, what is happening is that for every individual movie there is 121 00:13:45,500 --> 00:13:47,720 a clique imposed on this network. 122 00:13:48,410 --> 00:13:52,490 So, all the actors in that movie will have an edge between them because they have acted 123 00:13:52,490 --> 00:13:53,490 in that movie. 124 00:13:53,490 --> 00:13:57,320 So, that forms a clique of actors for that particular movie. 125 00:13:57,320 --> 00:14:01,790 And for every individual such movie, you are imposing a clique on this network. 126 00:14:01,790 --> 00:14:06,220 So, that is why these kinds of graphs are usually pretty cliquish. 127 00:14:06,220 --> 00:14:12,260 And, that is why as you see the clustering coefficient of this network, as I have shown 128 00:14:12,260 --> 00:14:16,890 you in the slides, the clustering coefficient of this particular network, the actor-actor 129 00:14:16,890 --> 00:14:19,840 network is point seven nine, which is very high. 130 00:14:19,840 --> 00:14:26,520 So, the probability that there exist an edge between a pair of actors, if they have co-acted 131 00:14:26,520 --> 00:14:29,240 with many other actors is very high. 132 00:14:29,830 --> 00:14:33,790 So, that is what I wanted to actually point out. 133 00:14:39,300 --> 00:14:46,560 So, then the next thing that we will talk about is this concept of small world 134 00:14:50,800 --> 00:14:57,420 and the 6 degrees of separation. 135 00:15:02,760 --> 00:15:06,280 So, so this idea is again very interesting. 136 00:15:06,710 --> 00:15:12,930 So, suppose look at the slides, suppose if I say that all late registrants in the complex 137 00:15:12,930 --> 00:15:15,660 networks course will be given ten marks bonus. 138 00:15:15,660 --> 00:15:18,990 So, how fast do you think will this information spread? 139 00:15:18,990 --> 00:15:24,060 In general, I would imagine that it would spread very fast among all the registrants 140 00:15:24,060 --> 00:15:25,580 of this course. 141 00:15:25,580 --> 00:15:31,320 So if that is the case, then the question is that why is it? 142 00:15:31,320 --> 00:15:38,320 So that it spreads fast and why is it so that it actually spreads in the first place. 143 00:15:38,320 --> 00:15:46,250 And, to show that this actually happens, this spread actually takes place, the famous scientist 144 00:15:46,250 --> 00:15:52,980 Milgram designed a very interesting experiment, which he called the 6 degrees of separation 145 00:15:52,980 --> 00:15:54,160 experiment. 146 00:15:54,430 --> 00:15:59,190 So, what he actually did was something like this. 147 00:15:59,730 --> 00:16:05,370 So, Travers and Milgram in 1969, they designed this classic study in Social Science. 148 00:16:05,370 --> 00:16:09,220 Actually, this is one of the very interesting in classic studies in Social Science. 149 00:16:09,220 --> 00:16:17,770 So, what they said is that suppose you have a source say Kharagpur, and there is a stockbroker 150 00:16:17,770 --> 00:16:26,630 in Kharagpur, who wants to send a message or a packet to some stockbroker in Kolkata. 151 00:16:26,630 --> 00:16:34,310 So, now this letter, what this person does, this stockbroker does is forwards this letter. 152 00:16:34,310 --> 00:16:37,060 So, he does not post this letter. 153 00:16:37,060 --> 00:16:41,290 He does not post it using the usual postal codes. 154 00:16:41,290 --> 00:16:46,310 What he does is he passes this letter to one of his friends. 155 00:16:46,310 --> 00:16:54,250 So, this letter actually has the name, the address, etcetera of the destination stockbroker. 156 00:16:54,600 --> 00:16:59,420 So, what he does is the stockbroker takes up this letter and passes it to one of his 157 00:16:59,420 --> 00:17:05,230 friends, whom he believes would know the Kolkata stockbroker. 158 00:17:05,230 --> 00:17:12,559 The Kharagpur stockbroker picks up this letter and passes it to one of his random friends. 159 00:17:12,559 --> 00:17:18,819 So, and then he feels that this friend might be knowing the Kolkata stockbroker and would 160 00:17:18,819 --> 00:17:19,889 pass it on to him. 161 00:17:19,889 --> 00:17:26,009 So, now this friend will again pass it to some other friend of his, and that friend 162 00:17:26,009 --> 00:17:30,919 will again pass it to some other friend of his and in this way the chain might at some 163 00:17:30,919 --> 00:17:37,399 point complete, leading to the destination or otherwise it may fail. 164 00:17:37,660 --> 00:17:42,620 So, what happened was the experiment that they performed, what came out was something 165 00:17:42,629 --> 00:17:51,559 like this that if the letter was to reach the target, it would reach in roughly 6 steps. 166 00:17:51,559 --> 00:18:10,480 So, if the letter reached the destination, it did so in at most 6 steps. 167 00:18:10,480 --> 00:18:15,970 But, as you can understand that this is a, you know, this is a stochastic experiment 168 00:18:15,970 --> 00:18:19,509 and things happen by chance most of the times. 169 00:18:19,509 --> 00:18:24,950 So, what happened was like 64 out of 296 chains actually reached the target. 170 00:18:24,950 --> 00:18:30,750 So, they initiate with 296 chains to start this experiment with, but then only 64 of 171 00:18:30,750 --> 00:18:31,310 them survived. 172 00:18:31,750 --> 00:18:32,630 Many of them dropped midway. 173 00:18:33,020 --> 00:18:40,049 But, those that survived, among them, all of them reached the destination in an average 174 00:18:40,049 --> 00:18:42,070 of five point two steps. 175 00:18:42,070 --> 00:18:49,970 So, basically every letter, or every time the letter reached the destination, it reached 176 00:18:49,970 --> 00:18:53,779 within 6 hops, roughly within 6 hops. 177 00:18:53,779 --> 00:18:59,200 So, this was a very interesting observation that Travers and Milgram made. 178 00:18:59,200 --> 00:19:02,040 And, this is known as the 6 degrees of step separation. 179 00:19:02,170 --> 00:19:06,870 And, this is actually a very interesting trivia question in various quiz contests. 180 00:19:08,230 --> 00:19:14,050 So, now the idea is like you might think like is this all a magic? 181 00:19:14,470 --> 00:19:15,910 By which this is happening? 182 00:19:16,600 --> 00:19:23,870 The point is intuitively if you try to explain this, there is a reasonable explanation that 183 00:19:23,870 --> 00:19:24,490 exists. 184 00:19:24,870 --> 00:19:26,650 It is nothing happening by magic. 185 00:19:27,149 --> 00:19:31,109 There is an intuitive; there can be an intuitive explanation for this. 186 00:19:31,309 --> 00:19:37,249 And, in the next slide we will try to discuss this intuitive explanation. 187 00:19:37,509 --> 00:19:42,209 So, imagine that you are a person in the network. 188 00:19:44,620 --> 00:19:47,380 So, think of your Facebook friends. 189 00:19:47,820 --> 00:19:50,400 How many Facebook friends roughly you have? 190 00:19:50,400 --> 00:19:53,509 So, I would imagine somewhere between 500 to 1000. 191 00:19:53,509 --> 00:19:56,279 So, let us be much more moderate. 192 00:19:56,279 --> 00:20:01,419 Let us take that you have say some 100 friends. 193 00:20:01,419 --> 00:20:08,200 And, I am assuming that these 100 friends are not connected among themselves. 194 00:20:08,200 --> 00:20:10,879 So you have, actually you have more than that. 195 00:20:10,879 --> 00:20:16,610 But then, say there are at least 100, who are not at, who are no way connected among 196 00:20:16,610 --> 00:20:17,640 themselves. 197 00:20:17,640 --> 00:20:24,570 Now, these 100 friends by the same hypotheses will again have another 100 friends, like this 198 00:20:25,570 --> 00:20:26,310 And, this will continue. 199 00:20:27,039 --> 00:20:34,520 Now, if you look at this tree of acquaintances or friendships, this completely desperate 200 00:20:34,520 --> 00:20:42,929 tree, where two nodes only know the parent, but nobody among themselves. 201 00:20:42,929 --> 00:20:51,260 So this, if you grow this tree, then what you see is that within 6 to 7 steps; so, if 202 00:20:51,260 --> 00:21:01,350 you have 100 friends in each level, within 6 to 7 steps you have, within 6 to 7 steps 203 00:21:01,350 --> 00:21:12,929 we have covered the entire population of the earth. 204 00:21:12,929 --> 00:21:16,799 Of course, I understand that there could be transitivity triangles. 205 00:21:16,799 --> 00:21:22,789 But, what I am assuming is that there are 500 friends, and among these 500 or they are. 206 00:21:22,789 --> 00:21:25,549 So, in many of you will have 800, 1000 friends. 207 00:21:25,750 --> 00:21:30,570 But among this, there are at least 100 friends who are not connected among themselves. 208 00:21:30,570 --> 00:21:31,330 That is the assumption. 209 00:21:31,570 --> 00:21:32,490 And that is the assumption in every step. 210 00:21:32,850 --> 00:21:37,330 If you do this simple assumption, which I would say is a realistic assumption. 211 00:21:38,230 --> 00:21:43,799 And if you go by this, then at every step you spawn a bunch of new nodes. 212 00:21:43,799 --> 00:21:50,129 And, if you have spawned until say 6 steps, you have at least like a few billion nodes 213 00:21:50,129 --> 00:21:51,360 that you have covered. 214 00:21:51,360 --> 00:21:54,019 So, that is how. 215 00:21:54,019 --> 00:22:01,529 So, this is why Milgram and his colleagues could have each of the letters reach their 216 00:22:01,529 --> 00:22:04,929 destination within 6 steps. 217 00:22:04,929 --> 00:22:12,860 That is the basic idea of the 6 degrees of separation. 218 00:22:12,860 --> 00:22:21,299 So then, after this we will start with another very important quantitative metric that people 219 00:22:21,299 --> 00:22:22,730 quite often use. 220 00:22:22,730 --> 00:22:28,549 So, these are called centrality metrics. 221 00:22:28,549 --> 00:22:38,639 So, as the name suggests you can immediately understand that centrality would indicate 222 00:22:38,639 --> 00:22:52,899 how central a node is in a given network. 223 00:22:52,899 --> 00:23:01,639 So, you want to estimate how central a node is in a given network; and, as I write in 224 00:23:01,639 --> 00:23:03,200 the slides. 225 00:23:03,200 --> 00:23:09,970 So, you want to estimate basically centrality in terms of these four quantities. 226 00:23:09,970 --> 00:23:19,370 This 4 P's - prestige, prominence, importance and power; these are the 4 Ps of or the 4 227 00:23:19,370 --> 00:23:20,899 pillars of measuring centrality. 228 00:23:20,899 --> 00:23:28,340 So, you want to identify prestigious nodes, prominent nodes, powerful nodes and important 229 00:23:28,340 --> 00:23:29,340 nodes. 230 00:23:29,340 --> 00:23:33,139 All of them more or less means similar; if you think carefully. 231 00:23:33,139 --> 00:23:37,870 So, we have to have quantitative measures to identify such central nodes. 232 00:23:37,870 --> 00:23:49,580 And, this is actually important in various ranking experiments; because in various experiments, 233 00:23:49,580 --> 00:23:55,289 what you want is to rank the nodes according to their centrality values. 234 00:23:55,289 --> 00:23:59,590 So, the more central values go at the top, the more nodes with more central values go 235 00:23:59,590 --> 00:24:03,370 at the top and the nodes with less central values come at the bottom. 236 00:24:03,370 --> 00:24:08,700 And, this ranking is actually necessary for various other applications as we shall see 237 00:24:08,700 --> 00:24:10,860 in some of the later part of the course. 238 00:24:12,429 --> 00:24:21,580 So, now say if we are like okay with the philosophical definition of centrality, we have to find 239 00:24:21,580 --> 00:24:27,619 out quantitative measures to identify centrality of the nodes in a network. 240 00:24:27,619 --> 00:24:34,510 Now, one of the simplest measures that come to once mind would be, in the context of a 241 00:24:34,510 --> 00:24:43,369 network, would be perhaps the degree of a node. 242 00:24:43,369 --> 00:24:52,529 Degree of a node, so that is what I write in the next slide; and this is termed as the 243 00:24:52,529 --> 00:25:00,649 degree centrality of a node. 244 00:25:02,300 --> 00:25:04,080 So, and the definition is very simple. 245 00:25:04,480 --> 00:25:16,580 Suppose, there is a node say A and it has say a degree equal to d. 246 00:25:17,779 --> 00:25:30,470 So then the degree centrality, the degree centrality is equal to d by capital N minus 247 00:25:30,470 --> 00:25:40,630 1, where N is the number of nodes in the network. 248 00:25:41,420 --> 00:25:44,780 So, basically what you are doing? 249 00:25:44,900 --> 00:25:51,659 You are expressing the degree of a node in as a fraction of the maximum possible degree 250 00:25:51,659 --> 00:25:52,219 of a node. 251 00:25:52,660 --> 00:26:00,820 So, in all, a node can be connected to N minus 1 other nodes, if there are N nodes in the 252 00:26:00,820 --> 00:26:01,820 system. 253 00:26:01,820 --> 00:26:06,269 So, if the node has a degree d, then you are expressing this d as a fraction of N minus 1 254 00:26:07,269 --> 00:26:12,929 That is what is the degree centrality of a node in a network. 255 00:26:12,929 --> 00:26:20,389 Now given this definition of degree centrality, we can also define something called the centralization 256 00:26:20,389 --> 00:26:23,929 of a network. 257 00:26:23,929 --> 00:26:38,640 Centralization of a network; which means the variance in degree centrality; you try to 258 00:26:38,640 --> 00:26:39,640 measure. 259 00:26:39,640 --> 00:26:42,499 So, for each node you can estimate the degree centrality. 260 00:26:42,499 --> 00:26:47,450 Now, using these values you can measure the variance of these values. 261 00:26:47,450 --> 00:26:53,139 And, if this variance, so this variance is called the centralization of the network. 262 00:26:53,139 --> 00:27:00,179 If this variance is high, this means that there are some nodes with very high degree 263 00:27:00,179 --> 00:27:01,179 centrality. 264 00:27:01,179 --> 00:27:20,389 If it is high, some nodes with very high degree centrality compared to the others. 265 00:27:20,389 --> 00:27:33,580 If it is low, then all degree centralities are similar; mostly similar to each other. 266 00:27:33,580 --> 00:27:41,210 And, an example of a case where the degree centrality is queued, that is, the centralization 267 00:27:41,210 --> 00:27:42,590 is high, would be. 268 00:27:43,260 --> 00:27:51,659 If you can simply imagine a star network, where, the inner black node will have a high 269 00:27:51,659 --> 00:27:55,289 degree centrality; whereas the outer nodes will have low degree centrality. 270 00:27:55,289 --> 00:28:02,299 So, this has a high centralization, whereas a line network will have a low centralization 271 00:28:02,299 --> 00:28:05,779 because all of them will have almost equal degree centrality. 272 00:28:06,340 --> 00:28:09,740 So, that is why we end this part of the lecture. 273 00:28:10,519 --> 00:28:12,399 Next day we will start with centrality.