1 00:00:20,670 --> 00:00:29,920 So,In the last few lectures we have been talking about the Basic Statically Metrics for analyzing 2 00:00:29,920 --> 00:00:37,090 complex large, complex networks. And we have got introduced to different centrality measures, 3 00:00:37,090 --> 00:00:43,290 page rank etcetera. In this set of lectures from now on wards 4 00:00:43,290 --> 00:00:48,940 we will mostly talk about Social Network Principles, and one of the first social network principles 5 00:00:48,940 --> 00:01:01,600 that we will discuss is called Assortativity or Homophily. 6 00:01:07,620 --> 00:01:17,590 The idea is somewhat like this, that given a social network rich people always tend to 7 00:01:17,590 --> 00:01:23,850 make friendship with other rich people. So this is the idea of Homophily or Assortativity. 8 00:01:23,850 --> 00:01:29,960 Also in other words you can say that the like goes with the like, so rich goes with the 9 00:01:29,960 --> 00:01:33,200 rich and possibly the poor goes with the poor. 10 00:01:34,200 --> 00:01:39,450 So if you look in to the slides the first example that we have here, is a friendship 11 00:01:39,450 --> 00:01:46,829 network from the one of the US high schools and what you see here there are three types 12 00:01:46,829 --> 00:01:55,759 of nodes in this network. The black ones correspond to black people in the school, the white ones 13 00:01:55,759 --> 00:02:00,119 corresponds to white people in the school and the grey ones are the others which could 14 00:02:00,119 --> 00:02:05,420 not be people who cannot be classified into either of this groups. And an edge in this 15 00:02:05,420 --> 00:02:12,010 network indicates a friendship relationship. So, what you observe here immediately is that 16 00:02:12,010 --> 00:02:18,690 there is this existence of homophily. That there are more blacks are more friends with 17 00:02:18,690 --> 00:02:24,180 other blacks, where as whites are more friends with other whites, and there are hardly any 18 00:02:24,180 --> 00:02:29,400 connections between blacks and white. This is the idea of homophily that we will build 19 00:02:29,400 --> 00:02:34,100 up on from now. So this is one of the very interesting examples. 20 00:02:34,730 --> 00:02:43,660 Another example was this experiment that was conducted in the San Francisco where there 21 00:02:43,660 --> 00:02:52,010 were 1958 couples who are interviewed. Now, these couples are like they classify themselves 22 00:02:52,010 --> 00:02:59,720 into four basic classes; the blacks, the whites, the hispanic or the people from Spanish Portuguese 23 00:02:59,720 --> 00:03:06,800 origin and others, who could not be classified into any of these three. And people from all 24 00:03:06,800 --> 00:03:13,709 this origins were interviewed and the question they were asked was about their sexual partnership. 25 00:03:13,709 --> 00:03:20,989 So, given a chance what type of sexual partner they would prefer. And this particular matrix 26 00:03:20,989 --> 00:03:26,569 in the slide shows you like what is their preferences, in general what is the preferences. 27 00:03:26,900 --> 00:03:33,910 So, one of the immediate observations from this particular slide or specifically this 28 00:03:33,910 --> 00:03:43,980 particular table is that the cells that are on the diagonal are the heaviest. Which again 29 00:03:43,980 --> 00:03:50,980 indicates that people who are of the same type are interested to have partner from their 30 00:03:50,980 --> 00:03:57,459 same own class like; blacks want to have more partners from the black class itself, hispanics 31 00:03:57,459 --> 00:04:04,829 want to have partners from mostly from the hispanic class itself, white tend to choose 32 00:04:04,829 --> 00:04:10,689 partners mostly from the white class and the others from the other class. You see that 33 00:04:10,689 --> 00:04:17,690 this is one very typical example in majority of social networks mostly which are built 34 00:04:17,690 --> 00:04:24,730 on this idea of friendship this particular phenomena is very, very, very prevalent. 35 00:04:24,730 --> 00:04:35,000 So, the idea is that again to iterate is that if there are people from the same class then 36 00:04:35,000 --> 00:04:40,540 partnerships or friendships between them is more probable than people from two different 37 00:04:40,540 --> 00:04:47,600 classes. Also this idea could be thought of as like people tend to go with other like 38 00:04:47,600 --> 00:04:52,500 people, so rich people tend to go with rich people like, so you can interpret it in various 39 00:04:52,500 --> 00:05:01,320 different forms. But the basic idea is this. So, some more examples; if you now look into 40 00:05:01,320 --> 00:05:08,040 this slide you see two typical examples. The left hand side network as it shows is much 41 00:05:08,040 --> 00:05:13,020 more assortativity than the right hand side network, the right hand side network on the 42 00:05:13,020 --> 00:05:20,870 other side is less hemophilic. And in general this type of networks are termed as Disassortativity 43 00:05:20,870 --> 00:05:28,180 Networks, that is rich do not go with rich; rich usually tend to go with poor. As we have 44 00:05:28,180 --> 00:05:32,840 seen long back in one of our introductory lectures in biological networks you see such 45 00:05:32,840 --> 00:05:37,920 disassortativity networks. Even in technological networks like routed networks you see this 46 00:05:37,920 --> 00:05:43,130 sort of disassortativity networks where like, many small computers, many mini computers 47 00:05:43,130 --> 00:05:47,090 connect to a large router. So it is mostly a disassortativity network. 48 00:05:47,350 --> 00:05:53,110 Where, social networks or friendship networks are mostly assortativity in nature. That is 49 00:05:53,110 --> 00:05:56,840 popular people tend to go with other popular people, tend to make friendship with other 50 00:05:56,840 --> 00:06:01,520 popular people rich people tend to make friendship with other rich people, that is the basic 51 00:06:01,520 --> 00:06:07,910 idea. Now given this observation from various social networks what immediate question is 52 00:06:07,910 --> 00:06:13,610 like, how can we have a quantitative measure of these particular phenomena? 53 00:06:14,620 --> 00:06:35,420 Now we will see how to Quantify Assortativity. The quantification goes like this, let us 54 00:06:35,420 --> 00:06:55,580 say that consider a node of degree k. Now the assortativity can be expressed by a factor 55 00:06:55,580 --> 00:07:15,960 called knn that is nearest neighbor degree. And this is defined as the following; k prime 56 00:07:15,960 --> 00:07:36,800 k prime p k prime given k, where p k prime given k is nothing but the conditional probability 57 00:07:36,800 --> 00:08:06,270 that a node of degree k ends up in connecting with another node of degree k prime. So this 58 00:08:06,270 --> 00:08:12,960 is the conditional probability that a node with degree k will connect at its other end 59 00:08:12,960 --> 00:08:18,500 with the node of degree k prime. So, this conditional probability multiplied 60 00:08:18,500 --> 00:08:26,450 by the node degree at the other end the k prime some of this over all nodes or all such 61 00:08:26,450 --> 00:08:33,300 k primes defines the nearest neighbor degree. The idea is very, very simple. So what you 62 00:08:33,300 --> 00:08:41,820 do is, let us say that we have a node x now we look at the degree of the node x, we also 63 00:08:41,820 --> 00:08:47,690 look at the degree of each of neighbors of the x. Let us draw it like this. 64 00:08:47,690 --> 00:09:05,680 Suppose you have a node x here, now say x as k neighbors N 1, N 2, N 3 up on till such 65 00:09:05,680 --> 00:09:19,420 k neighbors. Then what we do is we see what is the degree of each of the individual neighbors; 66 00:09:19,420 --> 00:09:29,680 we check the degree of each of the individual neighbors. We find an average of the degree 67 00:09:29,680 --> 00:09:35,810 of the neighbors that is the nearest degree neighbors. We find an average of the degree 68 00:09:35,810 --> 00:09:51,079 of all the neighbors, so you have the degree of the node x and the average degree of the 69 00:09:51,079 --> 00:09:59,139 neighbors. You have these two things, on the x axis you have the degree of the node x and 70 00:09:59,139 --> 00:10:04,339 on the y axis you have the average degree of the neighbors of x. 71 00:10:04,450 --> 00:10:18,899 Now, if this plot is a scatter diagram which mostly concentrates on the y equals x line 72 00:10:18,899 --> 00:10:28,470 then you have a high probability that nodes with similar degree or nodes of similar degree 73 00:10:28,470 --> 00:10:36,579 at friends in a social network. So what you see is that, my degree which is k is highly 74 00:10:36,579 --> 00:10:43,410 related with the average degree of my neighbors, so that is the idea. If my degree is highly 75 00:10:43,410 --> 00:10:48,839 correlated with the degree of my neighbors then it is an assortativity network. 76 00:10:48,839 --> 00:10:55,709 And such co-relation is reflected by the scatter diagram which is concentrated close to the 77 00:10:55,709 --> 00:11:03,389 y equals x line on this particular plot. So this is how you basically identify by plotting 78 00:11:03,389 --> 00:11:09,410 the degree and the degree of a node and the average degree of the neighbors of that node 79 00:11:09,410 --> 00:11:15,990 by plotting them on the x and the y axis and looking at how well they concentrate around 80 00:11:15,990 --> 00:11:23,249 the y equals x axis you identify whether a particular graph is assortativity or not. 81 00:11:23,249 --> 00:11:32,259 For instance, if you have a similar plot where you have the k and the average degree of the 82 00:11:32,259 --> 00:11:38,819 neighbors of x, k is basically the degree of x. 83 00:11:45,519 --> 00:11:57,310 And if you have a scatter plot which is just opposite like this then you have a high chance 84 00:11:57,310 --> 00:12:06,019 to believe that this particular network is disassortativity in nature. So, one side when 85 00:12:06,019 --> 00:12:12,600 it is highly correlated it is assortativity in nature, on the other side if it is negatively 86 00:12:12,600 --> 00:12:16,620 correlated then the network is thought to be disassortatvity. 87 00:12:17,720 --> 00:12:25,829 Just to make things more clear look at this diagram in each of this plot what we have 88 00:12:25,829 --> 00:12:31,389 plotted on the x axis is the degree values of all the nodes. So, every node x in the 89 00:12:31,389 --> 00:12:36,959 network we have plotted the degree of every node x in the network and on the y axis we 90 00:12:36,959 --> 00:12:44,160 have plotted the average degree of the neighbors of each such node x in the network that generates 91 00:12:44,160 --> 00:12:47,680 this plot. Now looking at this plot and having this fit 92 00:12:47,680 --> 00:12:53,600 having, this co relation analysis you can immediately say whether this is an assortativity 93 00:12:53,600 --> 00:12:56,220 network or disassortativity network. 94 00:12:58,670 --> 00:13:14,680 Now in order to further nicely quantify this idea there was this concept of Mixing introduced. 95 00:13:14,680 --> 00:13:20,970 Now in order to understand what exactly we mean by mixing in a social network we will 96 00:13:20,970 --> 00:13:29,370 look into the same example that I should you last time. The example of the partnership 97 00:13:29,370 --> 00:13:38,499 choices of these 4 categories of inhabitants of San Francisco: Black, Hispanic, White and 98 00:13:38,499 --> 00:13:46,209 the Others. Now, from this particular table that we see here we will translate this table 99 00:13:46,209 --> 00:13:50,469 into a more normalized version. 100 00:13:51,100 --> 00:13:58,589 So what we will do in this normalized version, if you look at this slides each cell of this 101 00:13:58,589 --> 00:14:07,269 table is normalized by the sum of all the entries across all this cells of the table. 102 00:14:07,269 --> 00:14:13,939 Basically, you normalize each cell by sum of all the entries in all the cells of this 103 00:14:13,939 --> 00:14:24,749 table. That means, now the sum of all the individual cells will adapt to 1. If you look 104 00:14:24,749 --> 00:14:33,259 at the slides that is way we write here sum of i j e i j is equal to 1. Now again even 105 00:14:33,259 --> 00:14:39,399 by looking at this table you can very nicely observe that the diagonalies heavy. 106 00:14:39,399 --> 00:14:48,470 Now, if we have a matrix where the diagonal contains all the values there is no other 107 00:14:48,470 --> 00:14:55,389 values in no other cells, then that would mean that the network is perfectly assortative, 108 00:14:55,389 --> 00:15:01,899 that is there is no other value in any other cell except the diagonal. So, blacks only 109 00:15:01,899 --> 00:15:07,529 go with black, hispanics only go with Hispanics, others only goes with others, and white only 110 00:15:07,529 --> 00:15:12,529 goes with white. Then in such case only the diagonal will have all the concentration of 111 00:15:12,529 --> 00:15:18,069 the values while the other cells will be empty or 0. 112 00:15:19,540 --> 00:15:26,759 In order to quantify this particular notion we will define the assortative mixing coefficient 113 00:15:26,759 --> 00:15:36,399 r. On one extreme you have e i i, which is the diagonal element this is the sum of all 114 00:15:36,399 --> 00:15:46,360 the diagonal elements so you are counting the total density of the diagonal elements 115 00:15:46,360 --> 00:15:57,639 by sum of e i i. Now you are subtracting from there the chance that a black chooses a hispanic 116 00:15:57,639 --> 00:16:06,490 or a black chooses some other group with some random chance independently, so that is quantified 117 00:16:06,490 --> 00:16:16,699 by this sum of a i b i. As you see here, as we have shown in the table a i is the sum 118 00:16:16,699 --> 00:16:22,499 of the elements on the rows, where as b i or b j is the some of the elements on the 119 00:16:22,499 --> 00:16:28,730 columns. Basically, this is independently if there 120 00:16:28,730 --> 00:16:38,430 is a chance those two nodes from two different groups' pair up for sexual partnership so 121 00:16:38,430 --> 00:16:47,230 that you discount from the total volume. Basically, you see what is the actual partnership that, 122 00:16:47,230 --> 00:16:53,990 you are getting from the data minus the part that you could have observed just by random 123 00:16:53,990 --> 00:17:01,851 chance. This is similar to the idea of defining correlation coefficient in statistics. Basic 124 00:17:01,851 --> 00:17:10,350 idea is again if I iterate that looking at the data you have the probability, you can 125 00:17:10,350 --> 00:17:20,429 estimate the probability of pair of people grouping for sexual partnership. This is say 126 00:17:20,429 --> 00:17:26,850 black going with black, white going with white, these value is counted or this fraction is 127 00:17:26,850 --> 00:17:34,630 counted in some of e i i. And from there we remove the part which could be just absorbed 128 00:17:34,630 --> 00:17:43,290 by random chance which is sum of a i b i. Now, this is normalized by, as I say perfect 129 00:17:43,290 --> 00:17:49,050 assortativity would be when some of e i i will be 1 everything else is 0 that is perfect 130 00:17:49,050 --> 00:17:56,080 assortativity. So that extreme is 1, that is the extreme value of e i i minus sum of 131 00:17:56,080 --> 00:18:03,320 a i b i. So that is the extreme value of e i i minus sum of a i b i. This fraction is 132 00:18:03,320 --> 00:18:10,230 what we call the mixing coefficient. Basically, what you see is you find out what 133 00:18:10,230 --> 00:18:16,870 is the probability or what is the chance that blacks goes with blacks, white go with whites, 134 00:18:16,870 --> 00:18:21,580 and you sum up all this counts minus what is the probability that you see by chance 135 00:18:21,580 --> 00:18:29,290 that two people pair up that is what you discount from this value and then you normalize this 136 00:18:29,290 --> 00:18:36,110 whole metric with 1 minus sum of a i b i. Where 1 is the extreme value of e i i that 137 00:18:36,110 --> 00:18:42,270 is the maximum that you can achieve. So if it is a perfectly assortative network then 138 00:18:42,270 --> 00:18:48,950 what will happen is this mixing coefficient again will be 1. 139 00:18:49,360 --> 00:19:00,190 Because, in such case you have r is equal to sum of e i i minus sum of a i b i by 1 140 00:19:00,190 --> 00:19:21,390 minus a i b i. Now for perfectly assortative networks sum of e i i will be equal to 1 as 141 00:19:21,390 --> 00:19:31,570 we said, that implies r will be equal to 1 minus sum of a i b i by i minus sum of a i 142 00:19:31,570 --> 00:19:38,160 b i which is equal to 1. So, for perfectly assortative graphs we will have a mixing coefficient 143 00:19:38,160 --> 00:19:46,380 equal to 1. However, if it is a disassortative network then e i i will be 0 and we will have 144 00:19:46,380 --> 00:19:50,580 a negative mixing coefficient value. 145 00:19:54,960 --> 00:20:04,491 Then after this the after we have got a little bit of idea about homophily or assortativity 146 00:20:04,491 --> 00:20:10,451 we will now look into another very interesting concept called Signed Graphs. 147 00:20:11,720 --> 00:20:24,750 Basically, this is a formal structure of graphs through which you can express, for instance 148 00:20:24,750 --> 00:20:30,310 in a social network or in a friendship network you can express both friendship as well as 149 00:20:30,310 --> 00:21:01,550 enmity. A network by which one can express both - friendship and enmity, some of the 150 00:21:01,550 --> 00:21:09,440 examples are one that we have given here in the slides, so look at this graphs. So, a 151 00:21:09,440 --> 00:21:18,290 plus sign on an age of this network would indicate friendship, whereas a minus sign 152 00:21:18,290 --> 00:21:25,890 would indicate enmity. If two nodes are connected by an edge which as a plus sign then it is 153 00:21:25,890 --> 00:21:28,350 a friendship relationship between these two nodes. 154 00:21:28,670 --> 00:21:35,510 However, if two nodes are connected by a negative edge, then this relationship is enmity relationship. 155 00:21:35,510 --> 00:21:42,390 And I have this interesting question given our online class it would be a nice exercise 156 00:21:42,390 --> 00:21:53,750 to measure how it will look in terms of this sign graph. Do you really have enemies here? 157 00:21:53,750 --> 00:21:59,630 Once we have this concept of sign graphs the first thing that people where interested in 158 00:21:59,630 --> 00:22:15,920 studying was this idea of balancing. Basically, these idea barrows from the traditional balancing 159 00:22:15,920 --> 00:22:22,790 theory; if you look at these graphs are given here. For instance the first graph, the graph 160 00:22:22,790 --> 00:22:29,930 marked as a. You see there are three nodes u v and w, it is a triangle basically. Now 161 00:22:29,930 --> 00:22:38,980 all the edges are marked as plus. So everybody is a friend of everybody else in this network. 162 00:22:38,980 --> 00:22:45,690 This is very stable configuration. Now let us take the second example. The second 163 00:22:45,690 --> 00:22:52,780 example is a bit tricky. So what you have here that, there are two nodes who are friend 164 00:22:52,780 --> 00:23:00,060 among each other and both of them actually share an enmity relationship with the third 165 00:23:00,060 --> 00:23:07,730 node. This is again a possible configuration because two friends might have a common enemy 166 00:23:07,730 --> 00:23:18,270 in general that is also a stable configuration. The third one is where you have at least two 167 00:23:18,270 --> 00:23:26,750 edges which are positive. Whereas, the third edge between these two is negative. This is 168 00:23:26,750 --> 00:23:34,070 a rare case. And the forth case is impossible. That there are three enemies in a triangle 169 00:23:34,070 --> 00:23:36,750 is a completely impossible case. 170 00:23:40,940 --> 00:23:50,120 Now given this examples of triangles we can also imagine cases of 4 cycles. Now like how 171 00:23:50,120 --> 00:23:59,410 should be the sign graphs taking 4 nodes together look like. Some examples are here. So, some 172 00:23:59,410 --> 00:24:07,000 of the stable configuration are shown here. These are the 2 friends each of each are enemies 173 00:24:07,000 --> 00:24:12,570 or these are the two friends and then there are 2 enemies on the other side. So these 174 00:24:12,570 --> 00:24:16,170 are some of the stable configurations that you observe here. 175 00:24:16,790 --> 00:24:25,650 In general the idea is that you should have even number of negative signs in the graphs, 176 00:24:25,650 --> 00:24:30,440 unless you have an even number of negative signs in the graph the configuration is not 177 00:24:30,440 --> 00:24:38,890 stable. Only if you have an even number of negative signs on edges in a graph then only 178 00:24:38,890 --> 00:24:44,150 your configuration is a stable configuration. For instance, in this particular example you 179 00:24:44,150 --> 00:24:54,330 see c and d are having uneven number of negative edges, and that is why these are unstable 180 00:24:54,330 --> 00:25:02,060 configurations. Whereas, in this particular case the 4 cycles you have only even number 181 00:25:02,060 --> 00:25:05,560 of negative edges that is why both of them are stable configurations. 182 00:25:06,680 --> 00:25:30,570 So, the next idea that we will talk about is Structural Holes. This is also again a 183 00:25:30,570 --> 00:25:36,760 very interesting idea and we have already looked into some sort of a quantification 184 00:25:36,760 --> 00:25:44,010 of this idea in one of our previous lectures when we discussed about betweenness centrality. 185 00:25:44,010 --> 00:25:52,220 Basically, structural holes are nothing but nodes or social actors in a network who are 186 00:25:52,220 --> 00:26:01,790 like brokers, like they actually transmit relevant information from one part of the 187 00:26:01,790 --> 00:26:09,490 network to the other part; they actually behave like information brokers. 188 00:26:09,490 --> 00:26:19,470 For instance, let us take these examples here. So, structural holes, as it reads out actually 189 00:26:19,470 --> 00:26:28,450 will separate non-redundant sources of information, sources that are additive and not over lapping. 190 00:26:28,450 --> 00:27:11,520 If you have two parts of the network say, one here and the other here. Basically, this 191 00:27:11,520 --> 00:27:19,700 green node here is denoted as a structural hole, because we are imagining that the information 192 00:27:19,700 --> 00:27:26,750 that is there within this particular group of members in the social network is very different 193 00:27:26,750 --> 00:27:32,730 from the information that is stored here in this group of networks, so that is why we 194 00:27:32,730 --> 00:27:43,990 call this particular node a Structural Hole. We have a word of caution here; there are 195 00:27:43,990 --> 00:27:51,770 two things that one needs to be careful about. A cohesive group cannot have a structural 196 00:27:51,770 --> 00:28:06,490 hole, for instance if you have a network like this, so this very cohesive network. And since 197 00:28:06,490 --> 00:28:11,690 this is a very cohesive network everybody has similar piece of information that is why 198 00:28:11,690 --> 00:28:18,810 nobody in this network actually qualifies as a structural hole. Similarly, if there 199 00:28:18,810 --> 00:28:29,990 is another similar concept of equivalence. For instance, suppose you have a node here 200 00:28:29,990 --> 00:28:42,370 and on two sides of it you have nodes that have equivalent information, and then also 201 00:28:42,370 --> 00:28:48,980 this is not an example of a structural hole. For instance say, this node or this node or 202 00:28:48,980 --> 00:28:56,200 this node or this node none of them are structural holes. Here also this particular black node 203 00:28:56,200 --> 00:29:03,400 is not a structural hole, because it does not enjoy any extra information more than, 204 00:29:03,400 --> 00:29:10,781 either of this green node. However, if you have a case where you have a node same black 205 00:29:10,781 --> 00:29:20,821 node here, but then the nodes on the left hand side have a very different set of information 206 00:29:22,350 --> 00:29:37,110 from the nodes on the right hand side. Then this particular node actually qualifies as 207 00:29:37,110 --> 00:29:38,250 a structural hole. 208 00:29:38,960 --> 00:29:40,960 So, we will stop here.