1
00:00:20,670 --> 00:00:29,920
So,In the last few lectures we have been talking
about the Basic Statically Metrics for analyzing
2
00:00:29,920 --> 00:00:37,090
complex large, complex networks. And we have
got introduced to different centrality measures,
3
00:00:37,090 --> 00:00:43,290
page rank etcetera.
In this set of lectures from now on wards
4
00:00:43,290 --> 00:00:48,940
we will mostly talk about Social Network Principles,
and one of the first social network principles
5
00:00:48,940 --> 00:01:01,600
that we will discuss is called Assortativity
or Homophily.
6
00:01:07,620 --> 00:01:17,590
The idea is somewhat like this, that given
a social network rich people always tend to
7
00:01:17,590 --> 00:01:23,850
make friendship with other rich people. So
this is the idea of Homophily or Assortativity.
8
00:01:23,850 --> 00:01:29,960
Also in other words you can say that the like
goes with the like, so rich goes with the
9
00:01:29,960 --> 00:01:33,200
rich and possibly the poor goes with the poor.
10
00:01:34,200 --> 00:01:39,450
So if you look in to the slides the first
example that we have here, is a friendship
11
00:01:39,450 --> 00:01:46,829
network from the one of the US high schools
and what you see here there are three types
12
00:01:46,829 --> 00:01:55,759
of nodes in this network. The black ones correspond
to black people in the school, the white ones
13
00:01:55,759 --> 00:02:00,119
corresponds to white people in the school
and the grey ones are the others which could
14
00:02:00,119 --> 00:02:05,420
not be people who cannot be classified into
either of this groups. And an edge in this
15
00:02:05,420 --> 00:02:12,010
network indicates a friendship relationship.
So, what you observe here immediately is that
16
00:02:12,010 --> 00:02:18,690
there is this existence of homophily. That
there are more blacks are more friends with
17
00:02:18,690 --> 00:02:24,180
other blacks, where as whites are more friends
with other whites, and there are hardly any
18
00:02:24,180 --> 00:02:29,400
connections between blacks and white. This
is the idea of homophily that we will build
19
00:02:29,400 --> 00:02:34,100
up on from now. So this is one of the very
interesting examples.
20
00:02:34,730 --> 00:02:43,660
Another example was this experiment that was
conducted in the San Francisco where there
21
00:02:43,660 --> 00:02:52,010
were 1958 couples who are interviewed. Now,
these couples are like they classify themselves
22
00:02:52,010 --> 00:02:59,720
into four basic classes; the blacks, the whites,
the hispanic or the people from Spanish Portuguese
23
00:02:59,720 --> 00:03:06,800
origin and others, who could not be classified
into any of these three. And people from all
24
00:03:06,800 --> 00:03:13,709
this origins were interviewed and the question
they were asked was about their sexual partnership.
25
00:03:13,709 --> 00:03:20,989
So, given a chance what type of sexual partner
they would prefer. And this particular matrix
26
00:03:20,989 --> 00:03:26,569
in the slide shows you like what is their
preferences, in general what is the preferences.
27
00:03:26,900 --> 00:03:33,910
So, one of the immediate observations from
this particular slide or specifically this
28
00:03:33,910 --> 00:03:43,980
particular table is that the cells that are
on the diagonal are the heaviest. Which again
29
00:03:43,980 --> 00:03:50,980
indicates that people who are of the same
type are interested to have partner from their
30
00:03:50,980 --> 00:03:57,459
same own class like; blacks want to have more
partners from the black class itself, hispanics
31
00:03:57,459 --> 00:04:04,829
want to have partners from mostly from the
hispanic class itself, white tend to choose
32
00:04:04,829 --> 00:04:10,689
partners mostly from the white class and the
others from the other class. You see that
33
00:04:10,689 --> 00:04:17,690
this is one very typical example in majority
of social networks mostly which are built
34
00:04:17,690 --> 00:04:24,730
on this idea of friendship this particular
phenomena is very, very, very prevalent.
35
00:04:24,730 --> 00:04:35,000
So, the idea is that again to iterate is that
if there are people from the same class then
36
00:04:35,000 --> 00:04:40,540
partnerships or friendships between them is
more probable than people from two different
37
00:04:40,540 --> 00:04:47,600
classes. Also this idea could be thought of
as like people tend to go with other like
38
00:04:47,600 --> 00:04:52,500
people, so rich people tend to go with rich
people like, so you can interpret it in various
39
00:04:52,500 --> 00:05:01,320
different forms. But the basic idea is this.
So, some more examples; if you now look into
40
00:05:01,320 --> 00:05:08,040
this slide you see two typical examples. The
left hand side network as it shows is much
41
00:05:08,040 --> 00:05:13,020
more assortativity than the right hand side
network, the right hand side network on the
42
00:05:13,020 --> 00:05:20,870
other side is less hemophilic. And in general
this type of networks are termed as Disassortativity
43
00:05:20,870 --> 00:05:28,180
Networks, that is rich do not go with rich;
rich usually tend to go with poor. As we have
44
00:05:28,180 --> 00:05:32,840
seen long back in one of our introductory
lectures in biological networks you see such
45
00:05:32,840 --> 00:05:37,920
disassortativity networks. Even in technological
networks like routed networks you see this
46
00:05:37,920 --> 00:05:43,130
sort of disassortativity networks where like,
many small computers, many mini computers
47
00:05:43,130 --> 00:05:47,090
connect to a large router. So it is mostly
a disassortativity network.
48
00:05:47,350 --> 00:05:53,110
Where, social networks or friendship networks
are mostly assortativity in nature. That is
49
00:05:53,110 --> 00:05:56,840
popular people tend to go with other popular
people, tend to make friendship with other
50
00:05:56,840 --> 00:06:01,520
popular people rich people tend to make friendship
with other rich people, that is the basic
51
00:06:01,520 --> 00:06:07,910
idea. Now given this observation from various
social networks what immediate question is
52
00:06:07,910 --> 00:06:13,610
like, how can we have a quantitative measure
of these particular phenomena?
53
00:06:14,620 --> 00:06:35,420
Now we will see how to Quantify Assortativity.
The quantification goes like this, let us
54
00:06:35,420 --> 00:06:55,580
say that consider a node of degree k. Now
the assortativity can be expressed by a factor
55
00:06:55,580 --> 00:07:15,960
called knn that is nearest neighbor degree.
And this is defined as the following; k prime
56
00:07:15,960 --> 00:07:36,800
k prime p k prime given k, where p k prime
given k is nothing but the conditional probability
57
00:07:36,800 --> 00:08:06,270
that a node of degree k ends up in connecting
with another node of degree k prime. So this
58
00:08:06,270 --> 00:08:12,960
is the conditional probability that a node
with degree k will connect at its other end
59
00:08:12,960 --> 00:08:18,500
with the node of degree k prime.
So, this conditional probability multiplied
60
00:08:18,500 --> 00:08:26,450
by the node degree at the other end the k
prime some of this over all nodes or all such
61
00:08:26,450 --> 00:08:33,300
k primes defines the nearest neighbor degree.
The idea is very, very simple. So what you
62
00:08:33,300 --> 00:08:41,820
do is, let us say that we have a node x now
we look at the degree of the node x, we also
63
00:08:41,820 --> 00:08:47,690
look at the degree of each of neighbors of
the x. Let us draw it like this.
64
00:08:47,690 --> 00:09:05,680
Suppose you have a node x here, now say x
as k neighbors N 1, N 2, N 3 up on till such
65
00:09:05,680 --> 00:09:19,420
k neighbors. Then what we do is we see what
is the degree of each of the individual neighbors;
66
00:09:19,420 --> 00:09:29,680
we check the degree of each of the individual
neighbors. We find an average of the degree
67
00:09:29,680 --> 00:09:35,810
of the neighbors that is the nearest degree
neighbors. We find an average of the degree
68
00:09:35,810 --> 00:09:51,079
of all the neighbors, so you have the degree
of the node x and the average degree of the
69
00:09:51,079 --> 00:09:59,139
neighbors. You have these two things, on the
x axis you have the degree of the node x and
70
00:09:59,139 --> 00:10:04,339
on the y axis you have the average degree
of the neighbors of x.
71
00:10:04,450 --> 00:10:18,899
Now, if this plot is a scatter diagram which
mostly concentrates on the y equals x line
72
00:10:18,899 --> 00:10:28,470
then you have a high probability that nodes
with similar degree or nodes of similar degree
73
00:10:28,470 --> 00:10:36,579
at friends in a social network. So what you
see is that, my degree which is k is highly
74
00:10:36,579 --> 00:10:43,410
related with the average degree of my neighbors,
so that is the idea. If my degree is highly
75
00:10:43,410 --> 00:10:48,839
correlated with the degree of my neighbors
then it is an assortativity network.
76
00:10:48,839 --> 00:10:55,709
And such co-relation is reflected by the scatter
diagram which is concentrated close to the
77
00:10:55,709 --> 00:11:03,389
y equals x line on this particular plot. So
this is how you basically identify by plotting
78
00:11:03,389 --> 00:11:09,410
the degree and the degree of a node and the
average degree of the neighbors of that node
79
00:11:09,410 --> 00:11:15,990
by plotting them on the x and the y axis and
looking at how well they concentrate around
80
00:11:15,990 --> 00:11:23,249
the y equals x axis you identify whether a
particular graph is assortativity or not.
81
00:11:23,249 --> 00:11:32,259
For instance, if you have a similar plot where
you have the k and the average degree of the
82
00:11:32,259 --> 00:11:38,819
neighbors of x, k is basically the degree
of x.
83
00:11:45,519 --> 00:11:57,310
And if you have a scatter plot which is just
opposite like this then you have a high chance
84
00:11:57,310 --> 00:12:06,019
to believe that this particular network is
disassortativity in nature. So, one side when
85
00:12:06,019 --> 00:12:12,600
it is highly correlated it is assortativity
in nature, on the other side if it is negatively
86
00:12:12,600 --> 00:12:16,620
correlated then the network is thought to
be disassortatvity.
87
00:12:17,720 --> 00:12:25,829
Just to make things more clear look at this
diagram in each of this plot what we have
88
00:12:25,829 --> 00:12:31,389
plotted on the x axis is the degree values
of all the nodes. So, every node x in the
89
00:12:31,389 --> 00:12:36,959
network we have plotted the degree of every
node x in the network and on the y axis we
90
00:12:36,959 --> 00:12:44,160
have plotted the average degree of the neighbors
of each such node x in the network that generates
91
00:12:44,160 --> 00:12:47,680
this plot.
Now looking at this plot and having this fit
92
00:12:47,680 --> 00:12:53,600
having, this co relation analysis you can
immediately say whether this is an assortativity
93
00:12:53,600 --> 00:12:56,220
network or disassortativity network.
94
00:12:58,670 --> 00:13:14,680
Now in order to further nicely quantify this
idea there was this concept of Mixing introduced.
95
00:13:14,680 --> 00:13:20,970
Now in order to understand what exactly we
mean by mixing in a social network we will
96
00:13:20,970 --> 00:13:29,370
look into the same example that I should you
last time. The example of the partnership
97
00:13:29,370 --> 00:13:38,499
choices of these 4 categories of inhabitants
of San Francisco: Black, Hispanic, White and
98
00:13:38,499 --> 00:13:46,209
the Others. Now, from this particular table
that we see here we will translate this table
99
00:13:46,209 --> 00:13:50,469
into a more normalized version.
100
00:13:51,100 --> 00:13:58,589
So what we will do in this normalized version,
if you look at this slides each cell of this
101
00:13:58,589 --> 00:14:07,269
table is normalized by the sum of all the
entries across all this cells of the table.
102
00:14:07,269 --> 00:14:13,939
Basically, you normalize each cell by sum
of all the entries in all the cells of this
103
00:14:13,939 --> 00:14:24,749
table. That means, now the sum of all the
individual cells will adapt to 1. If you look
104
00:14:24,749 --> 00:14:33,259
at the slides that is way we write here sum
of i j e i j is equal to 1. Now again even
105
00:14:33,259 --> 00:14:39,399
by looking at this table you can very nicely
observe that the diagonalies heavy.
106
00:14:39,399 --> 00:14:48,470
Now, if we have a matrix where the diagonal
contains all the values there is no other
107
00:14:48,470 --> 00:14:55,389
values in no other cells, then that would
mean that the network is perfectly assortative,
108
00:14:55,389 --> 00:15:01,899
that is there is no other value in any other
cell except the diagonal. So, blacks only
109
00:15:01,899 --> 00:15:07,529
go with black, hispanics only go with Hispanics,
others only goes with others, and white only
110
00:15:07,529 --> 00:15:12,529
goes with white. Then in such case only the
diagonal will have all the concentration of
111
00:15:12,529 --> 00:15:18,069
the values while the other cells will be empty
or 0.
112
00:15:19,540 --> 00:15:26,759
In order to quantify this particular notion
we will define the assortative mixing coefficient
113
00:15:26,759 --> 00:15:36,399
r. On one extreme you have e i i, which is
the diagonal element this is the sum of all
114
00:15:36,399 --> 00:15:46,360
the diagonal elements so you are counting
the total density of the diagonal elements
115
00:15:46,360 --> 00:15:57,639
by sum of e i i. Now you are subtracting from
there the chance that a black chooses a hispanic
116
00:15:57,639 --> 00:16:06,490
or a black chooses some other group with some
random chance independently, so that is quantified
117
00:16:06,490 --> 00:16:16,699
by this sum of a i b i. As you see here, as
we have shown in the table a i is the sum
118
00:16:16,699 --> 00:16:22,499
of the elements on the rows, where as b i
or b j is the some of the elements on the
119
00:16:22,499 --> 00:16:28,730
columns.
Basically, this is independently if there
120
00:16:28,730 --> 00:16:38,430
is a chance those two nodes from two different
groups' pair up for sexual partnership so
121
00:16:38,430 --> 00:16:47,230
that you discount from the total volume. Basically,
you see what is the actual partnership that,
122
00:16:47,230 --> 00:16:53,990
you are getting from the data minus the part
that you could have observed just by random
123
00:16:53,990 --> 00:17:01,851
chance. This is similar to the idea of defining
correlation coefficient in statistics. Basic
124
00:17:01,851 --> 00:17:10,350
idea is again if I iterate that looking at
the data you have the probability, you can
125
00:17:10,350 --> 00:17:20,429
estimate the probability of pair of people
grouping for sexual partnership. This is say
126
00:17:20,429 --> 00:17:26,850
black going with black, white going with white,
these value is counted or this fraction is
127
00:17:26,850 --> 00:17:34,630
counted in some of e i i. And from there we
remove the part which could be just absorbed
128
00:17:34,630 --> 00:17:43,290
by random chance which is sum of a i b i.
Now, this is normalized by, as I say perfect
129
00:17:43,290 --> 00:17:49,050
assortativity would be when some of e i i
will be 1 everything else is 0 that is perfect
130
00:17:49,050 --> 00:17:56,080
assortativity. So that extreme is 1, that
is the extreme value of e i i minus sum of
131
00:17:56,080 --> 00:18:03,320
a i b i. So that is the extreme value of e
i i minus sum of a i b i. This fraction is
132
00:18:03,320 --> 00:18:10,230
what we call the mixing coefficient.
Basically, what you see is you find out what
133
00:18:10,230 --> 00:18:16,870
is the probability or what is the chance that
blacks goes with blacks, white go with whites,
134
00:18:16,870 --> 00:18:21,580
and you sum up all this counts minus what
is the probability that you see by chance
135
00:18:21,580 --> 00:18:29,290
that two people pair up that is what you discount
from this value and then you normalize this
136
00:18:29,290 --> 00:18:36,110
whole metric with 1 minus sum of a i b i.
Where 1 is the extreme value of e i i that
137
00:18:36,110 --> 00:18:42,270
is the maximum that you can achieve. So if
it is a perfectly assortative network then
138
00:18:42,270 --> 00:18:48,950
what will happen is this mixing coefficient
again will be 1.
139
00:18:49,360 --> 00:19:00,190
Because, in such case you have r is equal
to sum of e i i minus sum of a i b i by 1
140
00:19:00,190 --> 00:19:21,390
minus a i b i. Now for perfectly assortative
networks sum of e i i will be equal to 1 as
141
00:19:21,390 --> 00:19:31,570
we said, that implies r will be equal to 1
minus sum of a i b i by i minus sum of a i
142
00:19:31,570 --> 00:19:38,160
b i which is equal to 1. So, for perfectly
assortative graphs we will have a mixing coefficient
143
00:19:38,160 --> 00:19:46,380
equal to 1. However, if it is a disassortative
network then e i i will be 0 and we will have
144
00:19:46,380 --> 00:19:50,580
a negative mixing coefficient value.
145
00:19:54,960 --> 00:20:04,491
Then after this the after we have got a little
bit of idea about homophily or assortativity
146
00:20:04,491 --> 00:20:10,451
we will now look into another very interesting
concept called Signed Graphs.
147
00:20:11,720 --> 00:20:24,750
Basically, this is a formal structure of graphs
through which you can express, for instance
148
00:20:24,750 --> 00:20:30,310
in a social network or in a friendship network
you can express both friendship as well as
149
00:20:30,310 --> 00:21:01,550
enmity. A network by which one can express
both - friendship and enmity, some of the
150
00:21:01,550 --> 00:21:09,440
examples are one that we have given here in
the slides, so look at this graphs. So, a
151
00:21:09,440 --> 00:21:18,290
plus sign on an age of this network would
indicate friendship, whereas a minus sign
152
00:21:18,290 --> 00:21:25,890
would indicate enmity. If two nodes are connected
by an edge which as a plus sign then it is
153
00:21:25,890 --> 00:21:28,350
a friendship relationship between these two
nodes.
154
00:21:28,670 --> 00:21:35,510
However, if two nodes are connected by a negative
edge, then this relationship is enmity relationship.
155
00:21:35,510 --> 00:21:42,390
And I have this interesting question given
our online class it would be a nice exercise
156
00:21:42,390 --> 00:21:53,750
to measure how it will look in terms of this
sign graph. Do you really have enemies here?
157
00:21:53,750 --> 00:21:59,630
Once we have this concept of sign graphs the
first thing that people where interested in
158
00:21:59,630 --> 00:22:15,920
studying was this idea of balancing. Basically,
these idea barrows from the traditional balancing
159
00:22:15,920 --> 00:22:22,790
theory; if you look at these graphs are given
here. For instance the first graph, the graph
160
00:22:22,790 --> 00:22:29,930
marked as a. You see there are three nodes
u v and w, it is a triangle basically. Now
161
00:22:29,930 --> 00:22:38,980
all the edges are marked as plus. So everybody
is a friend of everybody else in this network.
162
00:22:38,980 --> 00:22:45,690
This is very stable configuration.
Now let us take the second example. The second
163
00:22:45,690 --> 00:22:52,780
example is a bit tricky. So what you have
here that, there are two nodes who are friend
164
00:22:52,780 --> 00:23:00,060
among each other and both of them actually
share an enmity relationship with the third
165
00:23:00,060 --> 00:23:07,730
node. This is again a possible configuration
because two friends might have a common enemy
166
00:23:07,730 --> 00:23:18,270
in general that is also a stable configuration.
The third one is where you have at least two
167
00:23:18,270 --> 00:23:26,750
edges which are positive. Whereas, the third
edge between these two is negative. This is
168
00:23:26,750 --> 00:23:34,070
a rare case. And the forth case is impossible.
That there are three enemies in a triangle
169
00:23:34,070 --> 00:23:36,750
is a completely impossible case.
170
00:23:40,940 --> 00:23:50,120
Now given this examples of triangles we can
also imagine cases of 4 cycles. Now like how
171
00:23:50,120 --> 00:23:59,410
should be the sign graphs taking 4 nodes together
look like. Some examples are here. So, some
172
00:23:59,410 --> 00:24:07,000
of the stable configuration are shown here.
These are the 2 friends each of each are enemies
173
00:24:07,000 --> 00:24:12,570
or these are the two friends and then there
are 2 enemies on the other side. So these
174
00:24:12,570 --> 00:24:16,170
are some of the stable configurations that
you observe here.
175
00:24:16,790 --> 00:24:25,650
In general the idea is that you should have
even number of negative signs in the graphs,
176
00:24:25,650 --> 00:24:30,440
unless you have an even number of negative
signs in the graph the configuration is not
177
00:24:30,440 --> 00:24:38,890
stable. Only if you have an even number of
negative signs on edges in a graph then only
178
00:24:38,890 --> 00:24:44,150
your configuration is a stable configuration.
For instance, in this particular example you
179
00:24:44,150 --> 00:24:54,330
see c and d are having uneven number of negative
edges, and that is why these are unstable
180
00:24:54,330 --> 00:25:02,060
configurations. Whereas, in this particular
case the 4 cycles you have only even number
181
00:25:02,060 --> 00:25:05,560
of negative edges that is why both of them
are stable configurations.
182
00:25:06,680 --> 00:25:30,570
So, the next idea that we will talk about
is Structural Holes. This is also again a
183
00:25:30,570 --> 00:25:36,760
very interesting idea and we have already
looked into some sort of a quantification
184
00:25:36,760 --> 00:25:44,010
of this idea in one of our previous lectures
when we discussed about betweenness centrality.
185
00:25:44,010 --> 00:25:52,220
Basically, structural holes are nothing but
nodes or social actors in a network who are
186
00:25:52,220 --> 00:26:01,790
like brokers, like they actually transmit
relevant information from one part of the
187
00:26:01,790 --> 00:26:09,490
network to the other part; they actually behave
like information brokers.
188
00:26:09,490 --> 00:26:19,470
For instance, let us take these examples here.
So, structural holes, as it reads out actually
189
00:26:19,470 --> 00:26:28,450
will separate non-redundant sources of information,
sources that are additive and not over lapping.
190
00:26:28,450 --> 00:27:11,520
If you have two parts of the network say,
one here and the other here. Basically, this
191
00:27:11,520 --> 00:27:19,700
green node here is denoted as a structural
hole, because we are imagining that the information
192
00:27:19,700 --> 00:27:26,750
that is there within this particular group
of members in the social network is very different
193
00:27:26,750 --> 00:27:32,730
from the information that is stored here in
this group of networks, so that is why we
194
00:27:32,730 --> 00:27:43,990
call this particular node a Structural Hole.
We have a word of caution here; there are
195
00:27:43,990 --> 00:27:51,770
two things that one needs to be careful about.
A cohesive group cannot have a structural
196
00:27:51,770 --> 00:28:06,490
hole, for instance if you have a network like
this, so this very cohesive network. And since
197
00:28:06,490 --> 00:28:11,690
this is a very cohesive network everybody
has similar piece of information that is why
198
00:28:11,690 --> 00:28:18,810
nobody in this network actually qualifies
as a structural hole. Similarly, if there
199
00:28:18,810 --> 00:28:29,990
is another similar concept of equivalence.
For instance, suppose you have a node here
200
00:28:29,990 --> 00:28:42,370
and on two sides of it you have nodes that
have equivalent information, and then also
201
00:28:42,370 --> 00:28:48,980
this is not an example of a structural hole.
For instance say, this node or this node or
202
00:28:48,980 --> 00:28:56,200
this node or this node none of them are structural
holes. Here also this particular black node
203
00:28:56,200 --> 00:29:03,400
is not a structural hole, because it does
not enjoy any extra information more than,
204
00:29:03,400 --> 00:29:10,781
either of this green node. However, if you
have a case where you have a node same black
205
00:29:10,781 --> 00:29:20,821
node here, but then the nodes on the left
hand side have a very different set of information
206
00:29:22,350 --> 00:29:37,110
from the nodes on the right hand side. Then
this particular node actually qualifies as
207
00:29:37,110 --> 00:29:38,250
a structural hole.
208
00:29:38,960 --> 00:29:40,960
So, we will stop here.