1
00:00:18,900 --> 00:00:23,539
So, welcome back to this session on network
analysis.
2
00:00:23,539 --> 00:00:26,719
So, we will continue with the second part
of network analysis.
3
00:00:26,820 --> 00:00:33,300
And, today we will start with the second metric
that we will get ourselves introduced to,
4
00:00:33,300 --> 00:00:35,400
and this is called clustering coefficient.
5
00:00:42,860 --> 00:00:50,060
So, the clustering coefficient actually comes
from the transitivity property of a network;
6
00:00:53,520 --> 00:01:01,810
transitivity property of a particular network.
7
00:01:02,700 --> 00:01:11,600
So, the idea is very simple that if you have,
if there are two nodes in the network say
8
00:01:11,600 --> 00:01:20,130
A and B having a mutual friend say C, then
there is a high probability that A and B are
9
00:01:20,130 --> 00:01:22,759
also friends in the social network.
10
00:01:22,759 --> 00:01:34,859
So, A and C was a friend, B and C was a friend,
then there is a high probability.
11
00:01:34,859 --> 00:01:45,340
High probability that A and B are also friends.
12
00:01:47,040 --> 00:01:53,040
So, this is what is called the famous transitivity
property of a network.
13
00:01:53,580 --> 00:02:00,599
Now, this property actually comes from the
observation that in many cases.
14
00:02:00,599 --> 00:02:07,459
So, this probability actually will be higher
and higher, if you encounter situations like
15
00:02:07,459 --> 00:02:10,039
the one that I am drawing.
16
00:02:11,800 --> 00:02:24,240
Suppose there are many such common friends,
the larger the number of common friends between
17
00:02:24,250 --> 00:02:26,650
A and B, the higher is the probability.
18
00:02:26,650 --> 00:02:37,950
So, the large, higher is the probability that
A and B are friends.
19
00:02:38,819 --> 00:02:47,579
So, the larger is the number of mutual friends
between A and B, the higher is the probability
20
00:02:47,579 --> 00:02:49,559
that A and B are themselves friends.
21
00:02:49,960 --> 00:02:55,800
So, this idea actually is called the transitivity
of a particular social network.
22
00:02:56,590 --> 00:03:03,810
And, this transitivity can be quantified using
something called the clustering coefficient.
23
00:03:04,260 --> 00:03:05,960
that we will define now.
24
00:03:09,980 --> 00:03:14,280
So, the clustering coefficient can be measured
for each vertex.
25
00:03:14,280 --> 00:03:15,680
So, it is a vertex centric measure.
26
00:03:16,050 --> 00:03:30,470
So, clustering coefficient; clustering coefficient
is a vertex centric measure.
27
00:03:30,960 --> 00:03:36,040
So, you can measure this value for each individual
vertex separately.
28
00:03:36,620 --> 00:03:44,170
So what you do is, suppose you want to measure
the clustering coefficient of a node say x
29
00:03:44,170 --> 00:03:48,840
here, you basically look into the neighbors
of x.
30
00:03:48,840 --> 00:03:54,189
So, say x has 1, 2, 3 and 4 neighbors.
31
00:03:54,189 --> 00:04:00,599
You look into the total number of connections
between the neighbors of x.
32
00:04:00,599 --> 00:04:19,150
So, in this particular example number of connections
between neighbors of x; this is expressed
33
00:04:19,150 --> 00:04:44,910
as a fraction of the total number of possible
connections between or among all the neighbors of X
34
00:04:45,380 --> 00:04:48,380
So, this is basically the definition of clustering
coefficient.
35
00:04:48,900 --> 00:04:53,759
So, you look into the neighbors of a particular
node x.
36
00:04:53,759 --> 00:04:57,620
So, if you are interested to find out the
clustering coefficient of x, you look into
37
00:04:57,620 --> 00:04:58,620
the neighbors of x.
38
00:04:58,789 --> 00:05:04,330
Here for this particular example, you have
1, 2, 3 and 4 neighbors.
39
00:05:04,330 --> 00:05:13,810
So, you can write the clustering coefficient
of the node x in this particular example is,
40
00:05:13,810 --> 00:05:16,310
among the neighbors there are how many edges?
41
00:05:16,310 --> 00:05:21,370
There is 1 edge, 2 edge, 3 edge and 4 edge.
42
00:05:21,970 --> 00:05:26,890
So, you have four divided by the total number
of possible edges.
43
00:05:27,009 --> 00:05:29,189
So, what is the total number of possible edges?
44
00:05:29,490 --> 00:05:35,379
It should be; so, since there are four nodes,
there should be four C 2 possible edges.
45
00:05:35,379 --> 00:05:38,459
So, this is basically the clustering coefficient.
46
00:05:38,800 --> 00:05:44,020
This ratio actually defines the clustering
coefficient of a particular node.
47
00:05:44,360 --> 00:05:49,919
So, in this way you can measure the clustering
coefficient for each individual node in the
48
00:05:49,919 --> 00:05:50,499
network.
49
00:05:50,919 --> 00:05:57,629
And, the clustering coefficient of the whole
network is just an average of all the individual
50
00:05:57,629 --> 00:05:59,549
clustering coefficients of the different nodes.
51
00:05:59,699 --> 00:06:03,059
So, so it is a node centric property.
52
00:06:03,259 --> 00:06:07,079
First of all, for each individual node you
measure the clustering coefficient.
53
00:06:07,580 --> 00:06:14,000
Basically, for each individual node you try
to estimate what is the extent of cliquishness,
54
00:06:14,000 --> 00:06:18,669
how complete or how cliquish the neighborhood
of that particular node is.
55
00:06:18,669 --> 00:06:24,860
So, you try to express that as a fraction
of the maximum cliquishness possible, and
56
00:06:24,860 --> 00:06:29,280
then this fraction can be estimated for each
individual node.
57
00:06:29,280 --> 00:06:34,470
And then, for the whole network you just have
to average out all the individual clustering
58
00:06:34,470 --> 00:06:36,020
coefficients for the different nodes.
59
00:06:36,020 --> 00:06:43,210
So, that is how you define the clustering
coefficient of a particular network.
60
00:06:43,210 --> 00:06:51,599
Now, for this part of the lecture I have shown
you three examples.
61
00:06:51,599 --> 00:06:55,849
If you see in the slides, I have shown you
three examples.
62
00:06:55,849 --> 00:07:03,580
There are three different examples, where
the clustering coefficient for the first example
63
00:07:03,580 --> 00:07:10,000
for that black node, there in the center,
should be equal to zero because there is no
64
00:07:10,000 --> 00:07:17,919
edge that exists between its neighbors.
65
00:07:17,919 --> 00:07:27,539
So, similarly the clustering coefficient for
the second example is actually 0.5 because
66
00:07:27,539 --> 00:07:32,319
there are three edges between the neighbors.
67
00:07:32,319 --> 00:07:33,880
There are three edges between the neighbors.
68
00:07:33,880 --> 00:07:36,819
And, there could be a possible of four C 2
edges.
69
00:07:36,819 --> 00:07:39,150
So, that is how you measure.
70
00:07:39,150 --> 00:07:45,939
Similarly, for the third example, the clustering
coefficient is one because all the nodes that
71
00:07:45,939 --> 00:07:50,560
are neighbors of the black node are completely
connected.
72
00:07:50,560 --> 00:07:56,669
That is how you actually measure the clustering
coefficient for each individual node.
73
00:07:56,669 --> 00:08:06,290
So, this is; in the next slide, I actually
defined the formula more precisely.
74
00:08:06,290 --> 00:08:13,150
The same thing that I have written down in
the text, now the interesting part is that
75
00:08:13,150 --> 00:08:14,150
you can measure.
76
00:08:14,150 --> 00:08:20,199
Once you have this quantity, you can measure
the extent of transitivity of the different
77
00:08:20,199 --> 00:08:21,450
real world networks.
78
00:08:21,450 --> 00:08:29,330
So, and actually that was the thing that people
were trying to do in early 2001, 2002.
79
00:08:29,330 --> 00:08:35,940
And, what they found is that so if you look
into these networks the worldwide web, the
80
00:08:35,940 --> 00:08:42,040
internet, the co-authorship network, the metabolic
network, the C. elegance network, you see
81
00:08:42,040 --> 00:08:47,150
most of them have clustering coefficients
ranging from between, somewhere between 0
82
00:08:47,150 --> 00:08:48,150
and 1.
83
00:08:48,150 --> 00:08:55,250
But, what is interesting to note is that networks
like co-authorship network has a very high
84
00:08:55,250 --> 00:08:56,330
clustering coefficient.
85
00:08:56,330 --> 00:09:01,440
That is, these are mostly social networks
which have very high transitivity.
86
00:09:01,440 --> 00:09:09,050
That is, the idea is that if there are a lot
of mutual coauthors between a pair of scientists,
87
00:09:09,050 --> 00:09:14,110
then it is highly likely that those two scientists
have also coauthored a paper.
88
00:09:14,110 --> 00:09:19,900
So, that probability is quite high and is
close to, roughly close to 0.43 as per the
89
00:09:19,900 --> 00:09:23,040
table shown in the slides.
90
00:09:23,040 --> 00:09:27,300
So, another interesting example is this actor
network.
91
00:09:27,300 --> 00:09:31,540
This is again a, since this is an interesting
example I would like to take up these and
92
00:09:31,540 --> 00:09:32,870
discuss a bit more.
93
00:09:32,870 --> 00:09:43,000
So, this example actually draws from the complex
system of movies and actors.
94
00:09:43,000 --> 00:09:59,710
So, you conceive first of all of a bipartite
network, where one partition is basically
95
00:09:59,710 --> 00:10:08,100
the movies or the movie nodes and the other
partition is basically the actors or the actor
96
00:10:08,100 --> 00:10:09,100
nodes.
97
00:10:09,100 --> 00:10:13,630
Now, you have some movies and you have some
actors.
98
00:10:13,630 --> 00:10:19,030
Now, you draw an edge between a movie and
an actor.
99
00:10:19,030 --> 00:10:26,520
If suppose, this say the name of this movie
is M 1 and the name of this actor is A 1.
100
00:10:26,520 --> 00:10:41,990
You draw an edge between M 1 and A 1, if A
1 is in the cast of the movie M 1.
101
00:10:41,990 --> 00:10:47,230
So, this is how you construct a movie-actor
bipartite network.
102
00:10:47,290 --> 00:11:00,150
Now, from this bipartite network you can construct
something called one mode projections.
103
00:11:03,200 --> 00:11:12,880
This one mode projections can be drawn; again
on this side, one mode projections.
104
00:11:14,250 --> 00:11:27,200
So, these one mode projections can be drawn
on actor nodes as well as on, sorry, on actor
105
00:11:27,200 --> 00:11:31,900
nodes as well as on movie nodes.
106
00:11:37,400 --> 00:11:41,240
So, what you would do in drawing the one more
projection?
107
00:11:41,480 --> 00:11:47,040
So, let us try to define that in the next
part.
108
00:11:47,580 --> 00:11:48,920
So, suppose you have.
109
00:11:49,930 --> 00:12:05,750
So, from the movie-actor
bipartite network, you construct one more
110
00:12:05,750 --> 00:12:08,550
projection say on the actor nodes as follows.
111
00:12:09,420 --> 00:12:14,540
Suppose there are two actors A 1 and A 2.
112
00:12:15,000 --> 00:12:33,410
You draw an edge between A 1 and A 2, if A
1 and A 2 have co-acted in a movie say M.
113
00:12:33,410 --> 00:12:39,170
Now, this graph as you can imagine can be
a weighted graph.
114
00:12:39,170 --> 00:12:44,680
So, this can have a weight w.
115
00:12:44,680 --> 00:13:06,270
And, this w is nothing but if A 1 and A 2
have co-acted in w movies, then the weight
116
00:13:06,270 --> 00:13:11,220
of the edge is w.
117
00:13:11,220 --> 00:13:29,780
Now, in this way you can act, you can construct
a actor-actor one mode projection.
118
00:13:29,780 --> 00:13:36,980
So now, if you think of the movie-actor bipartite
network, when you are constructing this projection
119
00:13:36,980 --> 00:13:39,450
on the actor nodes, what is happening?
120
00:13:39,450 --> 00:13:45,500
If you think carefully, what is happening
is that for every individual movie there is
121
00:13:45,500 --> 00:13:47,720
a clique imposed on this network.
122
00:13:48,410 --> 00:13:52,490
So, all the actors in that movie will have
an edge between them because they have acted
123
00:13:52,490 --> 00:13:53,490
in that movie.
124
00:13:53,490 --> 00:13:57,320
So, that forms a clique of actors for that
particular movie.
125
00:13:57,320 --> 00:14:01,790
And for every individual such movie, you are
imposing a clique on this network.
126
00:14:01,790 --> 00:14:06,220
So, that is why these kinds of graphs are
usually pretty cliquish.
127
00:14:06,220 --> 00:14:12,260
And, that is why as you see the clustering
coefficient of this network, as I have shown
128
00:14:12,260 --> 00:14:16,890
you in the slides, the clustering coefficient
of this particular network, the actor-actor
129
00:14:16,890 --> 00:14:19,840
network is point seven nine, which is very
high.
130
00:14:19,840 --> 00:14:26,520
So, the probability that there exist an edge
between a pair of actors, if they have co-acted
131
00:14:26,520 --> 00:14:29,240
with many other actors is very high.
132
00:14:29,830 --> 00:14:33,790
So, that is what I wanted to actually point
out.
133
00:14:39,300 --> 00:14:46,560
So, then the next thing that we will talk
about is this concept of small world
134
00:14:50,800 --> 00:14:57,420
and the 6 degrees of separation.
135
00:15:02,760 --> 00:15:06,280
So, so this idea is again very interesting.
136
00:15:06,710 --> 00:15:12,930
So, suppose look at the slides, suppose if
I say that all late registrants in the complex
137
00:15:12,930 --> 00:15:15,660
networks course will be given ten marks bonus.
138
00:15:15,660 --> 00:15:18,990
So, how fast do you think will this information
spread?
139
00:15:18,990 --> 00:15:24,060
In general, I would imagine that it would
spread very fast among all the registrants
140
00:15:24,060 --> 00:15:25,580
of this course.
141
00:15:25,580 --> 00:15:31,320
So if that is the case, then the question
is that why is it?
142
00:15:31,320 --> 00:15:38,320
So that it spreads fast and why is it so that
it actually spreads in the first place.
143
00:15:38,320 --> 00:15:46,250
And, to show that this actually happens, this
spread actually takes place, the famous scientist
144
00:15:46,250 --> 00:15:52,980
Milgram designed a very interesting experiment,
which he called the 6 degrees of separation
145
00:15:52,980 --> 00:15:54,160
experiment.
146
00:15:54,430 --> 00:15:59,190
So, what he actually did was something like
this.
147
00:15:59,730 --> 00:16:05,370
So, Travers and Milgram in 1969, they designed
this classic study in Social Science.
148
00:16:05,370 --> 00:16:09,220
Actually, this is one of the very interesting
in classic studies in Social Science.
149
00:16:09,220 --> 00:16:17,770
So, what they said is that suppose you have
a source say Kharagpur, and there is a stockbroker
150
00:16:17,770 --> 00:16:26,630
in Kharagpur, who wants to send a message
or a packet to some stockbroker in Kolkata.
151
00:16:26,630 --> 00:16:34,310
So, now this letter, what this person does,
this stockbroker does is forwards this letter.
152
00:16:34,310 --> 00:16:37,060
So, he does not post this letter.
153
00:16:37,060 --> 00:16:41,290
He does not post it using the usual postal
codes.
154
00:16:41,290 --> 00:16:46,310
What he does is he passes this letter to one
of his friends.
155
00:16:46,310 --> 00:16:54,250
So, this letter actually has the name, the
address, etcetera of the destination stockbroker.
156
00:16:54,600 --> 00:16:59,420
So, what he does is the stockbroker takes
up this letter and passes it to one of his
157
00:16:59,420 --> 00:17:05,230
friends, whom he believes would know the Kolkata
stockbroker.
158
00:17:05,230 --> 00:17:12,559
The Kharagpur stockbroker picks up this letter
and passes it to one of his random friends.
159
00:17:12,559 --> 00:17:18,819
So, and then he feels that this friend might
be knowing the Kolkata stockbroker and would
160
00:17:18,819 --> 00:17:19,889
pass it on to him.
161
00:17:19,889 --> 00:17:26,009
So, now this friend will again pass it to
some other friend of his, and that friend
162
00:17:26,009 --> 00:17:30,919
will again pass it to some other friend of
his and in this way the chain might at some
163
00:17:30,919 --> 00:17:37,399
point complete, leading to the destination
or otherwise it may fail.
164
00:17:37,660 --> 00:17:42,620
So, what happened was the experiment that
they performed, what came out was something
165
00:17:42,629 --> 00:17:51,559
like this that if the letter was to reach
the target, it would reach in roughly 6 steps.
166
00:17:51,559 --> 00:18:10,480
So, if the letter reached the destination,
it did so in at most 6 steps.
167
00:18:10,480 --> 00:18:15,970
But, as you can understand that this is a,
you know, this is a stochastic experiment
168
00:18:15,970 --> 00:18:19,509
and things happen by chance most of the times.
169
00:18:19,509 --> 00:18:24,950
So, what happened was like 64 out of 296 chains
actually reached the target.
170
00:18:24,950 --> 00:18:30,750
So, they initiate with 296 chains to start
this experiment with, but then only 64 of
171
00:18:30,750 --> 00:18:31,310
them survived.
172
00:18:31,750 --> 00:18:32,630
Many of them dropped midway.
173
00:18:33,020 --> 00:18:40,049
But, those that survived, among them, all
of them reached the destination in an average
174
00:18:40,049 --> 00:18:42,070
of five point two steps.
175
00:18:42,070 --> 00:18:49,970
So, basically every letter, or every time
the letter reached the destination, it reached
176
00:18:49,970 --> 00:18:53,779
within 6 hops, roughly within 6 hops.
177
00:18:53,779 --> 00:18:59,200
So, this was a very interesting observation
that Travers and Milgram made.
178
00:18:59,200 --> 00:19:02,040
And, this is known as the 6 degrees of step
separation.
179
00:19:02,170 --> 00:19:06,870
And, this is actually a very interesting trivia
question in various quiz contests.
180
00:19:08,230 --> 00:19:14,050
So, now the idea is like you might think like
is this all a magic?
181
00:19:14,470 --> 00:19:15,910
By which this is happening?
182
00:19:16,600 --> 00:19:23,870
The point is intuitively if you try to explain
this, there is a reasonable explanation that
183
00:19:23,870 --> 00:19:24,490
exists.
184
00:19:24,870 --> 00:19:26,650
It is nothing happening by magic.
185
00:19:27,149 --> 00:19:31,109
There is an intuitive; there can be an intuitive
explanation for this.
186
00:19:31,309 --> 00:19:37,249
And, in the next slide we will try to discuss
this intuitive explanation.
187
00:19:37,509 --> 00:19:42,209
So, imagine that you are a person in the network.
188
00:19:44,620 --> 00:19:47,380
So, think of your Facebook friends.
189
00:19:47,820 --> 00:19:50,400
How many Facebook friends roughly you have?
190
00:19:50,400 --> 00:19:53,509
So, I would imagine somewhere between 500
to 1000.
191
00:19:53,509 --> 00:19:56,279
So, let us be much more moderate.
192
00:19:56,279 --> 00:20:01,419
Let us take that you have say some 100 friends.
193
00:20:01,419 --> 00:20:08,200
And, I am assuming that these 100 friends
are not connected among themselves.
194
00:20:08,200 --> 00:20:10,879
So you have, actually you have more than that.
195
00:20:10,879 --> 00:20:16,610
But then, say there are at least 100, who
are not at, who are no way connected among
196
00:20:16,610 --> 00:20:17,640
themselves.
197
00:20:17,640 --> 00:20:24,570
Now, these 100 friends by the same hypotheses
will again have another 100 friends, like this
198
00:20:25,570 --> 00:20:26,310
And, this will continue.
199
00:20:27,039 --> 00:20:34,520
Now, if you look at this tree of acquaintances
or friendships, this completely desperate
200
00:20:34,520 --> 00:20:42,929
tree, where two nodes only know the parent,
but nobody among themselves.
201
00:20:42,929 --> 00:20:51,260
So this, if you grow this tree, then what
you see is that within 6 to 7 steps; so, if
202
00:20:51,260 --> 00:21:01,350
you have 100 friends in each level, within
6 to 7 steps you have, within 6 to 7 steps
203
00:21:01,350 --> 00:21:12,929
we have covered the entire population of the
earth.
204
00:21:12,929 --> 00:21:16,799
Of course, I understand that there could be
transitivity triangles.
205
00:21:16,799 --> 00:21:22,789
But, what I am assuming is that there are
500 friends, and among these 500 or they are.
206
00:21:22,789 --> 00:21:25,549
So, in many of you will have 800, 1000 friends.
207
00:21:25,750 --> 00:21:30,570
But among this, there are at least 100 friends
who are not connected among themselves.
208
00:21:30,570 --> 00:21:31,330
That is the assumption.
209
00:21:31,570 --> 00:21:32,490
And that is the assumption in every step.
210
00:21:32,850 --> 00:21:37,330
If you do this simple assumption, which I
would say is a realistic assumption.
211
00:21:38,230 --> 00:21:43,799
And if you go by this, then at every step
you spawn a bunch of new nodes.
212
00:21:43,799 --> 00:21:50,129
And, if you have spawned until say 6 steps,
you have at least like a few billion nodes
213
00:21:50,129 --> 00:21:51,360
that you have covered.
214
00:21:51,360 --> 00:21:54,019
So, that is how.
215
00:21:54,019 --> 00:22:01,529
So, this is why Milgram and his colleagues
could have each of the letters reach their
216
00:22:01,529 --> 00:22:04,929
destination within 6 steps.
217
00:22:04,929 --> 00:22:12,860
That is the basic idea of the 6 degrees of
separation.
218
00:22:12,860 --> 00:22:21,299
So then, after this we will start with another
very important quantitative metric that people
219
00:22:21,299 --> 00:22:22,730
quite often use.
220
00:22:22,730 --> 00:22:28,549
So, these are called centrality metrics.
221
00:22:28,549 --> 00:22:38,639
So, as the name suggests you can immediately
understand that centrality would indicate
222
00:22:38,639 --> 00:22:52,899
how central a node is in a given network.
223
00:22:52,899 --> 00:23:01,639
So, you want to estimate how central a node
is in a given network; and, as I write in
224
00:23:01,639 --> 00:23:03,200
the slides.
225
00:23:03,200 --> 00:23:09,970
So, you want to estimate basically centrality
in terms of these four quantities.
226
00:23:09,970 --> 00:23:19,370
This 4 P's - prestige, prominence, importance
and power; these are the 4 Ps of or the 4
227
00:23:19,370 --> 00:23:20,899
pillars of measuring centrality.
228
00:23:20,899 --> 00:23:28,340
So, you want to identify prestigious nodes,
prominent nodes, powerful nodes and important
229
00:23:28,340 --> 00:23:29,340
nodes.
230
00:23:29,340 --> 00:23:33,139
All of them more or less means similar; if
you think carefully.
231
00:23:33,139 --> 00:23:37,870
So, we have to have quantitative measures
to identify such central nodes.
232
00:23:37,870 --> 00:23:49,580
And, this is actually important in various
ranking experiments; because in various experiments,
233
00:23:49,580 --> 00:23:55,289
what you want is to rank the nodes according
to their centrality values.
234
00:23:55,289 --> 00:23:59,590
So, the more central values go at the top,
the more nodes with more central values go
235
00:23:59,590 --> 00:24:03,370
at the top and the nodes with less central
values come at the bottom.
236
00:24:03,370 --> 00:24:08,700
And, this ranking is actually necessary for
various other applications as we shall see
237
00:24:08,700 --> 00:24:10,860
in some of the later part of the course.
238
00:24:12,429 --> 00:24:21,580
So, now say if we are like okay with the philosophical
definition of centrality, we have to find
239
00:24:21,580 --> 00:24:27,619
out quantitative measures to identify centrality
of the nodes in a network.
240
00:24:27,619 --> 00:24:34,510
Now, one of the simplest measures that come
to once mind would be, in the context of a
241
00:24:34,510 --> 00:24:43,369
network, would be perhaps the degree of a
node.
242
00:24:43,369 --> 00:24:52,529
Degree of a node, so that is what I write
in the next slide; and this is termed as the
243
00:24:52,529 --> 00:25:00,649
degree centrality of a node.
244
00:25:02,300 --> 00:25:04,080
So, and the definition is very simple.
245
00:25:04,480 --> 00:25:16,580
Suppose, there is a node say A and it has
say a degree equal to d.
246
00:25:17,779 --> 00:25:30,470
So then the degree centrality, the degree
centrality is equal to d by capital N minus
247
00:25:30,470 --> 00:25:40,630
1, where N is the number of nodes in the network.
248
00:25:41,420 --> 00:25:44,780
So, basically what you are doing?
249
00:25:44,900 --> 00:25:51,659
You are expressing the degree of a node in
as a fraction of the maximum possible degree
250
00:25:51,659 --> 00:25:52,219
of a node.
251
00:25:52,660 --> 00:26:00,820
So, in all, a node can be connected to N minus
1 other nodes, if there are N nodes in the
252
00:26:00,820 --> 00:26:01,820
system.
253
00:26:01,820 --> 00:26:06,269
So, if the node has a degree d, then you are
expressing this d as a fraction of N minus 1
254
00:26:07,269 --> 00:26:12,929
That is what is the degree centrality of a
node in a network.
255
00:26:12,929 --> 00:26:20,389
Now given this definition of degree centrality,
we can also define something called the centralization
256
00:26:20,389 --> 00:26:23,929
of a network.
257
00:26:23,929 --> 00:26:38,640
Centralization of a network; which means the
variance in degree centrality; you try to
258
00:26:38,640 --> 00:26:39,640
measure.
259
00:26:39,640 --> 00:26:42,499
So, for each node you can estimate the degree
centrality.
260
00:26:42,499 --> 00:26:47,450
Now, using these values you can measure the
variance of these values.
261
00:26:47,450 --> 00:26:53,139
And, if this variance, so this variance is
called the centralization of the network.
262
00:26:53,139 --> 00:27:00,179
If this variance is high, this means that
there are some nodes with very high degree
263
00:27:00,179 --> 00:27:01,179
centrality.
264
00:27:01,179 --> 00:27:20,389
If it is high, some nodes with very high degree
centrality compared to the others.
265
00:27:20,389 --> 00:27:33,580
If it is low, then all degree centralities
are similar; mostly similar to each other.
266
00:27:33,580 --> 00:27:41,210
And, an example of a case where the degree
centrality is queued, that is, the centralization
267
00:27:41,210 --> 00:27:42,590
is high, would be.
268
00:27:43,260 --> 00:27:51,659
If you can simply imagine a star network,
where, the inner black node will have a high
269
00:27:51,659 --> 00:27:55,289
degree centrality; whereas the outer nodes
will have low degree centrality.
270
00:27:55,289 --> 00:28:02,299
So, this has a high centralization, whereas
a line network will have a low centralization
271
00:28:02,299 --> 00:28:05,779
because all of them will have almost equal
degree centrality.
272
00:28:06,340 --> 00:28:09,740
So, that is why we end this part of the lecture.
273
00:28:10,519 --> 00:28:12,399
Next day we will start with centrality.