1
00:00:18,760 --> 00:00:27,771
In the last class we discussed about the models
shared memory and then we also discussed on
2
00:00:27,771 --> 00:00:39,899
mess connected computers and today the first
models of what we will be discussing with
3
00:00:39,899 --> 00:00:46,710
is known as butterfly model.
4
00:00:46,710 --> 00:01:07,340
Now it consists of K plus 1 2 to the power
K processors and this processor divided in
5
00:01:07,340 --> 00:01:32,310
2 K plus 1 rows and each row contains 2 to
the power K processors now the rows are numbered
6
00:01:32,310 --> 00:01:47,150
0 2 0 1 2 K and also the processor in the
person the in any row are indexed as 0 1 2
7
00:01:47,150 --> 00:02:11,709
to the 4 K minus 1 ok now let us assume that
P IJ is the GH index process in I TH row now
8
00:02:11,709 --> 00:02:30,239
this process PIJ has the four connections
one connection is P I minus 1 J another one
9
00:02:30,239 --> 00:03:04,499
is P I plus 1 J provided it exists PI Minus
1 M and T I plus 1 where M is of P.
10
00:03:04,499 --> 00:04:02,760
Where M and L are obtained as by converting
I TH MSB of J and
11
00:04:02,760 --> 00:04:10,840
so P IJ is connected with the at most four
processor P I minus 1 J P I plus 1 J P I M
12
00:04:10,840 --> 00:04:19,010
minus 1 M and P I plus 1 L where m and L are
as follows M is opted buying buddy item is
13
00:04:19,010 --> 00:04:32,760
P of J and L is obtained by inviting I minus
1 at MSB of J respectively little now by immediate
14
00:04:32,760 --> 00:04:38,650
MSB means that today if we start with.
15
00:04:38,650 --> 00:04:49,250
That B1 B2 or BQ or BK these are the K mix
you have because it is a 2 to the power K
16
00:04:49,250 --> 00:05:03,560
processors you have so K bits this is the
0 etymology okay this is the 1st energy and
17
00:05:03,560 --> 00:05:20,290
so on now let us consider K have say K equals
to 3 that is number of processors is 32 number
18
00:05:20,290 --> 00:05:40,600
of rows is 4 and number of process you need
to roll
19
00:05:40,600 --> 00:06:02,400
so the resume that you have 0 Rho 1 Rho 2
and then you are okay here / 0 1 2 3 4 5 6
20
00:06:02,400 --> 00:06:23,560
7 so yeah is Calculate so
you have so p0 this is your Rosalie's index
21
00:06:23,560 --> 00:06:40,110
is zero one two three this side is your eye
this side is your K so P 0 0 is connected
22
00:06:40,110 --> 00:07:32,710
with P 1 0 so this is correct with this, this
is connect with this okay.
23
00:07:32,710 --> 00:07:50,470
So this is this links I study based on these
two now think about Row 1 say this one P 1
24
00:07:50,470 --> 00:08:22,800
1 P 1 0 P 1 0 P10 is that
so by inverting this we will be getting one
25
00:08:22,800 --> 00:08:31,890
because first this is the zero eight feet
this is the first bit this is the second bit
26
00:08:31,890 --> 00:08:54,370
so you will be is MSB were replacing and P
will be connected cool p 0 p 0 p 0 2 ponies
27
00:08:54,370 --> 00:09:16,930
I get a minus 1 so this will be connected
to P Zero p10 you should be corrected okay
28
00:09:16,930 --> 00:09:31,250
hold this is one thing you have to keep this
first second is key the key so in that case
29
00:09:31,250 --> 00:09:45,540
v10 will be connected to p 0 p 0 and here
the first bid will be converting 1 0 0 0 so
30
00:09:45,540 --> 00:10:07,450
p 0 4 p 0 4 okay.
Then you have B 1 1 and it will be P 0 5 P
31
00:10:07,450 --> 00:10:25,210
0 5 similarly you have P 0 6 and P 0 7 now
what happens P 1 4 P 1 4 is 0 0 0 0 so P 1
32
00:10:25,210 --> 00:10:36,990
4 will be connected to this similarly this
will be connected we should be connected to
33
00:10:36,990 --> 00:10:51,070
this okay so this is based on based on this
connection now P I + l will give you the reverse
34
00:10:51,070 --> 00:10:56,450
connection basically while I consider this
one this will give you that so this is a bi
35
00:10:56,450 --> 00:11:09,240
directional taste it will show and then you
have suppose P 2 P 2 1 P 2 0 P 2 0 it becomes
36
00:11:09,240 --> 00:11:29,740
P 1 P 1 and 0 1 0 P 1 2 P 2 0 is P 12 so similarly
we will be getting this one this will get
37
00:11:29,740 --> 00:11:46,730
this one
and last one we get .
38
00:11:46,730 --> 00:11:54,690
So these two are basically to execute the
binary link so the structure of this but at
39
00:11:54,690 --> 00:11:59,529
this butterfly networking rubs up that this
looks like a butterfly things the height of
40
00:11:59,529 --> 00:12:09,500
this is like locate block n plus 1 and this
side you have the 2 to the power or L processors
41
00:12:09,500 --> 00:12:18,170
and processor and here you observe they could
transmit the data from one corner to another
42
00:12:18,170 --> 00:12:27,361
corner you will not take more than or a log
n time and another thing is they this type
43
00:12:27,361 --> 00:12:38,770
of network what you need you can add another
part of life here and you can plug it into
44
00:12:38,770 --> 00:12:42,960
that okay.
Only these condition is that you should fill
45
00:12:42,960 --> 00:12:58,290
in the form of k plus 1 2 to the 4th K so
suppose you given, given a K + 2, 2 to the
46
00:12:58,290 --> 00:13:12,880
power K plus 1 number of processors and you
first obtain K + 1 2 to the power K 1 cluster
47
00:13:12,880 --> 00:13:26,000
the another cluster is K + 1 2 to the power
K right and then on the top you put K + so
48
00:13:26,000 --> 00:13:34,980
you get basically here K + 1 to the power
k plus 1 right so basically you need additional
49
00:13:34,980 --> 00:13:44,020
2 to the power k plus 2 to the power k plus
1 and notes to be fixed for additional things
50
00:13:44,020 --> 00:13:53,130
so the thing is that converting and we want
to break it and it create upgradability things
51
00:13:53,130 --> 00:14:08,580
you have to do little homework or bookkeeping
to do that okay.
52
00:14:08,580 --> 00:14:16,120
The type of problem which you can solve on
butterfly Network is that you where you need
53
00:14:16,120 --> 00:14:22,120
the question of pipelining is coming then
you find that this network in the useful.
54
00:14:22,120 --> 00:14:43,140
Next model will be discussing is hypercube
here you have 2 to the power K processors
55
00:14:43,140 --> 00:15:21,940
and each processor is he this person is connected
with Logan now VI is the safety 0 P 1 P 2
56
00:15:21,940 --> 00:15:38,060
to the power K minus 1 are the indices of
the processors and P I is connected to PJ
57
00:15:38,060 --> 00:16:06,360
if, if J can be J can be obtained by inverting
any bite thee are K bits you have of the binary
58
00:16:06,360 --> 00:16:14,370
representation of five okay here is connected
to PJ here J can be obtained by inverting
59
00:16:14,370 --> 00:16:20,670
any bites there are K bits you have of the
binary representation of Phi.
60
00:16:20,670 --> 00:16:37,560
Let us know that suppose you have N equal
to 8 N equal to 8 so you have P 0 P 1 P 7
61
00:16:37,560 --> 00:16:55,250
so P 0 is connected with so you have the 0
1 2 3 4 5 6 7 that is the binary representation
62
00:16:55,250 --> 00:17:17,569
of this is 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1
0 1 110 and 111nowthis 0 is connected with
63
00:17:17,569 --> 00:18:04,720
by inverting 001 010 100 0 00 011 101 011
000 110 010 001 111 101 110 000 100 111 001
64
00:18:04,720 --> 00:18:26,850
111 100 010 110 101 011 so P0 is connected
with these three process because there are
65
00:18:26,850 --> 00:18:42,210
K connections.
So if I draw this P0 P1 P2 P3 P4 P5 P6 e sir
66
00:18:42,210 --> 00:19:00,440
P0 is connected with P 1 P 2 P 4 P 1 P 2 P1
is connected with P 1 P 0 P 3 P 5 P 3 P 5
67
00:19:00,440 --> 00:19:21,549
P 2 is connected with P 3 P 3 P 0 and P 6
P3 is connectors P 2 P 1 and P 7 P 4 is connected
68
00:19:21,549 --> 00:19:48,540
with P 5 P 6 and P0 P 5 is connected with
P 4 P 7 and P 1 P6 connected with P 7 P 4
69
00:19:48,540 --> 00:20:05,130
and P2 P 7 is connected with B 6 3 5 and P0
is not it so this is the structure of your
70
00:20:05,130 --> 00:20:21,620
hypercube we impressed with 8 process.
Now what happens when we have 16 process into
71
00:20:21,620 --> 00:20:47,250
the 16 process we
72
00:20:47,250 --> 00:21:01,470
will be adding another zero and you have another
0001 0010 0100 0000 0011 0101 0011 0000 0110
73
00:21:01,470 --> 00:21:15,310
0010 0001 0111 0101 0110 0000 0100 0111 0001
0111 0100 0010 0110 0101 0011 and you write
74
00:21:15,310 --> 00:21:20,071
another one 1000 1001 1010 1011 1100 1101
1110 1111 is for each person similarly you
75
00:21:20,071 --> 00:21:26,990
can have remaining 8 Prosser so here what
happens that you are connected already 8 now
76
00:21:26,990 --> 00:21:35,460
it is e 0 is connected with P 8 so I am drawing
another one is P 8.
77
00:21:35,460 --> 00:22:14,260
P 8 P 1 is connected with P 9 and P 2 is connected
with P 10 P3 is connected with P 11 P4 is
78
00:22:14,260 --> 00:22:38,300
connected with P12 P5 is
connected with P13 P6 is connected with P14
79
00:22:38,300 --> 00:23:26,160
P7 is connected with P15 now think about the
8 , 8 onwards it is connected with zero that
80
00:23:26,160 --> 00:23:44,669
is done eight it is connected with nine it
is of character with nine eight is connected
81
00:23:44,669 --> 00:24:08,080
with ten and eight is connected with 14 eight
is connected with twelve.
82
00:24:08,080 --> 00:24:21,220
Now think about next one nine is connected
with eight nine is connected with eleven nine
83
00:24:21,220 --> 00:24:36,170
is connected with thirteen and nine is connected
with one next one is 10, 10 is connected with
84
00:24:36,170 --> 00:24:54,970
11 10 is connected with 11 and then 10 is
connected with a ten is connected with 8 Ten
85
00:24:54,970 --> 00:25:19,410
is connected with 8 plus 4 plus 240 and Ten
is connected with 2 then 11 is connected with
86
00:25:19,410 --> 00:25:43,880
10 and 11 is connected with nine eleven is
connected with 15 11 is connected with 15
87
00:25:43,880 --> 00:25:57,400
and 11 is connected with 3.
Then but then 12 started with 13 , 12 started
88
00:25:57,400 --> 00:27:02,660
with 13 and 12 is connected with 14 what is
12 12 is connected with 14 12 is connected
89
00:27:02,660 --> 00:27:24,350
with 8 yes the curve is connected with 4 okay
then 13 is connected with 12 yes then 13 is
90
00:27:24,350 --> 00:27:39,090
connected with 15 and then 13 is connected
with 9 yes 13 is connected with five next
91
00:27:39,090 --> 00:27:52,049
one is 14 is connected with 15 14 is connected
with 12 yes and then what it is connected
92
00:27:52,049 --> 00:28:09,309
with 8 plus 2 is 10 okay.
And 14 is connected with 6 okay then 15 is
93
00:28:09,309 --> 00:28:22,010
connected with 14 15 is connected with 13
15 is connected with 8 + 3 11 15 is connected
94
00:28:22,010 --> 00:28:29,310
with so this is the structure this is the
structure of hypercube we need in equals to
95
00:28:29,310 --> 00:28:36,539
16 if you observe that this is basically a
round shape this is basically a round shape
96
00:28:36,539 --> 00:28:50,770
and what good thing of this is then given
of size before K I can model them with additional
97
00:28:50,770 --> 00:29:04,120
links of size 2 to the power K plus 1.
Now one beauty is that say
98
00:29:04,120 --> 00:29:10,330
so you had you had this is initially you had
this high country initially you had a deep
99
00:29:10,330 --> 00:29:17,230
hypercube of size eight this is another hypercube
of side a and you observed that for every
100
00:29:17,230 --> 00:29:23,970
processor there is one additional link you
have established and can Marge and get the
101
00:29:23,970 --> 00:29:33,650
hypercube of size 16 now if I have another
16 process another 16 processor similarly
102
00:29:33,650 --> 00:29:42,320
I can add these two I pacify per cubes to
get a hypercube of size 32 and you will observe
103
00:29:42,320 --> 00:29:50,470
that we were ready one with one additional
link between the two processes okay.
104
00:29:50,470 --> 00:30:02,870
And these two processors are those one whose
five binary presently a mystery by of the
105
00:30:02,870 --> 00:30:11,860
two processors they are different okay so
here every processor is having log and connection
106
00:30:11,860 --> 00:30:27,790
for connection in the place of two dimensional
less connection West section every processor
107
00:30:27,790 --> 00:30:37,960
is having four connections at most and in
the case of Partridge suppose every processor
108
00:30:37,960 --> 00:30:50,900
as opposed to anything the next one is.
109
00:30:50,900 --> 00:31:02,650
This connective cycle it is the combination
of butterfly and hypercube you have K into
110
00:31:02,650 --> 00:31:27,980
2 to the 4 K process this process are divided
into K groups each group has 2 to the power
111
00:31:27,980 --> 00:31:48,010
K process this process are index 0 100 K minus
1 and groups are numbered as 1 2 K here IJ
112
00:31:48,010 --> 00:32:15,940
is a processor is a J processor in the I TH
group the process of different groups processors
113
00:32:15,940 --> 00:32:46,240
of different groups having same index
J forms a cycle
114
00:32:46,240 --> 00:33:23,860
that is P IJ is connected to I plus 1 more
I have P11 P21 P31 P41 PK1 so this forms a
115
00:33:23,860 --> 00:33:29,600
cycle.
Now in this equation observe there is a problem
116
00:33:29,600 --> 00:33:47,049
where it reaches K plus when it reaches K
plus 1 when I is T here - K is connected to
117
00:33:47,049 --> 00:34:03,860
all right key K mod K yeah so this becomes
zero actually P 0 is basically or PK , PK
118
00:34:03,860 --> 00:34:29,369
so what I have to write here is correctly
I plus 1 J if I is less then K, K minus 1
119
00:34:29,369 --> 00:34:44,570
if it is eyes listen equals to K minus 1 and
it is P1J he I use one is okay I is K then
120
00:34:44,570 --> 00:34:53,950
PKG,PKG is connected with P1J okay.
So that is the structure we have right these
121
00:34:53,950 --> 00:35:00,869
are important operations because I have started
I have made it the group 1 to K so I have
122
00:35:00,869 --> 00:35:14,680
write P IJ is connected with P I plus 1 J
and P 1 J if I equals to K so this forms a
123
00:35:14,680 --> 00:35:43,930
cycle and besides this cycle PIJ is connected
with PIJ is connected with P I M where M is
124
00:35:43,930 --> 00:36:01,500
obtained by inverting I ate MSB of J M, M
is obtained by inverting IE there is B of
125
00:36:01,500 --> 00:36:10,570
J so let us see when you have 24 processor
that is K equals to 3 K equals to 3 you have
126
00:36:10,570 --> 00:36:28,510
24 process and the number of groups Is 3.And
each group is having 8 each group is having
127
00:36:28,510 --> 00:36:34,960
eight each group is having eight processors.
128
00:36:34,960 --> 00:37:01,451
So initially that liquid right P 1 0, P 2
0 and you have speed 3 0 you have P 11, P
129
00:37:01,451 --> 00:38:15,400
2 1, P 3 1 you have here P12, P22, P32 you
have here P 13, P 23 P 33 P14, P24, P34 ,P
130
00:38:15,400 --> 00:38:24,322
15, p25,p35, p16, p26,p36,p17,p27,p37 so this
is obtained based on the initial connection
131
00:38:24,322 --> 00:38:30,589
that PID is connected with P I plus 1 J if
I is less than equal to K other as P 1 J if
132
00:38:30,589 --> 00:38:40,640
I = K and P 1 0 is connected with P 1 0 is
connected with P 1 4.
133
00:38:40,640 --> 00:38:58,500
Because this so first will P 2 0 is connected
with p20 is connected with that you will be
134
00:38:58,500 --> 00:39:13,740
changing the second one P 22 and P30 is connected
to y with P31 ok similarly P 11 is connected
135
00:39:13,740 --> 00:39:32,730
with P 5 1 5 P 2 1 is connected with P 2 3
and P 31 okay similarly p11 is connected with
136
00:39:32,730 --> 00:39:57,710
this now P 12 is connected with P16 P3 is
be connected with this now p13 is connected
137
00:39:57,710 --> 00:40:27,010
with p17 now p34 p24 what is p26.
Similarly to from p25 p34 will be connected
138
00:40:27,010 --> 00:40:47,600
with p5 p36 the connected with p37 okay so
you observe that every three processors if
139
00:40:47,600 --> 00:40:55,220
I consider it is a single node then it becomes
a hypercube if I consider this whole thing
140
00:40:55,220 --> 00:41:02,890
as a single node this is a single node it
becomes a hypercube and every processor is
141
00:41:02,890 --> 00:41:13,240
having three connections one is to resolve
we have to form the cycle and one is connected
142
00:41:13,240 --> 00:41:18,430
with other cycle okay.
143
00:41:18,430 --> 00:41:35,619
So the next model is going to linear array
here if you have n processors the p0 p1 PN
144
00:41:35,619 --> 00:41:52,420
-1 we are linearly connected it means P I
is connected with P I - 1 and P I + 1 provided
145
00:41:52,420 --> 00:41:57,440
they exist the structure looks like.
146
00:41:57,440 --> 00:42:12,320
You have p0 p1 p2 pn-1 these connections are
by direct.
147
00:42:12,320 --> 00:42:23,040
The next model E is pre modern here you have
2 to the power K -1 processors now this 2
148
00:42:23,040 --> 00:42:30,670
to the power K- 1 processor arranged in such
a way that it forms a four binary tree and
149
00:42:30,670 --> 00:42:36,290
this binary tree our index welcome the first
searching number.
150
00:42:36,290 --> 00:43:11,340
That is P0 is the root P 1, P 2, P 3, P 4,
P 5, P 6, p7, p8, p9, p10, p11, p 12, p13,
151
00:43:11,340 --> 00:43:13,490
p14 and so on.
152
00:43:13,490 --> 00:43:27,700
That is P I is connected with P 2 is connected
with P 2 I plus 1 and P 2 I plus 2 P and so
153
00:43:27,700 --> 00:43:41,150
on P1 is connected with P 2 I +1 and p 2 I
+ 2 and also it is connected with P I by 2
154
00:43:41,150 --> 00:43:53,680
which is apparently provided they exist now
height of this tree is log n log n that is
155
00:43:53,680 --> 00:43:59,230
height is scale it indicates the to transmit
the data from bottom to up you need order
156
00:43:59,230 --> 00:44:03,599
lock it order log n time.
157
00:44:03,599 --> 00:44:16,349
Now the next model is pyramid model there
are two types of pyramid one is one dimensional
158
00:44:16,349 --> 00:44:31,609
pyramid model it is the combination of linearly
and tree you have the usual tree you have
159
00:44:31,609 --> 00:44:32,880
the user tree.
160
00:44:32,880 --> 00:44:42,150
And these siblings are they form the linear
array they form the linear array and it gives
161
00:44:42,150 --> 00:44:45,670
you the one dimensional pyramid.
162
00:44:45,670 --> 00:45:05,900
Now next model is known as two dimensional
pyramid and it is the combination of mesh
163
00:45:05,900 --> 00:45:28,670
and tree and every processor P I is mainly
you remember that you have p 0 and it has
164
00:45:28,670 --> 00:45:35,750
four children now for this node you have again
so every node you observed that this is the
165
00:45:35,750 --> 00:46:22,400
match this is a match this is another measure
and so on and in this node of this match.
166
00:46:22,400 --> 00:46:29,079
Is connected with the four neighboring processor
of the base it is connected with this parent
167
00:46:29,079 --> 00:46:35,800
and also this connection with four children
so if mesh is having a good ammeter pyramid
168
00:46:35,800 --> 00:46:43,000
it is a combination of three and the tree
and the mesh and for each node at post nine
169
00:46:43,000 --> 00:46:49,520
connections for connection of the same level
when treating the best connectivity one is
170
00:46:49,520 --> 00:46:58,050
to its parent and four is to children so this
is these are the major or assembly models
171
00:46:58,050 --> 00:46:59,050
now.
172
00:46:59,050 --> 00:47:05,900
Let us come to this definition of EIMD multiple
in structural scream and multiple the testing
173
00:47:05,900 --> 00:47:13,040
here you have several independent machines
each machine will have the control unit is
174
00:47:13,040 --> 00:47:28,300
C 1, C 2 and C n is any control units and
you have this process of p 1 let us to control
175
00:47:28,300 --> 00:47:37,590
unit 1 processor PN to control unit N and
you have this process for connected either
176
00:47:37,590 --> 00:47:48,320
through shared memory or interconnection networks
or share memory okay.
177
00:47:48,320 --> 00:47:58,410
Know this process are capable enough to solve
a problem now here only issue is that while
178
00:47:58,410 --> 00:48:07,300
the they control it one broadcast one steamer
monitor instructions control tree to convert
179
00:48:07,300 --> 00:48:14,500
that the define set of instruction to be performed
and as a result the synchronization plays
180
00:48:14,500 --> 00:48:22,480
a major role so that delay becomes very less
and seeing this person is capable enough to
181
00:48:22,480 --> 00:48:30,530
solve the problems so each control unit broadcast
the set of restrictions as much as you can
182
00:48:30,530 --> 00:48:36,230
so that relations order the communication
between the two processes becomes minimum
183
00:48:36,230 --> 00:48:44,600
know now once you have the model based on
the interconnection networks.
184
00:48:44,600 --> 00:48:53,790
We tell it is a multi computer
and the algorithm designed for this is known
185
00:48:53,790 --> 00:49:06,600
as distributed algorithms and in this case
once you pass the message passing role is
186
00:49:06,600 --> 00:49:19,680
an important role because once you pass the
message you should see other process are active
187
00:49:19,680 --> 00:49:24,900
or it does not get disturbed so minimum number
of message to be distributed or positive among
188
00:49:24,900 --> 00:49:28,960
themselves then.
So in order to measure the complexity that
189
00:49:28,960 --> 00:49:35,200
you we measure based on the number of messages
transmitted between the two processors now
190
00:49:35,200 --> 00:49:40,069
once we have another type of thing that we
have to process and communicate among themselves
191
00:49:40,069 --> 00:49:50,339
through shared memory and in that case the
model name is multi processors and the algorithm
192
00:49:50,339 --> 00:50:05,900
is known as asynchronous parallel algorithm
we have the synchronous parallel algorithm.
193
00:50:05,900 --> 00:50:16,619
And here the time public of the a secret formula
to play the role to measure its complexity
194
00:50:16,619 --> 00:50:21,220
so these are the various types of various
models you have not thought of you for designing
195
00:50:21,220 --> 00:50:27,460
the parallel algorithms know how to measure
the complexity of algorithms in the case of
196
00:50:27,460 --> 00:50:31,710
sequential algorithms we measure the complexity
of the algorithm.
197
00:50:31,710 --> 00:50:41,050
Based on the two factors they are time complexity
and the space complexity and there exist a
198
00:50:41,050 --> 00:50:48,050
trade of relationship between these two if
you have to more space time possibly you to
199
00:50:48,050 --> 00:50:53,190
solve a problem or similarly if you have less
space time can grow so there exists a trade
200
00:50:53,190 --> 00:50:57,089
off relationship between these two part there
is a limitation for example to find the form
201
00:50:57,089 --> 00:51:05,190
of a N number whatever the space you have
but you have to give the N - 1 additions okay.
202
00:51:05,190 --> 00:51:12,410
Now in the case of parallel algorithms you
have the end of the factor the number of processors
203
00:51:12,410 --> 00:51:20,360
the number of processors you are using now
here you have the trade of relationship between
204
00:51:20,360 --> 00:51:25,220
the three factors time space and number of
processors it may so happen that if you have
205
00:51:25,220 --> 00:51:30,270
the more number of processor time may be less
or if you have less number of processor and
206
00:51:30,270 --> 00:51:35,340
you will find that time is taking on but there
is a limitation here again that whatever the
207
00:51:35,340 --> 00:51:49,010
case may be you have to pay additional cost
for using more number of processors now that
208
00:51:49,010 --> 00:51:55,260
you know to measure the converse measure the
game with refer of the parallelism.
209
00:51:55,260 --> 00:52:24,930
There is a time known as speedup ratio it
is defined as the time complexity of the best
210
00:52:24,930 --> 00:52:39,369
known sequential algorithm so you should be
worst case time complexity of the best known
211
00:52:39,369 --> 00:52:54,640
sequential algorithm / time complexity of
parallel algorithms
212
00:52:54,640 --> 00:53:17,849
so more the bellows speeder better easier
algorithm agreed but the speedup
213
00:53:17,849 --> 00:53:23,710
limitation that what we have to find the what
speed time and purposes of base consequences
214
00:53:23,710 --> 00:53:26,089
divided by time complexity of the parallel
algorithm.
215
00:53:26,089 --> 00:53:38,609
And I can write these form with < or = the
number of processors used let us consider
216
00:53:38,609 --> 00:53:46,150
the problem of finding the summation of numbers
on mesh connected computers and as you know
217
00:53:46,150 --> 00:53:47,150
that.
218
00:53:47,150 --> 00:54:05,230
That I have a mesh of say n
219
00:54:05,230 --> 00:54:18,000
cross n mesh
and suppose the number of elements I have
220
00:54:18,000 --> 00:54:28,500
N squared so n is N squared P is number of
processor is also n square so you observe
221
00:54:28,500 --> 00:54:34,030
that there are n square elements and n square
processors what I can do I can distribute
222
00:54:34,030 --> 00:54:43,530
this n square elements among these n square
processors each is having 1 element right
223
00:54:43,530 --> 00:54:54,569
now in order to find the sum of this n square
elements first I will use or I will assume
224
00:54:54,569 --> 00:55:04,660
that this is a linear array and I will add
them this can be done order n time.
225
00:55:04,660 --> 00:55:13,350
So you will observe there all the submission
of respective row they are available in the
226
00:55:13,350 --> 00:55:21,480
last columns so again I will add this assuming
that this is a linear array I will combine
227
00:55:21,480 --> 00:55:32,190
them so this can be done in order n time so
the total time becomes order n using n square
228
00:55:32,190 --> 00:55:37,130
processors to find the sum of n square elements.
229
00:55:37,130 --> 00:55:47,900
But if I estimate the cost of this method
it becomes as you know the cost is number
230
00:55:47,900 --> 00:55:55,390
of processors and the time required to find
the sum of N numbers so which is nothing but
231
00:55:55,390 --> 00:56:09,680
n into order n so it becomes order n cube
to find the sum of n square elements so it
232
00:56:09,680 --> 00:56:27,190
is not cost optimal because it is not cost
optimal because to find the sum of n square
233
00:56:27,190 --> 00:56:34,610
elements on using the sequential processes
order n square so can we obtain that cost
234
00:56:34,610 --> 00:56:42,050
optimal parallel algorithms to do that before
the doing that let us assume that.
235
00:56:42,050 --> 00:57:02,089
I have n into the power 2.5 number of elements
2.5 number of elements so what we assume or
236
00:57:02,089 --> 00:57:14,390
with what we do we distribute this n to the
part two to the N to the power two point five.
237
00:57:14,390 --> 00:57:20,110
Elements among the same square processors
such that every processor contains square
238
00:57:20,110 --> 00:57:27,890
root of n elements now if we observe that
a processor this having a square root of n
239
00:57:27,890 --> 00:57:33,490
elements so there are n square processor so
L square into n to the power half which is
240
00:57:33,490 --> 00:57:42,730
to the power 2.5 elements you have distributed
among the N square processors now I apply
241
00:57:42,730 --> 00:57:51,770
each processor to find the sum of each n square
elements each square root n elements right
242
00:57:51,770 --> 00:57:53,100
which can be done.
243
00:57:53,100 --> 00:58:07,040
In order n time sequentially have done it
now every processor will retain the sum of
244
00:58:07,040 --> 00:58:16,640
square root n elements which can be done in
order square root n time now this some elements
245
00:58:16,640 --> 00:58:29,589
some elements now I can sum in row wise in
order n time so you get again order n time
246
00:58:29,589 --> 00:58:44,440
to find the sum of some of in some of square
root n elements row wise and in the last column
247
00:58:44,440 --> 00:58:52,109
you will find that some of n square root elements
n square root.
248
00:58:52,109 --> 00:59:09,730
N elements in each row now this sum this sum
can be added in order n times to get the using
249
00:59:09,730 --> 00:59:19,520
the array personal koala berry processors
to get the sum of n to the power 2.5 elements
250
00:59:19,520 --> 00:59:30,790
so the complexity becomes order n time so
cost becomes number of processors N squared
251
00:59:30,790 --> 00:59:38,920
and time to solve this problem so n to the
power 3 now you observe that if I have n to
252
00:59:38,920 --> 00:59:44,510
the power 2 point 5 n to the power 2.5 number
of elements to find.
253
00:59:44,510 --> 00:59:54,950
The sum of this n to the power 2.5 elements
using N squared time it takes it takes order
254
00:59:54,950 --> 01:00:06,030
n time and cost is better than the sum of
better than that of sum of n square elements
255
01:00:06,030 --> 01:00:13,260
so can we do better than that can we get the
cost of the parallel algorithm using this
256
01:00:13,260 --> 01:00:15,140
idea.
257
01:00:15,140 --> 01:00:29,410
So to do that to do that let us assume there
are in queue elements now this NQ elements
258
01:00:29,410 --> 01:00:38,930
are distributed among these n square processors
so each processor contains order contains
259
01:00:38,930 --> 01:00:53,150
n elements so the problem can be redefined
as you have n square processors and in NQ
260
01:00:53,150 --> 01:01:01,050
elements this NQ elements are distributed
in such a way that every processor contains
261
01:01:01,050 --> 01:01:02,050
an element.
262
01:01:02,050 --> 01:01:12,750
Assigned to him which takes order n time and
then these sums to be added here one by one
263
01:01:12,750 --> 01:01:18,980
while you are adding this you can move this
data you can move this data as if they are
264
01:01:18,980 --> 01:01:25,869
linearly connected the next phase you add
this next phase you add this and so on.
265
01:01:25,869 --> 01:01:34,319
So after order n time you will be finding
the sum is lying in this column then you move
266
01:01:34,319 --> 01:01:39,950
this data here then this linearly you follow
it and you get another order n time.
267
01:01:39,950 --> 01:01:49,500
So in order to do that you basically you need
order n time to find the sum of n cube elements
268
01:01:49,500 --> 01:01:58,280
which is storage here okay so this is the
time complexity to find the sum of n cube
269
01:01:58,280 --> 01:02:06,660
elements using n square processor and the
cost to find the sum of n cube is becoming
270
01:02:06,660 --> 01:02:14,280
order n square processor e of use order n
so which is order n cubes which is the cost
271
01:02:14,280 --> 01:02:23,720
optimal which happens to the because take
to find the sum of n cube elements you need
272
01:02:23,720 --> 01:02:25,140
order and cube additions.
273
01:02:25,140 --> 01:02:35,280
Next model is hypercube that is the very simple
idea because we know that if you have you
274
01:02:35,280 --> 01:02:42,020
can you have high-paid cube of size or dimension
2 of hypercube or Q 2 to the power K process
275
01:02:42,020 --> 01:02:50,309
this can be thought as the combination of
two sub hypercube each of size 2 to the power
276
01:02:50,309 --> 01:03:06,599
K - 1 and for one PI it is connected with
PJ where that MSB of these two binary presentations
277
01:03:06,599 --> 01:03:16,059
will be differing right.
So what I can think about that that you can
278
01:03:16,059 --> 01:03:26,410
bring the content of PJ into P I and add it
similarly for all PI for all PI of this zone
279
01:03:26,410 --> 01:03:32,510
I get the data from the other is counterpart
and added and then what happens the dimension
280
01:03:32,510 --> 01:03:36,740
is reduced from 2 to the power K and 2 to
the power of K - .
281
01:03:36,740 --> 01:03:40,410
(Refer Slide Time: 1:03:40
And now you have a hyper cube of size 2 to
282
01:03:40,410 --> 01:03:48,930
the power K - 1 this you divide into the two
parts each of size 2 to the power k- 2 and
283
01:03:48,930 --> 01:03:57,670
again there exists the connection between
P I with PJ whose second MSB is different
284
01:03:57,670 --> 01:04:05,569
and again this data will move up and add it
so the size is reduced to hypercube 2 to the
285
01:04:05,569 --> 01:04:15,990
power of k-2 an soon after log after k you
will find the data is the very level in the
286
01:04:15,990 --> 01:04:18,970
first process p 0.
287
01:04:18,970 --> 01:04:27,160
So here you need order if n is equal to 2
to the power k then you need order log n time
288
01:04:27,160 --> 01:04:37,579
to find the sum okay but cost is becoming
order n log n because n processors log n time
289
01:04:37,579 --> 01:04:43,910
so cost is order n log n now in order to in
order to find out the cost of two algorithms
290
01:04:43,910 --> 01:04:46,730
again your idea is same you divide.
291
01:04:46,730 --> 01:04:58,200
Suppose you have n elements you would define
your hyper cube of size n by log n each processor
292
01:04:58,200 --> 01:05:07,040
what is initial log n element and sequentially
they find the sum of this Log n elements so
293
01:05:07,040 --> 01:05:19,430
which takes order log n time plus you need
to find the sounds which takes order log n
294
01:05:19,430 --> 01:05:33,170
by log n time okay so this takes what a log
n time and you have used the processor is
295
01:05:33,170 --> 01:05:42,290
n by log n so cost is order n.
Which is optimal to find the sum of n elements
296
01:05:42,290 --> 01:05:49,690
using n by log n times on hypercube today
we'll be finishing our lecture by considering
297
01:05:49,690 --> 01:05:59,070
another considering the sum of N numbers on
another model.
298
01:05:59,070 --> 01:06:15,780
Which was known as perfect shuffle computers
you remember that in the perfect shuffle computers
299
01:06:15,780 --> 01:06:24,930
it has a tree connections the three connections
and here we will be using then shuffle and
300
01:06:24,930 --> 01:06:31,690
exchange operations to perform this addition
of n numbers suppose we have 2 to the power
301
01:06:31,690 --> 01:06:46,869
n is equal to 2 to the power K any question
2 to the power K process here for I equals
302
01:06:46,869 --> 01:07:15,150
to 1 to K do for each process for each processor
alpha to in parallel D is equals to 2 to the
303
01:07:15,150 --> 01:07:44,559
power 4 D is equals to alpha minus 1 it is
equals to a shuffle of I BD is equals to a
304
01:07:44,559 --> 01:07:57,780
change of I sorry it should be d it is equal
to a shuffle of D the D is equals to exchange
305
01:07:57,780 --> 01:08:09,940
of D and AD is equals to AD plus BD.
306
01:08:09,940 --> 01:08:23,839
So basically if you have a J or J is equals
to D is equals to D is equals to zero one
307
01:08:23,839 --> 01:08:37,139
two three four five six seven and say the
number is three one through four seven five
308
01:08:37,139 --> 01:09:05,659
six eight then a shuffle D a exchange D and
this ad so a shuffle D is nothing but shuffle
309
01:09:05,659 --> 01:09:29,150
of zero is zero so it moves here couple of
one it moves to two well shuffle of two moves
310
01:09:29,150 --> 01:09:52,900
to 4 then couple of three moves to six shuffle
of four moves to one shuffle of five moves
311
01:09:52,900 --> 01:10:15,599
to three.
Shuffle of five moves to three couple of six
312
01:10:15,599 --> 01:10:41,709
and shuffle of eight is here couple of seven
is eight now a link is here you can see seven
313
01:10:41,709 --> 01:10:50,630
and seven three one five seven three one five
one two six two four eight , eight four now
314
01:10:50,630 --> 01:11:03,329
exchange is here you can get eight four so
if I add it you get ten six eight twelve next
315
01:11:03,329 --> 01:11:29,459
one is again you will do the shuffle of b
so it is the kit it will come here then it
316
01:11:29,459 --> 01:11:41,949
will come here six and this will come here.
317
01:11:41,949 --> 01:11:56,409
Eight is move on this will be here and this
will be here and this will be here now you
318
01:11:56,409 --> 01:12:00,349
perform the exchange of pressure in that this
become 18 this becomes 18 this becomes a 18
319
01:12:00,349 --> 01:12:04,689
this becomes 18 this becomes shuffle this
is an accident why you always come anyway
320
01:12:04,689 --> 01:12:12,199
then you get the shuffle of B again and then
exchange of B you will find and of course
321
01:12:12,199 --> 01:12:20,340
the DQ adds you will get 36 okay.
So this is the way you can do because you
322
01:12:20,340 --> 01:12:26,179
observe that it takes off or the K times to
find the sum of two to four k elements and
323
01:12:26,179 --> 01:12:32,699
this can be this can be also obtained as a
cost optimal parallel algorithm because at
324
01:12:32,699 --> 01:12:37,789
this moment it is not cost optimal it is order
n log n algorithms in order to get the cost
325
01:12:37,789 --> 01:12:49,050
of n algorithms you assume that you have n
by log n number of processors
326
01:12:49,050 --> 01:12:54,071
n by log n groups each group is having log
in cross along an element sequentially for
327
01:12:54,071 --> 01:13:05,349
each log is log in sum of login elements and
then you proceed it you can easily show takes
328
01:13:05,349 --> 01:13:13,590
order n cost to point the sum of N numbers
using the process of computers.
329
01:13:13,590 --> 01:13:27,110
So you can try a to find the sum of to find
the sum of and using CRDW model CREW model
330
01:13:27,110 --> 01:13:36,010
here the condition is little different that
instead of finding the AI sum of AI for all
331
01:13:36,010 --> 01:13:45,961
I want to find out that AI has to be replaced
by summation of what A k and K is 1 to I which
332
01:13:45,961 --> 01:13:53,570
is known as a humility some basically I want
to find out you have a want a 2 a 3 and so
333
01:13:53,570 --> 01:13:54,570
on.
334
01:13:54,570 --> 01:14:04,479
I want to replace a 1 by a 1 this is by a
1 + a 2 this is by a 1+ a 2 + a 3 and so on
335
01:14:04,479 --> 01:14:10,849
so please try at home if possible to obtain
the finding the sum of finally given the n
336
01:14:10,849 --> 01:14:15,210
elements you want to find out the sum of the
are finding the cumulative sum of these n
337
01:14:15,210 --> 01:14:28,429
numbers
okay thank you.