1
00:00:18,750 --> 00:00:25,750
In the last class we discuss about the model sub share memory and then we also discussed
2
00:00:31,689 --> 00:00:36,559
all the next connected computer and.
3
00:00:36,559 --> 00:00:43,559
Today the first model what we will be discussing
between butterfly model. It consist of k plus
4
00:00:51,600 --> 00:00:58,600
1 2 to the power k processors and this processor
divided into k plus 1 rows and each row contains
5
00:01:19,899 --> 00:01:26,899
2 to the power k processors. Now, the rows
of number 0, 1, 2, k and also the processor
6
00:01:38,450 --> 00:01:45,450
in the in any row index as 0, 1 2 to the power
k minus1.
7
00:01:52,250 --> 00:01:59,250
Now let us assume that Pij is the jth index
processors in ith row. Now this processor,
8
00:02:15,700 --> 00:02:22,700
Pi,j, has the 4 connection. 1 connection
is Pi-1,j another one is Pi plus1,j provided
it exists, Pi-1,m and Piplus1,l provided it
9
00:02:56,769 --> 00:03:03,769
exists, where, m and l are obtained by inverting
ith msb of j and inverting i-1th msb of j,
10
00:03:57,209 --> 00:04:04,209
respectively. So. Pi,j is connected with at
most 4 processors Pi-1,j, Piplus1,j, Pi-1,m
11
00:04:11,920 --> 00:04:18,920
and Piplus1,l provided it exists, where, m
and l are obtained by inverting ith msb of
12
00:04:19,500 --> 00:04:26,500
j and inverting i-1th msb of j, respectively
13
00:04:32,720 --> 00:04:39,720
Let us start with ith msb bits. We start with
that b1, b2 ….b k. These are the k bits
14
00:04:44,610 --> 00:04:51,610
you have because, it is a 2 to the power k
processor we have, so, k bits. This is a 0th
15
00:04:57,780 --> 00:05:04,780
msb, this is a first msb and so on.
16
00:05:04,949 --> 00:05:11,949
Now let us considered we have say, k equals
to 3, the number of processors is 32, number
17
00:05:20,300 --> 00:05:27,300
of rows is 4 and number of processors in each
row is 8. So, we will assume we have row 0,
18
00:05:54,889 --> 00:06:01,889
row 1, row 2 and then another row 3 and then,
0 1 2 3 4 5 6 7. So, Pi is connected. So,
19
00:06:24,130 --> 00:06:31,130
you have P0. This is 0. Row 0 index is 0 1
2 3. This side is 0 I and this side is your
20
00:06:42,840 --> 00:06:49,840
j. So, P0,0 is connect to it P1,0. So, this
is connecting to this, this is connecting
21
00:07:33,759 --> 00:07:40,759
to this. So, this links are established based
on these 2. Now think about a row 1, so, this
22
00:08:02,860 --> 00:08:09,860
one. P1,1. P1,0 is that. So, by inverting
this, we will be getting 1 because 1st this
23
00:08:25,379 --> 00:08:32,379
is the 0-th bit, this is the 1st bit, this
is the 2nd bit. So, here ith msb is replacing
24
00:08:35,560 --> 00:08:42,560
and P will be connected to P1,0 which is connected
to P0,2. First it is ith, ith i-1, so, this
25
00:09:01,980 --> 00:09:07,620
would be connected to P0,2.
26
00:09:07,620 --> 00:09:14,620
So, this is one thing you have to remember,
this would be first second msb bit k msb,
27
00:09:29,060 --> 00:09:36,060
is k. So, in that case P1,0 will be connected
to P0 and here the first bit will be converting
28
00:09:43,240 --> 00:09:47,149
1 0 0 0. So, P0,4.
29
00:09:47,149 --> 00:09:54,149
Ok then you
have P1,1 and it will be P0,5. Similarly,
you have P0,6 and P0,7. Now, what happens,
30
00:10:18,510 --> 00:10:25,510
P14 is P0000. So, P1,4 will be connected to
this. Similarly, this will be connected to
31
00:10:26,709 --> 00:10:33,709
this, this will be connected to this and this
will be connected to this, ok.
32
00:10:37,529 --> 00:10:44,529
So, this is based on this connection. Now,
Piplusl to give you the reverse connection.
33
00:10:51,820 --> 00:10:56,269
Basically, while I consider of this. So, this
will be give you that. So, this is a by directional
34
00:10:56,269 --> 00:11:03,269
thing which will show, and then you have suppose,
P21, P20, it becomes P1, P1 and 010, P1 to
35
00:11:17,269 --> 00:11:24,269
P20 is P12. Similarly, will getting this one,
this will get this one
and last one will get, that is why like this,
36
00:11:45,450 --> 00:11:46,570
ok.
37
00:11:46,570 --> 00:11:53,570
So, these two are basically to indicate the
bidirectional link. So, the structure of this
38
00:11:55,519 --> 00:12:02,209
butterfly, you observe this, this looks like
a butterfly wing, the h8 of this is the block
39
00:12:02,209 --> 00:12:09,209
is nplus1, and this side you have the 2 to
the power n processors, and here you observed
40
00:12:16,339 --> 00:12:22,519
that to transfer data from one corner to another
corner you will not take more than for a login
41
00:12:22,519 --> 00:12:29,519
time. And another thing is there this type
of network still is upgradeable.
42
00:12:32,000 --> 00:12:39,000
You can add another butterfly here and you
can flag into that. Only this condition is
43
00:12:41,560 --> 00:12:48,560
that if you build it in the form of kplus1
2 to the power k. So, suppose we given a kplus2
44
00:12:58,450 --> 00:13:05,450
2 to the power kplus1 number of processors,
and you first obtain kplus1 2 to the power
45
00:13:07,560 --> 00:13:14,560
k one cluster, the other cluster is kplus1
2 to the power k, and then at the top you
46
00:13:21,170 --> 00:13:28,170
put kplus1. So, you get basically, here kplus1
2 to the power kplus1, right. So, basically
47
00:13:33,829 --> 00:13:40,829
you need additional 2 to the power k, 2 to
the power kplus1, and note to be fixed for
48
00:13:42,730 --> 00:13:46,700
additional thing.
So, the thing is that converting for given
49
00:13:46,700 --> 00:13:53,440
to 2 by 2 butterfly networks and there is
1,2 break it the next grade upgradability
50
00:13:53,440 --> 00:14:00,440
thing we have to do little homework or book
reading to do that, ok.
51
00:14:08,579 --> 00:14:15,579
The type of problem which you can solve on
butterfly network is that, if you where to
52
00:14:15,660 --> 00:14:22,130
meet the question of pipe lining is coming
there you find that this network is very useful.
53
00:14:22,130 --> 00:14:29,130
Next model number discussing is hyper cube.
Here, you have 2 to the power k processors
54
00:14:43,310 --> 00:14:50,310
and each processors is connected with log
n, each processor to k of other processors.
55
00:15:20,279 --> 00:15:27,279
Now Pi is set to P0, P1, P2 the power k-1
are the indices of the processors and Pi is
56
00:15:31,709 --> 00:15:38,709
connected to Pj if j can be obtained by inverting
a bit of the binary representation of i. P
57
00:16:08,709 --> 00:16:15,709
i connecting to P j if j can obtained by inverting
n in by their k bits you have of the binary
58
00:16:18,820 --> 00:16:20,850
representation of i.
59
00:16:20,850 --> 00:16:27,850
Ok let us assume that suppose you have n equals
to 8, so, you have P0, P1, P7, so, P0 is connected
60
00:16:42,480 --> 00:16:49,480
with, so you have 0 1 2 3 4 5 6 7, that is
the binary representation of this the 000,
61
00:16:58,779 --> 00:17:05,779
001, 010, 011, 100, 101, 110, 111. Now P0
is connected with, by inverting is 001 010
62
00:17:21,620 --> 00:17:28,620
100 000 011 101 011 000 110 010 001 111 101
110 000 100 111 001 111 100 010 110 101 and
63
00:17:45,970 --> 00:17:47,080
011.
64
00:17:47,080 --> 00:17:54,080
So, P0 is connected to this 3 processors because
there are k connections. So, if I draw this
65
00:18:08,570 --> 00:18:15,570
P0 P1 P2 P3 P4 P5 P6 P7. P0 is connected with
P1 P2 P4. P1 is connected to with P1 P0 P3
66
00:18:47,270 --> 00:18:54,270
P5. P2 is connected with P3 P0 and P6. P3
is connected with P2 P1 and P7. P4 is connected
67
00:19:23,270 --> 00:19:30,270
with P5 P6 P0. P5 is connected with P4 P7
and P1. P6 connected with P7 P4 and P2. P7
68
00:19:53,549 --> 00:20:00,549
is connected with P6 P5 and P3 these are careful.
69
00:20:01,510 --> 00:20:08,510
So this is a structure your hyper cube with
8 processors. Now what happened when we have
70
00:20:18,580 --> 00:20:25,580
6 processors? In case of 6 processors, we
will be adding another zeros they are and
71
00:20:34,630 --> 00:20:41,630
here also there will be 0there also will be
another 0 and you have another 0, we have,
72
00:21:01,750 --> 00:21:08,750
1000 1001 1010 1011 1100 1101 1111. This is
called 8 bit processors. Similarly, you can
73
00:21:19,400 --> 00:21:21,059
have remaining 8 processors.
74
00:21:21,059 --> 00:21:28,059
So, here what happened that you are connected
already 8. Now it is P0 connected P8, so,
75
00:21:32,220 --> 00:21:39,220
I am drawing another one, P8. P1 is connected
with P9 and P2 is connected with P10. P3 is
76
00:22:14,320 --> 00:22:21,320
connected with P11. P4 is connected with P12.
P5 connected with P13. P6 is connected with
77
00:22:32,669 --> 00:22:39,669
P14 and P7 is connected with P15, ok.
78
00:22:47,549 --> 00:22:54,549
Now think about the 8 onwards. 8 is connected
with 0 that is done. 8 is connected with 9.
79
00:23:28,559 --> 00:23:35,559
8 is connected with 9 and 8 is connected with
10 and 8 is connected with 14. 8 is connected
80
00:24:02,460 --> 00:24:09,460
with 12. 8 is connected with 12, ok. Now think
about next one 9 connected with 8. 9 is connected
81
00:24:19,620 --> 00:24:26,620
with 11. 9 connected with 13 and 9 is connected
with 1. Next one is 10. 10 is connected with
82
00:24:36,220 --> 00:24:43,220
11 10 is connected with 11 and then 10 is
connected with P10 is connected with 8. 10
83
00:24:54,970 --> 00:25:01,970
is connected with 8 plus 4 plus 2 14 and 10
is connected with 2.
84
00:25:16,440 --> 00:25:23,440
Then 11 is connected with 10. 11 is connected
with 10 and 11 is connected with 9. 11 is
85
00:25:35,110 --> 00:25:42,110
connected with 15. 11 is connected with 15
and 11 is connected with 3, because then 12
86
00:25:50,279 --> 00:25:57,279
is connected with 13. 12 is connected with
13. 12 is connected with 13 and 12 is connected
87
00:26:11,830 --> 00:26:18,830
with 14. 12 is connected with 14. What is
12? 12 is connected with 14 and 12 is connected
88
00:27:08,299 --> 00:27:15,299
with 8 yes that 12 is connected with 4.
Ok then 13 is connected with 12. Yes, then
89
00:27:31,169 --> 00:27:38,169
13 is connected with 15. And then, 13 connected
with 9. 13 is connected with 5. Next one is
90
00:27:39,409 --> 00:27:46,409
14 is connected with 15. 14 is connected with
12 yes. And then 14 is connected with 8 plus
91
00:27:53,870 --> 00:28:00,870
2 is 10 and 14 is connected with 6.
Ok then 15 is connected with 14. 15 connected
92
00:28:12,890 --> 00:28:19,890
with 13. 15 is connected with 8 plus 3 11.
15 is connected with 7. So, this is a structure
93
00:28:25,360 --> 00:28:32,360
this is a structure of hyper cube when n equals
to16. If you observe that this is basically
94
00:28:35,970 --> 00:28:42,970
round and one good thing of this is that given
a hyper cube of size 2 to the power k I can
95
00:28:46,809 --> 00:28:53,809
marginally value we additional mix of size
to the power kplus1. The one beauty thing
96
00:28:57,159 --> 00:29:04,159
is that say, so, you heard you had this is
initially you had this circle cube initially
97
00:29:08,720 --> 00:29:15,659
you had a this circle cube of size 8, this
is another hyper cube of size 8 and you observed
98
00:29:15,659 --> 00:29:22,659
the 4 8 bit processors. There is one additional
link you have a established and can margin
99
00:29:23,090 --> 00:29:30,090
get the hyper cube of size 16.
Now if I have another 16 processors another
100
00:29:31,779 --> 00:29:38,779
similarly, I can add this 2 hyper sub hyper
cube to get a hyper cube of size 32 and you
101
00:29:41,419 --> 00:29:48,419
will observe the revaluating. Only one additional
link between the 2 processors and these 2
102
00:29:51,899 --> 00:29:58,899
processors are those one is a binary representation
msb First msb is of binary representation
103
00:30:02,020 --> 00:30:06,730
of the 2 processors they are equal.
104
00:30:06,730 --> 00:30:13,730
Ok so, you have every processor is having
log and connection if you have the n processors
105
00:30:14,029 --> 00:30:21,029
the case of butterfly every processors having
a at most 4 connection in the phase of 2 dimensional,
106
00:30:26,270 --> 00:30:33,270
net connection every processor is a having
4 connection at most and in the case of perfect
107
00:30:34,100 --> 00:30:41,100
shuffle, every processor has at most 3 connections.
The next 1 is cube connected cycle it is the
108
00:30:58,850 --> 00:31:05,850
combination of butterfly and hyper cube. You
have k in to 2 to the power k processors.
109
00:31:08,039 --> 00:31:15,039
This processors are divided into k groups
each group has 2 to the power k processors
110
00:31:32,440 --> 00:31:39,440
this processors are index 0 1 2 to the power
k minus 1 and groups of number dials 1 2 k
111
00:31:47,730 --> 00:31:54,730
P i j is a processors is the jth processors
in the ith group
112
00:32:13,240 --> 00:32:20,240
The processors of different groups, processors
of different group having same index
j forms cycle. That is the P i j connected
113
00:32:55,399 --> 00:33:02,399
to P iplus 1 mod k j. I have P 1, P 2 1, P
3 1, P 4 1 and P k 1. So, this forms size
114
00:33:24,350 --> 00:33:31,350
here now in this operations we observe there
is a problem where it reaches k plus when
115
00:33:32,179 --> 00:33:39,179
it reaches k plus 1.When i is P k minus 1
j is connected to i will write P k mod k j.
116
00:33:56,200 --> 00:34:03,200
So, this becomes 0 actually P 0 is basically
your P k. So, what I have to write here cost
117
00:34:07,220 --> 00:34:14,220
of module operations. We have to write it
in different way it connected to P iplus 1
118
00:34:18,740 --> 00:34:25,740
j if i is less than k minus 1 if it is i is
less than equal to k minus 1and it is P j,
119
00:34:37,940 --> 00:34:44,940
if i is 1 i is k.
It is i is k then P k j are P k j is connected.
120
00:34:47,339 --> 00:34:54,339
P 1 j so that is the structure we have to
write easier of module operation. Because
121
00:34:56,599 --> 00:35:03,470
I have started I made the group 1 to k so
I have to write P i j is connected with the
122
00:35:03,470 --> 00:35:10,470
P, i 1 plus j if i is less than equals to
k minus 1 and P 1 j if i equals to k
123
00:35:14,070 --> 00:35:21,070
So this forms of cycle and besides this cycle
Pij is connected with Pim; where m is obtained
124
00:35:36,520 --> 00:35:43,520
by inverting ith msb of j. m is
obtained by inverting ith msb of j. So, let
us see when you have 24 processors there and
125
00:36:06,520 --> 00:36:13,520
k equals to 3 k equals to 3 your 24 processors
and the number of groups is 3, and is group
126
00:36:28,220 --> 00:36:35,220
is away 8 is group is away 8 processors
127
00:36:38,020 --> 00:36:45,020
So, additionally write liquidate P 1 0, P
2 0 and you have P 3 0. You have P 11, P 2
128
00:37:00,890 --> 00:37:07,890
1, P 3 1, you have here P 1 2, P 22, P 3 2.
You have here P 1 3, P 23, P 3 3; P 1 4, P
129
00:37:42,680 --> 00:37:49,680
24, P 3 4; P 1 5; P 2 5, P 3 5; P 16; P 2
6, P 3 6. P 1 7, P 27, P 3 7.
130
00:38:21,359 --> 00:38:26,440
So this is our paint based on the initial
collection that P i j connection with P iplus
131
00:38:26,440 --> 00:38:33,440
1 j, if i less than equals to k. Otherwise
P 1 j if i equals to j and P 1 0 is connected
132
00:38:34,960 --> 00:38:41,960
with P 0. 0 is connected with P 1 4 because
this our first msb deep 2 0 is connected with
133
00:38:50,490 --> 00:38:57,490
P 2 0 is connected with there will be changing
the second one P 22 and P 3 0 is connected
134
00:39:04,300 --> 00:39:11,300
via with P 3 1.
Ok similarly, P 1 1 is connected with P 5
135
00:39:17,400 --> 00:39:24,400
1 5, P 2 0 is connected with P 23 and P 3
1 is connected with this now P 1 2 is connected
136
00:39:35,670 --> 00:39:42,670
with P 1 6, P 3 2 should be connected to this
now P 1 3 connected with P 1 7. Now P 3 4,
137
00:40:04,170 --> 00:40:11,170
P 24, P 24 4 is P 2 6. Similarly, it
will complete to 5 P 3 4 to be connected with
P 3 5, P 3 6 will be connected with P 3 7.
138
00:40:39,140 --> 00:40:46,140
Ok so, we observe that every P processors
if I considered this is a single mod then
139
00:40:50,540 --> 00:40:57,540
it becomes a hyper cube by consider this whole
thing as a single mod this is a single mod
140
00:40:58,960 --> 00:41:05,960
it become a hyper cube and every processors
is having 3 connections one is a 2 is a with
141
00:41:09,240 --> 00:41:16,240
2 form the cycle and 1 is connected with another
cycle so this all about the quick connected
142
00:41:21,000 --> 00:41:21,470
cycle.
143
00:41:21,470 --> 00:41:28,470
Now the next model is known as linear array.
Here if you have n processors the P 0P1, P
144
00:41:40,369 --> 00:41:47,369
n minus 1, they are linearly connected. It
means Pi is connected with Pi minus 1 and
145
00:41:49,780 --> 00:41:56,780
P plus 1 provided they exists. The structure
looks like here P 0 P 1 P 2 P n minus 1 and
146
00:42:08,180 --> 00:42:11,910
this are these connections by direct.
147
00:42:11,910 --> 00:42:18,910
So, next model is tree model here you have
2 to the power k minus 2. Now this 2 to the
148
00:42:23,910 --> 00:42:30,910
power k minus 1 processors are raised in such
a way that it forms full binary k and this
149
00:42:31,359 --> 00:42:38,359
binary k have index. Where from the bit first
searching number the d, is P 0, has roots
150
00:42:45,500 --> 00:42:52,500
P 1 P 2P 3 P 4 P 5 P 6 P 7P 8 P 9 P ten P
11 12 13 14 and so on. There is the Pi connected
151
00:43:18,599 --> 00:43:25,599
with P 2iplus 1 and P 2iplus 2and also it
is connected with Pi by 2, this is a parallel
152
00:43:41,099 --> 00:43:45,310
of this provided their exists.
153
00:43:45,310 --> 00:43:52,310
Now height of this tree log n, that is height
is k. It indicates the 2 times till the date
154
00:43:56,960 --> 00:44:03,960
of from bottom to up you need out of lock
it out of log in time. Now the next model
155
00:44:07,550 --> 00:44:13,400
is one dimensional pyramid model. There are
2 types of pyramid one is one dimensional
156
00:44:13,400 --> 00:44:20,400
pyramid model. It is the combination of linear
array and tree. You have the usual tree and
157
00:44:31,260 --> 00:44:38,260
these sibling are there from the
linear array and it gives you the one dimensional
pyramid.
158
00:44:45,670 --> 00:44:52,670
Now next model is known as 2 dimensional pyramid
and it is the combination of mesh and tree
159
00:45:09,430 --> 00:45:16,430
and every processor Pi is connect as a post
manly, remember that you have P0 and these
160
00:45:25,950 --> 00:45:32,950
are 4 children. Now for this node you have
again, so every node you observe that this
161
00:46:07,920 --> 00:46:14,920
is a mesh this is another mesh so on. And
this a node of this mesh is connected through
162
00:46:23,770 --> 00:46:30,770
the 4 delivering processors on the mesh. It
is connected to it is parallel and now it
163
00:46:32,880 --> 00:46:37,369
is connected through 4 children. So, a mesh
is having 2 dimensional pyramid. It is a combination
164
00:46:37,369 --> 00:46:44,369
of tree and the mesh and furnished node it
has a cost 9 connection, 4th connection of
165
00:46:45,740 --> 00:46:52,740
the same level then checking the mesh connectivity.
1 and 2 is parent and 4 is to children. So,
166
00:46:56,810 --> 00:46:58,569
this is these are the major a SIMD model.
167
00:46:58,569 --> 00:47:04,410
Now let us come to the definition of MIMD
multiple instructions stream and multiple
168
00:47:04,410 --> 00:47:11,410
data stream. Here you have several a independent
machine. Each machine will have the controlling
169
00:47:13,420 --> 00:47:20,420
units CU1 CU2 and CUn this n control unit
and you have this processors of P1 attest
170
00:47:27,589 --> 00:47:34,589
to controlling unit, one processors Pn is
attest to controlling unit n and you have
171
00:47:35,230 --> 00:47:42,230
this processors are connected either through
pure memory or interconnection network or
172
00:47:45,010 --> 00:47:52,010
short memory, ok.
Now these processors are capable enough to
173
00:47:53,920 --> 00:48:00,920
solve a problem. Now here only issue is that
why this is a controlling one but, this one
174
00:48:04,220 --> 00:48:08,369
streamer is a instructions control chip 2
can but, has the different set of a instructions
175
00:48:08,369 --> 00:48:15,369
to be performed and as a result the synchronization
plays the major role. So, the delay becomes
176
00:48:16,819 --> 00:48:23,819
very less and since these processors capable
enough to solve the problems, so each control
177
00:48:25,420 --> 00:48:32,369
unit broadcast us the set of instructions
as much as you can. So, that relation there
178
00:48:32,369 --> 00:48:37,210
is the communication between the 2 processors
becomes minimum.
179
00:48:37,210 --> 00:48:44,210
Now once you have the model based on the interconnection
networks, we tell it is multi computed
and they algorithm design for this is known
180
00:48:53,790 --> 00:49:00,790
as distributed algorithm, and in this case
mesh is passing plays an important role because
181
00:49:09,180 --> 00:49:16,180
once you pass the message we should see that
the other processors are active or it does
182
00:49:20,250 --> 00:49:26,260
not get disturb. So, minimum number of mesh
is to be distributed or passed amongst there.
183
00:49:26,260 --> 00:49:31,859
Then so you not mention the complexity that
you remix of based on the number of messages
184
00:49:31,859 --> 00:49:37,559
transmitted between the 2 processors. Now
once we have another type of thing that we
185
00:49:37,559 --> 00:49:42,900
have processor can communicate amongst them
to short memory and in that case the model
186
00:49:42,900 --> 00:49:49,900
name is multiprocessors and algorithm is known
as asynchronous parallel algorithm.
187
00:49:59,900 --> 00:50:06,900
We have the asynchronous parallel algorithm
and here that time complexity is of as basic
188
00:50:12,000 --> 00:50:18,520
parallel algorithm a role to measure its complexity.
So, these are the various types of various
189
00:50:18,520 --> 00:50:24,940
modules you have in part of you for designing
the parallel algorithm. Now, how to measure
190
00:50:24,940 --> 00:50:31,640
the complexity of algorithm? In the case of
sequential algorithm we measure the complexity
191
00:50:31,640 --> 00:50:38,640
of algorithm based on the two factors. They
are time complexity and the space complexity
192
00:50:40,450 --> 00:50:46,490
and there exist kind of relationship between
this two and if you have the most space time
193
00:50:46,490 --> 00:50:51,109
possibly you can take this to solve a problem
or similarly, if you have less space time
194
00:50:51,109 --> 00:50:55,900
can work. So, there exist pair of relationship
between the 2 parallel limitations for example,
195
00:50:55,900 --> 00:51:01,980
to find the sum of n number whatever the space
you have but, you have to give the n minus
196
00:51:01,980 --> 00:51:07,640
1 addition, ok.
Now in the case of parallel algorithms you
197
00:51:07,640 --> 00:51:14,640
have the other factor the
number of processors you are using. Now here
you have the trade of relationship between
198
00:51:20,970 --> 00:51:26,119
the 3 factors time, space and number of processors.
It will so happen that if we have some more
199
00:51:26,119 --> 00:51:31,130
number of processors time may be less or if
you have less number processor and you find
200
00:51:31,130 --> 00:51:38,130
the time is taking up but, there is a limitation
here again that whatever the case may be you
201
00:51:39,839 --> 00:51:44,710
have to pay additional cost for using more
number of processors.
202
00:51:44,710 --> 00:51:51,710
Now there is a here to measure the k 2 to
the power we have the parallel algorithm there
203
00:51:56,079 --> 00:52:03,079
is term known as speed of ratio. It is defined
as the time complexity of the best known sequential
204
00:52:24,750 --> 00:52:31,750
algorithm this is a it should be worst case
time complexity of best known sequential algorithm
205
00:52:40,440 --> 00:52:47,440
divided by time complexity of parallel algorithm
so, more the value of speed better is your
206
00:53:00,190 --> 00:53:07,190
algorithm again but, the speed of is there
any limitation will you explain that explain
207
00:53:18,339 --> 00:53:21,400
limitation
That what we have defined their worst time
208
00:53:21,400 --> 00:53:26,109
bases, of best known sequential algorithm
by time complexity of the parallel algorithm
209
00:53:26,109 --> 00:53:31,079
and I can write can I write that this one
less than equal to the number of processors
210
00:53:31,079 --> 00:53:38,000
used.
211
00:53:38,000 --> 00:53:44,290
Let us considered the problem of finding the
summation of numbers on mesh connected computers
212
00:53:44,290 --> 00:53:51,290
and as we know that I have a mesh of second
cross
213
00:54:05,180 --> 00:54:12,180
and cross n mesh and as suppose the number
of elements I have n square so, n is n square
214
00:54:22,150 --> 00:54:29,150
P is number of processors is also n square.
So, observe that n square elements and n square
215
00:54:31,569 --> 00:54:38,069
processors what I can do is I can distribute
the n square element amounts each n square
216
00:54:38,069 --> 00:54:42,460
processor each is having one element right.
217
00:54:42,460 --> 00:54:49,460
Now in order to find the sum of this n square
elements first I will use or I will assume
218
00:54:54,970 --> 00:55:01,970
that this is a linear array and I will add.
This can be done order n time. So, you will
219
00:55:04,930 --> 00:55:11,930
observe they all the summation of respective
row they are available in the last columns.
220
00:55:14,530 --> 00:55:21,530
So, again I will add this assuming the this
is a linear array I will combined them so
221
00:55:22,130 --> 00:55:25,770
this can be done in order n time.
222
00:55:25,770 --> 00:55:32,770
So the total time becomes order n using n
square processors to find the sum of n square
223
00:55:35,849 --> 00:55:42,849
elements. But, if I estimate the cost of this
method it becomes as you know the first is
224
00:55:47,740 --> 00:55:54,700
number of processors and the time required
to find the sum of an numbers so which is
225
00:55:54,700 --> 00:56:01,700
nothing but, any 2 order n so it becomes order
n cube. To find the sum of n square elements.
226
00:56:07,369 --> 00:56:14,369
So, it is not cost optimal because to find
thus some of n square elements on using the
227
00:56:29,640 --> 00:56:36,640
sequential processors is order n square. So
can you obtain the cost optimal parallel algorithms?
228
00:56:37,359 --> 00:56:44,359
To do that before doing that let us assume
that I have n to the power 2 point five number
229
00:56:54,990 --> 00:57:01,990
of elements 2 point 5e number of elements.
So, what would assume or we what we do we
230
00:57:03,549 --> 00:57:10,549
distribute these n to the power 2 to the n
to the power 2point five elements among this
231
00:57:15,569 --> 00:57:21,599
n square processors such that every processors
contains square root of n.
232
00:57:21,599 --> 00:57:28,599
Now if you observe that a processor is having
a square root of n elements so, there are
233
00:57:29,290 --> 00:57:35,309
n square processors n square into n to the
power half which is n to the power 2point
234
00:57:35,309 --> 00:57:42,309
five elements you have distributed among the
n square processors. Now I employ each processors
235
00:57:44,030 --> 00:57:51,030
to find the sum of yields n square elements
each square root n elements
236
00:57:51,349 --> 00:57:58,349
Write which can be done in order n time sequentially
which I have done it. Now every processors
237
00:58:03,710 --> 00:58:10,710
will retain a sum of square root n elements
order square root n time now these some elements.
238
00:58:17,619 --> 00:58:24,619
Now I can some row wise in order n time you
get again order n time to find some of some
239
00:58:33,230 --> 00:58:40,230
of n sum of sum of square root n elements
row wise and in the last column you will find
240
00:58:45,760 --> 00:58:52,760
that sum of n square root n elements n square
root n elements in each row
241
00:58:53,430 --> 00:59:00,430
Now this sum this sum can be added in order
n time to get the using the array processors
242
00:59:12,380 --> 00:59:19,380
collaborative processors to get the sum of
n to the power 2 point 5 elements. So, the
243
00:59:21,059 --> 00:59:28,059
complexity becomes order n time so cost becomes
number of processors n square and time to
244
00:59:31,930 --> 00:59:38,930
solve this problem so, n to the power 3. Now
you observe that if I have n to the power
245
00:59:39,180 --> 00:59:46,180
2 point 5 number of elements, to find the
sum of this n to the power 2 point five elements
246
00:59:48,440 --> 00:59:55,440
using n square time it takes order n time
and cost is greater than the sum of n square
247
01:00:05,480 --> 01:00:06,480
elements.
248
01:00:06,480 --> 01:00:13,480
Can you do better than that can you get cost
of parallel algorithm using this idea? So
249
01:00:15,309 --> 01:00:22,309
to do that let us assume there are n cube
elements. Now this n cube elements are distributed
250
01:00:30,160 --> 01:00:37,160
among this n square processors. So, each processor
contains order contains it n elements so the
251
01:00:43,829 --> 01:00:50,829
problem can we redefined as you have n square
processors and n cube elements this n cube
252
01:00:54,930 --> 01:01:01,930
elements distributed in such a way that every
processors contains n elements. Assign to
253
01:01:03,410 --> 01:01:10,410
it which takes order n time and then this
sums to be added here one by one while you
254
01:01:13,210 --> 01:01:20,210
are adding this you can move the data as if
they are linearly connected. The next page
255
01:01:21,640 --> 01:01:28,640
you add this, next page you add this ,and
so on so after order n time you will be finding
256
01:01:30,130 --> 01:01:37,130
the sum is lying in this column then you move
the data here then this linearly you follow
257
01:01:37,180 --> 01:01:39,950
it and you get another order n time.
258
01:01:39,950 --> 01:01:46,950
So, you have to do that you basically you
need order n time to find the sum of n cube
259
01:01:48,470 --> 01:01:55,470
elements which is stored here. So, this is
the time complexity to find the sum of n cube
260
01:01:58,280 --> 01:02:05,280
elements using n square processors and the
cost to find the sum of n cube is becoming
261
01:02:06,660 --> 01:02:13,660
order n square processors we have used order
n so, which is order n cube, which is the
262
01:02:14,010 --> 01:02:20,760
cost optimal which appoints to the sequential
algorithm, because it will takes to find the
263
01:02:20,760 --> 01:02:25,109
sum of n cube index you need order n cube
addition.
264
01:02:25,109 --> 01:02:32,109
Next model is hyper cube. That is the very
simple idea because we know that if you have
265
01:02:35,630 --> 01:02:42,630
hyper cube of size or dimension of hyper cube
of 2to the power k processors. This can be
266
01:02:44,579 --> 01:02:51,579
done as a combination of tools are hyper cube
each of size 2to the power k minus 1 and for
267
01:02:54,470 --> 01:03:01,470
one Pi is connected with Pj where that msb
of this 2binary representations would be deferring.
268
01:03:11,380 --> 01:03:18,380
So what I can think about that that you can
bring the content of Pj into Pi and add it.
269
01:03:22,250 --> 01:03:29,250
Similarly, for all P is of this mod i get
the data from the other is counterpart and
270
01:03:31,140 --> 01:03:36,730
add it then what happens the dimension is
reduced from 2 to the power k to 2 to the
271
01:03:36,730 --> 01:03:42,329
power k minus one and now you have hyper cube
of size 2 to the power k minus one. This is
272
01:03:42,329 --> 01:03:49,329
divided in to the 2parts each of size 2 to
the power k minus 2 and again there exists
273
01:03:52,720 --> 01:03:59,180
the connection between Pi with Pj. The 2nd
msb is different and again the data will move
274
01:03:59,180 --> 01:04:06,180
off and add it. So, the size is reduced to
the hyper cube 2 to the power k minus 2 and
275
01:04:06,510 --> 01:04:07,799
so on.
276
01:04:07,799 --> 01:04:14,799
After log after k intention you find the data
is a base level in the first processors P0.
277
01:04:18,579 --> 01:04:25,579
So, here you need order if n is equal to 2
to the power k then you need order log n time
278
01:04:27,670 --> 01:04:34,670
to find the sum. But, cost it becoming order
n log n because n processors log n time. So,
279
01:04:37,780 --> 01:04:43,789
first is order n log n. Now in order to find
the cost optimal algorithm again here idea
280
01:04:43,789 --> 01:04:50,789
is same. You divide suppose you have n elements
you define your hyper cube of size n by log
281
01:04:57,119 --> 01:05:04,119
n each processors but, initial log n element
and sequential with the find some of this
282
01:05:05,720 --> 01:05:12,720
log n elements so which take order log n time
plus you need find the sum which take order
283
01:05:18,420 --> 01:05:25,420
log n by log n time. So, this things what
are the log n time and, this is order log
284
01:05:25,730 --> 01:05:32,730
n, and you have used the processors is n by
log n so cost is order m which is first optimal
285
01:05:40,109 --> 01:05:45,130
to find the sum of n elements using n by log
n times on hyper cube.
286
01:05:45,130 --> 01:05:52,130
Today we will be finishing our lecture by
considering another considering the sum of
287
01:05:53,500 --> 01:06:00,500
n numbers on another model which is found
as perfect shuffle computer. Remember in the
288
01:06:11,589 --> 01:06:18,589
case of perfect shuffle computers it has a
3 connection for each processor it has 3 connection
289
01:06:20,880 --> 01:06:27,880
and add here will be using that shuffle and
exchange operation to perform this a addition
290
01:06:28,390 --> 01:06:35,390
of n numbers. Suppose we have
n is equal to 2to the power k processors.
Here for i equals to one to k so, for each
291
01:06:53,270 --> 01:07:00,270
processors alpha to it parallel d is equal
to 2to the power or this equals to alpha minus
292
01:07:22,960 --> 01:07:29,960
1 sh equals to a shuffle of I, ex equals exchange
of i sorry it should be d.
293
01:07:52,960 --> 01:07:59,960
A d is equal to shuffle of d is equal to the
exchange of d and ab equals to ad plus bd.
294
01:08:05,200 --> 01:08:12,200
So, basically if you have a j or j is equals
to or d equals to d is equals to 0 1 2 3 4
295
01:08:26,380 --> 01:08:33,380
5 6 7 and say the number is 3 1 2 4 7 5 6
8. Then a shuffle d, a exchange d and then
296
01:08:55,569 --> 01:09:02,569
a d, so, a shuffle d is nothing but, shuffle
of 0is 0. So, it moves here shuffle of 1 it
297
01:09:19,370 --> 01:09:26,370
moves 2, then shuffle of 2 moves to 4, then
shuffle of 3 moves to 6, shuffle of 4 moves
298
01:09:43,880 --> 01:09:50,880
to 1, shuffle of 5 moves to 3, shuffle of
5 moves to 3, shuffle of 6 and shuffle of
299
01:10:27,469 --> 01:10:34,469
8 is here shuffle of 7 is 8. Now exchange
it here you get 3 7 and 7 3 1 5 5 1 2 6 6
300
01:10:55,280 --> 01:10:59,850
2 4 8 8 4.
301
01:10:59,850 --> 01:11:06,850
So, if I add it you get 10 10 6 6 8 8 12 12.
Next one again you do the shuffle of d so
302
01:11:11,360 --> 01:11:18,360
it is becoming 10 it will come here, then
it will come here 6 and this will come here
303
01:11:32,000 --> 01:11:39,000
no 8 will move up this 8 will be here and
this will be here and this will be here. Now
304
01:11:47,989 --> 01:11:54,989
you perform the exchange of per shuffle in
this become 18 this becomes 18 this becomes
305
01:11:56,140 --> 01:12:03,140
8een this becomes 18. This is the next Sybil
for which the extent of remainder you get
306
01:12:05,540 --> 01:12:12,540
the shuffle of d again and the exchange of
d you will find it after you add, you will
307
01:12:13,150 --> 01:12:14,150
get 36.
308
01:12:14,150 --> 01:12:21,150
Ok so, this is the way we can do because you
observe that it takes order k times to find
309
01:12:23,420 --> 01:12:30,420
the sum of 2to the power k elements and this
can be also obtained as a cost optimal parallel
310
01:12:31,960 --> 01:12:36,730
algorithm. You have this format it is not
cost optimal. It is order n log n algorithms.
311
01:12:36,730 --> 01:12:42,570
You do not get the cost optimal algorithms,
you assume that you have n by log n number
312
01:12:42,570 --> 01:12:49,570
of processors and n elements. Initially, you
make it n by log n groups each group is having
313
01:12:51,230 --> 01:12:57,010
log n cost. log n elements sequentially equal
each processors obtain log n sum of log n
314
01:12:57,010 --> 01:13:04,010
elements and then you proceed it. You can
easily show that it take common there in time.
315
01:13:04,040 --> 01:13:11,040
by it dictates order n cost to find the sum
of n numbers using the perfect shuffle computers.
316
01:13:13,590 --> 01:13:20,590
So you can try there is a time to find the
sum of n. Usually c r e w model c r e w model
317
01:13:27,800 --> 01:13:34,800
here the condition is little different that
instead of finding the a i sum of a i for
318
01:13:35,900 --> 01:13:42,900
all i , I want to find out that d i has to
be replaced by summation over a k. k is 1
319
01:13:43,820 --> 01:13:50,820
to i which is known as a cumulative sum. Basically,
I want to find out you have a 1 a 2 a 3 and
320
01:13:53,570 --> 01:14:00,570
so on. I want to replace a1 this by a1 plus
a 2. This is by a1 plus a2 plus a3 and so
321
01:14:02,679 --> 01:14:09,679
on. So please try at home. It is possible
to obtain to finding the sum of file given
322
01:14:10,080 --> 01:14:14,570
the elements you want to find out the sum
of the cube or finding the cumulative sum
323
01:14:14,570 --> 01:14:21,570
of this n numbers.
Thank you.