1
00:00:18,560 --> 00:00:24,669
Let us can also be discussing about the refund
parades that can be used to solve problems
2
00:00:24,669 --> 00:00:31,669
on sequential machine. Today, we have we like
to initial would like to discuss about the
3
00:00:35,239 --> 00:00:42,239
parallel algorithms and parallel machines,
and then we have to find out or defined a
4
00:00:49,079 --> 00:00:56,079
deferent models available on the parallel
machines. And finally if time formats, then
5
00:00:58,760 --> 00:01:04,780
we will discuss about one example how to solve
that problem on this parallel machines. The
6
00:01:04,780 --> 00:01:11,700
problem will be considering for our study
is that for example, we will find the sum
7
00:01:11,700 --> 00:01:18,700
of n number or may finding the minimum or
maximum of n numbers.
8
00:01:20,229 --> 00:01:26,579
Now, what happened? Actually, the demand of
computing power is increasing day by day.
9
00:01:26,579 --> 00:01:33,579
We must agree on that; and the design a cording
exist either best to improve the spirit of
10
00:01:36,950 --> 00:01:43,950
the computer skill. If you observe last thirty
years that they are trying their best to increase
11
00:01:44,340 --> 00:01:51,340
the spirit of the computing power and every
file here is what you are observed the speed
12
00:01:56,119 --> 00:02:02,520
of the computing power, speed of the machine
increase by in force. but, then the limitation
13
00:02:02,520 --> 00:02:09,520
are limitations is coming out because of that
speed of the comp1nt now, 3 component that
14
00:02:11,140 --> 00:02:18,140
can be used that will be used for your comp1nt
which cannot be fas10 the speed of fly.
15
00:02:22,150 --> 00:02:29,150
So you may not be able to achieve beyond certain
speed. Now, what are the alternatives, because
16
00:02:29,909 --> 00:02:36,909
demand should be increasing day by day for
example, initially if you remember there is
17
00:02:39,040 --> 00:02:46,040
our minus production model was based on 13
parameters and in order to obtain this thirteen
18
00:02:48,480 --> 00:02:54,640
parameters need not obtain the best, whereas
during the 13 parameter and if I have to predict
19
00:02:54,640 --> 00:03:00,980
something mainly around 24 hours to solve
the problem and to predict. But by that time
20
00:03:00,980 --> 00:03:07,980
then to the monsoon cloud will be covered
or will be a crossing a designed area so,
21
00:03:11,319 --> 00:03:18,230
you may not be able to predict something.
Now, if I have to give you the better prediction
22
00:03:18,230 --> 00:03:25,230
model then, what we have to do? You have to
predict the monsoon condition based on the
23
00:03:27,769 --> 00:03:34,769
cloud available in part away from there may
be the wild my cloud is in therefore, or near
24
00:03:37,110 --> 00:03:44,110
ready a i must be payable to care where, when
the cloud engage to this to area I not to
25
00:03:44,540 --> 00:03:51,540
do that you need ph1 number of parameter.
But, increase on parameter that it increase
26
00:03:51,760 --> 00:03:58,560
that by many ph1, what we want now the new
like we increase the number of parameters
27
00:03:58,560 --> 00:04:04,170
for production another same patch the speech
should be all a operating speech should be
28
00:04:04,170 --> 00:04:10,980
such that you must be able to predict about
the monsoon day.
29
00:04:10,980 --> 00:04:17,980
So, this is the very difficult task so, the
demand is increasing or as a same time the
30
00:04:19,130 --> 00:04:26,130
cost of hardware is coming down. So, one solution
could be instead of using 1 machine why not
31
00:04:28,650 --> 00:04:35,650
several machine to solve in a problem because
the cost of the machine is decreasing at the
32
00:04:36,650 --> 00:04:41,550
same time, demand is increase demand of company
model is increasing. So, what will testing
33
00:04:41,550 --> 00:04:46,310
that we use the several machines to solve
a single problem.
34
00:04:46,310 --> 00:04:53,310
So, what you have to do that the problem free
if you have and, you have a n machines say
35
00:04:54,160 --> 00:05:01,160
problem p and the machines you have n machines
n. Then you divide this problem p into n sub
36
00:05:06,900 --> 00:05:13,900
problems, and they pro is, sub pro a P i is
sub problem is that is true in machine M i
37
00:05:15,639 --> 00:05:22,310
and, machine M i solve the sub problem P i
and similarly for all i. Similarly, P 1 is
38
00:05:22,310 --> 00:05:29,310
solve the problem sub problem p 1 and so on,
say let be S 1 be the solution of problem
39
00:05:29,590 --> 00:05:36,590
sub problem P 1 P 2 and S 2 is the solution
of sub problem P 2 and s i is the solution
40
00:05:36,699 --> 00:05:41,560
of sub problem P i.
Finally, these machines are used together
41
00:05:41,560 --> 00:05:48,560
to combine the result to find the solution
of p. So, that p is, that p is you want to
42
00:05:52,490 --> 00:05:59,490
use the several machines to solve the problems
and these is product deferent process, deferent
43
00:06:00,169 --> 00:06:07,169
machines, deferent prob machines takes a sub
problem and, it touch to solve the sub problems
44
00:06:08,979 --> 00:06:15,259
simultaneously, and then combined that there
are two kind of final solution and this idea
45
00:06:15,259 --> 00:06:18,240
will basically gives in the ideas of parallel
machines.
46
00:06:18,240 --> 00:06:25,240
Now, we are algorithm you desire for this
parallel machine is known as parallel algorithm.
47
00:06:25,639 --> 00:06:32,639
Now, what if I have pens then some will become
in inherently parallel, some sequential algorithm
48
00:06:34,460 --> 00:06:41,460
are inherently parallel and in that case problem
is not that difficult and you can easily divide
49
00:06:44,270 --> 00:06:51,270
this problems into sub problems and that is
causes the does solution of final solution
50
00:06:52,520 --> 00:06:54,849
of the sub problems are combines algorithm.
51
00:06:54,849 --> 00:07:01,849
Say for example, if I have to do the two vector
relationship, I have two vector and I want
52
00:07:07,919 --> 00:07:14,919
to add it, there are n machines what I can
do this vector is divided into, is divided
53
00:07:15,729 --> 00:07:22,729
into n equal parts. And these say it is v
1 and v 2 say this addition, this 2 vector
54
00:07:26,919 --> 00:07:33,919
addition is being done by p 1 processor or
p 1 or m 1 machine m 1 machine and this is
55
00:07:37,099 --> 00:07:44,099
d1 by m 2 machines and this is d1 by m n machine.
So, this is inherently that is inherent parallelism
56
00:07:49,340 --> 00:07:56,340
so, you can easily achieve or solution of
this problem but, in reality may not be the
57
00:07:57,740 --> 00:08:04,740
case so, live you may have to so, you cannot
use or may not be able to use the available
58
00:08:05,810 --> 00:08:12,110
sequential algorithm to solve the problem
on parallel machines.
59
00:08:12,110 --> 00:08:18,520
So, in order to do that you may have to read
this i the whole sequence the whole algorithm
60
00:08:18,520 --> 00:08:25,520
or read this i the algorithm for your parallel
machines that is the goal of or name of this
61
00:08:29,680 --> 00:08:36,159
course. How to design or designs the parallel
algorithms for deferent product I will call
62
00:08:36,159 --> 00:08:43,159
you though Flynn’s according to Flynn’s
that whole class of machines or computer can
63
00:09:00,470 --> 00:09:07,470
be divide into the 4 classes this classes
are based on stream of instructions or instructions
64
00:09:12,880 --> 00:09:19,880
stream
and another 1 is the data stream.
By we take this stream being the set of instructions
65
00:09:32,529 --> 00:09:39,529
to be perform by deferent machines and will
be better by data steam, the state of data’s
66
00:09:43,250 --> 00:09:45,860
could be use binding participation. So, the
Flynn there is classification tell the machine
67
00:09:45,860 --> 00:09:52,860
can be divided into 4 classes. 1 is known
as a single instruction stream
68
00:10:05,880 --> 00:10:12,880
and single data stream
next 1 is multiple instruction stream and
single data stream then, you have single instruction
69
00:10:46,970 --> 00:10:53,970
stream
70
00:10:59,740 --> 00:11:06,740
and multiple data stream. Finally, you have
multiple instruction stream and multiple data
71
00:11:30,250 --> 00:11:37,250
stream in short, we tell again s i s g m i
s d s i m d and m i m d.
72
00:11:48,779 --> 00:11:55,779
So, we in classifies are thus of the whole
class of properties use do 4 categories 1
73
00:11:59,389 --> 00:12:06,389
is the s i s d another 1 is the m i s d then
s i m d and finally, you have m i m d well
74
00:12:09,279 --> 00:12:11,170
it is.
75
00:12:11,170 --> 00:12:18,170
Discuss about the first s i s d which is the
pure available sequential machine basically
76
00:12:18,389 --> 00:12:25,389
and here what happen that you have you have
1 control unit you have 1 processor and you
77
00:12:43,760 --> 00:12:50,760
have memory so, the control unit broadcast
the processor to perform extreme of this is
78
00:12:56,740 --> 00:13:03,740
of perform. See more stream of instructions
the processor gets the data from the memory
79
00:13:05,290 --> 00:13:11,490
and performs the operational and to the results
back to the memory, this is the simple structure
80
00:13:11,490 --> 00:13:17,040
of a s i s d and use a sequential machine
walls on it.
81
00:13:17,040 --> 00:13:24,040
Now, let us discuss about m i s d well here
what happens you have several control units
82
00:13:33,480 --> 00:13:40,480
c 1 c 2 c 2 c n and, you have processor p
1 attach to control unit 1 control unit to
83
00:13:51,420 --> 00:13:58,420
this and then you have p 3 and you have p
n and you have memory so, the control unit
84
00:14:09,620 --> 00:14:16,620
c 1 broadcast 1 instructions to p 1 say a
this is the b structure this may be a this
85
00:14:22,410 --> 00:14:29,410
may be multiply and all the processor will
get the data from the same memory locations.
86
00:14:32,070 --> 00:14:39,070
In this that is that all the pro default all
the control units broadcast the or we send
87
00:14:40,660 --> 00:14:47,660
the instruction to process defined types of
operation but, the operation to be perform
88
00:14:48,470 --> 00:14:55,470
on the same data and in reality the application
of such type of model does not exist and as
89
00:14:58,850 --> 00:15:02,370
results this model tiles on the spot.
90
00:15:02,370 --> 00:15:09,370
Now, let us consider the third model which
is s i m d here, we have 1 control unit and
91
00:15:15,649 --> 00:15:22,649
you have processor p 0 p 1 p 2 p n that control
unit broadcast a single stream of instructions
92
00:15:36,889 --> 00:15:43,889
to all the process fix the data either from
the local memory or, from the common memory
93
00:15:52,190 --> 00:15:59,190
based on the deferent model based on the model
of the machines or it can get the data from
94
00:16:02,110 --> 00:16:09,110
any other processor, processors and perform
the operations perform the operation. So,
95
00:16:09,459 --> 00:16:16,459
here what happens the data may be on deferent
set like while p 0 is getting the data say
96
00:16:22,930 --> 00:16:29,380
from the location x p 1 may in the further
location y this may be on z this may be on
97
00:16:29,380 --> 00:16:34,790
a and they perform with the data perform the
operation and send very with the trying to
98
00:16:34,790 --> 00:16:41,790
the common memory or into the local memory.
So, control unit broadcast same set of instructions
99
00:16:41,910 --> 00:16:48,620
on different process, the process get there
at either from the local memory or from the
100
00:16:48,620 --> 00:16:55,180
common memory all based on the ne2rk model.
It is connected with the p i another mode
101
00:16:55,180 --> 00:17:02,180
another processor that processor perform the
operations and sends the data into the occupational
102
00:17:02,569 --> 00:17:09,569
area. So, this is your s i m d model we will
discuss s i m d later on. And then we have
103
00:17:11,319 --> 00:17:16,360
the multiple instruction stream and multiple
data stream here you have.
104
00:17:16,360 --> 00:17:23,360
Basically control unit 1 control unit 2 control
unit n you have the processor p 1 processor
105
00:17:25,560 --> 00:17:31,340
p 2 processor p n and, this processor either
connect to processor inter connection ne2rk
106
00:17:31,340 --> 00:17:38,340
or through a common memory then control unit
1 say the instruction to processor p 1 i took
107
00:17:44,310 --> 00:17:51,310
for whom sub 1 set of instructions and p 1
there is the data either from the local area
108
00:17:54,230 --> 00:17:59,210
or from the common memory or from the memory
processors.
109
00:17:59,210 --> 00:18:03,200
Similarly, you come to broadcast the another
set of instructions, then we observe the same
110
00:18:03,200 --> 00:18:10,190
set of instructions and it d 2 perform the
operation taking the data from the common
111
00:18:10,190 --> 00:18:17,190
memory or from the neighboring or from the
local memory and so on. So, that is the idea
112
00:18:17,310 --> 00:18:18,220
on m i s d.
113
00:18:18,220 --> 00:18:25,170
So, all we discussing with s i m d and m i
m d details and most of for algorithms always
114
00:18:25,170 --> 00:18:32,170
in this course will be considering that s
i m d for our study here you have what you
115
00:18:36,350 --> 00:18:43,350
have, you have 1 control unit you have n processors
and this processors are inter connected either
116
00:18:55,840 --> 00:19:02,840
through inter connection ne2rk, or the processors
can communicate among themselves to inter
117
00:19:13,640 --> 00:19:20,640
connection ne2rk or all processor once to
read the data through common memory.
118
00:19:25,450 --> 00:19:32,450
So, you have 1 control unit you have several
processor that control unit broadcast the
119
00:19:37,620 --> 00:19:44,620
instructions to deferent processors all the
processor, which are active this take the
120
00:19:49,380 --> 00:19:56,380
data from the common memory or they can get
it through the data from the neighboring processor
121
00:19:58,460 --> 00:20:04,730
through in the connection ne2rk performs the
operations and choose the reserving to the
122
00:20:04,730 --> 00:20:11,730
deserve area.
The machine or model, which is based on the
123
00:20:12,080 --> 00:20:19,080
common memory is known as shared memory model
and the machines were looking for all that
124
00:20:27,260 --> 00:20:33,710
is you have a controller unit along with a
n transfers and processors can communicate
125
00:20:33,710 --> 00:20:39,700
among themselves to connection ne2rk. So,
get there is inter connections base model
126
00:20:39,700 --> 00:20:46,700
now, based on the deferent tiles of inter
connection model and you get a department
127
00:20:49,670 --> 00:20:52,400
a types of machine parallel machine.
128
00:20:52,400 --> 00:20:59,400
Now, one could be model which is known as
p to p e model is you have a control unit
129
00:21:07,830 --> 00:21:14,830
you have p e 0 and memory attach to it your
p e 1 m e memory is local memory and you have
130
00:21:23,880 --> 00:21:30,880
p e n minus 1 m e n minus 1 and finally, we
have inter connection ne2rk.
131
00:21:37,140 --> 00:21:44,140
So, this is p to p model that contrarily you
gets send the inspection to all the processors
132
00:21:50,350 --> 00:21:57,350
all the equal processors gets the data either
from its own local memory or if you once the
133
00:21:57,870 --> 00:22:00,060
data if get sends the 2 inter connection ne2rk.
134
00:22:00,060 --> 00:22:07,060
Get data of the other processors and use that
for its work there may be another model, which
135
00:22:07,310 --> 00:22:14,310
is known as p e to m e model here like a is
that you have control units and here you have
136
00:22:24,880 --> 00:22:31,880
p e 0 p e 1 p e n minus 1 and here you have
inter connected ne 2 rk inter connected ne
137
00:22:37,310 --> 00:22:44,310
2 rk here you have m e 0 m e 1 m e n minus
1. So, p e 1 to get the data write down the
138
00:22:59,020 --> 00:23:04,300
instruction you can get the data from here
to inter stream minus to a and I am saying
139
00:23:04,300 --> 00:23:10,440
that p e n minus get data from this processor
so, this memory location one.
140
00:23:10,440 --> 00:23:15,660
So, these are the 3 ways you can think 1 is
bonus shared memory model 1 is p e to p e
141
00:23:15,660 --> 00:23:22,660
model another p e to m e model now let us
consider a first the shared memory model for
142
00:23:24,130 --> 00:23:31,130
detail in this session here you have n processors,
n processors are numbered as p 0 p 1 p 2 p
143
00:23:39,780 --> 00:23:46,780
n minus 1, contributory users this n process
and this processor are can be made active
144
00:23:48,310 --> 00:23:55,310
or in a keep by setting the mask all the accuprocess
are allow to perform the operations taking
145
00:23:58,900 --> 00:24:05,330
the data from the common memory and then there
are back to the common memory.
146
00:24:05,330 --> 00:24:12,330
Only here based on the defined structure we
can classified the shared memory into the
147
00:24:13,810 --> 00:24:20,810
4 groups 1 is known as concurrent rate which,
is the weakest model concurrent rate model
148
00:24:30,120 --> 00:24:37,120
next 1 is
exclusive
149
00:25:02,430 --> 00:25:09,430
read
concurrent write model and the exclusive read
exclusive write model.
150
00:25:31,930 --> 00:25:38,930
Now, what do you mean by concurrent rate model
concurrent by concurrent read do we that deferent
151
00:25:53,160 --> 00:26:00,160
processor or 2 or more processor to allowed
to read a particular memory location simultaneously
152
00:26:01,770 --> 00:26:08,770
then by exclusive read I mean the note to
processor are allowed to read and a particular
153
00:26:15,370 --> 00:26:21,670
memory location similarly, concurrent write
mode means that 2 or more processor are allowed
154
00:26:21,670 --> 00:26:28,670
to write the same allowed to write the data
into the same memory location by exclusive,
155
00:26:28,860 --> 00:26:35,860
write will be there no proof us that are allowed
to write simultaneously, at a particular memory
156
00:26:38,290 --> 00:26:45,290
location at any ex10t of time.
Now, in the concurrent rate concurrent write
157
00:26:46,690 --> 00:26:47,880
model what happens mere issuing there is more
than 2 processors or 2 or more processors
158
00:26:47,880 --> 00:26:54,880
are allowed to read simultaneously, from the
same memory location and also allowed to write
159
00:27:01,300 --> 00:27:05,520
into the same memory locations are simultaneous
any instant of time.
160
00:27:05,520 --> 00:27:12,120
Now, think about this concurrent rate and
you observe that more in reality deferent
161
00:27:12,120 --> 00:27:17,140
processor allowing you are allowing to write
the same memory, you does not have more linier
162
00:27:17,140 --> 00:27:24,140
and it is must that easy to do handling. However
we can take about this or similar where concurrently
163
00:27:26,900 --> 00:27:33,900
can be allowed with discuss this part simple.
So, next model is the concurrent read and
164
00:27:35,530 --> 00:27:42,530
exclusive write model that means the 2 or
more processor allowed to read the same memory
165
00:27:44,450 --> 00:27:50,040
locations simultaneously, processor allowed
to write on the same memory locations at any
166
00:27:50,040 --> 00:27:55,710
instant of time at a simultaneously. Now,
the this model does not have much meaning
167
00:27:55,710 --> 00:28:02,710
because this is exclusively you are not allowing
to read which is the most simpler than simple
168
00:28:02,740 --> 00:28:08,020
program than the concurrent.
So, this actually did not last for long longer
169
00:28:08,020 --> 00:28:14,580
duration and so, we not discuss this 1 and
finally, the note to processor allowed to
170
00:28:14,580 --> 00:28:21,580
read and simultaneously at a site I will know
to processor allowed to write simultaneously,
171
00:28:25,480 --> 00:28:32,480
attaining of the time and how to handle in
reality date that concurrent read part. Suppose
172
00:28:32,750 --> 00:28:39,750
there are m processor n processors are clear
m processor 1 to read the same movement of
173
00:28:41,430 --> 00:28:48,430
10sion simultaneously then this can be d1
through.
174
00:28:53,120 --> 00:29:00,120
Broadcasting that is suppose p 0 p 1 p 2 p
3 p 4 p 5 p 6 p 7 simultaneously, they want
175
00:29:12,490 --> 00:29:19,490
to read the location l what will do the p
0 read self and rise into the location said
176
00:29:23,980 --> 00:29:30,980
b 1 now, p 1 reads l p 2 read b 1 simultaneously
and rewrites into the location. Simultaneously,
177
00:29:38,780 --> 00:29:45,780
b 2 and b a. Now, p 3 read the data from l
p 4 read the data form b 1 p 4 p 5 reads from
178
00:29:56,150 --> 00:30:03,150
b 2 and b 6 reads from b 3 and, we can write
into b 4 b 5 b 6 and b 7 and so on.
179
00:30:12,650 --> 00:30:19,650
So, can you tell me how much time you need
to broadcast or to read the n processor from
180
00:30:22,360 --> 00:30:29,360
location n what is the time unit to read the
con10t of location l by n processor here,
181
00:30:36,520 --> 00:30:43,180
the time will be order first of first time
only 1 processor second time 2 processor will
182
00:30:43,180 --> 00:30:50,180
be reading the location third time 4 processor
forth time 8 processors and so on.
183
00:30:51,960 --> 00:30:58,960
So, basically you can find after log n interations
after all a log n interations that all the
184
00:31:00,890 --> 00:31:07,890
processors all the n processor will be able
to read the location l. So, e remain even
185
00:31:08,280 --> 00:31:15,280
you do not have the opportunity to a design
a model of concurrently or to got thus we
186
00:31:17,480 --> 00:31:24,480
can handle the situations.
Now, what about the how to handle this concurrent
187
00:31:25,560 --> 00:31:32,560
like part now, there are different models
like the power concurrent it is the advance
188
00:31:35,890 --> 00:31:40,180
possible think is the something n processors
was to write simultaneously at a particular
189
00:31:40,180 --> 00:31:45,230
location the 1 way put to that you put the
some.
190
00:31:45,230 --> 00:31:52,230
Results to belated into the notices that is
suppose, p 1 was to write x 1 p 2 1s to write
191
00:31:53,980 --> 00:32:00,980
x 2 and p n 1s to write x n 1 person you will
think is that you take the sum of x i and
192
00:32:03,130 --> 00:32:10,130
write into a location l. The another way could
be that the smallest index or the index or
193
00:32:13,470 --> 00:32:18,100
the processors which is smallest index to
you allow or randomly 1 of them would be allowed
194
00:32:18,100 --> 00:32:25,100
to write and then, or you can defined and
can we other way so, that you can handle the
195
00:32:26,760 --> 00:32:27,890
problem of concurrent.
196
00:32:27,890 --> 00:32:34,600
Now, let us discuss some of the models based
on the ne 2 rk connection inter connection
197
00:32:34,600 --> 00:32:41,600
based the first model is known as mash connected
computers. Suppose, you have intra sets p
198
00:33:15,820 --> 00:33:22,820
0 p 1 p n minus 1 this is the n plus sets
you have now, this process are arranged in
199
00:33:26,390 --> 00:33:33,390
the form of q dimension say p 1 2 m 1 1 2
n 2 n q this is the p q q dimension where,
200
00:33:46,200 --> 00:33:53,200
that n 1 into n 2 into n q is your n where
n 1 into n 2 into n q these or a and p 1 i
201
00:34:21,639 --> 00:34:28,639
2 i q is the processor is the processor at
the i 1th location i 1-h dimension i 2 at
202
00:34:37,080 --> 00:34:41,409
definition a i q-th position.
203
00:34:41,409 --> 00:34:48,409
Now, these processor this processor has at
most 2 q connections
that is p i i 1 i 2 i q is connected with
204
00:35:13,600 --> 00:35:20,600
is connected with p i 1 i 2 i j plus minus
1 i q for all j for all j j equals to 1 to
205
00:35:34,530 --> 00:35:39,040
q.
So p i 1 i 2 it is the index it is the index
206
00:35:39,040 --> 00:35:46,040
of the processor i 1 i 2 i q it is connected
with p i 1 p i 2 p i j plus minus 1 p pro
207
00:35:47,700 --> 00:35:54,700
provided
208
00:36:03,100 --> 00:36:08,790
the exits. So, in the case of 2 dimensional
let, us consider let us consider are 2 dimensional
209
00:36:08,790 --> 00:36:14,290
mash connected computer and also consider
that within the 16 processor and the processor
210
00:36:14,290 --> 00:36:21,290
arrange 4 cross 4.
211
00:36:34,790 --> 00:36:41,790
This is the 4 cross 4 processors and you have
say p 1 1 p 1 2 p 1 3 p 1 4 p 1 5 p 2 1 p
212
00:36:53,700 --> 00:37:00,700
2 2 p 2 3 p 2 4. So, these are the 16 crosses
now, we are to a tell it q 1 1 is connected
213
00:37:21,580 --> 00:37:28,580
with p 1 plus 1 plus 1 plus 1 and also, this
side this connection is by direction that
214
00:37:39,010 --> 00:37:46,010
is why it is taking plus minus sign so, this
is the 2 way mash connected computation.
215
00:37:48,830 --> 00:37:55,830
Now, here we observed the processors what
we are today there are p i j form in reality
216
00:37:56,510 --> 00:38:03,510
processors and numbers 0 1 to up to n minus
1. So, there is a need of introducing some
217
00:38:06,590 --> 00:38:13,590
indexes key so, that p r is can only map on
to p j k.
218
00:38:17,890 --> 00:38:24,890
So, there were the several scheme exist for
these type of indexing 1 is known as row major
219
00:38:38,360 --> 00:38:45,360
indexing scheme, say p r is connected to p
j k p r is connected p i is the processor
220
00:38:52,330 --> 00:38:59,330
with index i and p j k is the i the processor
with the j-th row at fall up on 8h column.
221
00:39:05,950 --> 00:39:12,950
Now indexes keep these since it is the row
major indexes scheme that number should like,
222
00:39:15,250 --> 00:39:22,250
that p 0 p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p
9 p 10 p 11 p 12 p 13 p 14 and p 15. So, we
223
00:39:51,810 --> 00:39:58,810
are form find out the relationship between
say p 3 2 p 3 2 should points to p 9 or p
224
00:40:01,150 --> 00:40:08,150
9 should occupy the position or the processor
at the third row and the second column.
225
00:40:08,250 --> 00:40:15,250
So, what is the relationship between i j and
k i is equals to is it that is when j is equals
226
00:40:45,330 --> 00:40:52,330
to 1 k is equals to 1 then it becomes 0 i
is equals to 0 and then j is equals to 1 k
227
00:40:54,000 --> 00:41:01,000
equals to 2 k equals to 1 when j equals to
2 k equals to 1 that is 4 and so on. So, this
228
00:41:03,560 --> 00:41:09,250
is the relationship between i and j and k.
229
00:41:09,250 --> 00:41:16,250
Similarly, we can have the number in scheme
as a column measure that index is scheme then
230
00:41:21,550 --> 00:41:28,550
is another indexing scheme, we can defined
which is known as snake like row major indexing
231
00:41:30,240 --> 00:41:37,240
scheme in the snake like row major indexing
scheme, thus scheme looks like this there
232
00:42:01,990 --> 00:42:08,990
is in the form of snake or rope.
In that case what should be the relationship
233
00:42:13,320 --> 00:42:20,320
between p i and p j k this is should writ10
p i and p j k now, you observe there when
234
00:42:28,190 --> 00:42:35,190
the row number is odd it is l l are here l
that is p i is equals to j minus 1 n plus
235
00:42:47,070 --> 00:42:54,070
k minus 1 if j is odd, i get now if j is even
then what j is connected j minus 1 into n
236
00:43:11,450 --> 00:43:18,450
plus n minus k. Let us see when 2 1 2 1 that
is 4 plus is 4 minus 3 4 plus 3 7 and then,
237
00:43:30,660 --> 00:43:37,660
where is 2 4 2 4 then becomes 0 and there
is 4 while j is equal when it is j minus 1
238
00:43:38,180 --> 00:43:45,180
and plus n minus k. So, this is row snake
like row indexing scheme and similarly, you
239
00:43:47,370 --> 00:43:48,910
can define snake like column.
240
00:43:48,910 --> 00:43:55,910
Major indexing scheme where scheme this look
like this, this is known as shuffled row major
241
00:44:00,860 --> 00:44:07,860
indexing scheme suppose the processor p i
occupies in that arrange the location p j
242
00:44:09,990 --> 00:44:16,990
k then, we p h o and the j h column of the
row major indexing scheme and the binary representation
243
00:44:34,110 --> 00:44:41,110
of i is b 1 b 2 b 3 b q then shuffled of i
is defined by b 1 b q by 2 plus 1 b 2 b q
244
00:45:02,500 --> 00:45:09,500
by 2 plus 2 b q by 2.
Then what we will do that if it is i dash
245
00:45:12,910 --> 00:45:19,910
then p i dash occupies the location of p j
k in the shuffled row major indexing scheme
246
00:45:25,220 --> 00:45:32,220
the idea is suppose p i p j k that means j
through and k-th column of the 2 dimensional
247
00:45:35,590 --> 00:45:42,270
array under the row major indexing scheme
and, i if i convert is in the binary representation
248
00:45:42,270 --> 00:45:49,270
of it is b 1 b 2 b 3 b q by q then, we defined
shuffled of i as b 1 b 2 by 2 plus 1 b 2 by
249
00:45:54,940 --> 00:46:01,940
2 plus 2 and b q by 2 then this is i does.
So, p i does should be occupies the will be
250
00:46:04,500 --> 00:46:09,990
occupying the position of j-th row and the
k-th column.
251
00:46:09,990 --> 00:46:16,990
Row by the indexes came p 0 p 1 p 2 p 3 p
4 p 5 p 6 p 7 p 8 p 9 p 10 p 11. Now, that
252
00:46:39,200 --> 00:46:46,200
let us consider i 0 0 0 0 and this is your
i what is your i dash, i dash is same thing
253
00:46:51,170 --> 00:46:58,170
0 0 0 0. You have 0 0 0 1 and this is this
will come so, it will give you 0 0 0 1 then
254
00:47:07,310 --> 00:47:14,310
you have 0 0 1 0 this will be 0 1 0 0 then
you have 0 0 1 1 this will be 0 1 0 1 then,
255
00:47:18,960 --> 00:47:25,960
next 1 is you have 0 1 0 0 this will be 0
0 1 0 you have 0 1 0 1 which is 0 0 1 1 then,
256
00:47:32,410 --> 00:47:39,410
you have 0 1 1 0 so, it is 0 1 1 0 you have
0 1 1 1 still it is 0 1 1 1 then, you have
257
00:47:51,510 --> 00:47:58,490
1 triple 0 and this is 1 triple 0 then, you
have 1 0 0 1 so, it is 1 0 0 1 then, you have
258
00:47:58,490 --> 00:48:05,490
1 0 1 0 so, it is 1 0 0 1 then you have 1
0 1 0 this is becomes 1 1 0 0 this is 1 0
259
00:48:12,900 --> 00:48:19,900
1 1 so, it becomes 1 1 0 1 then you have 1
1 0 0 this is becomes 1 0 1 0 then you 1 0
260
00:48:30,350 --> 00:48:37,350
1 0 1 this is 1 0 1 1 then you have 1 1 1
0. So, this is 1 1 1 0 then you have all 1s
261
00:48:44,130 --> 00:48:44,960
which is all 1s.
262
00:48:44,960 --> 00:48:51,960
So, in that case the shuffled row indexing
scheme becomes is p 0 p 1 then this is 2 2
263
00:48:52,590 --> 00:48:59,590
means match to 4. So, this is becomes 4 this
is becomes 5 this becomes 2 this becomes 3
264
00:49:04,930 --> 00:49:11,930
this becomes 6 this becomes 7 this becomes
8 this becomes 9 this is 10 this is 11 12
265
00:49:18,340 --> 00:49:25,340
13 14 15. So, basically you can think about
this way. so, in generalize form
266
00:49:33,380 --> 00:49:40,380
So, in generalize form, you can define suppose
you have this side 16 processors this side
267
00:49:42,460 --> 00:49:49,460
another 16 this side 16 for this 32 cross
32 and here this 16 cross are you can define
268
00:49:50,000 --> 00:49:57,000
like this and so on. So, here this is p 0
p 1 and p 15 here you will have p 16 p 17
269
00:50:09,780 --> 00:50:16,780
and here 31 and so on. You observe that for
that 2 d miss corrected computed you have
270
00:50:21,950 --> 00:50:24,550
at most 4 connections.
271
00:50:24,550 --> 00:50:31,550
Now, in the boundary since you may have 2
or 3 connections and there are for example,
272
00:50:37,369 --> 00:50:44,369
you have p 0 p 1 p 2 p 3 p 4 p 5 p 6 p 7 p
8 p 9 10 11 12 13 14 15. So, p 9 you observe
273
00:51:05,300 --> 00:51:10,580
direct this is a 4 connections but, in the
case of p 0 you will have the 2 connection
274
00:51:10,580 --> 00:51:16,140
while p 8 it is having 3 connections so on.
Now, there is another model which is known
275
00:51:16,140 --> 00:51:23,140
as wrap around wrap around mesh connected
computers and here it means that p 0 is connected
276
00:51:29,490 --> 00:51:36,490
with p 12 or p 12 13 connected with p 1 and
so, 1 similarly, with the case with these
277
00:51:39,170 --> 00:51:46,140
the p’s every processor will have in a 4
connection will have in the 4 connections.
278
00:51:46,140 --> 00:51:53,140
So, illiac 4 or illiac machines are of this
type as this model even that looks give me
279
00:52:03,770 --> 00:52:10,770
complex but, it has the homogeneity so, it
is it easy to understand the body implement
280
00:52:19,040 --> 00:52:23,520
the algorithms on this model rambler here
the purse cube ablation mesh connect computer
281
00:52:23,520 --> 00:52:25,720
unique the 2 q connections.
282
00:52:25,720 --> 00:52:32,720
Now the next model is known as perfect shuffle
computer let, us show there are n processor
283
00:52:48,380 --> 00:52:55,380
p 1 p 2 p 3 p n minus 1 these are the n processors
and let us show the binary representation
284
00:53:02,050 --> 00:53:09,050
of i. So, for simplicity literal show let
n is equal to 2 to the power q like the for
285
00:53:28,560 --> 00:53:35,220
q can you tell me what should be the number
of beads here is then be q beads and let,
286
00:53:35,220 --> 00:53:42,220
us then show i q minus 1 i q minus 2 this
is i 0 with the binary representation of i
287
00:53:43,430 --> 00:53:50,430
can be express in the form of y q minus 1
i q minus 2 and i 0.
288
00:53:50,720 --> 00:53:57,720
Then in the p i is connected with 3 processors
provided they exist p j k and p l when j is
289
00:54:07,030 --> 00:54:14,030
obtained by the operation known as x j. So,
exchange of i is nothing but, i q minus 1
290
00:54:23,950 --> 00:54:30,950
i q minus 2 i 1 i 0 s compliment the exchange
of i j is nothing but, the exchange of i we
291
00:54:39,760 --> 00:54:46,760
way exchange is defined by i q minus 1 i q
minus 2 i 1 i 0. Now, k is known as shuffle
292
00:54:49,210 --> 00:54:56,210
of i shuffle of i which is defined as i 0
i q minus 1 i q minus 2 i 1 and l is un shuffle
293
00:55:03,190 --> 00:55:10,190
of i which is defined as i q minus 2 i q minus
1 i 0 i q minus 1. So, basically f b processor
294
00:55:21,000 --> 00:55:28,000
is connected with f b processor is connected
with at most stream processor rambler in the
295
00:55:28,810 --> 00:55:35,810
case of 2 dimensional mesh every process connected
with at most more processor, we are that means
296
00:55:37,730 --> 00:55:44,730
need the less number of processors a less
number of connections between then a fall
297
00:55:46,150 --> 00:55:53,150
any processor. Now, in a do illustrate or
this example illustrate is let us consider
298
00:55:58,859 --> 00:56:02,070
n equals to 8 n equals to 8 consider.
299
00:56:02,070 --> 00:56:09,070
See how it looks like so you have p p 0 say
processor index index 0 1 2 3 4 5 6 7 these
300
00:56:21,590 --> 00:56:28,590
are 8 processor we have taken then, you have
your properties j which is exchange operation
301
00:56:33,920 --> 00:56:40,920
exchange of i you have k it is nothing but,
shuffle of i and you have l which is un shuffle
302
00:56:49,240 --> 00:56:56,240
of i.
So, exchange of i is 1 this is nothing but,
303
00:56:58,080 --> 00:57:05,080
triple 0 this is nothing but, 0 0 1 it is
0 1 0 0 1 1 0 0 1 0 1 1 1 0 and all 1s. So,
304
00:57:13,970 --> 00:57:20,970
exchange is nothing but, this is 0 this is
3 this is 2 this is 5 this is 4 this is 7
305
00:57:26,490 --> 00:57:33,490
and this is 6 shuffle of i this is 0 this
is 4 this is 1 this is 5 this is 2 this is
306
00:57:46,230 --> 00:57:53,230
6 this is 3 this is 7. Un-shuffle this is
0 this is 2 this is 4 this is 6 this is 1
307
00:58:17,510 --> 00:58:24,510
this is 3 this is 5 this is 7.
308
00:58:26,500 --> 00:58:33,500
So, i is connected to with exchange shuffle
and un shuffle now, if i have to draw it then
309
00:58:34,770 --> 00:58:41,770
let, us do it p 0 p 1 p 2 p 3 p 4 p 5 p 6
p 7 now p 0 is connected with p 1 and p 0
310
00:58:59,660 --> 00:59:06,660
p 1 is connected with p 0 and p 4 and p 2
p 2 is connected with p 3 p 1 and p 4 p 3
311
00:59:24,350 --> 00:59:31,350
is connected with p 2 p 5 and p 6 p 4 is connected
with p 5 p 2 p 1 p 5 is connected with p 4
312
00:59:48,380 --> 00:59:55,380
p 6 p 6 and p 3 p 6 is connected with p 7
p 3 and p 5 p 7 is connected with p 6 p 7
313
01:00:11,010 --> 01:00:18,010
and p 7.
So, this is the structure of perfect shuffle
314
01:00:18,500 --> 01:00:22,640
computed now here we observe that 1 thing
i want tell you the perfect shuffle computer
315
01:00:22,640 --> 01:00:29,640
not only it is the less number of connections
it is in based on the 2 important properties
316
01:00:31,609 --> 01:00:38,609
1 property is that is d is available data
d is available data t i and after.
317
01:00:42,040 --> 01:00:48,880
Q shuffles when n is equal to 2 to the power
q when n is equal to 2 to the power q after
318
01:00:48,880 --> 01:00:55,740
q shuffles n is the number of processors if
after q shuffles the data will compare to
319
01:00:55,740 --> 01:00:59,310
its original position. So, property 1 is that
if there are 2 to the power keep of 2 processors
320
01:00:59,310 --> 01:01:06,310
and that and q processor that they after q
shuffle the data of each processor is come
321
01:01:10,260 --> 01:01:12,540
back to its original position.
322
01:01:12,540 --> 01:01:18,220
This is because this is the q beads shuffle
it 2 times so, it go back to its original
323
01:01:18,220 --> 01:01:25,220
1 the second property is that suppose x data
x in p i and y is in p j data x is in p i
324
01:01:31,520 --> 01:01:38,520
and y data y is in p j the binary representation
of binary representation of i n binary representation
325
01:01:44,740 --> 01:01:51,740
of j that default that default only e n minus
k-th beads or q minus k-th beads q minus k-th
326
01:01:56,980 --> 01:02:03,980
beads, that is binary representation of i
and binary representation j they differ only
327
01:02:05,380 --> 01:02:12,380
in the q minus k-th beads then after k shuffles
the data will come back to the adjacent location.
328
01:02:15,599 --> 01:02:22,599
Because that say you have i q minus 1 i q
minus 2 and here you will get x not x say
329
01:02:27,990 --> 01:02:34,990
i and then here you have i 0 and this is same
here it is i’s compliments and here i 0
330
01:02:37,359 --> 01:02:43,700
after the shuffle this will come here and
this will come here. So, there will be adjacent
331
01:02:43,700 --> 01:02:50,490
location there will be in adjacent location.
So, this perfect shuffle computed most of
332
01:02:50,490 --> 01:02:57,490
the algorithm perfect shuffle computing depended
on this 2 property is 1 property is that if
333
01:02:58,210 --> 01:03:05,210
that x data x is in p i and y is in p j and
the binary representation of i and binary
334
01:03:05,820 --> 01:03:12,820
representation of j because on n minus k only
in 1 beads n minus or q minus k-th beads then
335
01:03:13,780 --> 01:03:20,690
after k shuffles the data will come back to
the adjacent location and another, 1 is that
336
01:03:20,690 --> 01:03:24,730
if you have prove the fault you have q processors
then after q shuffles or q answer for that
337
01:03:24,730 --> 01:03:31,730
i will come back to the original position.
Now, in the next class what we like to consider
338
01:03:41,770 --> 01:03:48,770
the model they are would like to consider
first 1 is that butterfly model second 1 is
339
01:03:49,970 --> 01:03:56,970
algorithm and third 1 is third 1 is on cube
connected circle and the next 1 is the 3 model.
340
01:04:02,609 --> 01:04:07,200
The one-dimensional pyramid and two-dimensional
pyramid model. So, this the model you have
341
01:04:07,200 --> 01:04:14,200
to be consider are models of s i m d machine,
we have already covered mesh connected perfect
342
01:04:29,750 --> 01:04:36,750
shuffle computed will be doing the butterfly
hyper cube and the cube connected cycle tree
343
01:04:52,240 --> 01:04:59,240
model leaner array one-dimensional pyramid
model and then you have 2 dimensional pyramid
344
01:05:28,030 --> 01:05:34,790
model. So, these are the models well known
models which we like to consider we have already
345
01:05:34,790 --> 01:05:41,790
discussed about the mesh connected and perfect
shuffle and this models will be discussing
346
01:05:44,030 --> 01:05:46,000
will be discussing tomorrow.