1
00:00:17,920 --> 00:00:23,950
Good afternoon, welcome back again, I we continue
to our without discussion of probability theory
2
00:00:23,950 --> 00:00:28,900
and this time, I am going to begin a topic
called Conditional Probability in a couple
3
00:00:28,900 --> 00:00:32,610
of minutes, we will be looking at examples,
which are directly taken from probability
4
00:00:32,610 --> 00:00:38,600
theory and these are all brand on what we
call conditional probability, a very very
5
00:00:38,600 --> 00:00:43,010
important concept in the theory of probability.
6
00:00:43,010 --> 00:00:49,320
If you look at my screen here, I have the
union defined as I have defined here, the
7
00:00:49,320 --> 00:00:55,720
union is basically probability of A union
B turns out to be any object or any outcome
8
00:00:55,720 --> 00:01:02,720
that falls either in A or in B is also in
A a and in this case union B and therefore,
9
00:01:03,170 --> 00:01:09,380
if you look at the colored dimensions here,
you look at the blue and the yellow; everything
10
00:01:09,380 --> 00:01:15,210
that is either in blue or in yellow or of
course in both. They all turn out to be having
11
00:01:15,210 --> 00:01:21,420
the property that belong to A union B, this
is like one and you already know about the
12
00:01:21,420 --> 00:01:25,310
intersection, which is the green area right
in the middle that is the intersection part.
13
00:01:25,310 --> 00:01:30,500
And as I told you the formula here is the
probability of A union B which is like finding
14
00:01:30,500 --> 00:01:36,799
an outcome either in A or in B that turns
out to be P A plus P B minus P intersection
15
00:01:36,799 --> 00:01:39,219
A B, this we have seen before.
16
00:01:39,219 --> 00:01:45,119
Let us move into an example now, that the
example goes as follows, again I have got
17
00:01:45,119 --> 00:01:50,570
kids writing tests, when I had given them
a test and I have counted how many got A and
18
00:01:50,570 --> 00:01:55,499
how many got less than A. And also I have
kept track of how many were male and how many
19
00:01:55,499 --> 00:01:59,320
were females, how many girls were there, how
many boys were there. The question that are
20
00:01:59,320 --> 00:02:04,780
being raised again is, what we had before
which is like, what is the probability that
21
00:02:04,780 --> 00:02:11,390
are randomly pick student is going to have
is going to be male or a randomly picked student
22
00:02:11,390 --> 00:02:18,390
is going to be scoring A, he would have scored
A, he or she would have scored A and what
23
00:02:18,810 --> 00:02:25,630
is the probability that the person I picked
is male, and he scored A, what is that probability
24
00:02:25,630 --> 00:02:31,519
this and immediately it turns as, this is
the matter of intersection.
25
00:02:31,519 --> 00:02:35,670
Now, so far there is not such of a problem
I can probably find that out, but in addition
26
00:02:35,670 --> 00:02:40,489
the probability, I am giving another question
which says, what is the probability that the
27
00:02:40,489 --> 00:02:47,489
person has scored A, if he is a boy, if he
is male, that means given M, given that the
28
00:02:48,230 --> 00:02:53,959
person is male, what is the probability, that
he would have scored A, this could require
29
00:02:53,959 --> 00:02:56,040
me to defines our additional theory.
30
00:02:56,040 --> 00:03:01,209
And let see, how you move into that? We define
what we call the conditional probability what
31
00:03:01,209 --> 00:03:08,209
we say is probability A given B, probability
of event A is happening given that, B has
32
00:03:10,299 --> 00:03:16,620
happened before it; this the probability of
A occurring given that B as occurred. And
33
00:03:16,620 --> 00:03:22,790
this is given by this formula here, probability
A given B and it is all is mark with the slash
34
00:03:22,790 --> 00:03:28,629
that will be there, you see the slash there,
it is equal to probability A intersection
35
00:03:28,629 --> 00:03:34,549
B divided by P B, let me give you a pictorial
idea, let me give you a little idea of how
36
00:03:34,549 --> 00:03:39,349
this is done, how did I did I end up with
this thing.
37
00:03:39,349 --> 00:03:46,349
I have what you see, I have the set A or the
event A, and I have what we call, the set
38
00:03:50,000 --> 00:03:56,290
B which is the other event, I am going to
call it give it a different colour, this is
39
00:03:56,290 --> 00:04:03,290
set B; A intersection B is this part, that
is going to be A intersection B is going to
40
00:04:03,420 --> 00:04:09,599
be this part A intersection B.
Now, if I have to figure out the probability
41
00:04:09,599 --> 00:04:16,599
of what I said conditional probability on
that I am going to write as probability A
42
00:04:17,920 --> 00:04:24,920
given B, let us look at the total possibilities
given B means, all of B would have occurred,
43
00:04:27,060 --> 00:04:34,060
so I divide that in the bottom, so I just
call it P B in the bottom, that is the denominator.
44
00:04:34,330 --> 00:04:41,330
In the numerator I only have to carry that
part which is common between A and B that
45
00:04:43,030 --> 00:04:50,030
is this part that is this part this is the
part, this is the part that would have occurred
46
00:04:50,810 --> 00:04:55,910
if B had occurred, now a is occurring and
also b is occurring this the intersection
47
00:04:55,910 --> 00:04:58,220
part.
So, what have I got here, what have I got
48
00:04:58,220 --> 00:05:05,220
to write here, I have to write here probability
of A given B is going to be probability A
49
00:05:06,810 --> 00:05:12,590
intersection B, which is this part, which
is this part really, this is the part that
50
00:05:12,590 --> 00:05:19,590
is A intersection B divide by this whole probability
place. Now, A ratio of this little thing here,
51
00:05:20,540 --> 00:05:25,020
divided by this whole thing there, that is
going to be my conditional probability; I
52
00:05:25,020 --> 00:05:29,950
am not interested in this part, because there
B is not occurring, I am interested in only
53
00:05:29,950 --> 00:05:34,910
those parts where B is occurring, B is occurring
all over this place and this is the fraction
54
00:05:34,910 --> 00:05:39,610
of times when A and B occur together.
So, the probability of A occurring, when B
55
00:05:39,610 --> 00:05:44,980
has occurred B has occurred all over the place
and the probability of A has occurred in this
56
00:05:44,980 --> 00:05:48,930
area, so it is going to be this ratio that
actually gives me the conditional probability
57
00:05:48,930 --> 00:05:55,040
of probability of A given B, does this; this
is a very very important formula, this will
58
00:05:55,040 --> 00:05:58,670
be using many many times when you get into
probability calculations, you will be doing
59
00:05:58,670 --> 00:05:59,770
this many many time.
60
00:05:59,770 --> 00:06:05,420
Let us see how we use them, what we have to
remember of course, it is again a little reminder
61
00:06:05,420 --> 00:06:12,420
of what we done before, we have what we call
independent events, an independent events
62
00:06:12,980 --> 00:06:17,880
are once that occur independent of regardless
support by they are occurred somewhere else,
63
00:06:17,880 --> 00:06:24,880
so for example, a Kharagpur I I T student
have his bicycle stolen and by chance on July
64
00:06:25,650 --> 00:06:30,900
15th it rained in Mumbai, these eventually
we have nothing to do with each other, they
65
00:06:30,900 --> 00:06:36,600
are independent of each other, that is like
an example of events being independent.
66
00:06:36,600 --> 00:06:42,020
And look at this side I have got you know
fire in the lab, in the chemistry lab there
67
00:06:42,020 --> 00:06:49,020
was a fire and you know Obama got elected
of a couple hundred days back Obama got elected
68
00:06:49,650 --> 00:06:54,630
in the US, probably these have nothing to
do each other they are probably independent
69
00:06:54,630 --> 00:06:59,770
of each other, and there was a car accident
at the gate of I I T, Kharagpur there was
70
00:06:59,770 --> 00:07:05,900
a car accident. Again this has nothing to
do with Obama, not did it have anything to
71
00:07:05,900 --> 00:07:10,400
with the lab action, and these are examples
of independent events.
72
00:07:10,400 --> 00:07:16,900
So, I see here in independent events thrown
together that is all, on this other side I
73
00:07:16,900 --> 00:07:23,840
have got mutually exclusive events what are
those, I am tossing a coin and I get with
74
00:07:23,840 --> 00:07:28,720
a certain probability I get heads, certain
number of times I get heads, and at other
75
00:07:28,720 --> 00:07:33,520
times I get tails and only because these two,
these are the only two outcomes can possible
76
00:07:33,520 --> 00:07:39,270
head and tail these some of the probability
of there being a head and there being a tail,
77
00:07:39,270 --> 00:07:44,300
it is going to be 1, probability of head plus
probability of tail is going to be 1.
78
00:07:44,300 --> 00:07:50,180
And this a really, if head occurs tail will
not occur, if tail occurs head will not occur
79
00:07:50,180 --> 00:07:55,390
therefore, these are mutually exclusive events,
I hope you are cleared now, about independent
80
00:07:55,390 --> 00:08:02,390
events which occur individually regardless
support by they are happening somewhere else,
81
00:08:03,220 --> 00:08:09,920
and mutually exclusive events are such, if
one of them occurs, if one of them occurs
82
00:08:09,920 --> 00:08:15,050
the other will not the once that are mutually
exclusive, they pre include the occurrence
83
00:08:15,050 --> 00:08:19,450
of the other one, we have to remember this
when we are combining events, we are combining
84
00:08:19,450 --> 00:08:20,830
probabilities.
85
00:08:20,830 --> 00:08:27,830
So, events A and B are independent, if and
only if, I do this conditional probability
86
00:08:28,270 --> 00:08:34,120
and I find it is unchanged from the probability
of A; so whether B had occurred and B had
87
00:08:34,120 --> 00:08:39,750
not occurred, it would not really matter for
the probability of occurrence of A. If this
88
00:08:39,750 --> 00:08:45,529
is the situation I say A and B are independent
and of course, this will also imply the probability
89
00:08:45,529 --> 00:08:52,529
B given A would be equal to P B, probability
B given A but, b equal to P B. And now, we
90
00:08:52,990 --> 00:08:58,420
can go back to the two way table that we did,
and would like to take a look at what exactly
91
00:08:58,420 --> 00:08:59,019
is the situation.
92
00:08:59,019 --> 00:09:06,019
So, I bring up the two way table, and I have
some numbers there and what I have done is,
93
00:09:06,370 --> 00:09:09,470
all the creates independent and before that
I would like to do a calculation, I would
94
00:09:09,470 --> 00:09:16,379
like to go back to that thing there, and I
have this problem set which is given to me,
95
00:09:16,379 --> 00:09:22,040
and the question that are being asked in this
case, what is the probability that the student
96
00:09:22,040 --> 00:09:29,040
is male, what is the probability that he or
she scored A? This we randomly pick student,
97
00:09:30,250 --> 00:09:36,040
what is the probability that the randomly
pick student scored A and this is a boy and
98
00:09:36,040 --> 00:09:40,269
what is the probability that if he was a boy
he scored A, this is the conditional probability,
99
00:09:40,269 --> 00:09:47,269
probability of A given B, B has occurred now
want to see B is given to me and I want to
100
00:09:47,879 --> 00:09:52,310
see the probability of A being there.
Let us see, how we calculate these things
101
00:09:52,310 --> 00:09:57,540
I have the solution here on the on this little
piece of paper here.
102
00:09:57,540 --> 00:10:02,689
Let us start by looking at this two way table,
we start by taking a look at this two way
103
00:10:02,689 --> 00:10:09,470
table, what the two way table does it allows
me from counting it allows me to calculate
104
00:10:09,470 --> 00:10:14,230
the different probabilities in a very simple
way, which is just by counting, so here I
105
00:10:14,230 --> 00:10:19,949
am using counting, I am doing really object
evaluation, I am not going by opinion, I am
106
00:10:19,949 --> 00:10:24,540
going here by hard data.
What is the probability of the randomly picked
107
00:10:24,540 --> 00:10:29,449
student is a male, just take a look at that,
how would will I get 0.375, the chance of
108
00:10:29,449 --> 00:10:35,800
he is being male is those who got A they were
30 and those who did not get A they were 45
109
00:10:35,800 --> 00:10:42,720
this is the total is 75 I divide 75 by 200,
and I use by machine and it turns out that
110
00:10:42,720 --> 00:10:47,670
the machine gives me a number 0.375 that is
the probability of I have randomly picked
111
00:10:47,670 --> 00:10:52,579
student being male.
What is probability that the randomly picked
112
00:10:52,579 --> 00:10:57,480
student has scored A, A could have occurred
two ways either he was male or she was female
113
00:10:57,480 --> 00:11:03,420
or she was other student was female therefore,
90 is the probability of any student of any
114
00:11:03,420 --> 00:11:10,420
sex scoring A; therefore, 90 divided by 200
that is the probability, that the randomly
115
00:11:11,949 --> 00:11:17,540
pick student is going to be scoring would
have received A. Let us try to do that, so
116
00:11:17,540 --> 00:11:24,540
what I do is I take 90 and I divide that by
200 and I look at the number that turns out
117
00:11:26,319 --> 00:11:31,689
to be 0.45, and in this 0.45 turns out to
be answer there, see the 0.45 written there.
118
00:11:31,689 --> 00:11:38,689
Now, let us took a look at this third possibility
probability of A and M, what is A and M, A
119
00:11:39,029 --> 00:11:46,029
is the scored A, the person scored A and the
person was male; that is going to be now 30
120
00:11:47,329 --> 00:11:53,329
divided by 200, again by counting and 30 divided
by 200 it turns out I do not need the machine
121
00:11:53,329 --> 00:11:58,439
now; it turns out to be 0.15, that is the
probability that the randomly picked student
122
00:11:58,439 --> 00:12:05,060
was male and also had scored A.
Now, this condition probability of A given
123
00:12:05,060 --> 00:12:12,060
M, he scored A given that he was a boy, how
do I do that, I bring up the probability calculations
124
00:12:12,660 --> 00:12:17,589
and I go go to my conditional probability
formula and the conditional probability formula
125
00:12:17,589 --> 00:12:23,319
is given as this A intersection B divided
B. And let us take a look at, just look at
126
00:12:23,319 --> 00:12:27,309
this, these are all joint probability, these
are all going to be joint occurrence as of
127
00:12:27,309 --> 00:12:33,920
things therefore, if I have to find the joint
occurrence A and M, A and M is this cell,
128
00:12:33,920 --> 00:12:40,610
A and M is this cell, this is the cell that
is gives me the count of scoring A and also
129
00:12:40,610 --> 00:12:44,449
being male.
So, it turns out 30 divided by 200 that is
130
00:12:44,449 --> 00:12:51,449
the probability of A and M, A and M that is
0.15 and then the probability of the person
131
00:12:53,970 --> 00:12:59,870
was, because when I am using my my formula,
when I am using my conditional probability
132
00:12:59,870 --> 00:13:04,439
formula, I have probability A given B that
is equal to probability A B divided by probability
133
00:13:04,439 --> 00:13:10,839
B, which turns out to be probability A intersection
M divided by P M. P M in this case, I have
134
00:13:10,839 --> 00:13:17,819
already calculated it is 0.375, so 0.15 divided
by 0.375 this a very complicated question,
135
00:13:17,819 --> 00:13:24,819
let us try to work it out, 0.15 divided by
0.375 equal to 0.4, so that is the answer,
136
00:13:29,079 --> 00:13:33,240
that is the answer right there. So, I have
solved this problem now, and I have also solved
137
00:13:33,240 --> 00:13:39,009
a question that involved conditional probability.
Let us go now, to the next question, what
138
00:13:39,009 --> 00:13:46,009
is the next question, let us try to get the
question now, the question is this are genders
139
00:13:48,769 --> 00:13:55,769
and grades independent, what is the test for
independence remember now recall test of independence
140
00:13:56,519 --> 00:14:03,519
is P A given B is the same as P A or P B given
A is independent of A, which is it is equal
141
00:14:09,160 --> 00:14:14,629
to P B. So, lets we pick up all those things
out, what I am going to do is, I am going
142
00:14:14,629 --> 00:14:21,629
to work out those probabilities and if that
is so it also turns out there is a simple
143
00:14:22,189 --> 00:14:27,550
test for probability for the independence
of probability; then and that I am going to
144
00:14:27,550 --> 00:14:34,550
show you, just by flipping around a little
bit
and I am going to bring that sheet there,
145
00:14:36,279 --> 00:14:40,019
the sheet is here now.
146
00:14:40,019 --> 00:14:47,019
Are genders and grades independent, and it
turns out then remember now the intersection,
147
00:14:51,009 --> 00:14:56,569
remember the formula I had let me remind you
of that formula, let me remind you of that
148
00:14:56,569 --> 00:15:03,569
formula here, remember the conditional probability
formula which is this, and this formula I
149
00:15:05,319 --> 00:15:11,559
could also rewrite, let me rewrite that for
you on a sheet of paper, there nothing it
150
00:15:11,559 --> 00:15:15,790
will become a little more clear.
151
00:15:15,790 --> 00:15:22,790
The conditional probability formula says P
A given B is equal to P A intersection B divided
152
00:15:26,269 --> 00:15:33,269
by P B, this I can also write as P A intersection
A given B multiplied by P B is equal to P
153
00:15:39,170 --> 00:15:46,170
A intersection B, which I can also write loosely
as P A B, no problem so far. Now, is A and
154
00:15:50,569 --> 00:15:57,569
B are independent then this guy turns out
to be P A and multiplied by P B this is equal
155
00:15:59,569 --> 00:16:06,569
to P A B, this is the result, this result
I want to utilize.
156
00:16:06,589 --> 00:16:13,139
I look at P A b is that equal to P A times
P B are this should be true for all outcomes
157
00:16:13,139 --> 00:16:17,470
by the way, you cannot just check this in
a special case and say that I have got A and
158
00:16:17,470 --> 00:16:21,459
B declared to be independent, they are ready
to be declared independent, you cannot do
159
00:16:21,459 --> 00:16:26,439
that. You have to check out all the outputs,
all all the outcomes and from that you should
160
00:16:26,439 --> 00:16:32,279
make a general statement are very fine that
I had in no situation this condition fall
161
00:16:32,279 --> 00:16:36,529
at it then of course, A and B are independent.
Let us see how you utilize that, we go back
162
00:16:36,529 --> 00:16:43,529
to the problem are grade are grades are grades
independent of gender, and I have the solution
163
00:16:51,050 --> 00:16:58,050
here, what what do I have to check now, I
only have to check this condition P A M, A
164
00:16:58,220 --> 00:17:05,220
is scoring A, M is male, so I have got sex
here, I have got score here, is that equal
165
00:17:07,860 --> 00:17:13,339
to this this is what I would like to check.
So, really this is being forced as a question,
166
00:17:13,339 --> 00:17:20,339
this being forced as a question I have already
calculated P A and M in the previous question,
167
00:17:21,689 --> 00:17:28,250
I have already calculated P A and M and I
have also calculated P A and I have also calculated
168
00:17:28,250 --> 00:17:32,450
P M.
So, I have got all the things I have got this
169
00:17:32,450 --> 00:17:36,680
guy calculated separately, this calculated
separately, this calculated separately I just
170
00:17:36,680 --> 00:17:40,770
have to see check is the left hand side equal
to the right hand side, that is what I have
171
00:17:40,770 --> 00:17:47,770
to check; and what I did was, I took the value
of P A and I took the value of P M multiplied
172
00:17:52,480 --> 00:17:57,790
them. And let me just do it for you, so that
you will believe that I have done the job,
173
00:17:57,790 --> 00:18:04,790
I have 0.45 multiplied by 0.375 equal to and
I get 0.16875, but my god you know, we should
174
00:18:16,970 --> 00:18:20,070
have got we should have got 0.15, because
that is P M.
175
00:18:20,070 --> 00:18:26,210
So, it turns out this number is not the same
as this therefore, what is happening here,
176
00:18:26,210 --> 00:18:33,210
this condition is being violated what does
it mean grades and gender are not independent.
177
00:18:33,450 --> 00:18:39,070
So, it depends for there a boy or a girl why
going to be doing in the example, that is
178
00:18:39,070 --> 00:18:42,470
like the this little test here is suggesting
that.
179
00:18:42,470 --> 00:18:48,340
So, it is a question of probability what we
have done this, you already know about this
180
00:18:48,340 --> 00:18:51,800
condition here, which is like some of all
the probabilities they should be equal to
181
00:18:51,800 --> 00:18:53,960
1.
182
00:18:53,960 --> 00:19:00,960
If we talk about mutually exclusive events,
and it is sort of like this you go and pick
183
00:19:01,360 --> 00:19:07,660
couple of hats, and there are a certain number
of red hats, and the certain number of green
184
00:19:07,660 --> 00:19:14,660
hats certain number of black hats, what is
the chance that you will be picking up a green
185
00:19:16,040 --> 00:19:22,060
or a red or a black, they are now mutually
exclusive; it will all depend on the fraction
186
00:19:22,060 --> 00:19:26,860
of red being there or the fraction of black
being there or fraction of green hats being
187
00:19:26,860 --> 00:19:32,980
there, in the in the in that basket where
all the hats are put there. And in fact it
188
00:19:32,980 --> 00:19:37,980
turns out whenever you got mutually exclusive
events the union will turn out to be the sum
189
00:19:37,980 --> 00:19:42,330
of the probability of these two, so this is
again a very useful formula, which we utilized
190
00:19:42,330 --> 00:19:45,460
and we will be also using them, as we go in
to problem solving.
191
00:19:45,460 --> 00:19:51,000
Now, I have a I have a situation here and
I would just like you to read this and verify
192
00:19:51,000 --> 00:19:57,340
that this is so, I would like you to just
read this and verify that you know being short
193
00:19:57,340 --> 00:20:03,370
in the office which is a a president being
short in the office, dying out being killed
194
00:20:03,370 --> 00:20:10,370
in the office and having beard are these two
situations independent of each other, being
195
00:20:11,070 --> 00:20:17,960
short in the office and having beard are these
independent of each other.
196
00:20:17,960 --> 00:20:21,960
This we can verify by going into back, going
going to history and looking into all the
197
00:20:21,960 --> 00:20:26,540
data, and at the data turns out to be something
different then we will say indeed they are
198
00:20:26,540 --> 00:20:29,830
independent of each other having beard has
nothing to do with getting short in the office,
199
00:20:29,830 --> 00:20:35,450
then of course, I will ask you to read I will
ask you to read this slide and work it out
200
00:20:35,450 --> 00:20:36,630
from that.
201
00:20:36,630 --> 00:20:42,400
Now, conditional probability this is something
I used already and I told you that conditional
202
00:20:42,400 --> 00:20:48,480
probability is something that is there, and
what they have done here they tried to use
203
00:20:48,480 --> 00:20:55,480
the conditional probability formula, and they
have tried to show that P A times P B in this
204
00:20:56,480 --> 00:21:02,060
case, turns out to be the intersection and
so on so forth, in fact it turns out P A and
205
00:21:02,060 --> 00:21:07,580
P B, if you multiply them you will end up
P A and P, but I would like you to work it
206
00:21:07,580 --> 00:21:08,640
out on your own.
207
00:21:08,640 --> 00:21:13,920
So, I will not solve that particular problem
instead all this solving some other problem,
208
00:21:13,920 --> 00:21:19,390
let us take a look at all the other problem
that I would like to what solve for you, I
209
00:21:19,390 --> 00:21:24,810
have this motion of joint probability what
is joint probability, it is basically saying
210
00:21:24,810 --> 00:21:30,680
if I have two events A and B. The probability
that they will occur together there will be
211
00:21:30,680 --> 00:21:36,990
observe to occur together; they might not
be causing each other no, all they are saying
212
00:21:36,990 --> 00:21:43,990
is they are observed to occur together, it
rains and I have poor marks in the in the
213
00:21:45,270 --> 00:21:48,500
test.
It is just joint occurrence nothing more than
214
00:21:48,500 --> 00:21:54,440
that, it is not that one is causing the other,
that will require a lot of lot of verification
215
00:21:54,440 --> 00:21:59,590
that will require a designed experiment to
really you know sprinkled water and see of
216
00:21:59,590 --> 00:22:05,390
grades can affected on that or observe the
situation when it is really cause an effect
217
00:22:05,390 --> 00:22:10,340
at of type of relationship; if that can be
verified then only all see A is causing B
218
00:22:10,340 --> 00:22:14,010
otherwise, they just observed to occur together,
that is all.
219
00:22:14,010 --> 00:22:20,030
So, turns out the when were, whenever we talk
for joint probability events are just seen
220
00:22:20,030 --> 00:22:24,440
to occur together nothing more, what kind
of events are we talking about, we have got
221
00:22:24,440 --> 00:22:31,440
two events here, that we are playing with,
one is H H two tosses and both are heads;
222
00:22:34,490 --> 00:22:41,490
that is the first event. First event is I
toss the coin twice and I get two heads, the
223
00:22:42,350 --> 00:22:48,710
second event is I get one head, but first
I get a head, then I get a tail, that is one
224
00:22:48,710 --> 00:22:54,020
outcome and the second outcome is I get a
tail and then I get a head.
225
00:22:54,020 --> 00:23:01,020
So, there are two out comes now, that constitute
the event A, A event B it is head and tail
226
00:23:01,400 --> 00:23:08,400
head and then tail or tail and then head.
What is the joint probability between A and
227
00:23:09,350 --> 00:23:16,350
B, can you really workout the joint probability,
can you really workout the joint probability,
228
00:23:16,880 --> 00:23:23,880
there is one easy way to do this and that
is what I have tried to do here, that is what
229
00:23:24,240 --> 00:23:30,640
I have trying to do here what I have done
is I have sketched here event A and I have
230
00:23:30,640 --> 00:23:37,640
sketched here event B, notice something the
outcomes have nothing in common, the outcomes
231
00:23:41,890 --> 00:23:47,110
have really these sequence of two events,
one is the observation of head, then the observation
232
00:23:47,110 --> 00:23:52,090
of tail.
So, if I find head head I say event A has
233
00:23:52,090 --> 00:23:58,260
occurred, if I find head and then tail or
tail and then head, I say event B has occurred,
234
00:23:58,260 --> 00:24:03,810
nothing is common between them, nothing is
common between them, if you workout the probability
235
00:24:03,810 --> 00:24:10,190
there will be nothing common between them
and you would, if you go back to that probability
236
00:24:10,190 --> 00:24:16,410
of A and B, A intersection B.
It will turn out to be the product of P A
237
00:24:16,410 --> 00:24:21,000
and P B you work it out in details and do
that I am just saying right now, there is
238
00:24:21,000 --> 00:24:26,710
nothing common between them therefore, there
being nothing in that intersection that probability
239
00:24:26,710 --> 00:24:33,710
is going to be 0; see in fact I show it here
by saying the set A intersection B is null,
240
00:24:33,840 --> 00:24:40,840
the probability of that is 0, if this occurs
the other will not occur, it turns out if
241
00:24:42,450 --> 00:24:46,560
this occurs there is nothing from here, that
has also occurred there.
242
00:24:46,560 --> 00:24:52,740
So, the probability at A is occurred it might
be some quantity there, this this probability
243
00:24:52,740 --> 00:24:57,820
is going to be 0, in those experiments, let
us carry on here.
244
00:24:57,820 --> 00:25:02,150
And let us go to another example, and I am
going to be showing another example, which
245
00:25:02,150 --> 00:25:07,400
sort of goes like this remember whenever I
said A and B are independent, my test was
246
00:25:07,400 --> 00:25:14,400
P A B, P A intersection B is going to be P
A multiplied by P V and of course, I give
247
00:25:16,470 --> 00:25:20,690
you the reminder the independence does not
mean that A and B cannot be observed to occur
248
00:25:20,690 --> 00:25:24,450
together; they might be occurring together
but, it does not mean they are independent
249
00:25:24,450 --> 00:25:30,060
of each other, independence does not mean
they cannot occur together, they might be
250
00:25:30,060 --> 00:25:33,570
occurring together and that is the joint probability,
that is the joint occurrence of the two.
251
00:25:33,570 --> 00:25:39,400
So, they occurring by an accidental also there
will be somebody scoring see in an exam, those
252
00:25:39,400 --> 00:25:43,410
two events they may occur together or may
not occur together, sometimes they occur together,
253
00:25:43,410 --> 00:25:47,990
sometimes they occur independent of each other
and not really related to each other anywhere
254
00:25:47,990 --> 00:25:54,990
at all. If I have a set of events, if I have
got A 1, A 2, A 3, A 4 a set of the events,
255
00:25:56,290 --> 00:26:03,290
then if I look at their intersection if they
are independent of each other then this relationship
256
00:26:04,310 --> 00:26:11,000
will hold would hold good, probability of
A, A A 1 intersection A 2, intersection A
257
00:26:11,000 --> 00:26:16,130
3 and so on so forth. This is now, whatever
is common between A 1, A 2, A 3, A 4, A 5
258
00:26:16,130 --> 00:26:21,700
and so on, that turns out to be the probability
of that turns out to be the the product of
259
00:26:21,700 --> 00:26:26,810
the probability product of the all the events
occurring together that turns out to be this
260
00:26:26,810 --> 00:26:27,980
way.
261
00:26:27,980 --> 00:26:34,980
Here is an example, there are two events the
birth of a daughter and the birth of a daughter,
262
00:26:39,660 --> 00:26:46,660
birth of anyone any child with A B plus blood
type. Now, A B plus blood type is determined
263
00:26:47,020 --> 00:26:54,020
by biological considerations and different
biological conditions lead to the birth of
264
00:26:57,220 --> 00:27:03,940
a daughter, these two actually turn out to
be independent events, again if you do this
265
00:27:03,940 --> 00:27:09,820
test P A B, it will tell out to be P A times
P B and that is the test for independence
266
00:27:09,820 --> 00:27:13,090
or you could actually say.
267
00:27:13,090 --> 00:27:20,090
In another way, you could really say if I
have test this I will say P A given B this
268
00:27:20,600 --> 00:27:27,600
turns out to be P A, on the right hand side
I do not have B influencing anything at all
269
00:27:27,630 --> 00:27:34,630
and the same thing will be applying if A and
B are independent I could say P B given A,
270
00:27:34,890 --> 00:27:41,890
B given A this will turn out to be equal to
P B that is all; these are these are tests
271
00:27:42,960 --> 00:27:49,070
of independence these are tests of independence
they actually say that B really has no influence
272
00:27:49,070 --> 00:27:55,050
on the occurrence of A or A also has no no
occurrence, no no influence on the occurrence
273
00:27:55,050 --> 00:27:58,760
of B and let us try to work out one example.
274
00:27:58,760 --> 00:28:03,390
Let us try to work out this example, we are
working out many examples, and obviously we
275
00:28:03,390 --> 00:28:07,690
can stop the tape and we can actually rewind
and so on, you could do that, but let us take
276
00:28:07,690 --> 00:28:14,690
a look at this example here, I have the joint
occurrence defined here, how I defined it,
277
00:28:16,230 --> 00:28:23,230
I defined it like this a drug is used to treat
people, and these people could be men or women
278
00:28:26,350 --> 00:28:31,890
and the result of this administrating this
drug this drug is either the drug is successful,
279
00:28:31,890 --> 00:28:38,890
it heals people or it it fails to heal people.
And some data has been collected, so the drug
280
00:28:39,920 --> 00:28:46,920
was given to women and also to men who are
sick and in a certain number of times they
281
00:28:47,640 --> 00:28:52,950
found to be the drug was found to heal people
and certain other cases they were found not
282
00:28:52,950 --> 00:28:58,160
to heal people and the fractions are given
here, the the the ratios are given here. So,
283
00:28:58,160 --> 00:29:05,160
I have here, a total of 200 plus 800 that
is 2000 and 1800 plus 200 that is will again,
284
00:29:06,960 --> 00:29:13,960
2000 again, so 4000 people were given this
drug, 200 women they got cured, 1800 men got
285
00:29:17,050 --> 00:29:24,050
cured, 1800 women could not be cured, and
200 men could not be cured.
286
00:29:24,100 --> 00:29:31,100
And let us talk about the events, that we
are going to be testing against each other,
287
00:29:32,050 --> 00:29:39,050
A is the event that the patient is a women,
what is the chance of the patient being, patient
288
00:29:39,240 --> 00:29:45,200
being a women it is going to be if we comes
in, if the count comes in this column; this
289
00:29:45,200 --> 00:29:50,240
is the chance which is like 50 percent chance
of the women of the randomly picked patient
290
00:29:50,240 --> 00:29:57,240
being women; and the chance of the drug failing
I go to the failure row and I find 1800 plus
291
00:29:59,680 --> 00:30:04,590
200 that is again 2000.
So, what I have here, I have basically got
292
00:30:04,590 --> 00:30:11,570
a count of how many women were there in the
test, and also how many times the drug failed
293
00:30:11,570 --> 00:30:18,570
that count also I have, the question I am
asking is are A and B independent, what are
294
00:30:18,710 --> 00:30:25,410
A and B, A is the event that the patient is
a women and B is the event that the drug failed
295
00:30:25,410 --> 00:30:28,260
are the independent of each other, are these
independent of each other.
296
00:30:28,260 --> 00:30:33,680
Let us try to see how we do that, how we work
it out and for that what I have what I have
297
00:30:33,680 --> 00:30:40,680
done again is that taking the, I have taken
a print of the slide and I have worked out
298
00:30:41,040 --> 00:30:45,910
the examples for you, I have here first of
all you notice I have done the totaling, so
299
00:30:45,910 --> 00:30:50,890
I found out how many how many women were there
in the test and that turns out to be 200 plus
300
00:30:50,890 --> 00:30:57,360
1800 is 2000; how many men were there that
again is 2000, how many successes as I had
301
00:30:57,360 --> 00:31:04,360
I had 2000 successes and I had also 2000 failures,
a total of 400 people 4000 people were involved
302
00:31:05,730 --> 00:31:08,980
in this.
Now, let us take a look at this quantity we
303
00:31:08,980 --> 00:31:13,820
have, which is like the product of, which
is the joint which is the joint probability
304
00:31:13,820 --> 00:31:19,490
of women and the drug failing, and that I
can find by going to this joint probability
305
00:31:19,490 --> 00:31:24,970
table here, and it it turns out, if have to
look for that count that is shown here, this
306
00:31:24,970 --> 00:31:31,970
is the count of the patient being women and
also the drug failing that is 1800. So, 1800
307
00:31:33,190 --> 00:31:37,720
divided by 4000 which is the total number
that turns out to be 0.45 that is the that
308
00:31:37,720 --> 00:31:44,720
is the probability, that I had women tested
and the drug failed, this is that is this
309
00:31:46,110 --> 00:31:53,110
probability, this is women and failure that
is this probability here.
310
00:31:53,790 --> 00:31:59,210
What I then do is I go into the pieces the
parts of it and look at the probability of
311
00:31:59,210 --> 00:32:04,620
failure and this really got the marginal probability,
the marginal probability of failure, that
312
00:32:04,620 --> 00:32:08,840
turns out to be some number, that will be
like probability of failure is 2000 divided
313
00:32:08,840 --> 00:32:15,840
by 4000 that is this number. And the other
part is the patient is women and that again
314
00:32:16,470 --> 00:32:21,420
turns out to be if I go to this column, and
look at the total 2000 divided by 4000 that
315
00:32:21,420 --> 00:32:27,820
is this column here is, I have got 1 by 2
divided by 1 by 2, that is 1 by 4. And look
316
00:32:27,820 --> 00:32:34,820
at W F, W F is the joint probability which
is 0.45 it is not equal to 0.25 therefore,
317
00:32:36,490 --> 00:32:42,060
it turns out, these are not independent, these
are not independent.
318
00:32:42,060 --> 00:32:49,060
So, the conclusion is that the test is the
the the the situation here is not independent,
319
00:32:51,180 --> 00:32:58,180
the the the the drug failing and the patient
being women these are these are not independent;
320
00:32:58,990 --> 00:33:03,610
this I get by by just applying this little
formula here, this formula is the probability
321
00:33:03,610 --> 00:33:07,980
is the is the test for independence, this
is going to the test for independence, this
322
00:33:07,980 --> 00:33:14,980
is the test for independence this is the test,
that is the test for independence, that is
323
00:33:20,200 --> 00:33:21,460
what I have done.
324
00:33:21,460 --> 00:33:27,280
Let us see what else, we are being told about
independence and I am going to be coming back
325
00:33:27,280 --> 00:33:34,280
again to one of the examples, consider the
example when I toss the coin twice, and the
326
00:33:35,530 --> 00:33:42,530
outcomes for event A was defined as head and
then tail or two heads and B was the event
327
00:33:46,690 --> 00:33:53,690
head and tail, will be event B event A B independent
of B, this is a question I have, I have this
328
00:33:59,500 --> 00:34:04,830
question, what is the test I will be applying
here, again P A B, is it equal to P A times
329
00:34:04,830 --> 00:34:11,830
P B for that, what I have to do is, have to
calculate P A and also I will have to calculate
330
00:34:12,000 --> 00:34:16,679
P B and I have done that.
And let me show you, what the calculations
331
00:34:16,679 --> 00:34:23,679
look like, it turns out P A, which is like
head then tail, which is half times of that
332
00:34:29,139 --> 00:34:36,139
turns out to be, 1 by 4 and also I could have
the same event occur, if I have 2 heads on
333
00:34:36,859 --> 00:34:42,659
the chance of them occurring like this is
again 1 by 4. So, 1 by 4 plus 1 by 4, because
334
00:34:42,659 --> 00:34:48,519
these are these are two different outcomes
and I am talking about this occurring or this
335
00:34:48,519 --> 00:34:54,169
occurring therefore, I add the probabilities.
So, P A has the probability of 1 by half 1
336
00:34:54,169 --> 00:35:01,169
by 2, P B is now this case here first getting
a head then getting a tail, getting a head
337
00:35:03,150 --> 00:35:08,910
is one half and getting a tail is also one
half for a multiply there two, because they
338
00:35:08,910 --> 00:35:15,910
must be together, that turns out to be 1 by
4. Now, I have got P A and P B what is the
339
00:35:16,079 --> 00:35:23,079
chance of my P A B, P A B is what? P A B that
is a has occurred and also B has occurred,
340
00:35:24,569 --> 00:35:31,569
it is a very funny situation, I have situation
like this I have A here and a consists of
341
00:35:32,049 --> 00:35:39,049
two things A consists of H H and H T, and
guess what B consists of B consists of this
342
00:35:46,369 --> 00:35:53,369
part, this is B.
So, what is going to be the intersection of
343
00:35:54,200 --> 00:36:01,200
A and B, that is P A B, A intersection B,
what is that set, that is this and what is
344
00:36:03,599 --> 00:36:08,759
the probability for this, there is 1 by 4,
so here again I have got a situation when
345
00:36:08,759 --> 00:36:15,549
P A turns P B does not equal this does not
equal this, that means again they are not
346
00:36:15,549 --> 00:36:19,740
independent of each other. And you workout
the other examples on your own, you can work
347
00:36:19,740 --> 00:36:24,599
out and I am providing you the solution, and
you can come back and you know you check this
348
00:36:24,599 --> 00:36:29,059
again, check to make sure that yes indeed
there is problem, there there is something
349
00:36:29,059 --> 00:36:29,759
we could do.
350
00:36:29,759 --> 00:36:36,009
Let us take a look at another situation, when
I am talking about conditioning, and let us
351
00:36:36,009 --> 00:36:43,009
see how I work with the conditioning problem,
I start by again having a joint table and
352
00:36:49,480 --> 00:36:53,119
the joint table looks like this, if you look
at the screen here, the joint table looks
353
00:36:53,119 --> 00:37:00,119
like this. And what I am going to be checking
is, what is going to be P B given A, I just
354
00:37:00,940 --> 00:37:05,910
have to calculate that and for that all we
just using I will be using the joint table
355
00:37:05,910 --> 00:37:12,910
and I will be looking up for P A B, I will
be looking up for P A and I will workout this
356
00:37:13,809 --> 00:37:20,809
definition. Now, this P B given A will be
equal to P B A divided by P A, that is all
357
00:37:26,569 --> 00:37:33,569
P A given B will be equal to P A B divided
by P B, if just see that situations there,
358
00:37:34,720 --> 00:37:39,029
this is exactly what I have done here.
So, conditioning calculations are quiet easy
359
00:37:39,029 --> 00:37:44,140
and quiet straight forward, and I what have
what have I done here, I have here if you
360
00:37:44,140 --> 00:37:51,140
look at my sheet, now P B given A is equal
to P A B divided by P A that is P B given
361
00:37:55,670 --> 00:38:02,670
A. And I got this P A B already calculated
I did the the fractions before, and so I know
362
00:38:02,970 --> 00:38:09,079
that those numbers there I got 0.45 divided
by 0.5 that is P A; that turns out to be 0.9,
363
00:38:09,079 --> 00:38:13,650
that is one conditional probability, the other
conditional probability turns out P A given
364
00:38:13,650 --> 00:38:19,769
B there is some symmetry in the data therefore,
that also that answer also turns out to be
365
00:38:19,769 --> 00:38:23,299
0.9.
Now, let us just go back and remind you, how
366
00:38:23,299 --> 00:38:30,289
I found my P A and P B and so on, what is
P A, P A in this case is going to be patient
367
00:38:30,289 --> 00:38:37,289
is women patient is women, that is really
means how to count up the number of women
368
00:38:37,849 --> 00:38:44,019
that is 2000 divided by 4000 that is going
to be my patient being women which is this
369
00:38:44,019 --> 00:38:51,019
part, that turns out to be halt, they have
50 percent probability. Because half the population
370
00:38:51,869 --> 00:38:56,180
is women, this 50 percent probability that
they randomly picked person, randomly picked
371
00:38:56,180 --> 00:39:03,180
patient is is a women and that turns out to
be half. What about P A B, P A B is when the
372
00:39:03,460 --> 00:39:10,240
women is when the women is, when the drug
fails when a women is using it, and that is
373
00:39:10,240 --> 00:39:16,420
this cell here, this is the cell where I have
got women using the drug and the drug failing.
374
00:39:16,420 --> 00:39:23,410
So, that 1800 number I put there, and I have
1800 divided by 4000 and that turns out to
375
00:39:23,410 --> 00:39:29,369
be as we have done before with our machine,
it turns out to be 0.45, this is I found P
376
00:39:29,369 --> 00:39:35,269
A B, which I bring here and I divide that
by P B and I end up with my other numbers
377
00:39:35,269 --> 00:39:35,730
there.
378
00:39:35,730 --> 00:39:40,130
So, conditioning in probabilities are quiet
easy to calculate once you have the discipline
379
00:39:40,130 --> 00:39:47,089
there with this we can move on and we can
actually test, if there is any relationship
380
00:39:47,089 --> 00:39:54,089
there. If given A given, given that A is independent
of B what is the relationship between P A
381
00:39:59,789 --> 00:40:04,190
and P B, this I am sure you can work out,
let me ask you the question again and I think
382
00:40:04,190 --> 00:40:11,099
you will understand that there are two there
are two events.
383
00:40:11,099 --> 00:40:18,099
One event is A the other event is B, and what
they are saying is A is independent of B,
384
00:40:22,369 --> 00:40:29,369
that means P A is the same as P A given B,
this is what they are saying they are actually
385
00:40:32,660 --> 00:40:39,660
saying that, what is the relationship between
P A given B and P A; it turns out they are
386
00:40:42,470 --> 00:40:48,420
equal, because this is not dependent on B
this is actually equal to P A. If I only know
387
00:40:48,420 --> 00:40:53,470
this I do not have to worry about this, because
because A in no way is dependent on this,
388
00:40:53,470 --> 00:40:58,380
so this conditioning really has no effect
at all, whether B has occurred or not it has
389
00:40:58,380 --> 00:41:03,400
no impact on the occurrence of A. So, it is
a purely straight forward question nothing
390
00:41:03,400 --> 00:41:10,079
more just a little tricky question, they just
skipped in this little question for you.
391
00:41:10,079 --> 00:41:14,940
Then we moving to something which is an application
of what we done, and this turns out to be
392
00:41:14,940 --> 00:41:21,940
a very vital sort of application of conditional
probability, and let me tell you that in it
393
00:41:22,160 --> 00:41:28,630
also gives us an indication of how good our
quality control factor, quality control inspection
394
00:41:28,630 --> 00:41:33,569
will be like, and I will illustrate that using
some assumptions and I am going to solve it
395
00:41:33,569 --> 00:41:35,950
for you, I am going to show you the solution
also for this.
396
00:41:35,950 --> 00:41:41,930
Let me first give you a picture, let me first
give you the scenario and to do that what
397
00:41:41,930 --> 00:41:48,930
I will be doing is, I will be bringing you
situation in a hospital, the hospital is using
398
00:41:53,529 --> 00:42:00,529
some device may be mammogram or something
it is using to test or some set of a scanning
399
00:42:00,900 --> 00:42:07,900
to test to see if a person has a, cancer it
turns out that if a if the test says positive,
400
00:42:11,609 --> 00:42:16,239
it is not true that the person always has
cancer.
401
00:42:16,239 --> 00:42:23,239
In fact the, if the person does have cancer,
look at the screen now look at the screen,
402
00:42:23,819 --> 00:42:30,819
now look at the display here it says, if the
person has cancer the test will be positive,
403
00:42:33,150 --> 00:42:40,150
the probability of that is only 0.92, only
in 92 percent cases with a real cancer, the
404
00:42:42,349 --> 00:42:48,599
test will say that the patient has cancer,
really the patient already has cancer, but
405
00:42:48,599 --> 00:42:55,359
these only in 92 percent chance that the test
will catch it. The reverse situation or bring
406
00:42:55,359 --> 00:43:01,349
it a healthy person and he goes there takes
the test there is in mammogram, and then the
407
00:43:01,349 --> 00:43:07,970
test shows positive 4 percent of the time,
so it is a perfectly healthy person he takes
408
00:43:07,970 --> 00:43:13,910
a test and because, the instrument is not
perfect 4 percent of the time it gives you
409
00:43:13,910 --> 00:43:20,910
faulty result.
Now, these numbers 92 percent seems sufficiently
410
00:43:21,029 --> 00:43:26,970
high, and 4 percent it seems sufficiently
low, so you might say say it is I think I
411
00:43:26,970 --> 00:43:33,970
will get that instrument; I will have it installed
in our hospital. Now, what we would like to
412
00:43:35,920 --> 00:43:40,890
do is, we would like to do this analysis using
probability theory, would like to find out
413
00:43:40,890 --> 00:43:47,890
is it reasonable for me to purchase this equipment,
how often is it going to get me false signals,
414
00:43:47,930 --> 00:43:54,029
what kind of false signals we will work that
out we will work that out if a person is randomly
415
00:43:54,029 --> 00:44:01,029
selected and his test is positive; what is
the chance that he really has cancer, think
416
00:44:05,339 --> 00:44:12,339
of this again let me repeat the question,
the test is positive, what is the chance that
417
00:44:16,039 --> 00:44:20,069
the person really has cancer, knowing that
the instrument is not perfect.
418
00:44:20,069 --> 00:44:27,069
Let us try to work this out, what had been
given of course, is that in general, in the
419
00:44:27,700 --> 00:44:34,700
wide population 0.1 percent people have cancer,
which is probably true for this area of the
420
00:44:35,180 --> 00:44:38,940
geography, it would be quiet different if
you are in new jersey or some other place
421
00:44:38,940 --> 00:44:44,200
where the chance of having cancer is you know
it could be double digits. Because, the there
422
00:44:44,200 --> 00:44:49,319
lot of chemicals of the area, lot of garbage
dumps and so on, and the air is not clean
423
00:44:49,319 --> 00:44:53,779
and so on the forth, it would be true many
other places also what new jersey are knowing
424
00:44:53,779 --> 00:44:56,420
particular, because I lived there and I saw
all these problems there.
425
00:44:56,420 --> 00:45:02,329
Let us see, how we work this out, so what
is the data that has been given to us, I am
426
00:45:02,329 --> 00:45:09,329
first presenting you just the data and the
data is this in the wide population 0.1 percent
427
00:45:09,920 --> 00:45:16,920
of people have cancer and that means 0.999
is the probability of the person being healthy;
428
00:45:18,369 --> 00:45:25,369
no problem there. And the instruments capability
is the following it shows the test to be positive,
429
00:45:28,359 --> 00:45:34,789
when there is cancer 92 percent of the time,
which some people may think is pretty good
430
00:45:34,789 --> 00:45:41,789
and also unfortunately for a healthy person
also also it says that the person has cancer,
431
00:45:44,210 --> 00:45:50,359
4 percent of the time this is the instruments
performance.
432
00:45:50,359 --> 00:45:57,359
The question that is being ask this a manager
question, will you rely on this test equipment,
433
00:45:57,529 --> 00:46:04,269
and will you start a treatment only because,
you start a cancer treatment which is pretty
434
00:46:04,269 --> 00:46:09,799
severe, just because the machine said you
have cancer, will you do that, given this
435
00:46:09,799 --> 00:46:16,799
given this being the record, this being the
story. Let us see, how we solve this problem,
436
00:46:18,059 --> 00:46:22,920
what all we doing is I will be walking you
through some calculations, be just a little
437
00:46:22,920 --> 00:46:28,420
patient and I am going to work out the problem
for you.
438
00:46:28,420 --> 00:46:33,920
And we will be concentrating straight on the
on the sheet there I worked out for you, it
439
00:46:33,920 --> 00:46:39,920
took me little while but, it still could be
done on one sheet, so I thought to I I should
440
00:46:39,920 --> 00:46:46,920
preserve this for the class here; these are
given, these these things are given to us,
441
00:46:50,509 --> 00:46:56,329
the probability that the test is positive
given that there is cancer is 92, the probability
442
00:46:56,329 --> 00:47:03,009
that the test is positive given that there
is no cancer that is a healthy body is 4 percent.
443
00:47:03,009 --> 00:47:10,009
In the wide population outside, the probability
of a person having cancer, persons is having
444
00:47:12,539 --> 00:47:18,710
cancer is only 0.001 and the person being
healthy that means no cancer is 0.999, these
445
00:47:18,710 --> 00:47:24,680
are given to us.
Now, let us try to work out some questions,
446
00:47:24,680 --> 00:47:29,960
what is the questions that are I am asking,
I am asking this question, what is the chance
447
00:47:29,960 --> 00:47:35,369
that the person has cancer given that the
test is positive, given that the test is positive
448
00:47:35,369 --> 00:47:41,410
what is the chance that the person has cancer,
this is what I have to evaluate this is what
449
00:47:41,410 --> 00:47:46,759
I have to evaluate, and for that the actually
I will have to go back into what we have here
450
00:47:46,759 --> 00:47:51,619
the data, I will have to find this, I will
have to find this and I will have to find
451
00:47:51,619 --> 00:47:57,329
this, how did I find this formula this came
straight from our conditional probability
452
00:47:57,329 --> 00:48:00,859
conditional probability formula, I will going
to show you that formula in just one minute
453
00:48:00,859 --> 00:48:04,380
I will show you.
In fact what, now we have to do is, we have
454
00:48:04,380 --> 00:48:10,509
to look at these things, look at this intersection
here, probability that the test is positive
455
00:48:10,509 --> 00:48:17,509
and the person has cancer it is equal to the
probability test is positive given he has
456
00:48:18,229 --> 00:48:23,269
cancer multiplied by the probability his cancer,
do I have this data, yes I have this data,
457
00:48:23,269 --> 00:48:30,269
so I have this data here, do I have this data,
yes I have this data it is also given to me.
458
00:48:31,589 --> 00:48:38,589
So, I can calculate this quantity this quantity
I can calculate without any and it turns out
459
00:48:42,799 --> 00:48:48,559
this is also equal to the probability this
way, because I I can flip these two this is
460
00:48:48,559 --> 00:48:54,680
a this is, this saying intersection condition.
So, whether I take C first or I take plus
461
00:48:54,680 --> 00:48:58,960
first and positive indication first it does
not really matter, so here what I have done
462
00:48:58,960 --> 00:49:04,529
I have taken I have made C condition of the
test being positive and this, so I have actually
463
00:49:04,529 --> 00:49:10,509
on this side I have something that is equal
to this, on the left hand side I have got
464
00:49:10,509 --> 00:49:14,170
the quantities I have got the quantity, I
have got this quantity I know it is value
465
00:49:14,170 --> 00:49:19,099
it is value is going to be the product of
these two; that I can find from these two.
466
00:49:19,099 --> 00:49:25,789
Then I look at this quantity here, which is
the probability that the test is indicated
467
00:49:25,789 --> 00:49:32,460
to be positive, when can that happen, that
can happen by this, probabilities that it
468
00:49:32,460 --> 00:49:35,309
test is positive given that the person has
cancer multiplied by the probability of that
469
00:49:35,309 --> 00:49:42,309
the person has cancer; and that is found from
here, so I know this and I know this, no real
470
00:49:42,359 --> 00:49:48,460
problem there. And then this part which is
like probability that the test is positive
471
00:49:48,460 --> 00:49:53,430
given that there is no cancer multiplied by
probability that there is there is no cancer,
472
00:49:53,430 --> 00:49:59,700
where can I find this I can again look up
my given data and it turns out that this I
473
00:49:59,700 --> 00:50:06,700
know and also this I know from here.
So, I know all quantities, so I can evaluate
474
00:50:08,229 --> 00:50:15,229
this quantity for sure, now the only thing
I need to find out this probability of C given
475
00:50:15,460 --> 00:50:19,910
that the test was positive, because this is
now, this is something that I am going to
476
00:50:19,910 --> 00:50:24,349
be finding out, I am going to be calculating
that. So, let us try to see, if I could calculate
477
00:50:24,349 --> 00:50:31,349
this quantity here, what is this now what
is this probability this is the critical question,
478
00:50:31,519 --> 00:50:37,900
what is the chance that the person has cancer,
given that the test is positive.
479
00:50:37,900 --> 00:50:44,900
Now, look at this, look at this statement
here, look at this statement 1, statement
480
00:50:45,180 --> 00:50:52,180
1 says probability that test is positive given
that the person has cancer multiplied by this
481
00:50:52,970 --> 00:50:58,380
this is the same as probability of test being
positive and probability having and and the
482
00:50:58,380 --> 00:51:04,359
patient having cancer person having cancer
that is also equal to this is like probability
483
00:51:04,359 --> 00:51:10,319
A given B multiplied by P B and these probability
B given A multiplied by P A that is what it
484
00:51:10,319 --> 00:51:17,259
is, so I have really written the same formula,
once here another time here just by changing
485
00:51:17,259 --> 00:51:19,890
the order of the thing, so it is the same
formula.
486
00:51:19,890 --> 00:51:25,589
This is valid from the probability of, these
valid from the probability of conditional
487
00:51:25,589 --> 00:51:31,569
probability formula, I use this formula and
I am after this quantity I am after this quantity
488
00:51:31,569 --> 00:51:37,940
what is that quantity just try to put a green
bar there, I am after this quantity this is
489
00:51:37,940 --> 00:51:44,940
the quantity I am after, I know this and I
know this and this quantity I have already
490
00:51:45,180 --> 00:51:49,999
evaluated by equation 2; so I know this and
I know this and I know this therefore, I can
491
00:51:49,999 --> 00:51:56,059
find this and this I write write here, probability
of they are being canceled given that the
492
00:51:56,059 --> 00:52:02,369
test is positive is equal to this quantity
here I just took this, this is what I took
493
00:52:02,369 --> 00:52:06,710
and I put that in the numerator and I divided
by the denominator there.
494
00:52:06,710 --> 00:52:13,710
So, from 1 I found this, then from 2 I find
the probability of the test being positive
495
00:52:14,739 --> 00:52:21,739
which turns out to be this quantity this quantity,
this is the quantity that I put down here,
496
00:52:22,049 --> 00:52:27,859
using which equation equation 2, so I have
a quantity here, where I have got all the
497
00:52:27,859 --> 00:52:34,859
numbers, I have got 0.92, I have got 0.001,
I have got 0.92 again, I have got 0.001 and
498
00:52:37,229 --> 00:52:44,229
I have got 0.04 given to us and I have got
0.999, this have done. And if I do that, and
499
00:52:45,680 --> 00:52:52,680
I use my machine I do whatever and so on,
as more than I do some multiplication divide
500
00:52:53,489 --> 00:53:00,489
and so on, I get 0.0225, that is the probability
believe me, that is the probability of a person
501
00:53:02,880 --> 00:53:09,150
having cancer, if the test is shown as positive
the instrument says the person has cancer,
502
00:53:09,150 --> 00:53:14,369
and he is truly having cancer is this.
So, the probability having, truly having cancer
503
00:53:14,369 --> 00:53:21,369
given that the test is positive is only 2
percent will you trust a machine like this
504
00:53:23,680 --> 00:53:29,779
will you trust a machine like this, what we
have worked out is not known to many people
505
00:53:29,779 --> 00:53:36,309
in medical practice, because they have not
gone through this, and just see how serious
506
00:53:36,309 --> 00:53:42,989
this matter is, how can we fix this situation,
one way is start playing with these numbers
507
00:53:42,989 --> 00:53:49,559
here, this 0.92 notice here there is a quantity
called 0.92, which is the goodness of the
508
00:53:49,559 --> 00:53:56,559
instrument, this is also the goodness of the
instrument, these are the factors I need not
509
00:53:56,579 --> 00:54:01,799
really worry about population statistics.
But, I must worry about this number and this
510
00:54:01,799 --> 00:54:08,799
number these have to be sufficiently high
these have to be high, in fact this has to
511
00:54:09,569 --> 00:54:16,160
be low, and this has to be high, this has
to be low, because this is saying there is
512
00:54:16,160 --> 00:54:21,549
cancer when there no cancer, and this is saying
there is cancer and it saying positive, this
513
00:54:21,549 --> 00:54:28,079
should be as high as possible, what we have
done we have taken this two 0.999 and we have
514
00:54:28,079 --> 00:54:35,079
reduce this to 0.001, if we do that this result
stilt only 50 percent, that means you need
515
00:54:36,109 --> 00:54:41,900
a machine that must be much better than what
these fellows are supplying to you.
516
00:54:41,900 --> 00:54:46,470
That the performance level of these people
are supplying it to, this is something we
517
00:54:46,470 --> 00:54:51,239
got to keep in mind, and this is the analysis
that you got to go through, this is the analysis
518
00:54:51,239 --> 00:54:56,869
you got to go through before you make a purchase,
you should not go just by this 0.92, 0.04
519
00:54:56,869 --> 00:55:03,559
they look great, they sound very good. But,
when you actually do the calculation, what
520
00:55:03,559 --> 00:55:10,279
is the chance of the person having really
having cancer, when the test is positive that
521
00:55:10,279 --> 00:55:15,430
turns out to be reduce for 2 percent only
and obviously, we cannot really send these
522
00:55:15,430 --> 00:55:21,410
people for chemotherapy with this sort of
probability we cannot sell send people who
523
00:55:21,410 --> 00:55:27,239
indicate to be positive to chemotherapy, we
just cannot do that, we cannot do that; this
524
00:55:27,239 --> 00:55:31,710
just kind of gives you an idea, how important
this conditional probability theory is it
525
00:55:31,710 --> 00:55:35,150
is very very important.
This gets into quality assurance, this gets
526
00:55:35,150 --> 00:55:39,950
into production manufacturing, this gets into
any kind of improvement that you looking for
527
00:55:39,950 --> 00:55:45,170
like for example, if you want to remove false
alarms or bad treatments; then hospital is
528
00:55:45,170 --> 00:55:52,049
a test facility and a hospital also is a service
facility. It should provide good service,
529
00:55:52,049 --> 00:55:57,630
if it is based on this sort of data it is
not really possible for this this particular
530
00:55:57,630 --> 00:56:02,650
facility to survive for too long and people
can take them to the court, if this the can
531
00:56:02,650 --> 00:56:06,869
have instrument that is being used and false
treatments have been given only there is 2
532
00:56:06,869 --> 00:56:12,559
percent chance of the person really having
cancer, when the machine says he has cancer
533
00:56:12,559 --> 00:56:18,119
that being so low; people can be taken to
court and just be mindful of this. So, when
534
00:56:18,119 --> 00:56:23,299
you get into a situation like this please
think of this example and this is a pretty
535
00:56:23,299 --> 00:56:26,670
famous example, that is use by many different
people.
536
00:56:26,670 --> 00:56:33,670
So, will you really get a treatment and our
answer is going to be plain and suppose no,
537
00:56:34,630 --> 00:56:38,200
if this is the performance if the, if this
kind of performance is what we are getting
538
00:56:38,200 --> 00:56:43,329
from the instrument you should not get treated
in that hospital; certainly you should not
539
00:56:43,329 --> 00:56:48,279
go for treatment, rather you should try to
get other test and try to see this number
540
00:56:48,279 --> 00:56:55,150
could be improved. And if you not 0.999 is
not so good, so that is like a lesson for
541
00:56:55,150 --> 00:56:59,759
us, we will continue with this as we move
along, thank you very much, thank you.