1
00:00:09,610 --> 00:00:21,020
Hello and welcome to our lecture on Bias-Variance
Dichotomy, this is a conceptual lecture. So,
2
00:00:21,020 --> 00:00:25,110
you must be right now going through lectures,
where you are learning many of the machine
3
00:00:25,110 --> 00:00:32,191
learning tools and techniques and this is
not one of them. So, we not using a new technique
4
00:00:32,191 --> 00:00:40,570
in this lecture, but we are introducing a
very important concept that is machine learning
5
00:00:40,570 --> 00:00:45,680
and therefore, it applies to all the techniques
I would say that you are learning and I would
6
00:00:45,680 --> 00:00:51,400
go in so for saying this is probably one of
the most important concepts that you should
7
00:00:51,400 --> 00:00:53,590
understand in machine learning.
8
00:00:53,590 --> 00:00:59,320
So, let us derive into what the core concept,
the concept can be said in very simple sentence.
9
00:00:59,320 --> 00:01:10,370
The idea here is that while adding complexity
to a model, while if you want to keep adding
10
00:01:10,370 --> 00:01:16,240
like say complexity to a model, you might
improve the fit of the model. So, you might
11
00:01:16,240 --> 00:01:23,560
find out that you are better describing the
data, but that need not improve the predictive
12
00:01:23,560 --> 00:01:32,280
accuracy of this model when you compare it
to new data that you get and this concept
13
00:01:32,280 --> 00:01:36,110
is to for whatever type of model that you
look at.
14
00:01:36,110 --> 00:01:42,189
So, you can take an example on linear regression
and you can add complexity to it by either
15
00:01:42,189 --> 00:01:50,090
adding more input variables or even with one
input variable, you can add more complex transformations.
16
00:01:50,090 --> 00:01:57,119
So, for instance one way of just adding complexity
to a simple problem, where you have one input
17
00:01:57,119 --> 00:02:02,360
variable, one output variable is that saying
I am not only interested in looking it through
18
00:02:02,360 --> 00:02:08,890
the standard input variable, but I would like
to look it as a polynomial. So, what is the
19
00:02:08,890 --> 00:02:16,300
model when you have x, x is the input variable,
but you can also take x square, x cube and
20
00:02:16,300 --> 00:02:20,620
so on.
So, you can have a more complicated fit between
21
00:02:20,620 --> 00:02:26,290
y and x, because in simple regression you
always see y is equal to m x plus c, but what
22
00:02:26,290 --> 00:02:36,610
we have y is equal to m 1 x plus m 2 x square
plus c. So, you can add complexity that way,
23
00:02:36,610 --> 00:02:42,780
you can add complexity by adding more variables.
Now, this is again not confined to the regression
24
00:02:42,780 --> 00:02:47,739
any more. Almost any method that you take,
you will typically find that there is some
25
00:02:47,739 --> 00:02:52,080
way of getting more and more complex.
So, for instance you might have already covered
26
00:02:52,080 --> 00:02:59,520
trees, classification regression trees. You
can add more complexity by, you know creating
27
00:02:59,520 --> 00:03:04,500
more and more and more branches to the point
where you have such a complicated tree, you
28
00:03:04,500 --> 00:03:11,830
have such a large tree where each terminal
node or leaf is a single data point that we
29
00:03:11,830 --> 00:03:20,740
are using in your training set. So, you have
really the more complexity, but k need choices
30
00:03:20,740 --> 00:03:25,090
k could be assigned complexity. With neural
networks something that you would learn the
31
00:03:25,090 --> 00:03:28,709
future, the number of layers in the neural
networks can make the complexity.
32
00:03:28,709 --> 00:03:33,970
And this idea of complexity would become more
clear as we, you know talk through some example.
33
00:03:33,970 --> 00:03:39,849
But, the idea is that you can make the model
more and more and more complex, the model
34
00:03:39,849 --> 00:03:46,030
that you are going to use to create this relationship
between input variables and the output variable,
35
00:03:46,030 --> 00:03:50,870
nothing become more and more and more complex
and the more and more complex it becomes you
36
00:03:50,870 --> 00:03:54,060
will do a better job of fitting the data that
you have.
37
00:03:54,060 --> 00:03:58,850
But, that does not mean you are creating a
better model and the answer is might not,
38
00:03:58,850 --> 00:04:05,099
because while in might fit a data better that
you have, tomorrow we need to predict using
39
00:04:05,099 --> 00:04:12,190
this model, you might not do a better job
of predicting and we are going to see how
40
00:04:12,190 --> 00:04:17,750
that can happen that can possibly happen.
Now, this core concept in machine learning
41
00:04:17,750 --> 00:04:24,870
is also sometimes referred to as Occam's razor
that is more of mathematical concept and it
42
00:04:24,870 --> 00:04:30,599
is definitely used permanently in machine
learning, And the idea there is not which
43
00:04:30,599 --> 00:04:39,560
is that if there are two models with equal
predictive accuracy, then you prefer the model
44
00:04:39,560 --> 00:04:45,439
that is simpler that is less complex.
So, that is not go back, but you see how these
45
00:04:45,439 --> 00:04:52,039
two highly related concepts, but the concept
of bias variance dichotomy, you are essentially
46
00:04:52,039 --> 00:04:58,379
questions saying that I can add more complexity
to the model and it will look like it is doing
47
00:04:58,379 --> 00:05:04,990
a better job of fitting a data, but am I getting
better predictive. So, let us actually you
48
00:05:04,990 --> 00:05:11,449
know understand this through an example in
this particular model in the two tables that
49
00:05:11,449 --> 00:05:17,180
have shown you here is model one which is
your good old linear regression that you know
50
00:05:17,180 --> 00:05:22,029
and here is the data set.
So, use the same data set which is the same
51
00:05:22,029 --> 00:05:32,039
x on the right hand side and the same y, but
I say that I do not necessary believe this
52
00:05:32,039 --> 00:05:36,372
is the right model that is model one is the
right model. What are the relationship between
53
00:05:36,372 --> 00:05:43,710
x and y were more complex. So, I added an
x square term and x square is nothing but
54
00:05:43,710 --> 00:05:52,029
it is really simple, it just I take x and
i square it and I created new column and like
55
00:05:52,029 --> 00:06:02,999
that I keep on adding columns still x to the
power 9 and finally, I have all these as potential
56
00:06:02,999 --> 00:06:10,039
input variables as described by this model.
So, y is some function of all these parameters
57
00:06:10,039 --> 00:06:14,629
and I do a multiple regression on that. So,
how does it work up?
58
00:06:14,629 --> 00:06:24,680
So, here is my linear model, here are the
data points that you saw before and here is
59
00:06:24,680 --> 00:06:34,460
the that line that you see is the fitted line
that goes to these data standard and so this
60
00:06:34,460 --> 00:06:41,619
is what you get, how this what are you...
So, the polynomial the one that I created
61
00:06:41,619 --> 00:06:47,050
before is actually an ninth order polynomial.
So, how does that will work?
62
00:06:47,050 --> 00:06:52,349
Well it turns out that it does a really good
job of fitting all the data points as we can
63
00:06:52,349 --> 00:07:00,009
see this ninth order polynomial described
by this black curve goes through every single
64
00:07:00,009 --> 00:07:05,860
data point and that should not be surprised,
you have ten data points and you using ninth
65
00:07:05,860 --> 00:07:13,159
order polynomial fit it there is enough flexibility
in the model, in the coefficient is enough
66
00:07:13,159 --> 00:07:21,959
complexity in the model where you do not you
can actual go through each data point, here
67
00:07:21,959 --> 00:07:29,400
in the first model it is a straight line,
even if it wanted to it cannot even if I had
68
00:07:29,400 --> 00:07:36,169
the flexibility to put this line wherever
you had I can move this line up and down I
69
00:07:36,169 --> 00:07:43,360
can rotate this line, but the best job that
I can windup doing is actually the line that
70
00:07:43,360 --> 00:07:46,580
you see on this screen.
So, I did a simple linear regression which
71
00:07:46,580 --> 00:07:53,400
does try to you know fit as many data points
as possible and this is as best as it could
72
00:07:53,400 --> 00:07:57,779
do now with the ninth order polynomial, where
it tries to do that it does really good job
73
00:07:57,779 --> 00:08:02,499
it fits all the data points.
See you sitting there knowing like walk, so
74
00:08:02,499 --> 00:08:09,090
may be my system is ninth order polynomial,
but it is not I let you learn a secret, this
75
00:08:09,090 --> 00:08:16,969
actual data between x and y was created by
an actually a linear system with some amount
76
00:08:16,969 --> 00:08:24,770
of noise. So, you truly equation the true
relationship which because I am the God out
77
00:08:24,770 --> 00:08:30,400
here, I am the one whose actually creating
these data points. So, I have this oracle
78
00:08:30,400 --> 00:08:35,390
I am letting you on the secret that I actually
created that this data points through some
79
00:08:35,390 --> 00:08:45,519
y is equal to beta naught to b naught plus
b 1 x and not x square and x cube, but then
80
00:08:45,519 --> 00:08:52,089
was some error, some noise like any regular
system.
81
00:08:52,089 --> 00:08:57,680
So, given this let us it looks like this the
ninth order polynomial still developed better
82
00:08:57,680 --> 00:09:07,740
job of fitting, the fit is great. But, let
us look at how these two models compare will
83
00:09:07,740 --> 00:09:15,310
they have to predict that another occurrence,
here is the fitted model, the same graph has
84
00:09:15,310 --> 00:09:21,449
above in the linear model. But, I created
a new data set I created new data set and
85
00:09:21,449 --> 00:09:28,399
I see how well my line does as a job predicting
what is going to happen next's and the answer
86
00:09:28,399 --> 00:09:33,009
is it does not do too bad. So, this is the
predicted line, the blue line is predicted
87
00:09:33,009 --> 00:09:38,490
line that I got from my training data.
Now, I am go get new data it looks like I
88
00:09:38,490 --> 00:09:46,019
miss targeted couple of times and I probably
should expect that given what I know right
89
00:09:46,019 --> 00:09:51,020
now that there is some amount of noise in
the system that is irreducible. But, both
90
00:09:51,020 --> 00:09:56,000
this basically means is that tomorrow when
you come to me with saying that hey 4.5 what
91
00:09:56,000 --> 00:10:01,940
you predict I say I am going predict this
value and in reality I windup seeing this
92
00:10:01,940 --> 00:10:10,350
value that the line in red is what I windup
seeing in the field when I makeup predicts
93
00:10:10,350 --> 00:10:16,779
in the prediction make is where I windup,
what I windup using as the blue line.
94
00:10:16,779 --> 00:10:23,189
Now, what do you see in terms of predicting
with the ninth order polynomial, you actual
95
00:10:23,189 --> 00:10:30,470
do quite terribly bad I mean look at this
data point. So, this data point where it is
96
00:10:30,470 --> 00:10:41,339
1.5, my prediction at 1.5 would be something
close to minus 40 minus 39 or whatever. So,
97
00:10:41,339 --> 00:10:47,949
add 1.5 if I were to use this ninth order
polynomial as might fit it model I would be
98
00:10:47,949 --> 00:11:01,750
predicting minus 40, but look it what I actually
got, I got plus 50. So, I was almost of by
99
00:11:01,750 --> 00:11:07,720
x 55 units, whereas in that much with the
linear model and you are going to see the
100
00:11:07,720 --> 00:11:16,019
same kind of we are practitioner. In fact,
while at something like 0.5 this model just
101
00:11:16,019 --> 00:11:20,319
goes through the roof I cannot even fit it
inside the graphs.
102
00:11:20,319 --> 00:11:24,740
So, clearly the ninth order polynomial while
is doing a great job of fitting does it very
103
00:11:24,740 --> 00:11:35,100
bad job of predicted and if my goal more often
than I should say bad, but I did go more often
104
00:11:35,100 --> 00:11:42,990
than now is to really come up with good predicted
accuracy. So, the most machinery from judges
105
00:11:42,990 --> 00:11:47,439
not trying to fit a model to the data that
does not buy you the much, but you want to
106
00:11:47,439 --> 00:11:53,420
pay to model that can be generalized to other
situations.
107
00:11:53,420 --> 00:11:57,420
So, tomorrow when you get a data set, because
what you going to do with the model here either
108
00:11:57,420 --> 00:12:02,310
going to predict or either you going interpret
the model in either case you need to acknowledge
109
00:12:02,310 --> 00:12:07,600
that what you have the data that you have
is nothing but, a sample and if you take another
110
00:12:07,600 --> 00:12:12,749
sample I if it turns out that you would have
told a completely different story they may
111
00:12:12,749 --> 00:12:18,699
be the way doing things is not really correct,
my whole points is that for instance, if you
112
00:12:18,699 --> 00:12:25,410
had seen the reds star data in the linear
regression model you might have created a
113
00:12:25,410 --> 00:12:29,690
slightly different line may be the line would
have looked little bit like this.
114
00:12:29,690 --> 00:12:34,749
But, think about what you might done with
the complex polynomial you might have created
115
00:12:34,749 --> 00:12:40,871
a you know completely different polynomial
function that look than again will go through
116
00:12:40,871 --> 00:12:45,749
all the red data point, but if for one sample
you create one story and for another sample
117
00:12:45,749 --> 00:12:50,470
you create completely different story can
may be the way doing thing is not a really
118
00:12:50,470 --> 00:12:57,360
accurate. So, here is what we shown you the
system where it was truly a linear system
119
00:12:57,360 --> 00:13:02,439
and clearly how using a linear model made
more sense than having a more complex model.
120
00:13:02,439 --> 00:13:07,290
Now, what happens when there is more complexity
to this system, let us for instance say that
121
00:13:07,290 --> 00:13:12,250
my true model was quadratic and that is what
I going show here, I going take up quadratic
122
00:13:12,250 --> 00:13:17,850
model and then see what happen when I try
a linear fit. So, here is quadratic model
123
00:13:17,850 --> 00:13:24,550
this is the truth that the machine learning
algorithm does not know that in the world
124
00:13:24,550 --> 00:13:31,399
I will never know that true system is quadratic.
The only thing that I have is data show you
125
00:13:31,399 --> 00:13:38,250
the data I am just setting you in a secret
that for today's exercise I created this data.
126
00:13:38,250 --> 00:13:45,790
So, this is the data, this is the model that
I created this data using this model and adding
127
00:13:45,790 --> 00:13:53,670
some amount of noise or uncertainty.
So, the model that is creating is data has
128
00:13:53,670 --> 00:14:07,399
actually you know is more plus beta 1 x 1
that it because it only 1 x and beta 2 x square.
129
00:14:07,399 --> 00:14:15,010
So, I am using some model like this which
some b naught, b 1, b 2 plus some amount of
130
00:14:15,010 --> 00:14:22,540
noise. Obviously, if you knew the this was
the model then this is the model you going
131
00:14:22,540 --> 00:14:29,130
to try a fit, I mean if you knew this is the
model then this is the model you should use
132
00:14:29,130 --> 00:14:33,749
with that truly known b 1, b 2. So, you do
not even need to do any kind of machine learning
133
00:14:33,749 --> 00:14:37,339
statistics excise.
But, sadly you are only given the data and
134
00:14:37,339 --> 00:14:42,029
you not told which model it is, now let us
see what happens if you hand this data and
135
00:14:42,029 --> 00:14:49,970
then you try to fit a line this is one fit.
But, the kind of give you a feel for what
136
00:14:49,970 --> 00:14:55,089
happens when you do this many times I generated
another set of data points using this model.
137
00:14:55,089 --> 00:15:00,360
So, I have shown you only one set of blue
dots, blue small mini circles, but effectively
138
00:15:00,360 --> 00:15:04,999
it will another set of blue mini circles and
then fit a line, I fit another line.
139
00:15:04,999 --> 00:15:12,399
But, one thing you should note this is look
all these lines are in general more or less
140
00:15:12,399 --> 00:15:17,889
they trying to the same job and in general
they wind up feeling chronically in certain
141
00:15:17,889 --> 00:15:23,790
cases they always windup underestimating in
this region, because their always under the
142
00:15:23,790 --> 00:15:29,839
truth, they always windup over estimating.
So, each time I do this exercise it looks
143
00:15:29,839 --> 00:15:38,079
like chronically of in certain areas, but
I am fairly consistent each time when we do
144
00:15:38,079 --> 00:15:40,740
this exercise I windup kind of creating the
same line.
145
00:15:40,740 --> 00:15:46,809
Now, what happens when you have a ninth order
problem, what happens is you still not doing
146
00:15:46,809 --> 00:15:50,959
to great and the reason you not doing to great
is because this is a quadratic system and
147
00:15:50,959 --> 00:15:58,269
when you are trying to fit something so complex.
So, you still over shooting a lot, but there
148
00:15:58,269 --> 00:16:04,620
are couple of things note this one in general
as expected as shown in the previous slide
149
00:16:04,620 --> 00:16:09,459
you not always telling the same story, one
time you telling one story the next time you
150
00:16:09,459 --> 00:16:14,649
know you predicting vastly different this
kind of extreme variance from one kind of
151
00:16:14,649 --> 00:16:17,999
prediction to the other is not seen in the
linear fit.
152
00:16:17,999 --> 00:16:25,430
But, take another look you are not; obviously,
chronically off in certain areas, yes this
153
00:16:25,430 --> 00:16:31,190
is only five such fits, but imagine if you
had five thousand such fits. Even if you had
154
00:16:31,190 --> 00:16:36,180
five thousand fits in the linear model, you
will always be overestimating some regions,
155
00:16:36,180 --> 00:16:42,360
you always be underestimated some regions
in that is called bias, whereas in ninth order
156
00:16:42,360 --> 00:16:52,730
polynomial the idea is that yes there so much
variability, but if you were to do many, many,
157
00:16:52,730 --> 00:17:01,800
many fits that on average you might not be
off from the blue line and that useful in
158
00:17:01,800 --> 00:17:05,850
concept it is not useful in practice, because
in reality you are going to get only one data
159
00:17:05,850 --> 00:17:10,313
set and you are going to try one fit.
So, if it is off, it is off whether it is
160
00:17:10,313 --> 00:17:17,480
because of whatever reason, but it really
helps to understand why it is off, here in
161
00:17:17,480 --> 00:17:24,240
the linear fit it is off because you might
be chronically always going to be off. Because,
162
00:17:24,240 --> 00:17:29,510
you are trying to fit very rudimentary model,
very simplistic model for something that little
163
00:17:29,510 --> 00:17:37,289
more in reality more complex, the model the
reality is more complex, so this is more complex
164
00:17:37,289 --> 00:17:42,780
because of it is quadratic and the model you
are trying fit is too simplistic, because
165
00:17:42,780 --> 00:17:47,590
it can only here model can many be a line.
So, it is a line which is you know simple
166
00:17:47,590 --> 00:17:55,309
whereas...
So, here what you can to witness is a lot
167
00:17:55,309 --> 00:18:02,210
of a some region you are going to be always
off whereas out here in the ninth order polynomial
168
00:18:02,210 --> 00:18:08,970
this is so much variability. Because, you
are just getting fooled by the pure noise,
169
00:18:08,970 --> 00:18:12,590
the same thing that you saw in the previous
equation, nothing is really in the previous
170
00:18:12,590 --> 00:18:18,510
line nothing is really change, you still trying
to fit something that is a model that is overly
171
00:18:18,510 --> 00:18:27,090
complex to a system that is not that complex
we went from a linear data source to quadratic
172
00:18:27,090 --> 00:18:29,820
data source, but that is still does not in
ninth order polynomial, so this is a lot of
173
00:18:29,820 --> 00:18:32,980
variability.
174
00:18:32,980 --> 00:18:37,600
And this is the point that is get captured
in what is offend, what is really described
175
00:18:37,600 --> 00:18:46,470
as a bias variance dichotomy. This graph is
taken from the ESL book Hastie and Tibshirani,
176
00:18:46,470 --> 00:18:53,230
and it captured what we kind of try to illustrate
in the last two slides, which is as model
177
00:18:53,230 --> 00:19:00,679
complexity goes increases it looks like you
are doing a very good job of fitting the data
178
00:19:00,679 --> 00:19:08,299
this is nothing but, the fit. So, you take
the data and look here error keeps you want
179
00:19:08,299 --> 00:19:11,650
very low error.
So, as you keep on making a more and more
180
00:19:11,650 --> 00:19:17,450
and more and more complex model, you are going
to go through more and more data points. But,
181
00:19:17,450 --> 00:19:28,750
at some point your ability to predict this
is prediction on new data, the red line is
182
00:19:28,750 --> 00:19:35,899
prediction on new data, the blue line is the
fit on the data that was given to you for
183
00:19:35,899 --> 00:19:40,649
training. So, it is call the training sample
and the prediction is done on the test sample,
184
00:19:40,649 --> 00:19:48,230
you prediction keeps on getting low to a sweet
spot for model complexity and then after that
185
00:19:48,230 --> 00:19:57,580
it goes up and what is the sweet spot, you
that sweet spot would be at a, if you are
186
00:19:57,580 --> 00:20:00,899
able to match the exact complexity of the
system.
187
00:20:00,899 --> 00:20:03,980
So, if you had a quadratic system for instance
and you use a quadratic model that could might
188
00:20:03,980 --> 00:20:11,650
be an sweet spot. And so if have a complexity
that is perfect you know and what we are going
189
00:20:11,650 --> 00:20:17,820
to do is, in the real world you donâ€™t know
what the true model is. So, how do you figure
190
00:20:17,820 --> 00:20:24,730
out what this complexity should be and that
is going to be covered in or lectures on validation.
191
00:20:24,730 --> 00:20:32,250
How do you validate a model, how do you fine
tune some parameters of a particular model,
192
00:20:32,250 --> 00:20:37,900
again it can be K-nearest neighbors, it can
be trees, it can be neural networks, it can
193
00:20:37,900 --> 00:20:42,060
be support vector machine, it can be a simple
regression.
194
00:20:42,060 --> 00:20:45,679
But, if there is a some kind of tuning parameters
that there can increase or decrease complexity.
195
00:20:45,679 --> 00:20:49,820
How do you go about increasing and decreasing
complexity is seen what works best and then
196
00:20:49,820 --> 00:20:54,020
choosing the appropriate one that we captured
in validation, the lectures and invalidation
197
00:20:54,020 --> 00:20:59,750
when this lecture we want to create an appreciation
that as model complexity increases, the fit
198
00:20:59,750 --> 00:21:06,870
becomes better, the prediction you need to
find the sweet spot of model complexity and
199
00:21:06,870 --> 00:21:14,269
that is the core idea.
The other idea here is that yes when model
200
00:21:14,269 --> 00:21:20,641
complexity is low, you do not do too well
in terms of here prediction, but the reason
201
00:21:20,641 --> 00:21:27,180
for that is because there is high bias meaning
in that linear regression if you remember,
202
00:21:27,180 --> 00:21:32,830
you always of when you trying to fit that
linear regression to the quadratic function,
203
00:21:32,830 --> 00:21:37,710
you will always often certain regions. So,
you had a high bias in a low variance, what
204
00:21:37,710 --> 00:21:43,529
we mean by that I go back to the slide is
you are bias in certain regions. So, these
205
00:21:43,529 --> 00:21:50,250
are regions were you have bias, you are always
going to be off, because of the nature of
206
00:21:50,250 --> 00:21:56,539
you try to fit a line through a curve, but
you have a very low variance, if the variability
207
00:21:56,539 --> 00:22:03,620
between many such fits is low.
So, on any given day you get any given data
208
00:22:03,620 --> 00:22:10,250
set it is not like you are going to come up
with the completely new equation. So, that
209
00:22:10,250 --> 00:22:31,649
is what we call us high bias and low variance.
Now, you step over to the ninth order polynomial
210
00:22:31,649 --> 00:22:37,080
here the bias is not that high, it is not
like that can tell you that you are going
211
00:22:37,080 --> 00:22:43,820
to chronically be under predicting or over
predicting in some regions. So, this has low
212
00:22:43,820 --> 00:22:53,380
bias, so I cannot tell you upfront that you
going to be always off in one direction, but
213
00:22:53,380 --> 00:22:58,620
it is got high variance.
What you mean by that is on any given day
214
00:22:58,620 --> 00:23:03,519
if I take any given data set I take the sample
and I try to fit this polynomial I do not
215
00:23:03,519 --> 00:23:09,700
know which line I am going to get, I mean
this line predicts some astronomical high
216
00:23:09,700 --> 00:23:15,549
value out here, whereas this curve predicts
some astronomical low value out here. So,
217
00:23:15,549 --> 00:23:21,210
this like such high variance on a given data
set I do not know how going to be predicting
218
00:23:21,210 --> 00:23:27,039
and therefore, if I can have such high variance,
it just means that I am probably not going
219
00:23:27,039 --> 00:23:32,919
to do very good job of predicting. I do not
even know what I am going to be predicting.
220
00:23:32,919 --> 00:23:38,159
So, that is the concept between the bias variance
dichotomy, which is that when you go for lower
221
00:23:38,159 --> 00:23:43,070
model complexity you get high bias and low
variance and we go for a higher model complexity
222
00:23:43,070 --> 00:23:47,850
you will get low bias and high variance and
this is the very important concept to be internal
223
00:23:47,850 --> 00:23:52,980
analyst with respect to Machine learning and
it this going to be extremely important even
224
00:23:52,980 --> 00:23:59,380
in terms of applying more advance techniques.
So, I hope the bias variance dichotomy is
225
00:23:59,380 --> 00:24:00,320
clear.
Thank you.