1
00:00:13,880 --> 00:00:21,000
hi welcome to today's lecture so i will ah
start from with a brief recap of what we have
2
00:00:21,000 --> 00:00:24,820
discussed in last lecture so in last lecture
we began with standard deviation we. had a
3
00:00:24,820 --> 00:00:29,190
recap over zee score and plotting box plots
now and last towards the end of last lecture
4
00:00:29,190 --> 00:00:33,750
we had discussed about moments as a way of
characterizing data ok so the definition of
5
00:00:33,750 --> 00:00:38,700
moment as you would recall is in general so
given a set of observations y i of a variable
6
00:00:38,700 --> 00:00:44,020
y the rth sample moment about zero is defined
as m r star is equal to summation y to the
7
00:00:44,020 --> 00:01:13,350
power r by n for r is one two three dot dot
dot ok so clearly . if we give if we set the
8
00:01:13,350 --> 00:01:20,149
value of r equal to one so m one star is summation
y by n which is nothing but the mean so in
9
00:01:20,149 --> 00:01:32,420
other words the first moment about zero of
set of observations is the mean of the distribution
10
00:01:32,420 --> 00:01:36,659
ok so we can next go define in a more general
sense the rth sample moment about any particular
11
00:01:36,659 --> 00:01:37,840
value and in particular we want to know the
rth sample moment about the mean so the rth
12
00:01:37,840 --> 00:01:54,600
sample moment about the mean is defined by
summation y minus y bar to the power r by
13
00:01:54,600 --> 00:02:08,250
n for r equal to one two three dot dot dot
so clearly so when you say about the mean
14
00:02:08,250 --> 00:02:12,810
if we are to generalize it about a value a
instead of y bar we will put a value of a
15
00:02:12,810 --> 00:02:13,810
ok so again as before we did if you put r
equal to one so first moment about the mean
16
00:02:13,810 --> 00:02:17,720
is summation of y minus y bar whole to the
power one by n and summation y minus y bar
17
00:02:17,720 --> 00:02:23,500
is going to give you a value of zero as we
had determined in last lecture .
18
00:02:23,500 --> 00:02:34,490
so first moment about the mean is zero what
about the second moment about the mean if
19
00:02:34,490 --> 00:02:47,930
lets say. so n is reasonably large so your
m two is nothing but summation y minus y bar
20
00:02:47,930 --> 00:03:03,870
whole square by n and you can clearly see
that this is nothing but very close to what
21
00:03:03,870 --> 00:03:08,850
is our definition of the variance ok as opposed
to divide it by n minus one we have divided
22
00:03:08,850 --> 00:03:13,840
by n but for n large your m two is nothing
but the sample variance ok again we can use
23
00:03:13,840 --> 00:03:17,670
it for getting higher moments like third moment
about the mean y minus y bar whole to the
24
00:03:17,670 --> 00:03:28,421
power three by n so one aspect that we discussed
in last class was depending on the nature
25
00:03:28,421 --> 00:03:39,040
of the distribution all odd moments about
the mean so in which means that m one m three
26
00:03:39,040 --> 00:03:48,340
m five m seven so on and so forth will return
you a value of zero this is because for every
27
00:03:48,340 --> 00:04:01,210
value of y which is situated to the left of
y bar so there is another value of y . which
28
00:04:01,210 --> 00:04:06,340
is situated to the right of y bar and their
frequencies of these two values are equal
29
00:04:06,340 --> 00:04:10,640
which means that for every negative value
that you accumulate for lets say y one minus
30
00:04:10,640 --> 00:04:17,190
y bar there is a corresponding y two minus
y bar which is positive and equal value so
31
00:04:17,190 --> 00:04:25,289
these will cancel each other out eventually
giving you a value of m three or m five which
32
00:04:25,289 --> 00:04:37,790
will be equal to zero
but of course for a non zero for a asymmetric
33
00:04:37,790 --> 00:04:41,510
distribution this value is not going to be
zero it will have some value now depend the
34
00:04:41,510 --> 00:04:47,880
way these moments are defined if you have
a value of y which has a given unit then this
35
00:04:47,880 --> 00:04:53,600
m r will not return your value which is unit
less with rather it is unit which it has some
36
00:04:53,600 --> 00:04:59,470
units so you of course want to eliminate that
i will you know that aspect of dimensionality
37
00:04:59,470 --> 00:05:02,440
in your measurements and for that purpose
what you typically do is you divide by another
38
00:05:02,440 --> 00:05:15,120
moment which is raised to some other powers
so that the units are the same so a three
39
00:05:15,120 --> 00:05:20,590
is one such . measure it is defined as summation
of y minus y bar whole cube by summation of
40
00:05:20,590 --> 00:05:31,630
y minus y bar whole square whole to the power
three by two so it is nothing but m three
41
00:05:31,630 --> 00:05:49,669
by m two whole to the power three by two so
this is of course unit less as you can see
42
00:05:49,669 --> 00:05:57,980
from the definition this is called skewness
so we had worked out so logic ah we had reasoned
43
00:05:57,980 --> 00:06:01,960
that if you have a distribution which one
like this so this is y this is frequency so
44
00:06:01,960 --> 00:06:08,460
your mean is somewhere here and so all these
values so because there is a precedence of
45
00:06:08,460 --> 00:06:13,960
values which are to the left so all these
y minus y bar values to then this domain will
46
00:06:13,960 --> 00:06:19,240
give me negative and in y minus y bar in this
domain will be positive as a consequence of
47
00:06:19,240 --> 00:06:42,540
which there is a possibility that when you
compute y minus y bar whole cube summation
48
00:06:42,540 --> 00:07:01,760
this might turn out to be negative so there
is a greater chance that in this case . that
49
00:07:01,760 --> 00:07:06,000
you the when you calculate m three or a three
you get a value which is negative because
50
00:07:06,000 --> 00:07:10,310
m two m four m six are always positive because
they have y minus y bar whole square whole
51
00:07:10,310 --> 00:07:13,270
fourth whole six those are always positive
so all odd moments for asymmetric distributions
52
00:07:13,270 --> 00:07:16,040
may be either negative or positive depending
on how the data is biased
53
00:07:16,040 --> 00:07:19,840
ok so another measure so skewness of a data
is basically to see differentiate it in a
54
00:07:19,840 --> 00:07:25,020
symmetric distribution with a non symmetric
distribution either which is biased in to
55
00:07:25,020 --> 00:07:29,510
the left or biased to the right ok so these
two are asymmetric and these will give me
56
00:07:29,510 --> 00:07:43,280
different values so we had worked out an example
in last class where we tried to find out what
57
00:07:43,280 --> 00:08:02,600
is the skewness measure for this particular
population we anticipated it would be negative
58
00:08:02,600 --> 00:08:11,259
we turned out with a value which is slightly
positive but let us work out another example
59
00:08:11,259 --> 00:08:18,390
where let us say where we take a data . which
is biased to the right ok so in that case
60
00:08:18,390 --> 00:08:28,009
let me have these values
so lets say lets say our mode is three so
61
00:08:28,009 --> 00:08:35,289
and this is one this is five i have two and
some intermediate value four ok so let say
62
00:08:35,289 --> 00:08:39,690
i have one ones two twos three threes two
fours and one five ok so a one two three four
63
00:08:39,690 --> 00:08:48,600
five six seven eight nine values ok so let
a is you know let us calculate the mean so
64
00:08:48,600 --> 00:08:55,080
y bar is going to be one plus four plus nine
plus eight plus five so nine numbers which
65
00:08:55,080 --> 00:09:04,550
is five plus five ten nine ten and seventeen
twenty seven so y bar is nothing but three
66
00:09:04,550 --> 00:09:13,100
ok so y minus y bar whole cubed . is going
to be give me a value of minus two whole cubed
67
00:09:13,100 --> 00:09:24,320
plus minus one whole cubed into two plus one
whole cubed into two plus two cubed ok so
68
00:09:24,320 --> 00:09:28,720
i have minus eight here i have minus one minus
two here ok plus two here plus eight here
69
00:09:28,720 --> 00:09:39,410
ok so two cubed is eight eight eight so these
exactly balance each other out and in for
70
00:09:39,410 --> 00:09:43,480
this particular distribution i get y minus
y bar whole cube to be zero summation of y
71
00:09:43,480 --> 00:09:50,350
minus y bar whole cube to be zero ok so clearly
what you see is your you know your data is
72
00:09:50,350 --> 00:09:56,260
slowly shifting to the right ok from a value
so but if we have if you bias the data even
73
00:09:56,260 --> 00:10:01,339
to the right side even more then we will slowly
get a value which is much positive than i
74
00:10:01,339 --> 00:10:05,180
r . and than otherwise
ok there is another metric of ah . you know
75
00:10:05,180 --> 00:10:13,110
of characterizing a distribution which we
call as kurtosis so the kurtosis is a way
76
00:10:13,110 --> 00:10:18,370
of measuring the peakedness of a curve or
how flat or how sharp is the curve ok so i
77
00:10:18,370 --> 00:10:26,339
can have two curves lets say this is one situation
this is another situation this is another
78
00:10:26,339 --> 00:10:33,240
situation ok so what i see is the peakedness
of the curves this is increasing right this
79
00:10:33,240 --> 00:10:40,130
is called it measures the peakedness or flatness
of a curve ok and it is given by this metric
80
00:10:40,130 --> 00:10:48,250
of a four which is defined by summation y
minus y bar whole to the power forth by summation
81
00:10:48,250 --> 00:11:00,470
y minus y bar whole square whole square ok
so this is nothing but . m four by m two square
82
00:11:00,470 --> 00:11:14,959
so if we compare our definition of a three
and a four so a three i will i write a three
83
00:11:14,959 --> 00:11:26,000
was defined as m three by m two whole to the
power three by two a four is defined by m
84
00:11:26,000 --> 00:11:36,800
four by m two whole square and as i said so
the aim here is to define these matrix in
85
00:11:36,800 --> 00:11:45,260
such a way that you come up with a non dimensional
term and that is exactly how you have defined
86
00:11:45,260 --> 00:11:54,649
m four because m four has powers of y minus
y bar whole to the power four so the definition
87
00:11:54,649 --> 00:12:00,870
of you know m four has whole power four this
has whole power two so you have to you know
88
00:12:00,870 --> 00:12:07,850
square it to generate something which has
the same units of m four and that is why your
89
00:12:07,850 --> 00:12:14,710
a four is defined by m four by m two square
ok so let us calculate a some a simple example
90
00:12:14,710 --> 00:12:18,662
where we calculate a four ok so we want to
calculate a four lets say our data is . so
91
00:12:18,662 --> 00:12:26,191
let us take a very flat distribution ok so
lets say our data is one two three four so
92
00:12:26,191 --> 00:12:37,470
each of these values only appear once in this
case ok so your y bar is equal to two point
93
00:12:37,470 --> 00:12:53,040
five ok so i can have y minus y bar one two
three four y minus y bar whole square equal
94
00:12:53,040 --> 00:12:58,149
to the power forth so this is two point five
so one point five whole to the power four
95
00:12:58,149 --> 00:13:02,459
this is point five whole to the power four
point five whole to the power four one point
96
00:13:02,459 --> 00:13:10,610
five whole to the power four ok so we can
you can go through the calculation and see
97
00:13:10,610 --> 00:13:26,160
what value of a four you get for this distribution
versus for another distribution where let
98
00:13:26,160 --> 00:13:30,149
us say one two two three four three three
four . right so we have made the distribution
99
00:13:30,149 --> 00:13:33,100
so the other point lets say slightly higher
so we have generated another distribution
100
00:13:33,100 --> 00:13:37,279
so lets say this is distribution one which
is completely flat and this is distribution
101
00:13:37,279 --> 00:13:43,100
two which is slightly more peaked because
you have these two points which are occurring
102
00:13:43,100 --> 00:13:50,820
at a slightly higher frequency ok please go
through this calculation and see what value
103
00:13:50,820 --> 00:13:59,620
of a four you get you can generate one more
distribution where you arbitrarily lets say
104
00:13:59,620 --> 00:14:07,790
you make it one two two two three three four
so it is a symmetric so let us say the next
105
00:14:07,790 --> 00:14:10,100
distribution is a symmetric ok but two has
a higher value ok you can make a you know
106
00:14:10,100 --> 00:14:13,950
keep a keep on making it more and more peaked
and see what kind of value you get
107
00:14:13,950 --> 00:14:16,640
so these exercises will help you get an idea
of how to go about generating or coming up
108
00:14:16,640 --> 00:14:20,730
with important . matrix of quantifying the
statistics of the data so in the later part
109
00:14:20,730 --> 00:14:24,440
of this class today i wanted to discuss about
an. analytical tool or statistical how can
110
00:14:24,440 --> 00:14:28,180
you use a software to do this statistical
analysis of course we very well saw that even
111
00:14:28,180 --> 00:14:32,060
for these four points if you have to start
to do the calculation by hand beyond a point
112
00:14:32,060 --> 00:14:34,959
we are not able to do so we need a tool which
would enable us to do these calculations if
113
00:14:34,959 --> 00:14:39,420
you have clear data sets or where the data
is to be actually read from a file where you
114
00:14:39,420 --> 00:14:43,560
have lets say you know observations from ten
different experiments so on and so forth in
115
00:14:43,560 --> 00:14:48,220
that case i wanted to you know introduce you
to this language called r so this so what
116
00:14:48,220 --> 00:14:52,110
exactly is r r is a software environment which
is used for data analysis specifically it
117
00:14:52,110 --> 00:14:55,470
is a gnu package and the source code of r
is freely available so that is the best part
118
00:14:55,470 --> 00:15:01,100
of it you can and it has a command line interface
and it has other interfaces also . so and
119
00:15:01,100 --> 00:15:05,610
more importantly it can produce publication
quality graphs with mathematical symbols ok
120
00:15:05,610 --> 00:15:12,029
so r is essentially an interpreted language
this is a sample example of a console of r
121
00:15:12,029 --> 00:15:25,240
ok so you have you know this is as it is written
clearly here r is free software and comes
122
00:15:25,240 --> 00:15:36,050
with absolutely no warranty
ok so you can belief me you can download it
123
00:15:36,050 --> 00:15:43,310
from the net and i will come to the details
of how you can download it and how you can
124
00:15:43,310 --> 00:15:47,570
use it so what are the applications of this
r language it is used by statisticians it
125
00:15:47,570 --> 00:16:12,370
was established in the university of auckland
and it is now widely used to the extent that
126
00:16:12,370 --> 00:16:14,660
there are group of researchers who contribute
to the further development of this language
127
00:16:14,660 --> 00:16:19,190
ok so it is requiring it is used by statisticians
for statistical computation and software development
128
00:16:19,190 --> 00:16:22,630
r supports matrix arithmetic and its performance
is comparable to that of you know. expensive
129
00:16:22,630 --> 00:16:25,280
softwares . widely used expensive softwares
like matlab for which you need to purchase
130
00:16:25,280 --> 00:16:30,589
a license and r can be used to perform high
performance statistical computation and to
131
00:16:30,589 --> 00:16:34,790
the extent that it is also used by the business
fraternity so it brings us to the first question
132
00:16:34,790 --> 00:16:48,820
how do you get r so r is an open source programming
language so you can download it from this
133
00:16:48,820 --> 00:16:56,630
so there is a website called r project dot
org and the best part about it is it is available
134
00:16:56,630 --> 00:17:00,130
in all the different formats all the different
operating systems so you can download it for
135
00:17:00,130 --> 00:17:03,082
windows you can download it for linux or you
can download it for matt ok so now r itself
136
00:17:03,082 --> 00:17:13,850
is a command line interface so sometimes people
want graphical user interfaces for easy use
137
00:17:13,850 --> 00:17:31,090
of the facility and even to understand how
it is used so r there are various g u i softwares
138
00:17:31,090 --> 00:17:33,070
which you know run our code so r studio is
one such then
139
00:17:33,070 --> 00:17:36,350
so let me give you an example of how these
r . and r studio work so this is a console
140
00:17:36,350 --> 00:17:41,390
of r ok this is an r console ok and this is
an r studio console ok so when we say console
141
00:17:41,390 --> 00:17:48,620
so you can download r and you can write down
so in the console of course you saw the difference
142
00:17:48,620 --> 00:17:57,090
between r studio and r here everything you
have to write down and then get to your point
143
00:17:57,090 --> 00:18:07,490
here you have a way of you know ah browsing
through different aspects seeing what are
144
00:18:07,490 --> 00:18:15,950
the tools viewing and also the help file is
much more easy accessible ok so you can download
145
00:18:15,950 --> 00:18:21,929
e you know any of these two softwares i recommend
that you download r studio for your use ok
146
00:18:21,929 --> 00:18:30,442
so so that brings us to our studio ok so we
can clearly see so this is this is for example
147
00:18:30,442 --> 00:18:41,380
the console of r studio so you have three
. if you look at the r studio com command
148
00:18:41,380 --> 00:18:49,010
there are three different windows this is
called the workspace so lets see if you have
149
00:18:49,010 --> 00:18:53,090
generated some variables you can you will
see them being recorded here with the full
150
00:18:53,090 --> 00:18:56,080
information and this is the main command window
where you will actually enter various things
151
00:18:56,080 --> 00:18:58,640
to do ok
so let us just do some simple computation
152
00:18:58,640 --> 00:19:04,820
in r studio so i can open r studio so i can
lets say if i want to do simple arithmetic
153
00:19:04,820 --> 00:19:14,330
i can define a a as a variable and i can assign
it the value of one ok i can write a a equal
154
00:19:14,330 --> 00:19:20,530
to one now the value of a a is not displaced
but what you see is in this section you can
155
00:19:20,530 --> 00:19:23,120
see the value of a a being generated and its
value is obviously written here so in order
156
00:19:23,120 --> 00:19:31,290
to know exactly what is the value of a if
you write a a and press enter then you get
157
00:19:31,290 --> 00:19:42,929
a value of one ok so similarly i can do b
b is equal to two so . note that in statistical
158
00:19:42,929 --> 00:19:52,450
language if you irrespective of whether you
give a space or not. it will still work it
159
00:19:52,450 --> 00:19:59,010
will not crimp it will still work so in both
these cases b b is also stored at two and
160
00:19:59,010 --> 00:20:03,970
b b b c is also stored too even though you
gave spaces before the equal to but for your
161
00:20:03,970 --> 00:20:08,520
own clarity it is better that when you write
there is a space in between an equal to or
162
00:20:08,520 --> 00:20:12,450
any other symbol ok
so in the r studio console itself we can do
163
00:20:12,450 --> 00:20:18,120
basic calculations so for example i can write
a a plus b b enter so i get the value of three
164
00:20:18,120 --> 00:20:23,702
i can do simple arithmetic so i can write
a a power b b so that is x to the power y
165
00:20:23,702 --> 00:20:32,930
right and i can enter i can evaluate the value
of one so a is one one to the power two is
166
00:20:32,930 --> 00:20:48,840
one i can write b b to the power b b so i
can do so two square is four i can do you
167
00:20:48,840 --> 00:20:57,880
know ah simple calculation so if i do sine
. of thirty degrees so remember that for you
168
00:20:57,880 --> 00:21:10,350
know it is calculated in radians so sine of
thirty degrees we always think it is half
169
00:21:10,350 --> 00:21:31,020
but this guy is giving a value of minus point
nine eight this is because it is calculating
170
00:21:31,020 --> 00:21:43,260
in in radians
ok so in order to calculate the value of sine
171
00:21:43,260 --> 00:21:50,960
of thirty in in radians you have to write
thirties slash pi by one eighty ok if you
172
00:21:50,960 --> 00:21:57,150
write and you know good part is it gives you
you know what is the way in which you write
173
00:21:57,150 --> 00:22:03,370
so sine of x is how you have to enter this
value and within this you can do anything
174
00:22:03,370 --> 00:22:10,080
if i put enter here now i get a value of point
five which was not negative so when you are
175
00:22:10,080 --> 00:22:16,960
doing trigonometry calculations you have to
enter these values in terms of radians ok
176
00:22:16,960 --> 00:22:26,929
similarly i can do the same thing cos of pi
by two return me a value of this see so one
177
00:22:26,929 --> 00:22:32,780
thing is these things these values are calculated
numerically so this is the reason why you
178
00:22:32,780 --> 00:22:40,780
see that when its . value of cos pi by two
it is not zero but it is coming as six point
179
00:22:40,780 --> 00:22:44,460
one two whatever into ten to e minus one seven
means six point one seven ten to the power
180
00:22:44,460 --> 00:22:53,179
minus seven which is as good as zero but it
is not exactly zero and this is because these
181
00:22:53,179 --> 00:22:59,690
values are internally computed by a code so
it is an approximate
182
00:22:59,690 --> 00:23:09,820
so i can do the same thing i can do lets say
log of ten so so if you see the syntax it
183
00:23:09,820 --> 00:23:15,750
is log x comma base ok so i can write you
know so this is another way of writing is
184
00:23:15,750 --> 00:23:18,010
so lets say if i do log of ten i get the value
two point three that means that it is actually
185
00:23:18,010 --> 00:23:20,990
calculating the natural logarithm and not
the log base ten ok so lets say if i do log
186
00:23:20,990 --> 00:23:25,140
ten comma ten so now it is giving a value
of one so if i entered the base and this is
187
00:23:25,140 --> 00:23:28,600
my number this is the base with which i am
calculating my logarithm it is giving me the
188
00:23:28,600 --> 00:23:33,800
value of one but says . if you just write
log it will give you a value with respect
189
00:23:33,800 --> 00:23:39,559
to x the natural log so i can also do log
ten of ten then also you get a value of one
190
00:23:39,559 --> 00:23:46,930
ok so you can easily go through the list of
these kind of in inbuilt functions which do
191
00:23:46,930 --> 00:23:54,760
the basic calculations ok now lets say i have
ten values right i have ten values and i want
192
00:23:54,760 --> 00:24:01,000
to calculate the you know lets say standard
deviation mean or median of a distribution
193
00:24:01,000 --> 00:24:09,049
how do i do it ok
so what you do here is you enter lets say
194
00:24:09,049 --> 00:24:14,540
data is so because i did these you know i
i generated this data before it is already
195
00:24:14,540 --> 00:24:18,191
showing up as there but i can write i can
rewrite data i would use this expression c
196
00:24:18,191 --> 00:24:23,390
of one comma two comma three. ok so lets say
i enter as a vector so when so the syntax
197
00:24:23,390 --> 00:24:31,179
is c and within that you have you put numbers
ok by default you want to put numbers so when
198
00:24:31,179 --> 00:24:37,549
i do . enter and then i write data so what
you see here data got initialized to a row
199
00:24:37,549 --> 00:24:47,150
vector which has five entries one two three
four five ok in order to type data i should
200
00:24:47,150 --> 00:24:57,130
just write data and then i get back what it
is and because it is a row vector it is showing
201
00:24:57,130 --> 00:25:04,130
as one of one two three four five ok so now
i can change you know lets say i have an five
202
00:25:04,130 --> 00:25:11,900
new entries i can write data is equal to so
i can write data is equal to c of so i had
203
00:25:11,900 --> 00:25:15,960
my original data and i am overrating adding
three more numbers say six seven nine ok so
204
00:25:15,960 --> 00:25:24,300
i write data is equal to c of data comma six
seven nine now if i type data so what you
205
00:25:24,300 --> 00:25:31,260
see here data has now become a eight column
entry where in additional to one two three
206
00:25:31,260 --> 00:25:35,700
four five you have three more numbers which
have been added .
207
00:25:35,700 --> 00:25:45,240
ok so i can just get the value of data here
by writing data and enter and then i get this
208
00:25:45,240 --> 00:25:53,320
value now calculating these basic matrix in
you know in r is super simple so in order
209
00:25:53,320 --> 00:26:07,270
to calculate the mean of data i will just
write mean of data and i will enter and i
210
00:26:07,270 --> 00:26:17,240
get the exact value which is four point six
two five ok i can calculate a median of data
211
00:26:17,240 --> 00:26:25,330
median is four point five so i have one two
three four five six seven eight so my median
212
00:26:25,330 --> 00:26:36,180
is at position between four and five and which
is nothing but four point five by two which
213
00:26:36,180 --> 00:26:45,370
is what it is giving four point five now what
is the mode of this distribution we can clearly
214
00:26:45,370 --> 00:26:50,360
see that these are different values which
don't have any so there is no particular value
215
00:26:50,360 --> 00:27:05,050
which has you know which is maximal in frequency
so if i write mode of data let us see what
216
00:27:05,050 --> 00:27:11,920
it gives .
it says numeri which means you know so this
217
00:27:11,920 --> 00:27:16,110
is it is not giving an exact value because
it does i don't have any particular value
218
00:27:16,110 --> 00:27:21,150
which is repeating so let us again change
the you know expression for data by writing
219
00:27:21,150 --> 00:27:30,020
data is equal to c of data comma three three
comma three comma four comma four comma four
220
00:27:30,020 --> 00:27:47,750
comma four ok this is how i modify data i
can type it here but i can clearly see here
221
00:27:47,750 --> 00:27:58,820
now a data is showing up as a fourteen column
vector ok now once it is. bigger than a you
222
00:27:58,820 --> 00:28:05,020
know certain size of course it is difficult
to see here but what you can do is you can
223
00:28:05,020 --> 00:28:12,440
write data and. enquire its value so you have
this entire distribution ok of data so i can
224
00:28:12,440 --> 00:28:16,890
just write
now if i do mode of data is giving me numeric
225
00:28:16,890 --> 00:28:24,049
value so we have to see because now so let
us see . median data ok it is giving the value
226
00:28:24,049 --> 00:28:32,529
four so now i guess if we arrange them in
terms of ascending order then four will be
227
00:28:32,529 --> 00:28:52,360
there in the center and that is why this meeting
is giving you a value of four ok so if i go
228
00:28:52,360 --> 00:28:58,780
back to the presentation so let me just briefly
ah you know say what we have done
229
00:28:58,780 --> 00:29:06,539
so we you can create a custom vector so using
this c open bracket and then you have various
230
00:29:06,539 --> 00:29:15,279
entries one two three four five six in that
way you will get these values you can enter
231
00:29:15,279 --> 00:29:26,110
them you can you know right introduce this
vector as a sequence so i can write from one
232
00:29:26,110 --> 00:29:38,830
to seven by one which means i want a sequence
which is increasing. in units of one i can
233
00:29:38,830 --> 00:29:42,659
generate this vector you can repeat it so
you have one which you want to repeat ten
234
00:29:42,659 --> 00:29:48,539
times you can generate this vector you can
do this repeating sequence repeating of a
235
00:29:48,539 --> 00:29:53,110
range you can generate this victor similarly
for sequence you can be . so of course you
236
00:29:53,110 --> 00:29:56,279
need to remember that these are case specific
so. in one line you write b in the next line
237
00:29:56,279 --> 00:29:59,520
you write capital b you will be shown an error
ok
238
00:29:59,520 --> 00:30:06,210
so and we briefly discussed about you know
all these you know basic ah manipulations
239
00:30:06,210 --> 00:30:16,700
like addition subtraction multiplication division
so please note that if you have a vector if
240
00:30:16,700 --> 00:30:23,990
you have a vector then when you do these sum
so lets say a is this particular vector when
241
00:30:23,990 --> 00:30:30,690
i write b is a plus one one is getting added
element wise so that is why you are having
242
00:30:30,690 --> 00:30:36,880
you know two three four five six seven from
one two three four five six ok so these operations
243
00:30:36,880 --> 00:30:46,059
operate on element wise basis so which is
why if you do right c is equal to a by five
244
00:30:46,059 --> 00:31:08,480
you will get this particular values ok or
a star b now if you do a star b again its
245
00:31:08,480 --> 00:31:16,799
an element wise operation first one will be
one into two ok so we had modified b somewhere
246
00:31:16,799 --> 00:31:29,580
else oh to b is a plus one c is a . minus
one ok so a star b you can see that this will
247
00:31:29,580 --> 00:31:39,820
accordingly change ok. so this is a plus b
this is a star p c one into two is two in
248
00:31:39,820 --> 00:31:50,000
the last case six into seven is forty two
so you have these particular elements or you
249
00:31:50,000 --> 00:32:04,560
can also do a power to exponential whatever
ok so this gives you an idea of the usability
250
00:32:04,560 --> 00:32:15,850
of this particular language r which is very
easy to learn you can download it you can
251
00:32:15,850 --> 00:32:26,850
use it for analyzing your data with that i
stop here
252
00:32:26,850 --> 00:32:32,860
in the next class we'll do one more session
with the language r to see how we can import
253
00:32:32,860 --> 00:32:45,649
data from you know so of course
it is good enough to write six values seven
254
00:32:45,649 --> 00:32:51,570
values but you have a trench of data then
you need a way to import this data into this
255
00:32:51,570 --> 00:32:58,429
art software and operate on them and we'll
also briefly discussed how to do plotting
256
00:32:58,429 --> 00:33:03,809
with that i thank you for your attention today
and i look forward to having our next lecture
257
00:33:03,809 --> 00:33:04,460
thank you .