[slide 01]
Hi!
The objective of this lesson is to get you started on the first major assignment, and to introduce array, or vector, formulas in spreadsheets. These will come in handy in both this and future assignments, so let's get started!
[spreadsheet]
This is your template for the first major assignment. We have weekly data points for our source data, and the yellow-shaded cells are the areas where you have to input your results. Please read the guidelines document for details.
In this video, we will first create the returns, then calculate the geometric average return in two ways - once by explicitly creating an extra column with the interest factors, multplying them together and taking the Nth root, and a second time, by using array functions.
[slide 02]
From the basic time value of money relationship, we can calculate the rate of change as Final divided by Initial minus 1.
Recall also that to find the geometric average of returns, we first compound all the returns to find the total return, by multiplying all the interest factors together, then take the Nth root to find the average per period return interest factor. Subtract 1 in your head to find the rate.
[spreadsheet]
So, let us first create the returns. We need to find the percent change between beginning of week and end of week adjusted closing price. Let's do it in the Returns column.
Final divided by initial - 1 gives us the rate of change. Fill this all the way down.
Let's go down to the bottom and clear out the spurious invalid return at the end, since we don't have a beginning value to use.
Now, if we didn't know about array functions, in order to calculate the geometric average we would first have to create a column of interest factors - 1 plus the rate for each observation - then get a product of all of them to compound the returns, and take the Nth root of that cumulative return to find the geometric average interest factor. Cap that off by subtracting 1 to find the rate. Let's do that to start with.
Using this next column to hold our interest factors, add 1 to all the rates. Then our geometric average, is the product of all of these, take the 1240th root because we have 1240 weekly returns, and subtract 1. And there is our geometric average return.
We can take care of the product-and-root part using the GEOMEAN function on the interest factors, as well. And we get the same result.
Using array functions allows us to get away from having to create an extra column to hold the array of interest factors. Array functions allow us to tell the spreadsheet to conduct element-wise operations on a range of cells - such as add 1 to each of the selected rates, and return a vector of the same length. We can then display that vector in a column, which gets us to the same place as before, OR, we can then feed that vector directly into other functions for further processing.
In order to tell the software that you want it to treat the formula as an array function, either use control-shift-enter instead of just enter when submitting your calculation, or check the "Array" checkbox in the function editor. Let's see how it works.
Let's delete our interest factors column, and recreate them with an array function. Just enter the range of cells we want to use, add 1, and hit control-shift-enter, and there's our new column of data.
To the same effect, hit the function editor button, enter the formula, and check the Array checkbox, to achieve the same result.
Doesn't really get us anything new, other than a neat trick to avoid having to enter and then fill down a relative reference formula.
Instead of just printing the array result, let's use it as input to another function. In our case, we want to feed the array of interest factors to GEOMEAN function, then subtract 1. Let's try that. Hit control-shift-enter, and voila, we have calculated the geometric average, without having to create extra columns of intermediate data.
Give both of these methods a try when you work on this assignment!
[slide 03]
To extend our dive into array functions further, let's discuss how we would calculate the semideviation of returns. Recall that the semivariance is the sum of squared /negative/ deviations from the mean, divided by (n-1). It's just like the normal variance, except we exclude, or effectively set to 0, any positive deviations. Semideviation is the square root of semivariance.
To calculate this, we need a conditional function - a function that will return different results depending on whether the deviation is negative or positive. The great news is that there is such a logical function built in - I introduce you to the IF function, which returns different results depending on whether its first argument is true or false.
The function takes three arguments - the logical statement, the value to return if the statement is true, and the value to return if the statement is false. I hope you can see that we should be able to use the IF function to select out all the negative deviations from the mean.
[spreadsheet]
Let's try some simple examples: IF(1+1=2, "yes","no") will return "yes". IF(3>5,1,0) will return 0. You get the idea.
Like other functions, IF also can be used as a vector function, applied to a range of values. For example, let's create a range of cells, 1 to 10, and then check which ones are greater than 5. =IF(range > 5, 1, 0). Hit control-shift-enter, and there we have the result.
A powerful extension of this is that the values to be returned can also be arrays, and the IF function will return the appropriate array indices for true and false conditions.
For example, rather than just returning 0 or 1 for whether the input array element is greater than 5, we could have it output the actual element if the condition is true, and 0 otherwise. Let's try that.
=IF(range > 5, range, 0)
Notice how we zeroed out all the elements where condition was not true, but returned the actual element when it was true.
I hope you can now see how we can use this to create the semivariance - if a deviation from the mean is negative, we will return the square deviation from the mean for that observation, but return 0 if it is not, and then it's just a matter adding up the remaining squared deviations, and dividing by (n-1). Let's try it on our returns data.
Let's clear up some space. First, calculate the mean:
=AVERAGE(H2:H1241)
Now let's calculate the semivariance
=SUM(IF(returns - mean < 0, (returns - mean)^2, 0))/(1240-1)
Oops, looks like we forgot to hit control-shift-enter, so let's make sure to use the array function, and there is our actual semivariance.
Now we can calculate the semideviation as the square root of semivariance, and there we are.
Notice that an array function cells is denoted with curly braces, so that makes it easy to see if you are using array functions or not.
Let's review the formula: we are checking each deviation for whether it is negative, and are including its square in the output array if that is true, or 0 if it is false. Then we are feeding the whole output array into the sum function, and dividing by N-1.
Wait a second, I notice that the second reference to the mean has shifted up and is pointing to an empty cell. Let's fix this, both should be pointing to the mean return, and hit ctl-shift-enter. That will change the numbers a little bit. There we go.
At this point I must mention a crucial difference between the way LibreOffice and Microsoft Excel deal with array functions that output more than one cell at a time. For example, let's say we have a column of 1,2,3, and we want to calculate that column plus 1, using an array function on the right. In LibreOffice, which is what I'm using here and have been doing, I start with an equal sign, select the input range, add 1, and hit ctl-shift-enter, and magically I get an output with multiple cells in it.
In Excel, that won't happen, it will only output the first cell of the output. In order to make it output the entire range, you have to pre-select the range of cells where you expect the output to go. So, let's get rid of this stuff here. You'd pre-select the range, if you don't know how large it is you can select extra cells, and then enter your formula, select input range, add 1, and hit ctl-shift-enter. You'll notice that any cells that output doesn't need will get populated with NA, that just means you don't have anything there. So if you are not sure exactly how large your output will be, such as for instance when you do linear regression, you can just be generous and select a whole bunch of cells, and if there are any extras they'll get NAs in them.
Additionally, unlike in LibreOffice, where the function editor has a nice little array checkbox you can use, that doesn't exist in Microsoft Excel, you just have to use ctl-shift-enter to enter the array formulas.
[slide 04]
I hope you enjoyed this introduction to spreadsheet array functions. Feel free to play around with these and get a better feel for how they work. Experimenting with something is really one of the best ways to figure out anything.
Another good idea if you want more help and examples is to use your favorite search engine and seach for something like "excel array functions tutorial". The internet is a magical tool.
Good luck with the assignments for this week. Hit up the discussion boards if you run into trouble and need some help!