Functions in R

Anna Krystalli

Institute de Ecologia, UNAM 30 Aug. 2016
https://annakrystalli.github.io/UNAM-site/Functions_in_R.nb.html


@annakrystalli | annakrystalli@googlemail.com



What are functions?

Code written to carry out specific task

Functions used to:

  • incorporate sets of instructions that we want to use repeatedly
  • contain complex code in a neat sub-program
  • reduce opportunity for errors
  • make code more readable

usually:

  • accepts parameters (arguments) <- INPUT
  • returns value(s) <- OUTPUT


functions in R

R is a functional programming language!

provides many tools for the creation and manipulation of functions.

“To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call."

John Chambers

even + and indexing through [ ]!

`+`(3,4)
## [1] 7
`[`(1:10, 1)
## [1] 1


You can do anything with functions that you can do with vectors:

  • assign them to variables
  • store them in lists
  • pass them as arguments to other functions
  • create them inside functions
  • return them as the result of a function



function basics

Basic structure

function(arglist){body}

TIP: in Rstudio, typing the fun snippet inserts an R function definition:

name <- function(variables) {
    
}

Just try typing fun



example function

Let’s write a function that will calculate the standard deviation of the values in a vector x.

std.dev <- function(x){
    n <- length(x)
    xbar <- sum(x)/n
    diff <- x - xbar
    sum.sq <- sum(diff^2)
    var <- sum.sq / (n-1)
    sqrt(var)
}

function environment

  • every time a function is called, a new environment is created to host execution.
  • each invocation is completely independent of previous ones
  • variables used within are local, e.g. their scope lies within - and is limited to - the function itself. They are therefore invisible outside the function body
s.d <- std.dev(1:10)
s.d
## [1] 3.02765
xbar
## Error in eval(expr, envir, enclos): object 'xbar' not found



elements of a function

function name

This can be any valid variable name, but you should avoid using names that are used elsewhere in R, such as dir, function, plot, etc

  • choose descriptive names
  • use verbs
  • check whether they are already in use: ? function.name

(you can access a function from a specific package using package.name::function.name)


arguments

Functions can have any number of arguments. These can be any R object: numbers, strings, arrays, data frames, of even pointers to other functions; anything that is needed for the function.name function to run.

  • Again, use descriptive names for arguments
formals(std.dev)
## $x


The ..., or ellipsis, element in the definition of a function allows for other arguments to be passed into the function, and passed onto to another function.

ellipsis_example <- function(x, ...) {
    input_list <- list(...)
    output_list <- lapply(X=input_list, summary)
    return(output_list)
}
ellipsis_example(x = 1, a=1:10,b=11:20,c=21:30)
## $a
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.25    5.50    5.50    7.75   10.00 
## 
## $b
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.00   13.25   15.50   15.50   17.75   20.00 
## 
## $c
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   21.00   23.25   25.50   25.50   27.75   30.00



body

The function code between the {} brackets is run every time the function is called. Ideally functions are short and do just one thing.

body(std.dev)
## {
##     n <- length(x)
##     xbar <- sum(x)/n
##     diff <- x - xbar
##     sum.sq <- sum(diff^2)
##     var <- sum.sq/(n - 1)
##     sqrt(var)
## }



return value

The last line of the code is the value that will be returned by the function. It is not necessary that a function return anything, for example a function that makes a plot might not return anything, whereas a function that does a mathematical operation might return a number, or a list.




arguments & environment

All arguments required for computation must be supplied.

missing arguments not required for computation are fine:

f1 <- function(a, b, c){a + b}
f1(a = 10 , b = 20)
## [1] 30

objects required by function will be sought first in the local environment. If an argument specified in the function is missing, it will return an error, even if such an object exists in the global environment.

b <- 10
f1 <- function(a, b){a + b}
f1(a = 10)
## Error in f1(a = 10): argument "b" is missing, with no default
f1(a = 10 , b = 20)
## [1] 30

Objects required by computation but not specified as function arguments will be sought in the containing environment iteratively until it reaches the global environment.

b <- 10
f2 <- function(a){a + b}
f2(a = 10)
## [1] 20
rm(b) # remove object b
f2(a = 10)
## Error in f2(a = 10): object 'b' not found


default values

Default values for arguments can also be supplied when writing the functions

f3 <- function(a, b = 20){a + b}
f3(a = 10)
## [1] 30
f3(a = 10, b = 10)
## [1] 20
  • can be a bit dangerous depending on objects from outside the function environment.
  • consider using ...



outputs

A function by default returns the last ‘thing’ evaluated

f4 <- function(x){x + 10}
f4(10) # returns a value
## [1] 20
f5 <- function(x){
    y <- x + 10}
f5(10) # return an object which needs to be assigned
z <- f5(10)
z
## [1] 20
f6 <- function(x){
    y <- x + 10
    return(y)} 
f6(10) # returns a value
## [1] 20
v <- f6(10)
v
## [1] 20

I generally advise you always use return() to specify the outputs of your functions.

You can also use it conditionally to return different values or hault evaluation and return back to the calling environment:

f7 <- function(x) {
    y <- x - 10
    if(y < 0){return(NA)}else{
        y <- 2 * y
        return(y)}
}

f7(20)
## [1] 20
f7(5)
## [1] NA


multiple outputs

If you want to return multiple values/objects, you can collect objects created within a function in a list. Let’s say instead of just the std.dev, we also wanted the mean (xbar) and n returned. No problem. Just collect them in a list.

std.dev <- function(x){
    n <- length(x)
    xbar <- sum(x)/n
    diff <- x - xbar
    sum.sq <- sum(diff^2)
    var <- sum.sq / (n-1)
    s.d <- sqrt(var)
    
    return(list(s.d, xbar, n))
}

std.dev(1:10)
## [[1]]
## [1] 3.02765
## 
## [[2]]
## [1] 5.5
## 
## [[3]]
## [1] 10
  • Lists can collect diverse and complex outputs.
  • This is basically what outputs of function like lm are, a list.



Calling a function given a list of arguments

Suppose you had a list of function arguments:

args <- list(c(1:10, NA), na.rm = TRUE)

You could you then send that list to mean() by using do.call():

do.call(mean, list(1:10, na.rm = TRUE))
## [1] 5.5



function pipes

If you have a long workflow of computations:

  • break it up into logical blocks
  • write function for each
  • write functions so that output from one is the first argument to the next
  • use package dplyr and the pipe shorthand %>% to set up function pipeline
install.packages("dplyr")

Say I want to prepare a vector of values by removing NAs, scaling and centering the values. First I create a list containg the vector of values as well as an element to track the status of the process.

require(dplyr)
l <- list(x = c(1:10, NA), status = NULL)

Then I write three functions that will receive and return the list.

rmNAs <- function(X){
    X$x <- na.omit(X$x)
    X$status <- c(X$status, "NAs_removed")
    return(X)
}
         
scaleVector <- function(X){
    X$x <- scale(X$x, scale = T)
    X$status <- c(X$status, "scaled")
    return(X)
}          
          
centerVector <- function(X){
    X$x <- scale(X$x, center = T)
    X$status <- c(X$status, "centered")
    return(X)
}  

Then I set up a pipeline with the functions and pass the vector through:

l %>% rmNAs() %>% scaleVector %>% centerVector
## $x
##             [,1]
##  [1,] -1.4863011
##  [2,] -1.1560120
##  [3,] -0.8257228
##  [4,] -0.4954337
##  [5,] -0.1651446
##  [6,]  0.1651446
##  [7,]  0.4954337
##  [8,]  0.8257228
##  [9,]  1.1560120
## [10,]  1.4863011
## attr(,"scaled:center")
## [1] 0
## attr(,"scaled:scale")
## [1] 1
## 
## $status
## [1] "NAs_removed" "scaled"      "centered"

more on pipelines: https://rpubs.com/tjmahr/pipelines_2015




anonymous functions

If you just want to use a function once, you don’t have to name it:

(function(x) x * 10)(10)
## [1] 100

Anonymous functions can be particularly useful in conjunction with vectorising functions like lapply(). Here’s an unammed function for calculating the mean of a vector x. In the following example, the input x to the function is each element of the list l.

l <- list(1:5, 5:7)
lapply(l, FUN = function(x){sum(x)/length(x)})
## [[1]]
## [1] 3
## 
## [[2]]
## [1] 6



sourcing functions

I often write functions associated with a particular project.

I will save all the functions in a separate "project.name_functions.R" script. I’ll then call that script to make all the functions available to my workflow.

source("project.name_functions.R")

Remember to document details of your functions!




Avoiding errors

Let’s start with another example of a function:

getY <- function(X.matrix, b.vec, a.scalar) {

    # multiply the matrix by the vector using %*% operator
    Xb.prod <- X.matrix %*% b.vec

    # multiply the two resulting objects together to get a final object
    y <- Xb.prod * a.scalar

    # return the result
    return(y)
}

Clearly this function requires that the length of the vector and the number of columns in the matrix match.

Let’s make some objects to pass to the function:

mat <- cbind(c(1, 3, 4), c(5, 4, 3))
vec <- c(4, 3)

getY(mat, vec, 3)
##      [,1]
## [1,]   57
## [2,]   72
## [3,]   75

In this case the function works because the number of columns of matrix mat (2) and the length of vec (2) match.

How can we prevent or identify errors?


1. sanity checks

We can adapt the function and use the print() function to print off values of interest, for example the dimensions of arguments of concern

getY <- function(X.matrix, b.vec, a.scalar) {
    
    # print diagnostics
    print(dim(X.matrix))
    print(length(b.vec))

    # multiply the matrix by the vector using %*% operator
    Xb.prod <- X.matrix %*% b.vec

    # multiply the two resulting objects together to get final y
    y <- Xb.prod * a.scalar
    
    return(y)
}

getY(mat, vec, 3)
## [1] 3 2
## [1] 2
##      [,1]
## [1,]   57
## [2,]   72
## [3,]   75


2. debugging

When you have an error, one thing you can do is use R’s built-in debugger debug() to find at what point the error occurs.

debug(getY)
getY(X.matrix = mat, b.vec = c(2, 3, 6, 4, 1), a.scalar = 9)
## debugging in: getY(X.matrix = mat, b.vec = c(2, 3, 6, 4, 1), a.scalar = 9)
## debug at <text>#1: {
##     print(dim(X.matrix))
##     print(length(b.vec))
##     Xb.prod <- X.matrix %*% b.vec
##     y <- Xb.prod * a.scalar
##     return(y)
## }
## debug at <text>#4: print(dim(X.matrix))
## [1] 3 2
## debug at <text>#5: print(length(b.vec))
## [1] 5
## debug at <text>#8: Xb.prod <- X.matrix %*% b.vec
## Error in X.matrix %*% b.vec: non-conformable arguments


3. stopping and error messages

To ensure functions run smoothly you can use functions stop() or stopifnot() to hault the execution of functions should specific conditions not be met and flag with an appropriate and informative error

stop()

stops execution and returns error message. Usually used with conditional statements

f1 <- function(x) {
    if(!is.numeric(x)){stop("x is not numeric")}else{
        return(2*x)
    }
}
f1(10)
## [1] 20
f1("a")
## Error in f1("a"): x is not numeric

stopifnot()

tests the conditional statements supplied as arguments and stops execution if any return FALSE

f1 <- function(x) {
    stopifnot(is.numeric(x))
    return(2*x)
}
f1(10)
## [1] 20
f1("a")
## Error: is.numeric(x) is not TRUE


Testing

Each function should be easy to test, then you can “freeze” it. Write test cases, which can be automatically checked. - Unit Testing in R: The Bare Minimum




Tips & tricks

  • Keep your functions short. Remember you can use them to call other functions!
    • code cleaner and easily testable.
    • code easy to update
  • Document what the inputs to the function are, what the function does, and what the output is.
  • Check for errors along the way
  • Try out your function with simple examples to make sure it’s working properly
  • Use debugging and error messages, as well as sanity checks as you build your function.
  • Avoid mixing computation and plotting in the same function eg.
res <- some.computation(par1, par2, par3) 
plot(res)
  • it can be useful to look at the code of a function. Type the function name without the ( )
    • Will work on any function apart from base functions which are written in C
std.dev
## function(x){
##     n <- length(x)
##     xbar <- sum(x)/n
##     diff <- x - xbar
##     sum.sq <- sum(diff^2)
##     var <- sum.sq / (n-1)
##     s.d <- sqrt(var)
##     
##     return(list(s.d, xbar, n))
## }



Exercises:

source: https://www.r-bloggers.com/functions-exercises/

Exercise 1 Create a function that will return the sum of 2 integers.

Exercise 2 Create a function what will return TRUE if a given integer is inside a vector.

Exercise 3 Create a function that given a data frame will print by screen the name of the column and the class of data it contains (e.g. Variable1 is Numeric).

Exercise 4 Create the function unique, which given a vector will return a new vector with the elements of the first vector with duplicated elements removed.

Exercise 5 Create a function that given a vector and an integer will return how many times the integer appears inside the vector.

Exercise 6 Create a function that given a vector will print by screen the mean and the standard deviation, it will optionally also print the median.

Exercise 7 Create a function that given an integer will calculate how many divisors it has (other than 1 and itself). Make the divisors appear by screen.

Exercise 8 Create a function that given a data frame, and a number or character will return the data frame with the character or number changed to NA.

Further reading