class06

Author

Patrick Nguyen (A17680785)

Background

Functions are at the heart of using R.Everything we do involves calling and using functions (from data input, analysis to results).

All functions in R have at least 3 things:

  • A name the thing we use to call the function.
  • One or more input arguments that are comma seperated
  • The *body, liens of code between curly brackets {} that does the work of the function.

A first function

Let’s write a silly wee function to add some numbers:

add <- function(x) {
  x + 1
}

Let’s try it out

add(100)
[1] 101

Will this work?

add( c(100, 200, 300) )
[1] 101 201 301

Modify to be more useful and add more than just 1

add <- function(x, y=1) {
  x + y
}
add(100, 10)
[1] 110

Will this work?

add(100)
[1] 101
plot(1:10, col="blue", typ="b")

N.B. Input arguments can be either required or optional. The later have a fall-back default that is specifed in the function code with an equals sign.

#add(x=100, y=200, z=300)

##A second function

All functions in R look like this

name <- function(arg) {
  body
}

The ‘sample()’ function in R…

sample(1:10, size=4)
[1] 10  7  2  5

Q. Return 12 numbers picked randomly from the input 1:10

sample(1:10, size=12, replace=TRUE)
 [1] 10 10  3  5  2  8  7  7 10  3  7  2

Q. Write the code that generates a random 12 nucleotide long DNA sequence?

sample( c("A","T","G","C"), size=12, replace=TRUE )
 [1] "C" "T" "A" "A" "G" "G" "T" "A" "C" "A" "T" "G"

Q. Write a first version function called ‘generate_dna()’ that generates a user specified length ‘n’ random DNA sequence?

name <- function(arg) {
  body
}
generate_dna <- function(n=3) {
  sample( c("A","T","G","C"), size=n, replace=TRUE )
}
generate_dna(100)
  [1] "A" "T" "G" "A" "A" "C" "A" "T" "G" "T" "A" "T" "G" "T" "G" "A" "A" "C"
 [19] "C" "A" "A" "T" "A" "T" "T" "T" "A" "A" "C" "A" "A" "T" "G" "T" "G" "C"
 [37] "T" "A" "T" "T" "G" "T" "A" "T" "T" "C" "G" "C" "G" "C" "A" "C" "A" "G"
 [55] "A" "C" "T" "C" "A" "G" "T" "G" "C" "G" "G" "T" "T" "A" "C" "T" "C" "C"
 [73] "T" "T" "C" "T" "G" "T" "C" "C" "T" "G" "A" "G" "A" "G" "G" "C" "A" "A"
 [91] "C" "T" "T" "G" "T" "C" "C" "C" "C" "A"

Q. Modify your function to return a FASTA like sequence so rather than [1] “C” “G” “C” “A” “A” “A” “C” “T” “A” “C” “C” “T” we want “GCAAT”

generate_dna <- function(n=3) {
  ans <- sample( c("A","T","G","C"), size=n, replace=TRUE )
  ans <- paste(ans, collapse ="")
  return(ans)
}
generate_dna(10)
[1] "GAGATGCCTT"

An example

# Example pattern (not using your bases)
x <- c("H","E","L","L","O")

paste(x, collapse = "****")
[1] "H****E****L****L****O"
# returns "HELLO"

Q. Give the user an option to return FASTA format output sequence or standard multi-element vector format?

generate_dna <- function(n=3, fasta=TRUE) {
  ans <- sample( c("A","T","G","C"), size=n, replace=TRUE )
  
if(fasta) {
  ans <- paste(ans, collapse ="")
  cat("Hello...")
} else {
  cat("...is it me you are looking for...") 
  }
  
  return(ans)
  }
generate_dna(10)
Hello...
[1] "CAAGGAACCC"
generate_dna(10, fasta=FALSE)
...is it me you are looking for...
 [1] "C" "G" "G" "C" "A" "C" "A" "C" "A" "C"

A new cool function

Q.Write a function called ‘generate_protein()’ that generates a user specified length protein sequence in FASTA like format?

generate_protein <- function(n=3, fasta=TRUE) {
  aa <- c( "A","R","N","D",
           "C","Q","E","G",
           "H","I","L","K",
           "M","F","P","S",
           "T","W","Y","V")
  gen <- sample(aa,size=n, replace=TRUE)
  
  if(fasta) {
    prn <- paste(gen, collapse = "")
  }
  
  return(prn)
  
}
generate_protein(10)
[1] "KRICYWLWNE"

Q. Use your new ‘generate_protein()’ function to generate sequences between lengths 6 and 12 amino acids in length and check of any of these are unique in nature (i.e. found in the MR database at NCBI)?

generate_protein(6)
[1] "GMSRIP"
generate_protein(7)
[1] "WEHHSFL"
generate_protein(8)
[1] "SWELSKAD"
generate_protein(9)
[1] "YMPTIFEAQ"
generate_protein(10)
[1] "VRMNMKWLFF"
generate_protein(11)
[1] "GWLWSNPFPIP"
generate_protein(12)
[1] "CQTVLKPLSLCG"

Or we could do a ‘for()’ loop:

for(i in 6:12) {
  cat(">", i, sep="", "\n")
  cat( generate_protein(i), "\n" )
}
>6
LHYDIS 
>7
HDMIHPI 
>8
RSIPSICD 
>9
IYLCLHKML 
>10
HWIRGIHTSH 
>11
PKWEQGWWSNR 
>12
FELPAWGAIATT