add <- function(x) {
x + 1
}class06
Background
Functions are at the heart of using R.Everything we do involves calling and using functions (from data input, analysis to results).
All functions in R have at least 3 things:
- A name the thing we use to call the function.
- One or more input arguments that are comma seperated
- The *body, liens of code between curly brackets {} that does the work of the function.
A first function
Let’s write a silly wee function to add some numbers:
Let’s try it out
add(100)[1] 101
Will this work?
add( c(100, 200, 300) )[1] 101 201 301
Modify to be more useful and add more than just 1
add <- function(x, y=1) {
x + y
}add(100, 10)[1] 110
Will this work?
add(100)[1] 101
plot(1:10, col="blue", typ="b")
N.B. Input arguments can be either required or optional. The later have a fall-back default that is specifed in the function code with an equals sign.
#add(x=100, y=200, z=300)##A second function
All functions in R look like this
name <- function(arg) {
body
}
The ‘sample()’ function in R…
sample(1:10, size=4)[1] 10 7 2 5
Q. Return 12 numbers picked randomly from the input 1:10
sample(1:10, size=12, replace=TRUE) [1] 10 10 3 5 2 8 7 7 10 3 7 2
Q. Write the code that generates a random 12 nucleotide long DNA sequence?
sample( c("A","T","G","C"), size=12, replace=TRUE ) [1] "C" "T" "A" "A" "G" "G" "T" "A" "C" "A" "T" "G"
Q. Write a first version function called ‘generate_dna()’ that generates a user specified length ‘n’ random DNA sequence?
name <- function(arg) {
body
}
generate_dna <- function(n=3) {
sample( c("A","T","G","C"), size=n, replace=TRUE )
}generate_dna(100) [1] "A" "T" "G" "A" "A" "C" "A" "T" "G" "T" "A" "T" "G" "T" "G" "A" "A" "C"
[19] "C" "A" "A" "T" "A" "T" "T" "T" "A" "A" "C" "A" "A" "T" "G" "T" "G" "C"
[37] "T" "A" "T" "T" "G" "T" "A" "T" "T" "C" "G" "C" "G" "C" "A" "C" "A" "G"
[55] "A" "C" "T" "C" "A" "G" "T" "G" "C" "G" "G" "T" "T" "A" "C" "T" "C" "C"
[73] "T" "T" "C" "T" "G" "T" "C" "C" "T" "G" "A" "G" "A" "G" "G" "C" "A" "A"
[91] "C" "T" "T" "G" "T" "C" "C" "C" "C" "A"
Q. Modify your function to return a FASTA like sequence so rather than [1] “C” “G” “C” “A” “A” “A” “C” “T” “A” “C” “C” “T” we want “GCAAT”
generate_dna <- function(n=3) {
ans <- sample( c("A","T","G","C"), size=n, replace=TRUE )
ans <- paste(ans, collapse ="")
return(ans)
}generate_dna(10)[1] "GAGATGCCTT"
An example
# Example pattern (not using your bases)
x <- c("H","E","L","L","O")
paste(x, collapse = "****")[1] "H****E****L****L****O"
# returns "HELLO"Q. Give the user an option to return FASTA format output sequence or standard multi-element vector format?
generate_dna <- function(n=3, fasta=TRUE) {
ans <- sample( c("A","T","G","C"), size=n, replace=TRUE )
if(fasta) {
ans <- paste(ans, collapse ="")
cat("Hello...")
} else {
cat("...is it me you are looking for...")
}
return(ans)
}generate_dna(10)Hello...
[1] "CAAGGAACCC"
generate_dna(10, fasta=FALSE)...is it me you are looking for...
[1] "C" "G" "G" "C" "A" "C" "A" "C" "A" "C"
A new cool function
Q.Write a function called ‘generate_protein()’ that generates a user specified length protein sequence in FASTA like format?
generate_protein <- function(n=3, fasta=TRUE) {
aa <- c( "A","R","N","D",
"C","Q","E","G",
"H","I","L","K",
"M","F","P","S",
"T","W","Y","V")
gen <- sample(aa,size=n, replace=TRUE)
if(fasta) {
prn <- paste(gen, collapse = "")
}
return(prn)
}generate_protein(10)[1] "KRICYWLWNE"
Q. Use your new ‘generate_protein()’ function to generate sequences between lengths 6 and 12 amino acids in length and check of any of these are unique in nature (i.e. found in the MR database at NCBI)?
generate_protein(6)[1] "GMSRIP"
generate_protein(7)[1] "WEHHSFL"
generate_protein(8)[1] "SWELSKAD"
generate_protein(9)[1] "YMPTIFEAQ"
generate_protein(10)[1] "VRMNMKWLFF"
generate_protein(11)[1] "GWLWSNPFPIP"
generate_protein(12)[1] "CQTVLKPLSLCG"
Or we could do a ‘for()’ loop:
for(i in 6:12) {
cat(">", i, sep="", "\n")
cat( generate_protein(i), "\n" )
}>6
LHYDIS
>7
HDMIHPI
>8
RSIPSICD
>9
IYLCLHKML
>10
HWIRGIHTSH
>11
PKWEQGWWSNR
>12
FELPAWGAIATT