class: center, middle, inverse, title-slide # Advanced R - Object oriented programming ## Part 1 - Intro & Basics ### Hannes Oberreiter ### Cohort 5 --- class: left, top ## OOP - Intro Three types to choose in R: - **S3** (base) - base R, use it if possible! solving simple problems - *S4* more strict rewrite of S3 - *RC* special case of S4, objects are mutable (can be modified in place) - **R6** - model objects independently of R (web API) - similar to RC, will be discussed in R6 Section - **S4** - planning a big project? go for S4 - used by Bioconductor (https://www.bioconductor.org/developers/package-guidelines/) --- class: left, top ### Two main paradigms - **encapsulated OOP** - method belongs to objects or classes `object.method(arg1, arg2)`, eg. R6 - **functional OOP** - method belong to generic function `generic(object, arg2, arg3)` -- *functional OOP* in my opinion should be called *multiple dispatch*: > A method no longer “belongs” to a class. Methods belong primarily to the generic function for which they are defined. <small> How S4 Methods Work, John Chambers, August 30, 2006 https://developer.r-project.org/howMethodsWork.pdf </small> --- class: left, top ### Multiple Dispatch Example ```r mde <- function(data, signature) { switch(signature, char = str_count(data), number = sum(data) ) } ``` ```r mde("test", "char") ``` ``` ## [1] 4 ``` ```r mde(c(10, 20), "number") ``` ``` ## [1] 30 ``` > The process of finding the correct method given a class is called **method dispatch** > *functional OOP*: This is called functional because from the outside it looks like a regular function call, and internally the components are also functions.. --- class: left, top ## OOP - Glossary - **polymorphism** - The Main Reason to use OOP - use the same function form for different types of input -- ```r dplyr::glimpse(ggplot2::diamonds$cut) ``` ``` ## Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... ``` ```r base::summary(ggplot2::diamonds$cut) ``` ``` ## Fair Good Very Good Premium Ideal ## 1610 4906 12082 13791 21551 ``` -- ```r dplyr::glimpse(diamonds$carat) ``` ``` ## num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... ``` ```r base::summary(diamonds$carat) ``` ``` ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.2000 0.4000 0.7000 0.7979 1.0400 5.0100 ``` --- class:center, middle > Conceptually, a generic function extends the idea of a function in R by allowing different methods to be selected corresponding to the classes of the objects supplied as arguments in a call to the function. --- class: left, top ## OOP - Glossary > OOP allows us to extent to class to also add new summary functions for new types (which would not be possible for with if_else from the outside) -- - **encapsulation** - user don't worry about the object details, they are hidden -- - **class** - describes object - defined **fields** contained in every instance - **method** - describes what the object of class can do -- > Classes are hierarchy, if one method does not exist it will use the parent method. The child **inherits** its parents methods and traits. (method dispatch == process to find the correct method given a class) --- class: center, middle <div class="figure"> <img src="data:image/png;base64,#./img/inheritance.png" alt="By User:Crashed greek - User:Crashed greek, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=64193508" width="" /> <p class="caption">By User:Crashed greek - User:Crashed greek, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=64193508</p> </div> --- class: left, top ## Inheritance Example: - ordered factor *inherits* from a regular factor - generalized linear model *inherits* from a linear model. --- class: left, middle ## sloop - sail the seas of OOP ```r library(sloop) sloop::otype(1:10) ``` ``` ## [1] "base" ``` ```r sloop::otype(mtcars) ``` ``` ## [1] "S3" ``` ```r mle_obj <- stats4::mle(function(x = 1) (x - 2) ^ 2) sloop::otype(mle_obj) ``` ``` ## [1] "S4" ``` --- class: center, middle ## Base versus OO objects  --- class: left, middle ## Base versus OO objects Base Objects do not have a class attribute to access ```r attributes(1337L)$class ``` ``` ## NULL ``` ```r # class can be misleading with base objects class(1337L) ``` ``` ## [1] "integer" ``` ```r # we can show class vector with sloop # returns the implicit class that the S3 and S4 systems will use sloop::s3_class(1337L) ``` ``` ## [1] "integer" "numeric" ``` > Note: `numeric` in S3 and S4 can mean either integer or double --- class: left, middle ## Base versus OO objects 2 Only OO has a class attribute, but every object has a base type: ```r typeof(1337L) ``` ``` ## [1] "integer" ``` ```r typeof(trees) ``` ``` ## [1] "list" ``` ```r attr(trees, "class") ``` ``` ## [1] "data.frame" ``` Base types are mostly written in C and use switch statements --> we cannot simply add new base types without manipulating source code. --- class: left, middle ```r # INTSXP, Vectors Chapter 3 typeof(1L) ``` ``` ## [1] "integer" ``` ```r # CLOSXP, Functions Chapter 6 typeof(mean) ``` ``` ## [1] "closure" ``` ```r # ENVSXP, Environments Chapter 7 typeof(globalenv()) ``` ``` ## [1] "environment" ``` ```r # S4 Chapter 15 (don't inherit from any base type) mle_obj <- stats4::mle(function(x = 1) (x - 2)^2) typeof(mle_obj) ``` ``` ## [1] "S4" ``` --- class: left, top ## Primitive function ```r getGeneric("+") ``` ``` ## standardGeneric for "+" defined from package "base" ## belonging to group(s): Arith ## ## function (e1, e2) ## standardGeneric("+", .Primitive("+")) ## <bytecode: 0x7fc522259800> ## <environment: 0x7fc522251bf0> ## Methods may be defined for arguments: e1, e2 ## Use showMethods("+") for currently available ones. ``` ```r showMethods("+") ``` ``` ## Function: + (package base) ## e1="Date", e2="Duration" ## e1="Date", e2="Period" ## e1="difftime", e2="Duration" ## e1="Duration", e2="Date" ## e1="Duration", e2="difftime" ## e1="Duration", e2="Duration" ## e1="Duration", e2="numeric" ## e1="Duration", e2="POSIXct" ## e1="Duration", e2="POSIXlt" ## e1="numeric", e2="Duration" ## e1="numeric", e2="Period" ## e1="Period", e2="Date" ## e1="Period", e2="numeric" ## e1="Period", e2="Period" ## e1="Period", e2="POSIXct" ## e1="Period", e2="POSIXlt" ## e1="POSIXct", e2="Duration" ## e1="POSIXct", e2="Period" ## e1="POSIXlt", e2="Duration" ## e1="POSIXlt", e2="Period" ```