Chapter 11 DSSC - Data Wrangling, Presentation and Applications

To check for missing data use,

print(paste("Missing data:", sum(is.na(df$var)), sep=" ", collapse=""))

11.1 Data Wrangling with Tidyverse

Loading tidyverse,

library("tidyverse")

11.1.1 Tidy Form (`tidyr`)

What is tidy data?

each variable is in a column
each observation is in a row
each type of observational unit forms a table

Moving to and from tidy data

Problems (how data may violate tidy form)

Data is too wide - one variable spread over multiple columns (use pivot_longer())
Data is too long - one observation spread along multiple rows (use pivot_wider())

pivot_longer()

Makes Wide Data Longer

The arguments are:

Data Frame
Columns to transform
Name of the column where previous column names should go
Name of the column where values from the column should go

Example

who_wide

##       country  y1999  y2000
## 1 Afghanistan    745   2666
## 2      Brazil  37737  80488
## 3       China 212258 213766

pivot_longer(who_wide,
             c(`y1999`, `y2000`),
             names_to = "year",
             values_to = "cases")

## # A tibble: 6 × 3
##   country     year   cases
##   <chr>       <chr>  <dbl>
## 1 Afghanistan y1999    745
## 2 Afghanistan y2000   2666
## 3 Brazil      y1999  37737
## 4 Brazil      y2000  80488
## 5 China       y1999 212258
## 6 China       y2000 213766

pivot_wider()

Makes Long Data Wider

The arguments are:

Data Frame
Columns to transform
Name of the column where column names should come from
Name of the column where values should come from

Example

who_long

##        country year       type      count
## 1  Afghanistan 1999      cases        745
## 2  Afghanistan 1999 population   19987071
## 3  Afghanistan 2000      cases       2666
## 4  Afghanistan 2000 population   20595360
## 5       Brazil 1999      cases      37737
## 6       Brazil 1999 population  172006362
## 7       Brazil 2000      cases      80488
## 8       Brazil 2000 population  174504898
## 9        China 1999      cases     212258
## 10       China 1999 population 1272915272
## 11       China 2000      cases     213766
## 12       China 2000 population 1280428583

pivot_wider(who_long,
            names_from = "type",
            values_from = "count")

## # A tibble: 6 × 4
##   country      year  cases population
##   <chr>       <dbl>  <dbl>      <dbl>
## 1 Afghanistan  1999    745   19987071
## 2 Afghanistan  2000   2666   20595360
## 3 Brazil       1999  37737  172006362
## 4 Brazil       2000  80488  174504898
## 5 China        1999 212258 1272915272
## 6 China        2000 213766 1280428583

Additional Example - DSSC Lab 5.6

pres.res

##   Candidate       California       Arkansas
## 1   Clinton 8753788/14181595 380494/1130676
## 2     Trump 4483810/14181595 684872/1130676
## 3     Other  943997/14181595  65310/1130676

pres.res2 <- pivot_longer(pres.res,
                          c("California", "Arkansas"),
                          names_to = "State",
                          values_to = "Proportion")
pres.res2

## # A tibble: 6 × 3
##   Candidate State      Proportion      
##   <chr>     <chr>      <chr>           
## 1 Clinton   California 8753788/14181595
## 2 Clinton   Arkansas   380494/1130676  
## 3 Trump     California 4483810/14181595
## 4 Trump     Arkansas   684872/1130676  
## 5 Other     California 943997/14181595 
## 6 Other     Arkansas   65310/1130676

pres.res3 <- separate(pres.res2, "Proportion", c("Votes", "Total"))
pres.res3

## # A tibble: 6 × 4
##   Candidate State      Votes   Total   
##   <chr>     <chr>      <chr>   <chr>   
## 1 Clinton   California 8753788 14181595
## 2 Clinton   Arkansas   380494  1130676 
## 3 Trump     California 4483810 14181595
## 4 Trump     Arkansas   684872  1130676 
## 5 Other     California 943997  14181595
## 6 Other     Arkansas   65310   1130676

pres.res4 <- mutate(pres.res3, Votes = as.numeric(Votes), Total = as.numeric(Total))
str(pres.res4)

## tibble [6 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Candidate: chr [1:6] "Clinton" "Clinton" "Trump" "Trump" ...
##  $ State    : chr [1:6] "California" "Arkansas" "California" "Arkansas" ...
##  $ Votes    : num [1:6] 8753788 380494 4483810 684872 943997 ...
##  $ Total    : num [1:6] 14181595 1130676 14181595 1130676 14181595 ...

pres.res5 <- pres.res4 |> 
  group_by(Candidate) |> 
  summarise(Percent = sum(Votes)/sum(Total)*100) |> 
  arrange(desc(Percent))
pres.res5

## # A tibble: 3 × 2
##   Candidate Percent
##   <chr>       <dbl>
## 1 Clinton     59.7 
## 2 Trump       33.8 
## 3 Other        6.59

Other useful `tidyr` functions

separate() - splits one column of strings into multiple new columns
unite() - combines many columns into one (as a string)
extract() - uses regular expressions to pull out specific information from a string column

Example

fball

##        home     away score
## 1     Man U Shef Wed   2-1
## 2 Tottenham  Arsenal   0-0
## 3   Chelsea    W Ham   1-0

separate(fball, "score", c("home_goals", "away_goals"))

##        home     away home_goals away_goals
## 1     Man U Shef Wed          2          1
## 2 Tottenham  Arsenal          0          0
## 3   Chelsea    W Ham          1          0

11.1.2 Data Manipulation (`dplyr`)

Main `dplyr` functions

(First argument is always the data frame)

filter() - Focus on a subset of rows

Other Arguments

condition to filter by

For example, filter(who, year == 1999)

(see above list of logical operators)

arrange() - Reorder the rows

Other Arguments

Variable names to sort by, sub-sorting by later variables
Wrap variable name in desc() to sort descending (ascending by default)

For example, arrange(who, year, desc(country))

select() - Focus on a subset of variables (columns)

Other Arguments

Name of variables to retain

For example, select(who, year, cases)

mutate() - Create new derived variables

Other Arguments

Name of new variable and equation defining it

For example, mutate(who, rate = cases/population)

group_by() - Splits a data frame up into groups according to one variable

Other Arguments

Name of variable to group by

For example, group_by(who, country)

summarise() - Create summary statistics (collapsing many rows) by groupings

Other Arguments

Function to summarise by

For example, summarise(who, total = sum(cases))

Note: often want to summarise by group

For example,

who2 <- group_by(who, country)
summarise(who2, total = sum(cases), change = max(cases)-min(cases))

11.1.3 Pipelines

Chain functions (not limited to tidyverse functions) where result of first function is first entry in second function and so on.

Example,

filter(x, ...) |> 
  select(...) |> 
  mutate(...) |> 
  group_by(...) |> 
  arrange(...)

Pipeline Operator: CMD-SHIFT-M

11.1.4 Joining Data Frames in Tidyverse

Simplest case of joining data frames (more details in data frames section):

rbind() - paste rows together (above/below)
cbind() - paste cols together (left/right)

These methods can be very error prone (requires variables/observations in identical order etc)

Advanced Data Frame Joins

left_join(x, y) - add new variables from y to x, keeping all x obs
right_join(x, y) - add new variables from x to y, keeping all y obs
inner_join(x, y) - keep only matching rows
full_join(x, y) - keep all rows in both x and y

Example

band_members

## # A tibble: 3 × 2
##   name  band   
##   <chr> <chr>  
## 1 Mick  Stones 
## 2 John  Beatles
## 3 Paul  Beatles

band_instruments2

## # A tibble: 3 × 2
##   artist plays 
##   <chr>  <chr> 
## 1 John   guitar
## 2 Paul   bass  
## 3 Keith  guitar

left_join(band_members, band_instruments2, by = c("name" = "artist"))

## # A tibble: 3 × 3
##   name  band    plays 
##   <chr> <chr>   <chr> 
## 1 Mick  Stones  <NA>  
## 2 John  Beatles guitar
## 3 Paul  Beatles bass

11.2 Dynamic Documents and Interactive Dashboards

11.2.1 RMD

Document Preamble

---
title: "Example"
author: "(optional) Jamie Reason"
date: "(optional)"
output:
  html_document: default
  pdf_document: default
---

Course Slides on RMD

For further formatting, refer to RMD Cheat Sheet

11.2.2 Shiny

Resource: Mastering Shiny Book

Outline

UI
Server

R code can be added to any part of a shiny document but only the code in the server will be updated when needed.

Starting a Shiny Dashboard (create a new shiny app in R studio):

fluidpage() is just the most common but there are alternatives

library(shiny)

#misc code

ui <- fluidpage(
  ...
)

server <- function(input, output, session){
  #server code
}

shinyApp(ui, server)

11.2.2.1 UI

UI elements reference guide

11.2.2.1.1 Pages

Examples

ui <- fluidPage(
  "One",
  "Two",
  "Three"
)
shinyApp(ui, server = function(input, output, session) {})

ui <- navbarPage(
  "Title of page",
  tabPanel("My first tab", "Hello Alice"),
  tabPanel("My second tab", "Hello Bob")
)
shinyApp(ui, server = function(input, output, session) {})

Other pages: fixedPage(), fillPage(), …

11.2.2.1.2 Layouts and Panels

Goes inside of the page

titlePanel("My App")
sidebarLayout()
- first argument sidebarPanel()
- second argument mainPanel()
fluidrow() - creates a new row with columns in
- ```column() calls
- first a number 1 to 12 (all columns numbers must sum to 12) for width -other arguments are outputs

Examples

ui <- fluidPage(
  titlePanel("My App"),
  sidebarLayout(
    sidebarPanel("I'm in sidebar"),
    mainPanel("I'm in main panel")
  )
)
shinyApp(ui, server = function(input, output, session) {})

ui <- fluidPage(
  fluidRow(
    column(4, "Lorem ipsum dolor ..."),
    column(8, "Lorem ipsum dolor ...")
  ),
  fluidRow(
    column(6, "Lorem ipsum dolor ..."),
    column(6, "Lorem ipsum dolor ...")
  )
)
shinyApp(ui, server = function(input, output, session) {})

11.2.2.1.3 UI Inputs

All inputs take same first argument - inputId, the unique identifier of the input.

This can be accessed by using input$name (in the server).

The second argument is a label, or how it’s name appears on the dashboard.

Text Inputs

textInput()
passwordInput()
textAreaInput()

Numeric Inputs

numericInput()
sliderInput()

Categoric Inputs

selectInput()
radioButtons()
checkboxGroupInput()

Examples

ui <- fluidPage(
  numericInput("num", "Number one", value = 0, min = 0, max = 100),
  sliderInput("num2", "Number two", value = 50, min = 0, max = 100),
  sliderInput("rng", "Range", value = c(10, 20), min = 0, max = 100)
)
shinyApp(ui, server = function(input, output, session) {})

animals <- c("dog", "cat", "mouse", "bird", "other", "I hate animals")
ui <- fluidPage(
  selectInput("state", "What's your favourite state?", state.name),
  radioButtons("animal", "What's your favourite animal?", animals),
  checkboxGroupInput("animal2", "What animals do you like?", animals)
)
shinyApp(ui, server = function(input, output, session) {})

11.2.2.2 Server and UI Outputs

All outputs take same first argument, outputId and an output can be called by output$name.

11.2.2.2.1 UI Outputs

Text Outputs

textOutput()
renderText()
verbatimTextOutput()
renderPrint()

Plot Outputs

plotOutput() and renderPlot()
- width argument
- res = 96 argument closest to what you see inj RStudio

Examples

ui <- fluidPage(
  textInput("name", "What's your name?"),
  textOutput("greet")
)
server <- function(input, output, session) {
  output$greet <- renderText({
    if(nchar(input$name) > 0) {
      return(paste0("Hello ", input$name))
    } else {
      return("Hello friend, tell me your name!")
    }
  })
}
shinyApp(ui, server)

ui <- fluidPage(
  plotOutput("myplot", width = "400px")
)
server <- function(input, output, session) {
  output$myplot <- renderPlot({
    plot(iris$Sepal.Length, iris$Sepal.Width)
  }, res = 96)
}
shinyApp(ui, server)

11.2.2.2.2 Variables outside outputs (reactive)

Instead of making variables in the server (which you can’t do as they wouldn’t be reactive), you use the reactive({}) call:

Inside the server,

name <- ...

becomes,

name <- reactive({
  ...
})

And when name is used it should be called as name()

Examples

server <- function(input, output, session) {
  name <- reactive({
    toupper(input$name)
  })
  output$greet <- renderText({
    if(nchar(input$name) > 0) {
      return(paste0("Hello ", name(), ", here is your plot ..."))
    } else {
      return("Hello friend, tell me your name!")
    }
  })
  output$myplot <- renderPlot({
    if(nchar(input$name) > 0) {
      ggplot(iris, aes_string(x = input$xvar, y = input$yvar)) +
        geom_point() +
        labs(title = paste0(name(), "'s plot!"))
    }
  }, res = 96)
}

11.2.2.3 Full Example

From exercise 5.78 (Lab 8)

library("shiny")
library("ukpolice")
library("tidyverse")
library("leaflet")

nbd <- ukc_neighbourhoods("durham")
nbd2 <- nbd$id
names(nbd2) <- nbd$name

# Define UI for application
ui <- fluidPage(
  titlePanel("UK Police Data"),
  sidebarLayout(
    sidebarPanel(
      selectInput("nbd", "Choose Durham Constabulary Neighborhood", nbd2),
      textInput("date", "Enter the desired year and month in the format YYYY-MM", value = "2021-09")
    ),

    # Show a plot of the generated distribution
    mainPanel(
      plotOutput("barchart"),
      leafletOutput("map")
    )
  )
)

# Define server logic
server <- function(input, output) {
  # Get boundaries for selected neighbourhood
  # Wrapped in a reactive because we need this to trigger a
  # change when the input neighborhood changes
  bdy <- reactive({
    bdy <- ukc_neighbourhood_boundary("durham", input$nbd)
    bdy |>
      mutate(latitude = as.numeric(latitude),
             longitude = as.numeric(longitude))
  })

  # Get crimes for selected neighbourhood
  # Also wrapped in a reactive because we need this to trigger a
  # change when the boundary above, or date, changes
  crimes <- reactive({
    bdy2 <- bdy() |>
      select(lat = latitude,
             lng = longitude)

    ukc_crime_poly(bdy2[round(seq(1, nrow(bdy2), length.out = 100)), ], input$date)
  })

  # First do plot
  output$barchart <- renderPlot({
    ggplot(crimes()) +
      geom_bar(aes(y = category, fill = outcome_status_category)) +
      labs(y = "Crime", fill = "Outcome Status")
  }, res = 96)

  # Then do map
  output$map <- renderLeaflet({
    leaflet() |>
      addTiles() |>
      addPolygons(lng = bdy()$longitude, lat = bdy()$latitude) |>
      addCircles(lng = as.numeric(crimes()$longitude), lat = as.numeric(crimes()$latitude), label = crimes()$category, color = "red")
  })
}

# Run the application

11.3 Dates

(see DSSC Lab 9)

Use lubridates package

library("lubridate")

## Warning: package 'lubridate' was built under R version 4.1.2

## Loading required package: timechange

## Warning: package 'timechange' was built under R version 4.1.2

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

lubridate.tidyverse.org

###Creating Dates {-}

Current date and time

today()

## [1] "2023-01-15"

now()

## [1] "2023-01-15 16:34:16 GMT"

str(today()) #these are dates not strings

##  Date[1:1], format: "2023-01-15"

Constructing dates from strings and numbers

ymd("2021-12-02")

## [1] "2021-12-02"

mdy("December 2nd, 2021")

## [1] "2021-12-02"

ymd(20211202)

## [1] "2021-12-02"

ymd_hms("2021-12-02 12:33:59")

## [1] "2021-12-02 12:33:59 UTC"

Constructing dates and times from individual components

make_date(2021, 12, 2)

## [1] "2021-12-02"

make_date("2021", "12", "2")

## [1] "2021-12-02"

make_datetime(2021, 12, 2, 12)

## [1] "2021-12-02 12:00:00 UTC"

make_datetime(2021, 12, 2, 12, 33, 59)

## [1] "2021-12-02 12:33:59 UTC"

11.3.1 Time Zones

Date creation functions take an argument tz = "America/New_York".

now(tz = "America/New_York")

## [1] "2023-01-15 11:34:16 EST"

To see all avaliable zones call OlsonNames()

Changing Time Zone

#forces change of time zone without changing date/time
x <- ymd_hm("2019-12-02 15:10")
force_tz(x, "America/New_York")

## [1] "2019-12-02 15:10:00 EST"

#converts date/tine to a new time zone
with_tz(x, "America/New_York")

## [1] "2019-12-02 10:10:00 EST"

11.3.2 Extracting From Dates

datetime <- today()
year(datetime)

## [1] 2023

yday(datetime)

## [1] 15

wday(datetime, week_start = 1) #by default, sunday is first day of week, use this to make it monday

## [1] 7

month(datetime, label = TRUE)

## [1] Jan
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

Rounding Dates/Times

floor_date(datetime, unit = "minute")

## [1] "2023-01-15 UTC"

ceiling_date(datetime, unit = "week")

## [1] "2023-01-22"

ceiling_date(datetime, unit = "quarter")

## [1] "2023-04-01"

floor_date(datetime, unit = "week", week_start = 1)

## [1] "2023-01-09"

11.3.3 Misc

Updating Dates/Times

datetime <- ymd_hms("2021-12-02 12:33:59")
datetime <- update(datetime, hour = 11, second = 33)
datetime

## [1] "2021-12-02 11:33:33 UTC"

#Alternatively,
datetime <- ymd_hms("2021-12-02 12:33:59")
hour(datetime) <- 11
second(datetime) <- 33
datetime

## [1] "2021-12-02 11:33:33 UTC"

Durations

Can do arithmetic with dates and times

einstein <- dmy("14th March 1879")
age <- today() - months(42) - einstein #age 42 months ago
age

## Time difference of 51257 days

Get a duration after arithmetic using as.duration()

as.duration(age)

## [1] "4428604800s (~140.33 years)"

11.4 Strings and Regular Expressions

11.4.1 Strange characters

When you want a string with strange characters, enclose it in r"(...)" instead of just "...".

z <- r"(As Roosevelt said,
"Believe you can and you're halfway there."
)"
cat(z)

## As Roosevelt said,
## "Believe you can and you're halfway there."

cat() is like a print command

11.4.2 `stringr` (part of tidyverse)

Most stringr functions begin with str_ so can use autocomplete for many string operations.

Basics

String Length

str_length(c("Data Science and Statistical Computing", "by", "Dr Louis Aslett"))

## [1] 38  2 15

Combining Strings

str_c("Data Science and Statistical Computing", "by", "Dr Louis Aslett")

## [1] "Data Science and Statistical ComputingbyDr Louis Aslett"

str_c("Data Science and Statistical Computing", "by", "Dr Louis Aslett", sep = " ")

## [1] "Data Science and Statistical Computing by Dr Louis Aslett"

str_c(c("Data Science and Statistical Computing", "by", "Dr Louis Aslett"))

## [1] "Data Science and Statistical Computing"
## [2] "by"                                    
## [3] "Dr Louis Aslett"

str_c(c("Data Science and Statistical Computing", "by", "Dr Louis Aslett"), collapse = " ")

## [1] "Data Science and Statistical Computing by Dr Louis Aslett"

Subsetting Strings

z <- c("Alice", "Bob", "Connie", "David")
str_sub(z, 1, 4)

## [1] "Alic" "Bob"  "Conn" "Davi"

str_sub(z, 1, 2) <- "Zo"
z

## [1] "Zoice"  "Zob"    "Zonnie" "Zovid"

Trimming

str_trim("  String with  trailing,   middle, and    leading   white space\n\n")

## [1] "String with  trailing,   middle, and    leading   white space"

str_squish("  String with  trailing,   middle, and    leading   white space\n\n")

## [1] "String with trailing, middle, and leading white space"

11.4.2.1 Regex’s

See all details in docs or lecture slides

Regex’s are used for finding patterns in strings

`str_view()`

Identify a pattern in a string:

Exact matching

str_view("string to find pattern in", "pattern")

## [1] │ string to find <pattern> in

Wildcard matching

x <- c("apple", "banana", "pear")
str_view(x, ".a.")

## [2] │ <ban>ana
## [3] │ p<ear>

How to match a .? - str_view(c(".bc", "a.c", "be."), "a\\.c") (use \ but make sure to escape it)

Anchoring

To start:

str_view(x, "^a")

## [1] │ <a>pple

To end:

str_view(x, "a$")

## [2] │ banan<a>

can also anchor to both.

Matching Set of Characters I

Find exactly first character that matches:

str_view(x, "[pan]")

## [1] │ <a><p><p>le
## [2] │ b<a><n><a><n><a>
## [3] │ <p>e<a>r

Find one or more instance consecutively:

str_view(x, "[pan]+")

## [1] │ <app>le
## [2] │ b<anana>
## [3] │ <p>e<a>r

Find exact number of instances occurring consecutively:

str_view(x, "[pan]{2}")

## [1] │ <ap>ple
## [2] │ b<an><an>a

Find a range or instances occurring consecutively:

str_view(x, "[pan]{1,3}")

## [1] │ <app>le
## [2] │ b<ana><na>
## [3] │ <p>e<a>r

Matching Set of Characters II

y <- c("There were 122 in total", "Overall about 390 found", "100 but no more")
str_view(y, "[0-9]+")

## [1] │ There were <122> in total
## [2] │ Overall about <390> found
## [3] │ <100> but no more

str_view(y, "[^A-Za-z ]+") #^ anchor inside so acts as a negation

## [1] │ There were <122> in total
## [2] │ Overall about <390> found
## [3] │ <100> but no more

str_view(y, "^[0-9]+") #^ anchor on outside

## [3] │ <100> but no more

str_view(y, "[a-z ]+")

## [1] │ T<here were >122< in total>
## [2] │ O<verall about >390< found>
## [3] │ 100< but no more>

11.5 Probability Distributions

Letter	Function	Use
“d”	`dnorm()`	evaluates pdf $f(x)$
“p”	`pnorm()`	evaluates cdf $F(x)$
“q”	`qnorm()`	evaluates inverse cdf $F^{-1}(q)$ i.e. $P(X \leq x) = q$
“r”	`rnorm()`	generates random numbers

Parameters will vary, e.g.

Normal distribution: dnorm, pnorm, qnorm, rnorm. Parameters: mean ($\mu$) and sd ($\sigma$).
t distribution: dt, pt, qt, rt. Parameter: df
$\chi^2$ distribution: dchisq, pchisq, qchisq, rchisq. Parameter: df

11.5.1 DSSC Theory Applications

11.5.1.1 Monte Carlo Hyothesis Test

Example 2.1

# Specify test statistic and null value
x.bar <- 8.6
n <- 6
mu0 <- 9.2

# Simulate lots of data assuming the null is true
t <- rep(0, 50000)

for(j in 0:50000) {
  z <- rnorm(n, mu0, sqrt(0.4)) #random sample (of n=6) generated under H0
  t[j] <- abs(mean(z)-mu0) #difference in mean of random sample and mean under H0 assumption
}

# Calculate empirical p-value
sum(t > abs(x.bar-mu0)) / 50000 #number of random samplea that were at least as far from mu0 as observation

11.5.1.2 Boot Strap

Set-up

Sample of size $n$ independent samples
There is a statistic $S( \cdot )$ we wish to estimate
We also want the standard error of this

General Method:

Draw $B$ new samples of size $n$ with replacement from $\mathbf{x} = (x_1, \ldots , x_n)$
Call these samples $\textbf{x}^{\star 1}, \ldots , \textbf{x}^{\star B}$
Calculate the estimate, $\bar{S}^{\star}=\frac{1}{B} \sum_{b=1}^{B} S\left(\mathbf{x}^{\star b}\right)$
Calculate the variance, $\widehat{\operatorname{Var}}(S(\mathbf{x}))=\frac{1}{B-1} \sum_{b=1}^{B}\left(S\left(\mathbf{x}^{\star b}\right)-\bar{S}^{\star}\right)^{2}$

Example 3.1 (Also see 3.5)

# Mouse data
x <- c(94,197,16,38,99,141,23)

# Number of bootstraps
B <- 1000

# Statistic
S <- mean

# Perform bootstrap
S.star <- rep(0, B)
for(b in 1:B) {
  x.star <- sample(x, replace = TRUE)
  S.star[b] <- S(x.star)
}

# Bootstrap estimate
mean(S.star)

# Standard error of estimate
sd(S.star)

Empirical CDF - ecdf(x)

Letter	Function	Use
“d”	`dnorm()`	evaluates pdf \(f(x)\)
“p”	`pnorm()`	evaluates cdf \(F(x)\)
“q”	`qnorm()`	evaluates inverse cdf \(F^{-1}(q)\) i.e. \(P(X \leq x) = q\)
“r”	`rnorm()`	generates random numbers

Chapter 11 DSSC - Data Wrangling, Presentation and Applications

11.1 Data Wrangling with Tidyverse

11.1.1 Tidy Form (tidyr)

Moving to and from tidy data

Other useful tidyr functions

11.1.2 Data Manipulation (dplyr)

Main dplyr functions

11.1.3 Pipelines

11.1.4 Joining Data Frames in Tidyverse

Advanced Data Frame Joins

11.2 Dynamic Documents and Interactive Dashboards

11.2.1 RMD

11.2.2 Shiny

11.2.2.1 UI

11.2.2.1.1 Pages

11.2.2.1.2 Layouts and Panels

11.2.2.1.3 UI Inputs

11.2.2.2 Server and UI Outputs

11.2.2.2.1 UI Outputs

11.2.2.2.2 Variables outside outputs (reactive)

11.2.2.3 Full Example

11.3 Dates

11.3.1 Time Zones

11.3.2 Extracting From Dates

11.3.3 Misc

11.4 Strings and Regular Expressions

11.4.1 Strange characters

11.4.2 stringr (part of tidyverse)

Basics

11.4.2.1 Regex’s

str_view()

11.5 Probability Distributions

11.5.1 DSSC Theory Applications

11.5.1.1 Monte Carlo Hyothesis Test

11.5.1.2 Boot Strap

11.1.1 Tidy Form (`tidyr`)

Other useful `tidyr` functions

11.1.2 Data Manipulation (`dplyr`)

Main `dplyr` functions

11.4.2 `stringr` (part of tidyverse)

`str_view()`