Chapter 11 DSSC - Data Wrangling, Presentation and Applications

To check for missing data use,

print(paste("Missing data:", sum(is.na(df$var)), sep=" ", collapse=""))

11.1 Data Wrangling with Tidyverse

Loading tidyverse,

library("tidyverse")

11.1.1 Tidy Form (tidyr)

What is tidy data?

  • each variable is in a column
  • each observation is in a row
  • each type of observational unit forms a table

Moving to and from tidy data

Problems (how data may violate tidy form)

  • Data is too wide - one variable spread over multiple columns (use pivot_longer())
  • Data is too long - one observation spread along multiple rows (use pivot_wider())

pivot_longer()

Makes Wide Data Longer

The arguments are:

  • Data Frame
  • Columns to transform
  • Name of the column where previous column names should go
  • Name of the column where values from the column should go

Example

who_wide
##       country  y1999  y2000
## 1 Afghanistan    745   2666
## 2      Brazil  37737  80488
## 3       China 212258 213766
pivot_longer(who_wide,
             c(`y1999`, `y2000`),
             names_to = "year",
             values_to = "cases")
## # A tibble: 6 × 3
##   country     year   cases
##   <chr>       <chr>  <dbl>
## 1 Afghanistan y1999    745
## 2 Afghanistan y2000   2666
## 3 Brazil      y1999  37737
## 4 Brazil      y2000  80488
## 5 China       y1999 212258
## 6 China       y2000 213766

pivot_wider()

Makes Long Data Wider

The arguments are:

  • Data Frame
  • Columns to transform
  • Name of the column where column names should come from
  • Name of the column where values should come from

Example

who_long
##        country year       type      count
## 1  Afghanistan 1999      cases        745
## 2  Afghanistan 1999 population   19987071
## 3  Afghanistan 2000      cases       2666
## 4  Afghanistan 2000 population   20595360
## 5       Brazil 1999      cases      37737
## 6       Brazil 1999 population  172006362
## 7       Brazil 2000      cases      80488
## 8       Brazil 2000 population  174504898
## 9        China 1999      cases     212258
## 10       China 1999 population 1272915272
## 11       China 2000      cases     213766
## 12       China 2000 population 1280428583
pivot_wider(who_long,
            names_from = "type",
            values_from = "count")
## # A tibble: 6 × 4
##   country      year  cases population
##   <chr>       <dbl>  <dbl>      <dbl>
## 1 Afghanistan  1999    745   19987071
## 2 Afghanistan  2000   2666   20595360
## 3 Brazil       1999  37737  172006362
## 4 Brazil       2000  80488  174504898
## 5 China        1999 212258 1272915272
## 6 China        2000 213766 1280428583

Additional Example - DSSC Lab 5.6

pres.res
##   Candidate       California       Arkansas
## 1   Clinton 8753788/14181595 380494/1130676
## 2     Trump 4483810/14181595 684872/1130676
## 3     Other  943997/14181595  65310/1130676
pres.res2 <- pivot_longer(pres.res,
                          c("California", "Arkansas"),
                          names_to = "State",
                          values_to = "Proportion")
pres.res2
## # A tibble: 6 × 3
##   Candidate State      Proportion      
##   <chr>     <chr>      <chr>           
## 1 Clinton   California 8753788/14181595
## 2 Clinton   Arkansas   380494/1130676  
## 3 Trump     California 4483810/14181595
## 4 Trump     Arkansas   684872/1130676  
## 5 Other     California 943997/14181595 
## 6 Other     Arkansas   65310/1130676
pres.res3 <- separate(pres.res2, "Proportion", c("Votes", "Total"))
pres.res3
## # A tibble: 6 × 4
##   Candidate State      Votes   Total   
##   <chr>     <chr>      <chr>   <chr>   
## 1 Clinton   California 8753788 14181595
## 2 Clinton   Arkansas   380494  1130676 
## 3 Trump     California 4483810 14181595
## 4 Trump     Arkansas   684872  1130676 
## 5 Other     California 943997  14181595
## 6 Other     Arkansas   65310   1130676
pres.res4 <- mutate(pres.res3, Votes = as.numeric(Votes), Total = as.numeric(Total))
str(pres.res4)
## tibble [6 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Candidate: chr [1:6] "Clinton" "Clinton" "Trump" "Trump" ...
##  $ State    : chr [1:6] "California" "Arkansas" "California" "Arkansas" ...
##  $ Votes    : num [1:6] 8753788 380494 4483810 684872 943997 ...
##  $ Total    : num [1:6] 14181595 1130676 14181595 1130676 14181595 ...
pres.res5 <- pres.res4 |> 
  group_by(Candidate) |> 
  summarise(Percent = sum(Votes)/sum(Total)*100) |> 
  arrange(desc(Percent))
pres.res5
## # A tibble: 3 × 2
##   Candidate Percent
##   <chr>       <dbl>
## 1 Clinton     59.7 
## 2 Trump       33.8 
## 3 Other        6.59

Other useful tidyr functions

  • separate() - splits one column of strings into multiple new columns
  • unite() - combines many columns into one (as a string)
  • extract() - uses regular expressions to pull out specific information from a string column

Example

fball
##        home     away score
## 1     Man U Shef Wed   2-1
## 2 Tottenham  Arsenal   0-0
## 3   Chelsea    W Ham   1-0
separate(fball, "score", c("home_goals", "away_goals"))
##        home     away home_goals away_goals
## 1     Man U Shef Wed          2          1
## 2 Tottenham  Arsenal          0          0
## 3   Chelsea    W Ham          1          0

11.1.2 Data Manipulation (dplyr)

Main dplyr functions

(First argument is always the data frame)

filter() - Focus on a subset of rows

Other Arguments

  • condition to filter by

For example, filter(who, year == 1999)

(see above list of logical operators)

arrange() - Reorder the rows

Other Arguments

  • Variable names to sort by, sub-sorting by later variables
  • Wrap variable name in desc() to sort descending (ascending by default)

For example, arrange(who, year, desc(country))

select() - Focus on a subset of variables (columns)

Other Arguments

  • Name of variables to retain

For example, select(who, year, cases)

mutate() - Create new derived variables

Other Arguments

  • Name of new variable and equation defining it

For example, mutate(who, rate = cases/population)

group_by() - Splits a data frame up into groups according to one variable

Other Arguments

  • Name of variable to group by

For example, group_by(who, country)

summarise() - Create summary statistics (collapsing many rows) by groupings

Other Arguments

  • Function to summarise by

For example, summarise(who, total = sum(cases))

Note: often want to summarise by group

For example,

who2 <- group_by(who, country)
summarise(who2, total = sum(cases), change = max(cases)-min(cases))

11.1.3 Pipelines

Chain functions (not limited to tidyverse functions) where result of first function is first entry in second function and so on.

Example,

filter(x, ...) |> 
  select(...) |> 
  mutate(...) |> 
  group_by(...) |> 
  arrange(...)

Pipeline Operator: CMD-SHIFT-M

11.1.4 Joining Data Frames in Tidyverse

Simplest case of joining data frames (more details in data frames section):

  • rbind() - paste rows together (above/below)
  • cbind() - paste cols together (left/right)

These methods can be very error prone (requires variables/observations in identical order etc)

Advanced Data Frame Joins

  • left_join(x, y) - add new variables from y to x, keeping all x obs
  • right_join(x, y) - add new variables from x to y, keeping all y obs
  • inner_join(x, y) - keep only matching rows
  • full_join(x, y) - keep all rows in both x and y

Example

band_members
## # A tibble: 3 × 2
##   name  band   
##   <chr> <chr>  
## 1 Mick  Stones 
## 2 John  Beatles
## 3 Paul  Beatles
band_instruments2
## # A tibble: 3 × 2
##   artist plays 
##   <chr>  <chr> 
## 1 John   guitar
## 2 Paul   bass  
## 3 Keith  guitar
left_join(band_members, band_instruments2, by = c("name" = "artist"))
## # A tibble: 3 × 3
##   name  band    plays 
##   <chr> <chr>   <chr> 
## 1 Mick  Stones  <NA>  
## 2 John  Beatles guitar
## 3 Paul  Beatles bass

11.2 Dynamic Documents and Interactive Dashboards

11.2.1 RMD

Document Preamble

---
title: "Example"
author: "(optional) Jamie Reason"
date: "(optional)"
output:
  html_document: default
  pdf_document: default
---

Course Slides on RMD

For further formatting, refer to RMD Cheat Sheet

11.2.2 Shiny

Resource: Mastering Shiny Book

Outline

  • UI
  • Server

R code can be added to any part of a shiny document but only the code in the server will be updated when needed.

Starting a Shiny Dashboard (create a new shiny app in R studio):

fluidpage() is just the most common but there are alternatives

library(shiny)

#misc code

ui <- fluidpage(
  ...
)

server <- function(input, output, session){
  #server code
}

shinyApp(ui, server)

11.2.2.1 UI

UI elements reference guide

11.2.2.1.1 Pages

Examples

ui <- fluidPage(
  "One",
  "Two",
  "Three"
)
shinyApp(ui, server = function(input, output, session) {})
ui <- navbarPage(
  "Title of page",
  tabPanel("My first tab", "Hello Alice"),
  tabPanel("My second tab", "Hello Bob")
)
shinyApp(ui, server = function(input, output, session) {})

Other pages: fixedPage(), fillPage(), …

11.2.2.1.2 Layouts and Panels

Goes inside of the page

  • titlePanel("My App")
  • sidebarLayout()
    • first argument sidebarPanel()
    • second argument mainPanel()
  • fluidrow() - creates a new row with columns in
    • ```column() calls
      • first a number 1 to 12 (all columns numbers must sum to 12) for width -other arguments are outputs

Examples

ui <- fluidPage(
  titlePanel("My App"),
  sidebarLayout(
    sidebarPanel("I'm in sidebar"),
    mainPanel("I'm in main panel")
  )
)
shinyApp(ui, server = function(input, output, session) {})
ui <- fluidPage(
  fluidRow(
    column(4, "Lorem ipsum dolor ..."),
    column(8, "Lorem ipsum dolor ...")
  ),
  fluidRow(
    column(6, "Lorem ipsum dolor ..."),
    column(6, "Lorem ipsum dolor ...")
  )
)
shinyApp(ui, server = function(input, output, session) {})
11.2.2.1.3 UI Inputs

All inputs take same first argument - inputId, the unique identifier of the input.

This can be accessed by using input$name (in the server).

The second argument is a label, or how it’s name appears on the dashboard.

Text Inputs

  • textInput()
  • passwordInput()
  • textAreaInput()

Numeric Inputs

  • numericInput()
  • sliderInput()

Categoric Inputs

  • selectInput()
  • radioButtons()
  • checkboxGroupInput()

Examples

ui <- fluidPage(
  numericInput("num", "Number one", value = 0, min = 0, max = 100),
  sliderInput("num2", "Number two", value = 50, min = 0, max = 100),
  sliderInput("rng", "Range", value = c(10, 20), min = 0, max = 100)
)
shinyApp(ui, server = function(input, output, session) {})
animals <- c("dog", "cat", "mouse", "bird", "other", "I hate animals")
ui <- fluidPage(
  selectInput("state", "What's your favourite state?", state.name),
  radioButtons("animal", "What's your favourite animal?", animals),
  checkboxGroupInput("animal2", "What animals do you like?", animals)
)
shinyApp(ui, server = function(input, output, session) {})

11.2.2.2 Server and UI Outputs

All outputs take same first argument, outputId and an output can be called by output$name.

11.2.2.2.1 UI Outputs

Text Outputs

  • textOutput()
  • renderText()
  • verbatimTextOutput()
  • renderPrint()

Plot Outputs

  • plotOutput() and renderPlot()
    • width argument
    • res = 96 argument closest to what you see inj RStudio

Examples

ui <- fluidPage(
  textInput("name", "What's your name?"),
  textOutput("greet")
)
server <- function(input, output, session) {
  output$greet <- renderText({
    if(nchar(input$name) > 0) {
      return(paste0("Hello ", input$name))
    } else {
      return("Hello friend, tell me your name!")
    }
  })
}
shinyApp(ui, server)
ui <- fluidPage(
  plotOutput("myplot", width = "400px")
)
server <- function(input, output, session) {
  output$myplot <- renderPlot({
    plot(iris$Sepal.Length, iris$Sepal.Width)
  }, res = 96)
}
shinyApp(ui, server)
11.2.2.2.2 Variables outside outputs (reactive)

Instead of making variables in the server (which you can’t do as they wouldn’t be reactive), you use the reactive({}) call:

Inside the server,

name <- ...

becomes,

name <- reactive({
  ...
})

And when name is used it should be called as name()

Examples

server <- function(input, output, session) {
  name <- reactive({
    toupper(input$name)
  })
  output$greet <- renderText({
    if(nchar(input$name) > 0) {
      return(paste0("Hello ", name(), ", here is your plot ..."))
    } else {
      return("Hello friend, tell me your name!")
    }
  })
  output$myplot <- renderPlot({
    if(nchar(input$name) > 0) {
      ggplot(iris, aes_string(x = input$xvar, y = input$yvar)) +
        geom_point() +
        labs(title = paste0(name(), "'s plot!"))
    }
  }, res = 96)
}

11.2.2.3 Full Example

From exercise 5.78 (Lab 8)

library("shiny")
library("ukpolice")
library("tidyverse")
library("leaflet")

nbd <- ukc_neighbourhoods("durham")
nbd2 <- nbd$id
names(nbd2) <- nbd$name

# Define UI for application
ui <- fluidPage(
  titlePanel("UK Police Data"),
  sidebarLayout(
    sidebarPanel(
      selectInput("nbd", "Choose Durham Constabulary Neighborhood", nbd2),
      textInput("date", "Enter the desired year and month in the format YYYY-MM", value = "2021-09")
    ),

    # Show a plot of the generated distribution
    mainPanel(
      plotOutput("barchart"),
      leafletOutput("map")
    )
  )
)

# Define server logic
server <- function(input, output) {
  # Get boundaries for selected neighbourhood
  # Wrapped in a reactive because we need this to trigger a
  # change when the input neighborhood changes
  bdy <- reactive({
    bdy <- ukc_neighbourhood_boundary("durham", input$nbd)
    bdy |>
      mutate(latitude = as.numeric(latitude),
             longitude = as.numeric(longitude))
  })

  # Get crimes for selected neighbourhood
  # Also wrapped in a reactive because we need this to trigger a
  # change when the boundary above, or date, changes
  crimes <- reactive({
    bdy2 <- bdy() |>
      select(lat = latitude,
             lng = longitude)

    ukc_crime_poly(bdy2[round(seq(1, nrow(bdy2), length.out = 100)), ], input$date)
  })

  # First do plot
  output$barchart <- renderPlot({
    ggplot(crimes()) +
      geom_bar(aes(y = category, fill = outcome_status_category)) +
      labs(y = "Crime", fill = "Outcome Status")
  }, res = 96)

  # Then do map
  output$map <- renderLeaflet({
    leaflet() |>
      addTiles() |>
      addPolygons(lng = bdy()$longitude, lat = bdy()$latitude) |>
      addCircles(lng = as.numeric(crimes()$longitude), lat = as.numeric(crimes()$latitude), label = crimes()$category, color = "red")
  })
}

# Run the application

11.3 Dates

(see DSSC Lab 9)

Use lubridates package

library("lubridate")
## Warning: package 'lubridate' was built under R version 4.1.2
## Loading required package: timechange
## Warning: package 'timechange' was built under R version 4.1.2
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

lubridate.tidyverse.org

###Creating Dates {-}

Current date and time

today()
## [1] "2023-01-15"
now()
## [1] "2023-01-15 16:34:16 GMT"
str(today()) #these are dates not strings
##  Date[1:1], format: "2023-01-15"

Constructing dates from strings and numbers

ymd("2021-12-02")
## [1] "2021-12-02"
mdy("December 2nd, 2021")
## [1] "2021-12-02"
ymd(20211202)
## [1] "2021-12-02"
ymd_hms("2021-12-02 12:33:59")
## [1] "2021-12-02 12:33:59 UTC"

Constructing dates and times from individual components

make_date(2021, 12, 2)
## [1] "2021-12-02"
make_date("2021", "12", "2")
## [1] "2021-12-02"
make_datetime(2021, 12, 2, 12)
## [1] "2021-12-02 12:00:00 UTC"
make_datetime(2021, 12, 2, 12, 33, 59)
## [1] "2021-12-02 12:33:59 UTC"

11.3.1 Time Zones

Date creation functions take an argument tz = "America/New_York".

now(tz = "America/New_York")
## [1] "2023-01-15 11:34:16 EST"

To see all avaliable zones call OlsonNames()

Changing Time Zone

#forces change of time zone without changing date/time
x <- ymd_hm("2019-12-02 15:10")
force_tz(x, "America/New_York")
## [1] "2019-12-02 15:10:00 EST"
#converts date/tine to a new time zone
with_tz(x, "America/New_York")
## [1] "2019-12-02 10:10:00 EST"

11.3.2 Extracting From Dates

datetime <- today()
year(datetime)
## [1] 2023
yday(datetime)
## [1] 15
wday(datetime, week_start = 1) #by default, sunday is first day of week, use this to make it monday
## [1] 7
month(datetime, label = TRUE)
## [1] Jan
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

Rounding Dates/Times

floor_date(datetime, unit = "minute")
## [1] "2023-01-15 UTC"
ceiling_date(datetime, unit = "week")
## [1] "2023-01-22"
ceiling_date(datetime, unit = "quarter")
## [1] "2023-04-01"
floor_date(datetime, unit = "week", week_start = 1)
## [1] "2023-01-09"

11.3.3 Misc

Updating Dates/Times

datetime <- ymd_hms("2021-12-02 12:33:59")
datetime <- update(datetime, hour = 11, second = 33)
datetime
## [1] "2021-12-02 11:33:33 UTC"
#Alternatively,
datetime <- ymd_hms("2021-12-02 12:33:59")
hour(datetime) <- 11
second(datetime) <- 33
datetime
## [1] "2021-12-02 11:33:33 UTC"

Durations

Can do arithmetic with dates and times

einstein <- dmy("14th March 1879")
age <- today() - months(42) - einstein #age 42 months ago
age
## Time difference of 51257 days

Get a duration after arithmetic using as.duration()

as.duration(age)
## [1] "4428604800s (~140.33 years)"

11.4 Strings and Regular Expressions

11.4.1 Strange characters

When you want a string with strange characters, enclose it in r"(...)" instead of just "...".

z <- r"(As Roosevelt said,
"Believe you can and you're halfway there."
)"
cat(z)
## As Roosevelt said,
## "Believe you can and you're halfway there."

cat() is like a print command

11.4.2 stringr (part of tidyverse)

Most stringr functions begin with str_ so can use autocomplete for many string operations.

Basics

String Length

str_length(c("Data Science and Statistical Computing", "by", "Dr Louis Aslett"))
## [1] 38  2 15

Combining Strings

str_c("Data Science and Statistical Computing", "by", "Dr Louis Aslett")
## [1] "Data Science and Statistical ComputingbyDr Louis Aslett"
str_c("Data Science and Statistical Computing", "by", "Dr Louis Aslett", sep = " ")
## [1] "Data Science and Statistical Computing by Dr Louis Aslett"
str_c(c("Data Science and Statistical Computing", "by", "Dr Louis Aslett"))
## [1] "Data Science and Statistical Computing"
## [2] "by"                                    
## [3] "Dr Louis Aslett"
str_c(c("Data Science and Statistical Computing", "by", "Dr Louis Aslett"), collapse = " ")
## [1] "Data Science and Statistical Computing by Dr Louis Aslett"

Subsetting Strings

z <- c("Alice", "Bob", "Connie", "David")
str_sub(z, 1, 4)
## [1] "Alic" "Bob"  "Conn" "Davi"
str_sub(z, 1, 2) <- "Zo"
z
## [1] "Zoice"  "Zob"    "Zonnie" "Zovid"

Trimming

str_trim("  String with  trailing,   middle, and    leading   white space\n\n")
## [1] "String with  trailing,   middle, and    leading   white space"
str_squish("  String with  trailing,   middle, and    leading   white space\n\n")
## [1] "String with trailing, middle, and leading white space"

11.4.2.1 Regex’s

See all details in docs or lecture slides

Regex’s are used for finding patterns in strings

str_view()

Identify a pattern in a string:

Exact matching

str_view("string to find pattern in", "pattern")
## [1] │ string to find <pattern> in

Wildcard matching

x <- c("apple", "banana", "pear")
str_view(x, ".a.")
## [2] │ <ban>ana
## [3] │ p<ear>

How to match a .? - str_view(c(".bc", "a.c", "be."), "a\\.c") (use \ but make sure to escape it)

Anchoring

To start:

str_view(x, "^a")
## [1] │ <a>pple

To end:

str_view(x, "a$")
## [2] │ banan<a>

can also anchor to both.

Matching Set of Characters I

Find exactly first character that matches:

str_view(x, "[pan]")
## [1] │ <a><p><p>le
## [2] │ b<a><n><a><n><a>
## [3] │ <p>e<a>r

Find one or more instance consecutively:

str_view(x, "[pan]+")
## [1] │ <app>le
## [2] │ b<anana>
## [3] │ <p>e<a>r

Find exact number of instances occurring consecutively:

str_view(x, "[pan]{2}")
## [1] │ <ap>ple
## [2] │ b<an><an>a

Find a range or instances occurring consecutively:

str_view(x, "[pan]{1,3}")
## [1] │ <app>le
## [2] │ b<ana><na>
## [3] │ <p>e<a>r

Matching Set of Characters II

y <- c("There were 122 in total", "Overall about 390 found", "100 but no more")
str_view(y, "[0-9]+")
## [1] │ There were <122> in total
## [2] │ Overall about <390> found
## [3] │ <100> but no more
str_view(y, "[^A-Za-z ]+") #^ anchor inside so acts as a negation
## [1] │ There were <122> in total
## [2] │ Overall about <390> found
## [3] │ <100> but no more
str_view(y, "^[0-9]+") #^ anchor on outside
## [3] │ <100> but no more
str_view(y, "[a-z ]+")
## [1] │ T<here were >122< in total>
## [2] │ O<verall about >390< found>
## [3] │ 100< but no more>

11.5 Probability Distributions

Letter Function Use
“d” dnorm() evaluates pdf \(f(x)\)
“p” pnorm() evaluates cdf \(F(x)\)
“q” qnorm() evaluates inverse cdf \(F^{-1}(q)\) i.e. \(P(X \leq x) = q\)
“r” rnorm() generates random numbers

Parameters will vary, e.g.

  • Normal distribution: dnorm, pnorm, qnorm, rnorm. Parameters: mean (\(\mu\)) and sd (\(\sigma\)).
  • t distribution: dt, pt, qt, rt. Parameter: df
  • \(\chi^2\) distribution: dchisq, pchisq, qchisq, rchisq. Parameter: df

11.5.1 DSSC Theory Applications

11.5.1.1 Monte Carlo Hyothesis Test

Example 2.1

# Specify test statistic and null value
x.bar <- 8.6
n <- 6
mu0 <- 9.2

# Simulate lots of data assuming the null is true
t <- rep(0, 50000)

for(j in 0:50000) {
  z <- rnorm(n, mu0, sqrt(0.4)) #random sample (of n=6) generated under H0
  t[j] <- abs(mean(z)-mu0) #difference in mean of random sample and mean under H0 assumption
}

# Calculate empirical p-value
sum(t > abs(x.bar-mu0)) / 50000 #number of random samplea that were at least as far from mu0 as observation

11.5.1.2 Boot Strap

Set-up

  • Sample of size \(n\) independent samples
  • There is a statistic \(S( \cdot )\) we wish to estimate
  • We also want the standard error of this

General Method:

  1. Draw \(B\) new samples of size \(n\) with replacement from \(\mathbf{x} = (x_1, \ldots , x_n)\)
  2. Call these samples \(\textbf{x}^{\star 1}, \ldots , \textbf{x}^{\star B}\)
  3. Calculate the estimate, \(\bar{S}^{\star}=\frac{1}{B} \sum_{b=1}^{B} S\left(\mathbf{x}^{\star b}\right)\)
  4. Calculate the variance, \(\widehat{\operatorname{Var}}(S(\mathbf{x}))=\frac{1}{B-1} \sum_{b=1}^{B}\left(S\left(\mathbf{x}^{\star b}\right)-\bar{S}^{\star}\right)^{2}\)

Example 3.1 (Also see 3.5)

# Mouse data
x <- c(94,197,16,38,99,141,23)

# Number of bootstraps
B <- 1000

# Statistic
S <- mean

# Perform bootstrap
S.star <- rep(0, B)
for(b in 1:B) {
  x.star <- sample(x, replace = TRUE)
  S.star[b] <- S(x.star)
}

# Bootstrap estimate
mean(S.star)

# Standard error of estimate
sd(S.star)

Empirical CDF - ecdf(x)