dplyr is a widely adopted r package, part of tidyverse, which uses English verbs to reshape data.

library(tidyverse)

I made a small subset of the starwars dataset: small_sw

small_sw

Arrange

Arrange to sort rows (observations) by column headers (variables)

small_sw %>% 
  arrange(species, desc(mass), homeworld)

Select

Select to choose columns (variables)

starwars %>% 
  arrange(species, desc(mass), homeworld) %>% 
  select(species, mass, homeworld, name:starships)

Filter

Filter to select rows (observations)

starwars_32_mass <- starwars %>% 
  filter(mass == 80)

starwars_32_mass    # display filtered data frame

Mutate

Mutate to create new variables (columns)

starwars_32_mass %>% 
  mutate(BMI = height / mass) %>% 
  select(name, BMI, height, mass, hair_color, 1:13)

Count

Count to summarize observations (rows)

starwars %>% 
  count(mass)

Summarize

Summarize to collapse values into a summary

starwars %>% 
  group_by(species) %>% 
  summarize(Count = n(), 
            mean_ht = mean(height), 
            min_ht = min(height), 
            max_ht = max(height)) %>% 
  arrange(desc(Count))

Put it all together

We can pipe commands together. Think of a pipe (i.e. %>%) as a conjunction. Any time you see a pipe, think “and then.” You can insert a pipe with ‘Ctrl+Shift+M’ (Help > Keyboard Shortcuts)

starwars %>% 
  group_by(species) %>% 
  summarize(Count = n(), 
            mean_ht = mean(height), 
            min_ht = min(height), 
            max_ht = max(height)) %>% 
  arrange(desc(Count)) %>% 
  filter(!is.na(mean_ht), Count > 1)
 
R We Having Fun Yet‽ -- Learning Series
Data & Visualization Services
Duke University Libraries
C bn
Shareable via Creative Commons: CC By-NC