Wrangle Data with {dplyr}

Modified

February 19, 2024

{dplyr} verbs help you wrangle, clean, and normalize your data

dplyr function use for
select() subset columns
filter() subset rows
arrange() sort rows by column variable values
mutate() Create new, or modify variables
group_by() use with summarize for subtotals
summarize() generate column totals and subtotals, etc.
count() a specialized summarize() function

Examples

First we need to load the {dplyr} package for wrangling and the {readr} package for importing CSV data. In our case, we’ll do that by loading the tidyverse which loads {dplyr}, {readr} and several other helpful packages. Then we need to load our data

library(tidyverse)
brodhead_center <- read_csv("data/brodhead_center.csv")

select()

brodhead_center |> 
  select(name, type)

Select columns

Select columns

filter()

brodhead_center |> 
  filter(menuType == "dessert")

Filter by rows

Filter by rows

arrange()

brodhead_center |> 
  arrange(cost)

Arrange rows by the values in a column

Arrange rows by the values in a column

mutate()

brodhead_center |> 
  mutate(ratings_high = rating * 2)

Create new variable or modify variable with mutate()

Create new variable or modify variable with mutate()

We can also mutate data by groups or categories

brodhead_center |> 
  mutate(avg_item_rating_rest = mean(rating, na.rm = TRUE), 
         .by = name, 
         .after = name)

count()

Count values in a group
menuType n
entree 24
appetizer 23
dessert 7
side 5
brodhead_center |> 
  count(menuType)

group_by() & summarise()

Summarise column
Sum_of_cost
412
brodhead_center |> 
  group_by(name) |> 
  summarise(min_cost = min(cost), mean_cost = mean(cost), max_cost = max(cost))
or

Summarize by groups, without group_by()

brodhead_center |> 
  summarise(min_cost = min(cost), .by = name)