Wrangle Data with {dplyr}

Modified

February 19, 2024

{dplyr} verbs help you wrangle, clean, and normalize your data

dplyr function use for
select() subset columns
filter() subset rows
arrange() sort rows by column variable values
mutate() Create new, or modify variables
group_by() use with summarize for subtotals
summarize() generate column totals and subtotals, etc.
count() a specialized summarize() function

Examples

First we need to load the {dplyr} package for wrangling and the {readr} package for importing CSV data. In our case, we’ll do that by loading the tidyverse which loads {dplyr}, {readr} and several other helpful packages. Then we need to load our data

library(tidyverse)
brodhead_center <- read_csv("data/brodhead_center.csv")

select()

brodhead_center |> 
  select(name, type)
ABCDEFGHIJ0123456789
name
<chr>
type
<chr>
Devils Krafthousebar and grill
Devils Krafthousebar and grill
Devils Krafthousebar and grill
Devils Krafthousebar and grill
Devils Krafthousebar and grill
Devils Krafthousebar and grill
Devils Krafthousebar and grill
Devils Krafthousebar and grill
Devils Krafthousebar and grill
Devils Krafthousebar and grill

Select columns

Select columns

filter()

brodhead_center |> 
  filter(menuType == "dessert")
ABCDEFGHIJ0123456789
name
<chr>
type
<chr>
menuType
<chr>
itemType
<chr>
Devils Krafthousebar and grilldessertdessert
Devils Krafthousebar and grilldessertdessert
Devils Krafthousebar and grilldessertdessert
Cafecafedessertdessert
Cafecafedessertdessert
Cafecafedessertdessert
Cafecafedessertdessert

Filter by rows

Filter by rows

arrange()

brodhead_center |> 
  arrange(cost)
ABCDEFGHIJ0123456789
name
<chr>
type
<chr>
menuType
<chr>
itemType
<chr>
Tandoorrestaurantappetizerbread
Tandoorrestaurantappetizerbread
Tandoorrestaurantappetizerbread
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillsidesoup
Tandoorrestaurantappetizerappetizer
Tandoorrestaurantappetizerappetizer
Tandoorrestaurantappetizerappetizer

Arrange rows by the values in a column

Arrange rows by the values in a column

mutate()

brodhead_center |> 
  mutate(ratings_high = rating * 2)
ABCDEFGHIJ0123456789
name
<chr>
type
<chr>
menuType
<chr>
itemType
<chr>
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillappetizersnack
Devils Krafthousebar and grillentreesandwich
Devils Krafthousebar and grillentreesandwich

Create new variable or modify variable with mutate()

Create new variable or modify variable with mutate()

We can also mutate data by groups or categories

brodhead_center |> 
  mutate(avg_item_rating_rest = mean(rating, na.rm = TRUE), 
         .by = name, 
         .after = name)
ABCDEFGHIJ0123456789
name
<chr>
avg_item_rating_rest
<dbl>
type
<chr>
Devils Krafthouse6.65625bar and grill
Devils Krafthouse6.65625bar and grill
Devils Krafthouse6.65625bar and grill
Devils Krafthouse6.65625bar and grill
Devils Krafthouse6.65625bar and grill
Devils Krafthouse6.65625bar and grill
Devils Krafthouse6.65625bar and grill
Devils Krafthouse6.65625bar and grill
Devils Krafthouse6.65625bar and grill
Devils Krafthouse6.65625bar and grill

count()

Count values in a group
menuType n
entree 24
appetizer 23
dessert 7
side 5
brodhead_center |> 
  count(menuType)
ABCDEFGHIJ0123456789
menuType
<chr>
n
<int>
appetizer23
dessert7
entree24
side5

group_by() & summarise()

Summarise column
Sum_of_cost
412
brodhead_center |> 
  group_by(name) |> 
  summarise(min_cost = min(cost), mean_cost = mean(cost), max_cost = max(cost))
ABCDEFGHIJ0123456789
name
<chr>
min_cost
<dbl>
mean_cost
<dbl>
max_cost
<dbl>
Cafe56.5000008
Devils Krafthouse47.50000010
Tandoor26.31578912
or

Summarize by groups, without group_by()

brodhead_center |> 
  summarise(min_cost = min(cost), .by = name)
ABCDEFGHIJ0123456789
name
<chr>
min_cost
<dbl>
Devils Krafthouse4
Tandoor2
Cafe5