The Tidyverse is a set of packages that work together within a common tidy-data framework.1 These packages modernize R and enable an efficient workflow. This tidy framework is great for learning while being productive.

Image Credit:  RStudio.com

Image Credit: RStudio.com

 

Load library Packages:

library(tidyverse)

 

Note: You must install a library-package before you load a library-package. The above code chunk engages the library() function to load the “tidyverse” packages. (Video Help: Install Pages)

As you see from the image above, the tidyverse package is a super-package consisting of several more precise and focused packages. You can install packages from the Tools menu-bar in the RStudio IDE.

Load Custom Data

Outside of this workshop, you’ll likely want to load your own data. R and RStudio support many methods of gathering and importing data. Two common data import methods include importing data from the local file system or via a URL.

RStudio has an on-board data loading wizard which leverages the readr() package for data loading. There are several ways to engage this import function:

  • File > Import Dataset
  • Or, via the Import Dataset “button” found in the Environment Pane.
RStudio Environment Pane
RStudio Environment Pane


Using the Import wizard, you can generate (and execute) the code necessary to read in the cars.csv file.

# readr::read_csv
# 'read_csv()' is part of the tidyverse 'readr' package.  

cars <- read_csv("data/cars.csv")

‘reader::read_csv()’ helpfully overrides default behavior of the “base R” read.csv() function. For example read_csv() does not automatically convert strings as factors. You can read more about readr http://readr.tidyverse.org/

Workshop Data Set

During this workshop demonstration, We’ll use the Starwars characters dataset, starwars. The data, part of the dplyr package, come from SWAPI, the Star Wars API, http://swapi.co/. Since the data are integrated into dplyr, we don’t need to load the data, but you may still want to find information about the dataset. The codebood for the Starwars dataset is integrated into the dplyr documentation. To view the Starwars codebook, first load the library(dplyr) package. Then, via the Help pane, search starwars. Alternatively, in the Console pane, type ?starwars.

Note: dplyr is part of tidyverse which we loaded with the command: library(tidyverse)

In the next module, Data Wrangling, we’ll discus the dplyr package in greater detail.

View your data in a grid.

starwars

Other Data Loaders, a selective list

  • View() is an exploratory convenience while using RStudio. View() will not generate output in your RMarkdown reports, but one advantage to View() is an on-board, clickable, sortable data-viewer used while computing within RStudio.

  • The Environment Pane within RStudio presents information about data objects

Modern Data Frames: Tibbles

A tibble (tbl) is a “Table as data frame”, a modern tidyverse table.

class(starwars) 
## [1] "tbl_df"     "tbl"        "data.frame"
starwars

 

Data Structure

Most Common Data Structures

  • Vector
  • Data Frame & Tibble
  • List
  • Matrix

Read more about it in R for Data Science

Glimpse into a data frame

First, let’s glimpse into a Tibble. How do you know the object is a Tibble? Read below about class(), but first …

glimpse() reveals the structure of an object

glimpse(starwars)
## Observations: 87
## Variables: 13
## $ name       <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", ...
## $ height     <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188...
## $ mass       <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 8...
## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "b...
## $ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "l...
## $ eye_color  <chr> "blue", "yellow", "red", "yellow", "brown", "blue",...
## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, NA, 24.0...
## $ gender     <chr> "male", NA, NA, "male", "female", "male", "female",...
## $ homeworld  <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine", "Alder...
## $ species    <chr> "Human", "Droid", "Droid", "Human", "Human", "Human...
## $ films      <list> [<"Revenge of the Sith", "Return of the Jedi", "Th...
## $ vehicles   <list> [<"Snowspeeder", "Imperial Speeder Bike">, <>, <>,...
## $ starships  <list> [<"X-wing", "Imperial shuttle">, <>, <>, "TIE Adva...

The old-school way to glimpse into the structure is via the str() function. It’s fine but not as pretty as glimpse(). Sometimes it’s necessary to str(), particularly when investigating the structure of lists – but that’s way beyond what we’re talking about here.

str(starwars)  #example, not executed. Practice this in your RStudio console or R Notebook.  

 

Data Type

Most Common Data Types

  • character
  • numeric

    • integer (e.g. 30L, as.integer(30))
    • double (approximation with floating points and multiple special values)
  • logical (True or False)
  • factor ([forcats](https://forcats.tidyverse.org/) is especially used for for manipulating factor data)

Class

class() identifies the class of an object revealing the data structure or the data type.

class(starwars)
## [1] "tbl_df"     "tbl"        "data.frame"

You can reference a vector within a dataframe via the $<<column_name>> syntax.

class(starwars$name)
## [1] "character"
class(starwars$height)  
## [1] "integer"
class(starwars$mass)
## [1] "numeric"

Other Useful Functions

tbl_vars lists table variables (column headers) as a vector

tbl_vars(starwars)
##  [1] "name"       "height"     "mass"       "hair_color" "skin_color"
##  [6] "eye_color"  "birth_year" "gender"     "homeworld"  "species"   
## [11] "films"      "vehicles"   "starships"
 
R We Having Fun Yet‽ -- Learning Series
Data & Visualization Services
Duke University Libraries
C bn
Shareable via Creative Commons: CC By-NC