Michael Lydeamore
  • Home
  • CV
  • Software
  • Blog
  • Guides
    • Making your first R Package
  1. Working with Data
  • Intro to R Packages with {usethis}
  • Why Write an R Package?
  • Creating a New Package
  • Writing Your First Function
  • Working with Data
  • Build, Check, and Install
  • Final Tips & Resources

On this page

  • Step 1: Add Raw Data Script
  • Step 2: Read and Save Data
  • Step 3: Load and Use the Data
  • Step 4: Document the Dataset
  • Internal vs External Data
  • Tips for Good Data Documentation
  • Coming Up Next

Working with Data

R packages can include data — not just functions. This is helpful when:

  • You want to ship example data with your package
  • You’re documenting datasets for teaching or analysis
  • You use specific reference data across projects

There are two types of package data:

  • External data: available to users after loading the package
  • Internal data: used inside the package only

Step 1: Add Raw Data Script

Run:

usethis::use_data_raw("my_dataset")

This creates:

  • A data-raw/ folder (if it doesn’t exist)
  • A starter script at data-raw/my_dataset.R

Inside that script, read in and clean your data.


Step 2: Read and Save Data

Replace the contents of data-raw/my_dataset.R with something like:

# Read raw CSV
my_dataset <- read.csv("path/to/your.csv")

# Clean or rename variables here
# my_dataset <- dplyr::rename(my_dataset, new_col = old_col)

# Save it for package use
usethis::use_data(my_dataset, overwrite = TRUE)

After this runs, you’ll see:

data/
└── my_dataset.rda

Step 3: Load and Use the Data

After loading your package:

library(mypackage)
my_dataset

The dataset is ready for use — just like iris or mtcars.

The object name (my_dataset) is the same as the name in your R script and .rda file.


Step 4: Document the Dataset

You document datasets with roxygen2, just like functions.

Create a new file, e.g. R/data-documentation.R:

#' Example dataset: my_dataset
#'
#' A dataset containing example data imported from a CSV.
#'
#' @format A data frame with 100 rows and 5 variables:
#' \describe{
#'   \item{col1}{Description of column 1}
#'   \item{col2}{Description of column 2}
#'   \item{col3}{Description of column 3}
#' }
#' @source path/to/your.csv
"my_dataset"

Then run:

devtools::document()

Now you can view:

?my_dataset

✅ You’ll see a proper help page for the data!


Internal vs External Data

If you want to use a dataset inside your package but not expose it to users:

usethis::use_data(my_secret_data, internal = TRUE)

This stores it in R/sysdata.rda. You can still use it inside your package code, but users won’t see it in the global namespace.


Tips for Good Data Documentation

📋 When documenting data, always include:

  • What the dataset represents
  • The number of rows and columns
  • Descriptions for each variable (@format)
  • The source of the data
  • Any transformation or filtering applied

Use @examples too if it makes sense:

#' @examples
#' summary(my_dataset)

Coming Up Next

Now that you can include data and document it properly, let’s look at how to build, install, and share your package.

👉 Build and Install

Dr. Michael Lydeamore
Lecturer in Business Analytics
Monash University
© Copyright 2023 Michael Lydeamore

 

@MikeLydeamore
MikeLydeamore
michael.lydeamore@monash.edu
Clayton, VIC, Australia